Every year, 30 teams led by the greatest coaches and basketball players in the world grind it out to win the last game of the season. The brightest basketball minds search desperately to find the right mixture of talent, coaching, and star power, and pray for a little bit of luck. But before the season starts, before the games are played, the outcome has already been decided.

“Every battle is won or lost before it’s ever fought.” – Sun Tzu, The Art of War.

“Championships are won with defense” the cliche goes, but in reality, championships are won with a single phone call to the Barclays Center in Brooklyn long before a season begins. Before a brush is lifted, the canvas must first be prepared. For every team, there is no event of greater importance than the NBA Draft.

The Draft. Where dynasties are built on the back of a Tim Duncan, or where franchises are sucked into lottery hell by a Darko Miličić. An event where not a single basketball is dribbled, but one that determines the fate of billion-dollar franchisees for years to come. I am absolutely fascinated by it.

In honor of the 2016 NBA Draft taking place next week, I wrote a Python script to scrape advanced statistics, draft results, and salary information for the past 8 seasons. I then ran a statistical analysis on the data to find the average expected value of each draft position in terms of Additional Wins produced vs. Salary. I hope you enjoy the read.

## Objectives

To begin my analysis, I first needed to find a player’s *value. *Easier said than done. Using classic stats such as Points per Game overvalues inefficient chuckers, but advanced stats like Net Rating and PER are highly correlated with team record – a player is rewarded for simply playing with other good players. After testing multiple advanced stats, I decided to use Value Over Replacement Player (VORP) which measures how much a player produces relative to an average, minimum-cost replacement player, adjusting for minutes and games played.^{1} VORP alone with no context does not tell us much, but by multiplying VORP by 2.7, we can get a rough estimate of **Wins Over Replacement (WOR****)** – the number of **Additional Wins** a player theoretically produces for his team. Now here is a meaningful stat I can use!

## Scraping the Data

Luckily, basketball-reference has VORP calculated for every player for every season since 1974, downloadable in csv. Awsome. There’s just one problem; there is no shortcut to pull multiple seasons of data in one go, so you would have to download each season individually and combine them into a single file. Unfortunately I don’t have an intern, so I wrote a Python script that uses the *BeautifulSoup* and *Pandas* module with some regex to automatically scrape advanced statistics for a customized time frame and dump them all into a spreadsheet. I don’t want to bore you with technical jargon here, but those interested in web scraping can check out the downloadable code at the bottom of this post and email me with any questions. There is also a link to download the Excel sheet with the raw data at the bottom.

###### Findings – Additional Wins

I ran my script to get 8 years of data from 2007-16 on qualified players^{2}. I just arbitrarily chose 8 years, but the time frame is adjustable.

In the tables below, think of Additional Wins as the number of extra wins a player produced for his team. A negative Additional Wins number means that a player was so bad that his team would have actually won more games by subbing a minimum-cost replacement. **The average WOR (Additional Wins) for all qualified players from 2007-16 is 2.45**. Here are the 10 best and worst players since 2007 ranked by Additional Wins.

## Top 10 Players, 2007-16

Season | Player | Team | Additional Wins |
---|---|---|---|

2008-09 | LeBron James | CLE | 31.32 |

2009-10 | LeBron James | CLE | 29.43 |

2007-08 | LeBron James | CLE | 27.27 |

2008-09 | Chris Paul | NOH | 27.00 |

2012-13 | LeBron James | MIA | 26.46 |

2015-16 | Stephen Curry | GSW | 26.46 |

2008-09 | Dwyane Wade | MIA | 26.19 |

2007-08 | Chris Paul | NOH | 22.95 |

2013-14 | Kevin Durant | OKC | 22.95 |

2015-16 | Russell Westbrook | OKC | 22.41 |

## Worst 10 Players, 2007-16

Season | Player | Team | Additional Wins |
---|---|---|---|

2012-13 | Kevin Seraphin | WAS | -4.32 |

2012-13 | Michael Beasley | PHO | -4.32 |

2007-08 | Jeff McInnis | CHA | -3.78 |

2008-09 | Earl Watson | OKC | -3.78 |

2013-14 | Andrew Nicholson | ORL | -3.78 |

2014-15 | Lance Thomas | OKC/NYK | -3.78 |

2007-08 | Nenad Krstic | NJN | -3.51 |

2007-08 | Mark Blount | MIA | -3.51 |

2007-08 | Earl Barron | MIA | -3.51 |

2008-09 | Nick Young | WAS | -3.51 |

## Relative Value of Production

It’s not a surprise that Superstars have the highest Wins Over Replacement over the last 8 years. After all, they get a lot of money to be very good. What I’m more interested in though, is **relative value**. Which players produce the most Wins per dollar they are paid?

In the finance world, value is measured by the P/E ratio – the Price you pay per $1 of Earnings. A lower than average P/E ratio means an asset is undervalued, a higher one means the asset is overvalued^{3}. To translate this to basketball, I created a stat I call **Price per Additional Win** **(P/AW ratio) **which measures how much a player is paid per Additional Win he generates. Remember that Additional Wins is a synonym for Wins Over Replacement (WOR).

P/AW is simply calculated as

**P/AW = Player Salary/(Wins over Replacement)**

for a specific season. For example, Steph Curry led the league this year by producing 26.46 Additional Wins for the Warriors. Curry was paid $11,370,786, so his P/AW is $11,370,786/26.46 = **$429,735**. This means the Warriors paid Curry $429,735 for each Additional Win he produced. By contrast, Kevin Love this year had a P/AW of **$2,579,365 **($15,719,062/6.75). The Cavs paid Kevin Love $2,579,365 per extra win he gave them. Since all wins are worth the same, Steph Curry not only produced more wins than Love did this year, he did so almost 6x more efficiently (from an owner’s perspective). In other words, Steph Curry was heavily *undervalued* relative to Kevin Love.

To find P/AW, I needed to write another function that would capture every player’s salary for each season. Again, this information is on basketball-reference, but it was a little trickier to scrape this time since I had to visit each individual player’s profile page. Coders can look at the bottom of this post for more details. After finally getting the script to work, I ran it again for the past 8 years. Here are the 10 most Undervalued players for 2015-16 first, then since 2007.^{4} Note that P/AW, like P/E, is invalid for players with negative WOR, so I can’t make an accurate most-overvalued list. **The average P/AW for qualified players from 2007-16 was $3,100,424**.^{5}

## Top 10 Undervalued Players, 2007-16

Season | Player | Team | Additional Wins | Salary | PAW |
---|---|---|---|---|---|

2009-10 | Anthony Tolliver | POR/GSW | 1.62 | $51,983 | $32,088 |

2007-08 | Jamario Moon | TOR | 7.02 | $427,163 | $60,849 |

2014-15 | Draymond Green | GSW | 11.88 | $915,243 | $77,041 |

2014-15 | Rudy Gobert | UTA | 11.61 | $1,127,400 | $97,106 |

2007-08 | Monta Ellis | GSW | 7.83 | $770,610 | $98,418 |

2008-09 | Paul Millsap | UTA | 8.10 | $797,581 | $98,467 |

2012-13 | Chandler Parsons | HOU | 8.91 | $88,8250 | $99,691 |

2010-11 | Landry Fields | NYK | 4.59 | $473,604 | $103,182 |

2012-13 | Patrick Beverley | HOU | 2.70 | $281,377 | $104,214 |

2013-14 | Isaiah Thomas | SAC | 8.37 | $884,293 | $105,650 |

Notice how almost all these players were on the tail end their rookie scale contracts. This is one reason why draft picks are so coveted by NBA teams. No matter how much a drafted player overachieves, he is locked into getting paid peanuts for his first 3-4 years.

## Expected Value per Draft Position

Now, onto what I originally set out to do. I want to find out two things:

- The average Additional Wins per draft pick
- The average P/AW ratio per draft pick

In other words, I am finding how many Additional Wins the average player in each draft position produces, and what is the value of that production relative to his salary. I will only take players drafted in the 2007 Draft and onward since that is the time range of my data. Since a lot of rookies barely play, I decided to include all players, not only qualified ones (≥500 minutes played a season). But before I can continue my analysis, I have to adjust for a thing called Survivorship Bias. Let me explain.

###### Adjusting for Survivorship Bias

Let’s say I wanted to find Additional Wins for players drafted 60th overall from 2011-14 – three total seasons. Therefore, the player drafted in 2012 should have 3 seasons of data, the player drafted in 2013 should have 2 seasons, and the player drafted in 2014 should have 1 season of data for a total of 6 seasons. Here is how the actual data looks.

Season | Player | Additional Wins | Draft Year |
---|---|---|---|

2011 - 12 | Isaiah Thomas | 2.97 | 2011 |

2012 - 13 | Isaiah Thomas | 2.70 | 2011 |

2013-14 | Isaiah Thomas | 8.37 | 2011 |

2012 - 13 | Robert Sacre | -0.81 | 2012 |

2013 - 14 | Robert Sacre | -0.27 | 2012 |

Average | 2.592 |

Wow, that’s pretty good for the 60th pick, an average of 2.592 Additional Wins! But hold on a second, there are only 5 entries here – where is the 6th?? Well, the missing entry is the 60th pick in 2013 – *Jānis Timma*. Our friend Jānis never played in 2013-14, so we don’t have data for him. This is a classic case of survivorship bias. Jānis never made it to the NBA, so we didn’t account for him in our data. Consequently, our numbers will be inflated because we did not account for Jānis’ production of 0. In other words, we need to include players who did not **“survive”** the journey from the Draft to an NBA court. If we analyze only players who “survived”, we will have an overly optimistic view on what we can expect from each draft position. By including Jānis’ (lack of) production, the average Wins Over Replacement drops from 2.592 to 2.16.

Calculating average Additional Wins per draft position is simply

**Avg Additional Wins = ∑(Additional Wins)/(# of Seasons)**

I adjusted for survivorship bias by simply calculating the total number of seasons of data there should be for a given time frame (45 seasons for 2007-16 picks), and using that number as the denominator when calculating average Additional Wins. Therefore, each draft position should have the same denominator. Since a player who did not play has an Additional Wins value of 0, no adjustment is needed for the numerator.

###### Findings – Average Additional Wins per Draft Position

Interesting. This graph looks kind of a like a stock chart with jagged ups and downs instead of a smooth linear regression as one might expect. However, as draft position increases, the average Additional Wins decreases, an inverse relationship which is to be expected. Some other things I noticed:

- There are 10 picks (16.67% of the draft) which actually lose their team wins on average.
- 20, 28, 32, 36, 44, 51, 52, 54, 58, 59

- First overall pick has the highest Additional Wins at 4.734 -even taking account busts like Greg Oden and Anthony Bennett.
- There is a HUGE drop-off in the value of the 2nd overall pick (only 2.466, then back up to 4.422 and 4.41 for 3rd and 4th) and this data set even included 8 seasons of Kevin Durant since 2007! Here are the other players who were drafted 2nd overall: Michael Beasley, Hasheem Thabeet, Evan Turner, Derrick Williams, Michael Kidd-Gilchrist, Victor Oladipo, Jabari Parker, and D’Angelo Russell… Okay, makes sense. Call it the curse of the 2nd pick. Good Luck, Lakers!
- Wondering what that random spike is for the 35th pick? Deandre Jordan and Draymond Green carrying hard.
- What about that 48th pick spike? Marc Gasol, carrying harder.
- The top 5 win-producing draft positions in order – 1st, 3rd, 4th, 9th, and 7th. A lot of solid players consistently drafted 9th actually: Joakim Noah, DeMar DeRozan,Gordon Hayward, Kemba Walker, and Andre Drummond.
- Worst 5 win-producing draft positions – 20th, 36th, 52nd, 32nd, 28th.
- The real “Mr. Irrelevant” is the 59th pick. Of the eight 59th picks drafted in the last 8 years, only D.J. Strawberry (2007) has played in the NBA, and even then he only played 270 minutes in one season.

2007-16 gives me 45 seasons of total data for each draft position. I feel like that is a good sample size, but I wonder how expanding my time frame would affect the data set. I think I will cover that in a follow-up post later. If you are interested in exploring further on your own, you can find the full data I used in the “Draft Position Value Raw” sheet from the Excel at the bottom of this post.

Onto the good stuff. **Relative **value per draft position by P/AW.

##### Findings – Relative Value by P/AW

Again, P/AW is good for finding undervalued players, but not for finding overvalued players because it’s not valid when Additional Wins is negative. Therefore, for this analysis, I focused only on draft positions with below league median P/AW. **The lower the P/AW value, the more undervalued**.

- Interestingly, there is an inverse relationship between draft position and P/AW. As draft position increases, you can find better bang-for-your-buck. Since Additional Wins trends downward with draft pick position, this implies that
**Salary paid decreases at a rate faster than Wins Over Replacement decreases as we get later into the draft**. - There are 19 picks that have below median P/AW and 41 that have above median P/AW. That means only 31.67% of the picks are true bargains.
- Of those 19 bargain picks, 11 are in the 1st round and 8 in the 2nd round.
- The Top 10 picks by lowest Price per Additional Win. These have been the highest relative value picks over the last 8 years (most undervalued). In order:
- 46, 60, 42, 30, 22, 45, 34, 9, 48, 4

- Lots of high-value gems to be found in that 2nd round. Some notable examples – Danny Green (46), Isaiah Thomas (60), Patrick Beverley (42), Goran Dragic (45), Marc Gasol (48).
- This is a long-term numbers game. All a team needs to do is find one gem in the later picks to strike gold value-wise. Finding one Gorgan Dragic more than makes up for drafting 7 nobodies.

## Just for Kicks

Since my script captured a bunch of extra data, I also did some quick analysis to find average Wins Over Replacement based on qualified player **Age** (at the start of a season). Just for fun.

The fact that players aged 25 produce the highest Additional Wins is not surprising, but I was surprised at the hefty production a 35/36 age veteran can provide. Here are Top 10 highest Additional Wins for players aged 35 or 36.

Season | Player | Age | Tm | Additional Wins |
---|---|---|---|---|

2008-09 | Jason Kidd | 35 | DAL | 12.69 |

2009-10 | Jason Kidd | 36 | DAL | 12.42 |

2010-11 | Ray Allen | 35 | BOS | 11.07 |

2009-10 | Marcus Camby | 35 | LAC/POR | 10.26 |

2013-14 | Dirk Nowitzki | 35 | DAL | 9.45 |

2015-16 | Pau Gasol | 35 | CHI | 9.45 |

2012-13 | Paul Pierce | 35 | BOS | 9.18 |

2012-13 | Tim Duncan | 36 | SAS | 8.64 |

2009-10 | Ben Wallace | 35 | DET | 7.83 |

2009-10 | Steve Nash | 35 | PHO | 7.02 |

I wanted to filter by Position, but a vast majority of player play multiple positions over the course of a season (i.e. Marcus Morris at PF-SF) which makes it impossible to determine what production occurred at which position. Doing a pivot table on the data, the numbers for each position looked roughly equal without adjustments, hovering around 2.5 Additional Wins per position.

What about which team is best in getting the most value out of their players? Here is P/AW by team – the lower the number means the team is more efficient in paying for wins.^{6} Note that I removed DET, LAC, PHI, CHA because they were so bad that they were actually paying their players to lose games from 2007-16 (negative P/AW).

## Resources

Downloadable Excel — Wins Over Replacement Data, 2007-16

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
# Extract VORP, draft position, and salary data from basketball-reference using Beautiful Soup # Author: Jasper Wu # Date: 06/09/2016 from bs4 import BeautifulSoup from urllib.request import urlopen from re import sub from decimal import Decimal import openpyxl import re import warnings import pandas as pd # Set ipython's max row display pd.set_option('display.max_row', 1000) # Set iPython's max column width to 50 pd.set_option('display.max_columns', 50) warnings.filterwarnings("ignore") # Gather data between these seasons (refer to season end year, so 2007-08 would be 2008) begDate = 2008 endDate = 2016 # Load data to this workbook. Change the location/name to wherever you put the file wbLocation = 'C:\\Users\Jasper\Documents\Basketball Blog Material\Posts\Jasper - Web Scraping\VORP Data.xlsx' wb = openpyxl.load_workbook(wbLocation) # We will dump our data into the sheet called 'VORP' ws = wb.get_sheet_by_name('VORP') last_row = ws.max_row # Grab all advanced stats def get_VORP(season, last_row): # Get advanced stats for all players for the given season and store them in Excel url = 'http://www.basketball-reference.com/leagues/NBA_' + str(season) + '_advanced.html' html = urlopen(url).read() soup = BeautifulSoup(html, 'lxml') rows = soup.find_all('tr', {'class': 'full_table'}) tdcount = len(rows[1].findAll('td')) original_last = last_row # for every table row, get each TD for row in range(0, len(rows) - 1): last_row += 1 columnCount = 3 ws.cell(row=last_row, column=1).value = str(season-1) + "-" + str(season)[-2:] for td in range(1, tdcount): # no need to start from 0 because 0 is Rk #Convert the data to numbers. If data can't be converted, pass it as a string try: ws.cell(row=last_row, column=columnCount).value = float(rows[row].findAll('td')[td].text) except ValueError: ws.cell(row=last_row,column=columnCount).value = rows[row].findAll('td')[td].text columnCount += 1 # this part is for finding the player ID in a 'href'. Use regex to eliminate the noise. # From '/players/a/acyqu01.html', just return acyqu01 p = re.compile('\/([^\/\?]+)\.') #Get the playerIDs from links for row in range(0, len(rows) - 1): original_last += 1 columnCount = 2 balls = rows[row].findAll('a')[0].get('href') ws.cell(row=original_last, column=columnCount).value = re.search(p, balls).group(1) # This is to calculate Wins Over Replacement Player (2.7 * VORP) columnCount = 31 ws.cell(row=original_last, column=columnCount).value = float(ws.cell(row=original_last, column=columnCount-1).value) * 2.7 wb.save(wbLocation) # Returns dict of all row indexes where a unique player ID appears in the ws and seasons # Keys: Season, rowIndex def get_row_season(uniquePlayerID): row_season = dict.fromkeys(['Season','rowIndex']) seasons = [] rows= [] for row in range(2, ws.max_row+1): if ws.cell(row=row, column=2).value == uniquePlayerID: # Can't use "-" in seasons as dict key later on seasons.append(ws.cell(row=row, column=1).value.replace('-', '_')) rows.append(row) row_season['Season'] = seasons row_season['rowIndex'] = rows return row_season # For players currently listed, get their draft/salary information and write it to the sheet. def get_Draft_Salary(ws, uniquePlayerIDs): for player in uniquePlayerIDs: print(player) full_info = get_row_season(player) player_url = 'http://www.basketball-reference.com/players/' + str(player)[0] + '/' + str(player) + '.html' html = urlopen(player_url).read() soup = BeautifulSoup(html, 'lxml') salaryDict= {} #----------------------------- DRAFT Information --------------------------------------- #Older players in basketball-reference Draft data stored in person_image_offset. Newer players don't have # person_image_offset. if not soup.find_all('div', {'class': 'person_image_offset'}): draft = soup.find_all('div', {'class': 'margin_left_half'})[0] else: draft = soup.find_all('div', {'class': 'person_image_offset'})[0] if "Draft:" in draft.text: matchDraft = re.compile(r'[\n\r].*Draft:\s*([^\n\r]*)') draftInfo = re.search(matchDraft, draft.text).group(1) matchTeam = re.compile(r'([^,]+)') draftTeam = re.search(matchTeam, draftInfo).group(1) for row in full_info['rowIndex']: ws.cell(row=row, column=34).value = draftTeam matchPosition = re.compile(r'\((.*)\)') positionInfo = re.search(matchPosition, draftInfo).group(1) matchPosition2 = re.compile(r',\s(.*?)[a-z]') draftPosition = re.search(matchPosition2, positionInfo).group(1) for row in full_info['rowIndex']: ws.cell(row=row, column=33).value = int(draftPosition) matchYear = re.compile(r'\d{4}') draftYear = re.search(matchYear, draftInfo).group(0) for row in full_info['rowIndex']: ws.cell(row=row, column=32).value = int(draftYear) else: for row in full_info['rowIndex']: ws.cell(row=row, column=32).value = 'Undrafted' ws.cell(row=row, column=33).value = 'Undrafted' ws.cell(row=row, column=34).value = 'Undrafted' # ---------------------------- GET salary Information --------------------------------------------- # Get historical salaries first try: salariesPast = soup.find_all('div', {'id': 'div_salaries'})[0] for td in range(0, len(salariesPast.findAll('td')) - 4, 4): season = salariesPast.findAll('td')[td].text.replace('-', '_') money = Decimal(sub(r'[^\d.]', '', salariesPast.findAll('td')[td + 3].text)) salaryDict[season] = money except: pass #APPEND current salary try: salariesNow = soup.find_all('div', {'id': 'all_contract'})[0] for entry in range(1, len(salariesNow.findAll('th'))): season = salariesNow.findAll('th')[entry].text.replace('-', '_') salary = Decimal(sub(r'[^\d.]', '', salariesNow.findAll('td')[entry].text)) salaryDict[season] = salary except: pass counter = 0 #Loop through all seasons the player played nd if match, get salary and write to Excel for playedSeason in full_info['Season']: for season, salary in salaryDict.items(): if playedSeason == season: ws.cell(row = full_info['rowIndex'][counter], column = 35).value = salary counter += 1 wb.save(wbLocation) # Append data from time range to end of excel sheet for year in range(begDate, endDate+1, 1): get_VORP(year, last_row) last_row = ws.max_row # Get the unique playerIDs from the time range VORP = pd.read_excel(wbLocation, 'VORP') uniquePlayerIDs = VORP.PlayerID.unique() # Append Draft data to all players get_Draft_Salary(ws, uniquePlayerIDs) |

### footnotes

- This is not a perfect stat because it is based on Box Score Plus Minus, but I think it does the best job of capturing value. A lot of other advanced stats I back-tested claimed that Boban Marjanović was better than Karl-Anthony Towns this season ↩
- A qualified player played at least 500 minutes during the season ↩
- Compared to similar assets ↩
- Because of the rising salary cap & inflation, average P/AW will tend to trend upwards over time. Still working on an adjustment for this. ↩
- Ignoring players with negative WOR ↩
- CHO is Charlotte Hornets. New Jersey Nets and Brooklyn Nets are treated as separate teams ↩

## 2 thoughts on “The Expected Return of each NBA Draft Position”