As basketball fans know, our beloved announcers all have their pet sayings when calling games: Mike Breen’s “BANG!” after big shots, Mark Jackson’s “Hand down, man down” (which doesn’t really make sense), Hubie Brown’s “Now now, come on now”. One of my favorite announcers though is Jeff Van Gundy because of all his random ideas and thoughts he spurts out on every broadcast.
Usually, a lot of Jeff’s thoughts are just dumb things he says to kill time during blowouts, but there is one point he consistently makes that has interested me for a while. Why do teams always play better at home? After all, the rims are still 10 ft. tall, the ball’s the same, the coaches and players are the same, and the court is still 94×50 ft. of hardwood no matter where you play.
Yet year after year, teams almost always have a better home record than away record – even the terrible teams! In fact, not only do teams win more home, they win MUCH more. The 10-72 76ers this year won 20.6% (7-34) of their home games, but only 7.3% (3-38) of their road games – twice as much at home!
In fact, looking at 12 years of data since the 2004-05 season1, we see that teams had a better or equal home record than overall record 97.34% of the time! (329 out of 338 instances).2 Why do teams consistently play so much better at home?
From what I gathered, there are three prevailing theories on why home court advantage is a thing.
- The home crowd somehow makes the home team play better, or away team play worse.
- The refs give the home team more calls.
- Scheduling. Players have more time to rest and get comfortable at home, hence they play better.
In this three part series, I’m going to take a look at each theory and analyze its merit based on historical data using statistical methods. To start off, let’s look at Theory #1 – the crowd can influence games.
To measure a crowd’s impact, I used home Attendance Percentage, assuming that a larger crowd has more of an impact on the home team. Unfortunately, this is an imperfect stat because it does not capture crowd enthusiasm; a large crowd is not necessarily a loud or hostile crowd. It would be great if I could get something like crowd decibel level or number of fan ejections, but I couldn’t find anything like that, so Attendance PCT it is.
Collecting the Data
I want to test whether or not there is a significant relationship between home Attendance PCT and Home Record Outperformance — whether crowd size influences a team’s home court advantage. This part is tricky; I can’t just use Home Record PCT as the dependent variable because there will bias – goods teams will naturally have better home records and attract higher attendance. Good teams will win a lot games both at Home and on the Road, just as bad teams will lose a lot of games at Home and Away – Hence why Home Record is not a good indicator of home court advantage
To eliminate this bias, I regressed attendance on a team’s Outperformance at Home vs. Away to find how many more games a team wins at Home than on the Road. This way, I can standardize teams regardless of Overall Record by using their Away record as a kind of baseline.
For example, the 32-50 Knicks this year went 18-23 (43.9%) at Home and 14-27 (34.15%) on the Road. Their Outperformance at home was %.
By comparison, the 73-9 Warriors’ outperformance was 12.19% . Notice how I am not punishing the Knicks for losing 50 games nor rewarding the Warriors for winning 73. Instead, I am simply comparing each team’s Home performance relative to how they performed on the Road. In this way, outperformance does a better job than Home Record of measuring the true value of Home Court Advantage.
To summarize, Attendance PCT is our independent variable and Home Outperformance PCT is our dependent variable.
To begin, I collected 12 seasons of Attendance, Home record, and Away record data for all 30 teams from ESPN, starting with the 2004-05 season and ending with the 2015-16 season.
Two things to note from this data set: 1. Some teams have over 100% attendance because they sell standing room only tickets (I assume). 2. ESPN was missing attendance for some teams for some earlier seasons, so I have 338 records instead of 360 (= 12 seasons *30 teams) in my data.
If you are interested in looking at the data I collected, you can find the Excel in the Resources section at the bottom of this piece.
Here is a scatter plot of Home Outperformance PCT vs. Attendance PCT. Teams with below zero Home Outperformance actually won more games on the Road than they did at Home that season.
We now need to make sure that our data conforms to the assumptions of a linear regression (such as homoskedasticity). Thankfully, R makes this easy.
Best and Worst Fanbases by Attendance
It seems like every team has the best fans in the world, but which fans actually back it up by consistently filling up the seats? Here are the top 10 teams in average home Attendance PCT over the last 12 seasons.3 League average was 91%.
Top 10 Teams by Attendance, 2004-16
|Team||Avg Attendance PCT|
Those poor Knicks fans. Over the last 12 seasons, the mostly star-less Knicks had an average Overall record of 40.62% (33-49) and made the playoffs only 3 times, yet they still attracted higher attendance than the Kobe-led Lakers.
Here are the teams with the worst average attendance over the past 12 seasons.
Worst 10 Teams by Attendance, 2004-16
|Team||Avg Attendance PCT|
Interestingly, the “Highlight Factory” Hawks had the 4th highest Overall record over the last 12 seasons at 58.49%, but the 10th worst attendance.
Best and Worst Home Court Advantage
Here are the 10 best and worst teams in terms of Home Court Outperformance PCT. You can consider these teams to have the best/worst Home Court advantage over the last 12 seasons. Remember, outperformance does not consider how good or bad the team was overall, just how many more games it won at Home than Away. League average was 18.95% (7.77 more games won at Home than Away).
Top 10 Teams by Home Outperformance
|Team||Avg Home Outperformance PCT|
*Note that Hornets refers to the Charlotte franchise, not the old New Orleans Hornets.
If you didn’t know, the Utah and Denver home court advantage is largely because of their time zone difference and high altitude, especially when the away team is on the second leg of a back-to-back. This is actually kind of an unfair advantage that NBA owners have brought up in annual meetings and that the NBA has tried to address.
Worst 10 Teams by Home Outperformance
|Team||Avg Home Outperformance PCT|
Notice how even the worst Home Court Advantage teams still have positive Outperformance PCT. No doubt that the Home Court Advantage is real. Now to find out why.
I ran a single-factor linear regression on the data using R – just a fancy way of saying line of best fit, or where the independent variable x=Attendance PCT.
The middle blue line is our linear regression, or line-of-best-fit. The upper and lower lines are our 95% confidence interval lines. With 336 degrees of freedom, here is the Ordinary Least Squares (OLS) regression equation.
In other words, for every 1% increase in Attendance, Home Outperformance PCT increases by 0.09965%.
Cool, we have the regression model. Seems like Attendance just marginally has a positive relation to Home Court Advantage (Outperformance), but we have to make sure. We do this by testing how well our model fits the data.
First off, how confident are we that our coefficient (‘a’) of 0.09965 is significantly different from 0? In other words, is the relationship we found between Attendance and Home Outperformance PCT more than just chance. We can verify our coefficient’s significance using a hypothesis test where the Null Hypothesis is
and the Alternative Hypothesis is
Using a 95% confidence interval, we calculate with 95% confidence that the true value of the coefficient lies between -0.01092759 and 0.2085601. We also find that the p-value of our coefficient is .
Since zero lies within our confidence interval range, we cannot reject the Null Hypothesis that ‘a’ is not significantly different from zero. This conclusion is also supported by our relatively high p-value of .103, implying that we can only conclude ‘a’ is significantly different from zero with 89.7% or less confidence, which is way too low. 95% confidence is considered standard.
More statistics to support our findings. The Correlation between independent variable x (Attendance PCT) and dependent variable y (Home Outperformance PCT) is only and the Coefficient of Determination , meaning that a measly .7894% of the dependent variable variance is explained by our independent variable. You might have already guessed it when you first saw the scatter plot visualization above, but it appears Attendance PCT has no significant impact on Home Outperformance. Just looking at the scatter plot alone, there is no clear relationship between x and y, linear or not. Looks like we need to search elsewhere to find the source of Home Court Advantage.
Interestingly, we have proven that Theory #1 is NOT statistically valid. Home crowd attendance DOES NOT influence a team’s outperformance at Home. In other words, there is no significant correlation between Attendance and Home Court Advantage (how much more a team wins at Home than Away).4
In other words, there is no significant correlation between Attendance and Home Court Advantage
Specifically, we found that a 1% increase in Attendance leads to a 0.09965% increase in Home Court Advantage (Outperformance). However, we cannot conclude that this number is significantly different from zero (more than just a result of randomness) because of our high p-value and the fact that 0 is within our 95% confidence interval.
Home Court Advantage definitely exists, as evidenced by the league average of 18.95% outperformance at Home relative to Away. It’s just not correlated with Attendance PCT, and by extension, crowd size.
Having disproved Theory #1, we will take a look at Theory #2 (refs are biased for the home team) in Part 2.
*If you like this kind of statistical analysis, be sure to check out our partners at http://tevunah.com for even more great content.
Regression Code: R and RStudio
#Set Correct Path
State <- read_excel("Attendance-Data.xlsm", sheet = "Sheet1")
State <- State[!is.na(State$Attendance_PCT),]
# save ggplot2 to variable qq
qq <- ggplot(data = State, aes(x = Attendance_PCT, y = Diff)) +
geom_point(aes(text = paste("Home Win %:", Diff)), size = .5) + ylim(-.1, .6) + xlim(.5, 1.05)
# Linear Regression Model
reg1 <- lm(Diff ~ Attendance_PCT, data = State)
#Degree's of freedom
DF <- (338 - 2)
param <- summary(reg1)$coefficients[, 1];
se <- summary(reg1)$coefficients[, 2]
param + qt(0.975, DF) * se
param - qt(0.975, DF) * se
fun.1 <- function(x) 0.09882 + 0.09965*x
fun.2 <- function(x) 0.2085601 + 0.2195319*x
fun.3 <- function(x) -0.01092759 + -0.02023042*x
qq + stat_function(fun = fun.1) +
stat_function(fun = fun.2) +
stat_function(fun = fun.3) +
theme(axis.title.x = element_text(face="bold", size=16),
axis.text.x = element_text(vjust=0.5, size= 16))+
theme(axis.title.y = element_text(face="bold", size=16),
axis.text.y = element_text(vjust=0.5, size= 16))
# I'm trying to get find line of best fit
# y = mx + b
# y = 0.9965x + 0.09882
# and get R^2 (0.08884934)
# and get a confidence interval for the b1 coefficient'
# y = 0.2195319x + 0.2085601
# y = -0.02023042x - 0.01092759
# y = 1.1183014x - 0.1304357
# looking to prove b1 is significantly different from 0
# p-value: 0.103
#Revert chart plots
*Written with help from Jasper Wu
- I use 2004-05 as a starting point because that is when the Charlotte Bobcats rejoined the league ↩
- I only looked at teams with valid attendance records, hence why 338 entries instead of 30*12=360). ESPN was missing 22 attendance records. ↩
- Hornets refers to the Charlotte franchise, not the old New Orleans Hornets ↩
- Again, I want to point out the caveat I mentioned at the beginning. We are only measuring Attendance PCT (crowd size) — not how enthusiastic the crowd is. There may be some other way the crowd impacts the game beyond just attendance, like level of noise produced. However, that kind of data is harder to find; I don’t even know if arenas record average noise level. ↩