The Importance of Sample SizeBy Kevin Pelton
Jan. 22, 2004
I've been working recently on a theory that what separates smart people from less smart ones is their understanding of sample size. Recently, I've generalized the theory a little bit more to state that the difference is in correctly attributing what is luck and what is skill. Sample size plays heavily into that idea, but there are other elements as well.
In baseball, Bill James and others have written that the difference between a .250 hitter and a .300 one, over the course of the season, is virtually indistinguishable to the naked eye. The players are separated over that span by only a couple dozen hits -- for a player playing every day, about one a week.
I believe that things are somewhat better in basketball, both because I want to and because the sample sizes are larger if we talk about field goals attempted for a regular. Still, it can't be denied that the little things add up in the NBA. Basically one field goal per game distinguished last year's Nikoloz Tskitishvili -- who had the worst shooting percentage for an NBA regular in years -- from a fictional twin who would have shot 50% and been an incredible prospect.
Just how confident can we be in the statistics that we see?
I was provided a wake-up call this summer when I did my column comparing players' stats over a two-season span. I found huge, huge differences in two-point percentage. Arguably, the only reason I didn't find the same fluctuations in three-point and free-throw percentages was that I combined them (thus, a change in one might have been offset by an opposite change in the other).
The trend has, alas, continued this season. Last month, I saw a list of the NBA's leaders in two-point percentage from the perimeter (i.e. not power forwards and centers). Seeing several guys, like Devean George and DeShawn Stevenson, who had improved dramatically from the previous season, I went through and looked at how all of them had shot from two-point range in 2002-03. The results? After shooting 47.7% from two-point range, the players had improved to an average of 52.7% this year. That's huge.
I then went through the three-point leaderboard - and found even bigger changes. Those leaders went all the way from 36.0% to 44.0%.
Now those are the most extreme players in the league, and the numbers are for just two months, but doesn't that seem pretty dramatic to you?
Of course, it works the other way. Eddy Curry's reputation has been beaten up in Chicago this year because of his inability to maintain his play from last season. Naturally, the most notable difference is his shooting percentage, which has gone from a league-best 58.5% all the way down to 49.2%. But how much of that is because Curry actually is playing worse, and how much of it is completely random? (There are statistical answers to this question, but I'll save you and I both the trouble of going through the process. Look at this as a theoretical question.)
If the numbers can be deceived so badly, how easy is it for humans to be deceived by their own eyes? It's probably quite easy indeed, for a number of reasons beyond sample size. There are also perceptual errors, like the recency effect, which states that we tend to over-emphasize things that occurred most recently. Or, as I would prefer to put it, how a player can go from a star to a bum in one night.
We're probably also all guilty of selective memory. It's a lot easier to remember the great plays by your favorite players and the mistakes of the guys you don't like.
So what is the answer to all this? Should we just give up, not even bother trying to make predictions based on either observations or stats?
I have to confess the thought has crossed my mind in recent days. Saturday produced a lot of cognitive dissonance for me. In the early afternoon, I was watching the University of Washington men's basketball team take on Oregon State. Shortly into the game, freshman Angelo Tsagarakis checked in and the Fox Sports announcers praised his shooting ability. Since the guy is a backup on a second-division team in a relatively weak power conference, I was naturally a bit dubious. Still, even I was stunned to load up ESPN.com's Tsagarakis player page and find he was shooting a paltry 30.3% from three-point range this season, 2-for-21 during four Pac-10 games.
What did Tsagarakis end up doing? Oh, just hitting five of 10 three-pointers, half of them from an entirely different area code, and scoring 18 points.
Later that day, more of the same. Early in the Sonics' game at Washington, Sonics play-by-play announcer Kevin Calabro commented that Wizards forward Jarvis Hayes seemed to have "a nose for the ball", based on his fine early rebounding. I scoffed, knowing that Hayes is considered a one-dimensional player and had yet to hit double-digits boards this season. The result? A career-high 13 rebounds, followed by 14 Monday against Chicago.
(I'm not picking on these announcers, by the way. They are merely prominent examples to me of a trend towards broadcasters being overly positive about players. Can we please call a spade a spade and not gloss over players' weaknesses? It's enough to make one long for Simon Cowell to become a color commentator; for now, we have to settle for Fox's brutally honest Cris Collinsworth.)
Here's why I don't give up, why I soldier on: Statistics -- or, more accurately, those who use them -- have never claimed to be perfect or infallible. Nobody believes that the Atlanta Hawks could contend for the NBA Finals next season if they adopted a statistical approach. The difference is incremental, not huge. There are only a handful of Carlos Boozers to be discovered, and only one team can get each.
Where making decisions better makes a difference is over the long run. If every decision you make is slightly better, that's going to start adding up. After all, the sample size argument made earlier in this column demonstrates how nearly imperceptible differences separate good players from average ones, and great ones from good ones.
The implicit assumption is, naturally, that making decisions with a statistical perspective will produce better decisions. There are certainly those who disagree to some extent or another, including the established basketball community. The tide, however, is changing, and as statistics continue to play an ever-increasing role in our understanding of the world around us, that will naturally extend to basketball.
What will be necessary for a true revolution to take place in the only place that matters -- NBA front offices -- will be patience. The Oakland A's were lucky, in a sense, to be a poor team during the early 1990s. They had no success to risk by adopting a new strategy, and it was all the more difficult for the media to jump on them for daring to be different (now that they are successful, however, their methods have seemingly met more resistance).
One mistake, even a high-profile one, cannot be taken as an indictment of the reasoning behind it. Nobody's perfect. Jerry West gave Cezary Trybanski a long-term contract, and Geoff Petrie traded Jon Barry for Mateen Cleaves (admittedly a salary-related deal, but still . . .).
I guess that's the point I want you to take away from this column. Tsagarakis will have his day in the sun and the Spurs will lose to the Hawks, but in the long run, averages, order, and logic will rule, no matter how much it may not seem like it on a given day.
Lenny in the Big AppleI once intended to write a column on Lenny Wilkens' selection as the Knicks' replacement for Don Chaney, but limited time and an inability to put my thoughts done on paper have conspired to keep me from doing so. Instead, I offer this Cliff's Notes version of my take:
The Knicks are not as bad on defense as you think (not good, but not that awful). Mike Fratello's rep for having good defensive teams is overblown because of the pace his squads play at, and it's better for the game to have him at least as far away from the sidelines as the broadcast booth. Wilkens isn't getting enough credit as a defensive coach, as his Cleveland and Atlanta teams were consistently better defensively than offensively -- and very good overall. Has the game passed up Wilkens? Only as much as it has Pat Riley -- after all, it's been longer since Riley took a team to the playoffs. Coaches who are perceived as defensive-minded are overrated. The Knicks will likely make the playoffs.
WNBA Dispersal DraftAgain, a skeleton version of the column I never wrote: After looking at the numbers, I feel better about the selection of Penny Taylor by the Mercury. Taylor is virtually the same age as Thomas and has posted better numbers. She's also a good fit for Phoenix if Maria Stepanova returns to take the center position and Adrian Williams plays the four.
Chasity Melvin to Washington and LaToya Thomas are no-brainers for Washington and San Antonio, respectively.
After that, the talent dropped off. New York did well to get Ann Wauters, who has the potential to be an impact player, and in both cases, Indiana's Deanna Jackson and Seattle's Betty Lennox were clearly the best players available.
That leaves only one player worth discussing, Helen Darling, the former starting point guard for the Rockers who was selected seventh by Minnesota. One of the changes I've had to make when analyzing the WNBA is accepting that pass-first point guards are held in much higher regard. Even given that fact, Darling isn't very good. Her pass rating, which combines assist rate and assist/turnover ratio, was 13th in the WNBA and the same as that of Kristi Harrower, who Darling will likely replace as the Lynx's backup at the point. At the same time, Harrower was a far more efficient shooter (47.5% true shooting percentage), as were the backup point guards for other teams ahead of Indiana like Seattle's Tully Bevilaqua (48.1%); few WNBA shot worse than Darling's pitiful 43.8%.
Darling is a rotation player and wasn't a bad pick where she was taken, but teams ahead of Minnesota should not be criticized for passing on Darling, and she is most decidedly not a steal for the Lynx.
Kevin Pelton is an intern for the Seattle SuperSonics and is responsible for original content on Supersonics.com. He writes "Page 23" for Hoopsworld.com on a semi-regular basis. He can be reached via e-mail at firstname.lastname@example.org.