Can You Predict Performance from What a Player Says? You'd Be Surprised.
Taking some inspiration from MIT Sloan today!
Sometimes, I’ll do a bit of perusing of MIT Sloan papers. You know, the conference where guys with super-technical degrees decide to use their expertise in finance, engineering, or otherwise, in what is obviously a far more important field: Hoops!
I digress, but some of the stuff that comes out of the MIT Sloan Sports Analytics Conference every year is literally mind-blowing, and teams usually come calling as a result.
This past year, one paper that turned heads was “Beyond the Box Score: Using Psychological Metrics to Forecast NBA Success,” which aimed to determine whether or not a player would be drafted based on their interview transcripts. It’s pretty mind-blowing, and it was successful, too: without any physical attributes, they could predict who would get drafted with 67% accuracy, and with physical attributes, they could predict a whopping 87%! That’s incredible accuracy for just predicting who’s draft-worthy, hence why teams like the Heat reached out to the researchers after the paper was presented.
So, of course, as I am wont to do, I wanted to see what else players’ words can tell us — and as it turns out, they can tell us quite a lot, especially when you mix them with existing stats.
P.S. If you don’t care about the technical aspects of it and just want to get to the data, feel free to scroll down to the “Choice of Words” section.
Prepping The Program
But first, to actually analyze what players say, you have to have interviews to pull from. I ended up doing a similar thing to what the researchers of the MIT Sloan study did, grabbing transcripts from one of the NBA’s partners, ASAP Sports. After pulling literally every NBA player’s transcript in the database (it took a little bit!), I finally had a beautiful set of data to work with. Except, of course, I didn’t, because a lot of players had other players (or coaches) with them in their interviews, making it a bit problematic to actually use each interview transcript.
Hence, after tinkering a bit with some methods to clean the data, I was successfully able to create a file for each player labeled “All Individual Answers,” allowing me to begin to actually have some fun.
The researchers in the MIT study used a specialized model to analyze each player’s interview transcripts, one that requires a paid license to use. I, being cheap, decided to go with a regular stock NLP (natural language processing) pipeline, but then I ran into an issue: What to do when players use a lot of words like certain names (“NBA”), generic words (“tournament”), and other borderline useless (for our purposes) words?
Thankfully, there’s a solution to that, too. Without getting too technical here (my head was spinning just figuring this out), I removed the proper nouns and utilized a type of regression called Lasso, which normalizes a lot of stuff (once again, very unscientific explanation here).
In short, I’m attempting to predict four key areas of players’ games:
Passing
Scoring
Defense
Rebounding
To do this, I trained four different models, which use the above NLP method to analyze players’ words, vocabulary richness, length of sentences, and all sorts of other metrics.
So, does it work? Well…
Choice of Words
As it turns out, it kind of does!
Let’s start with the Passing Mindset model. Some of the words/phrases that corresponded well with stats like assists and assist percentage included:
“make plays”
“thankful”
“opportunity”
“fun”
“penetrate”
There’s a common thread here, at least on paper: Players who talk about the emotional aspects or general playmaking aspects of the game are more likely to be better at passing. That seems obvious, but I don’t think it’s something we recognize without actually looking at the data.
According to our Passing Mindset model, these are the most pass-minded players in recent NBA history:
Jayson Tatum
LeBron James
Richard Hamilton
Giannis Antetokoumpo
Russell Westbrook
Damian Lillard
Chris Paul
Not a bad list, huh? It’s imperfect — Rip Hamilton was never a pass-first guy in my book — but it has the hallmarks of great point guards, as well as point forwards like LeBron and, to some extent, Tatum, who averaged 6 assists per game last season. And, remember, this is about mentality, not about the actual numbers — yet.
Before I show you just how accurate the Passing Mindset model is at explaining how well a player plays, let’s move to our scoring model. Some of the words and phrases that coincided with PPG, USG%, and things of that nature were:
“impact”
“better”
“quarter”
“can’t wait”
Outside of “quarter,” it’s pretty obvious what’s going on in this case. Players who like to score talk about the impact they have on the game, are more likely to talk about wanting to be “better,” and have an air of excitement about them (“happy” was another term that popped up a lot).
With that in mind, the top players in Scoring Mindset are:
LeBron James
Jayson Tatum
Shai Gilgeous-Alexander
Carmelo Anthony
Damian Lillard
Giannis Antetokoumpo
Tyrese Haliburton
Also, not a bad list! Tatum and LeBron (plus Lillard and Antetokoumpo) show up on both the Scoring and Passing Mindset lists, but can you really argue that too much? SGA is an easy shoo-in, as is the very score-first Carmelo Anthony. So far, so good!
Now we move on to the two more difficult models to quantify, at least in theory: Stocks Mindset and Rebounding Mindset. Defensive is notoriously difficult to measure, but I think there’s something here. For example, here’s the top 5 in Stocks Mindset, which aims to predict steals and blocks at the next level:
Victor Wembanyama
Kobe Bryant
Paul Pierce
Andrew Bynum
Shai Gilgeous-Alexander
Chris Paul
Kevin Garnett
The only players not on this list not to make an All-Defensive team are SGA and Andrew Bynum, both of whom I would call above-average defenders. Still looking pretty clean here!
Lastly, let’s look at Rebounding Mindset. Here’s your top-7:
Tony Parker
Kobe Bryant
Dwight Howard
Andrew Bogut
Rudy Gobert
Lamar Odom
Carlos Boozer
Hey, pretty good, right? I feel like I keep surprising myself with these lists, just because of how weird it feels to predict success in an area from the words a player says — but sure enough, we seem to have something here. How much of something, you may ask?
The Stats on Accuracy
To calculate how accurate the models are at predicting what they aim to measure, we can use a little variable called R-squared. R-squared, if you’ve ever taken a statistics class, shows you how much a variable (for example, Passing Mindset Score) can explain (aka: predict) another variable (for example: AST%).
This is where the stuff gets really fun to watch. For example, here’s a breakdown of how much of AST% can be explained by the models:
This is really, really good. By using interview transcripts alone, the Passing Mindset Score can explain 20.1% of a player’s assist percentage. It’s a pretty similar story with AST/36:
NBA AST/36 Explained by Each Model:
Passing Mindset: 18.3%
Scoring Mindset: 0.06%
Stocks Mindset: 0.12%
Rebounding Mindset: 0.68%
Of course, that means that there’s another 80-ish percent that our model can’t account for, which you’d expect. The model doesn’t take into account positions, height, weight, usage, or anything else; it’s just reading into the player’s words.
Let’s take a look at scoring next, starting with points per 36 minutes:
NBA PTS/36 Explained by Each Model:
Passing Mindset: 0.95%
Scoring Mindset: 13.9%
Stocks Mindset: 2.7%
Rebounding Mindset: 0.32%
Once again, it looks generally how we want it to look, though Scoring Mindset Score isn’t as clear a translation to points as passing mindset is to assists, and that makes sense on paper. Interestingly, Stocks Mindset gets into the equation here, too, which we’ll get back to later.
Usage is a similar story, and for brevity’s sake, I won’t put it below, but Scoring Mindset explains about 13.7% of a player’s usage rate.
Intriguingly, Rebounding Mindset explains about 11% of both REB% and rebounds per 36, but Passing Mindset seems to influence about 4% of that as well, which is strange, though it’s not a massive amount.
Lastly, we can talk about the one model that isn’t very explanatory — that is, doesn’t seem to be as accurate. Stocks Mindset doesn’t seem to explain all of STL% or BLK%, and not even as much as some other variables:
NBA STL% Explained by Each Model:
Passing Mindset: 1.16%
Scoring Mindset: 0.33%
Stocks Mindset: 0.91%
Rebounding Mindset: 0.38%
NBA BLK% Explained by Each Model:
Passing Mindset: 8.16% (strangely)
Scoring Mindset: 0.10%
Stocks Mindset: 0.09%
Rebounding Mindset: 1.77%
So though the other models worked quite well, Stocks Mindset is all over the place — and that’s okay, since that was kind of the point of testing it as well.
Bringing the Real Stats Into Play
Thankfully, we don’t just have the transcripts to go off of; we also have each player’s college stats (at least, if they played college at all). So, like the original MIT Sloan paper, I want to throw some stats into the (literal) equation here in an attempt to predict a player’s NBA stats in Year 5 of their career (which gives us enough room for guys to grow into their roles). To do that, I’m pulling from the much-loved BartTorvik.com website, which has tons of college stats ranging from rim attempts to PORPAG, an all-in-one metric.
After putting together the stats and the mindset scores, we get a R-squared value of 0.35, averaged across PTS/36, AST/36, REB/36, AST%, REB%, BLK%, and STL%. That means that 35% of the variability in each of those stats can be explained by our little model here, which I admittedly haven’t pruned to an extreme level and could likely be increased a bit more if I had (1) more data and (2) more time. With many players, though, it feels awfully prescient — and the 0.35 is actually a little bit misleading for one reason: It’s close on all accounts.
For example, for PTS/36, each prediction has a mean absolute error of +/- 2.16. In other words, any typical player’s model prediction is only 2.16 PTS/36 off of their actual — that’s pretty good! For AST/36, it’s even better, with a +/- 0.95, while REB is around the same range. Steal and block percentage are still a bit finicky (+/- 3%), but we’re getting awfully close on nearly everything.
But enough talking about the metrics, let me actually show you! On the “woah, this is crazy accurate” side, we have Damian Lillard (note, all stats have been scaled from 0 to 1 just to show the “shape” of their NBA statlines):
James Harden:
Tobias Harris:
And Brandon Clarke:
These are pretty incredible predictions, at least in my book. The model predicted that James Harden would score 23 points per 36 minutes in his 5th year in the NBA, and he ended up scoring 24 points per 36 minutes — amazing!
But, of course, it’s not all fine and dandy, and I don’t want to fool you into thinking I’ve made a perfect oracle of a model. So here are some of the worst predictions:
Zion Williamson:
Draymond Green:
Talen Horton-Tucker:
Joel Embiid:
These are some of the worst predictions out of all of the model’s takes (and you’ll notice it has a bit more trouble with big men). The most average prediction (that is, in between the best and worst) is Jalen Brunson, and his prediction feels still really accurate for being the “average” one, other than the scoring:
PTS/36
Predicted: 18.7
Actual: 24.68
AST/36
Predicted: 5.07
Actual: 6.3
REB/36
Predicted: 3.98
Actual: 3.6
USG%
Predicted: 22.42%
Actual: 22.4% (great!)
Overall, the distribution of predictions looks great. You want the chart to lean toward the left as much as possible, and I’m relatively happy with this for it being just a first run:
So, in conclusion, what do we have here? Well, it seems reasonable to say that at least some of a player’s future stats can be predicted strictly by listening to what they say, particularly on the offensive front. Players who pass tend to talk about opportunities and making plays (obviously), while scorers focus a lot on having fun and making an “impact.” Rebounders are somewhat predictable, too, but steals and blocks are a tough one to crack. And, when you throw college stats into the equation, you can actually get some really accurate results — though not perfect, by any means.
Above all else, this is meant to be an experiment. Sean Farrell, Ethan Laity, Dean Oliver, and everyone else in the MIT Sloan paper successfully showed the impact of linguistic analyses for hoops, and I wanted to see whether we could take that to another side of the game besides strictly determining a player’s draftability. Yet, I have a hunch that the above could be used for the draft as well, which I’ll likely be exploring at some point in the next 6 months.
Lastly, let me close with some qualms you may have with the above data:
Do players pass well because of what they say, or do they say certain things because they pass well? It’s a chicken-and-egg problem, and I’m not sure which it is, but I also don’t think it matters if you can successfully predict future outcomes — even if they’re similar to the current ones.
How can you guarantee that the transcripts are accurate? Well, I can’t, and neither could the MIT Sloan paper. It’s taken from a partner of the NBA, so that’s all I get to work with. But I can guarantee that I only took the players’ answers into the equation, as I rooted out all questions from those in the press box.
Why do stars dominate the top of the mindset lists? Because it’s taken from interview transcripts, and it’s rare for non-stars to get tons of interviews recorded in ASAP Sports’ database, except in college. I’ve included college transcripts as well to solve that issue somewhat, and there are some non-stars deeper down the list if I were to publish it in full.
Why does Passing Mindset influence BLK% so much? To be honest, I’m not entirely sure. Such is the nature of doing random statistical correlations.
“But correlation doesn’t equal causation!” Yes, yes, I hear you — and I agree! But, sometimes, correlation can mean causation, and I believe that players who are positive about their teammates and opportunities are more likely to exploit those opportunities via passing (or scoring).
Did you backtest it? For those out of the loop, this is about making sure that the data I’m trying to predict doesn’t sneak into the training dataset, which could cause overly accurate results. I did out-of-fold tests for each season and have covered my bases here, so unless I missed something — and I’m not a statistician by trade, so it’s possible! — no data has leaked from the training to the testing sets.
Hopefully you enjoyed reading through this one! I specifically made it free for all to read because of how fun it is, and because it builds upon previous research that I think is awfully interesting. If you thought so, too, drop me a comment (and, hey, if you’re with a team and like what you see, hit my line!).















This is awesome! I read that paper a while back, but didn't think to use it like you have here. Great stuff!!
Fantastic read and even better thinking piece. Thanks to Neil Paine and Ariel Calista for pointing us in the direction of this excellent article. I can't wait to see what other fascinating takes you have! Nothing better than waking up to find a writer new to me that pushes the envelope to understand. Excellent job here.