View Full Version : Fun with Statistics
DCWildcat
09-17-2006, 10:56 PM
A while back I made this post: http://www.wildcatnation.net/forum/view_topic.php?id=20805&forum_id=2&highlight=tempo but was unable to follow up on the stuff I wanted to do because I didn't have access to statistical analysis software (I was at home, not at school).
Well, now I'm back at school, and here goes. I'll put the results and comments in the next post; this is how to interpret it.
First, an explanation on some of this crap: A correlation coefficient basically tells you how two things match. The result can be anywhere between -1 (perfect negative) and 1 (perfect positive). A positive correlation means that as one variable goes up, the other does as well. For example, as the heat in an oven rises, the temperature gauge on that thing you stick in a turkey will rise. That should be a very strong positive correlation. A negative correlation would be something like as points allowed goes up, games won goes down.
The strength of the correlation is the important part. A correlation between rainy weather and umbrellas observed will be pretty strong (people use umbrellas when it's raining, not when it's not) but not perfect (not everyone has an umbrella). In general, a value of +/- .6 is "high," +/- .4 is "medium," and +/- .2 is "low." (that's incredibly overgeneralized, but good enough for now).
All data used is from http://kenpom.com/stats.php
DCWildcat
09-17-2006, 10:56 PM
Tempo[/b]
The result of a pearson correlation coefficient between adjusted tempo and pythag win% was -.11. I chose adjusted tempo because, well, it’s better than raw, and I didn’t include rank because rankings are stupid when you can have scores instead (see here:
http://www.wildcatnation.net/forum/view_topic.php?id=23078&forum_id=2&highlight=What% 5C%27s+in+a+rank%3F (view_topic.php?id=23078&forum_id=2&highlight=What %5C%27s+in+a+rank%3F)). Note that choosing adj. tempo rank and doing a Spearman correlation with that and adj. Pythagorean win % rank results in a -.12, significant correlation.
-.11 is not statistically significant; however, it’s right on the borderline, as -.12 would have been. What does this mean?
It probably is statistically significant[/i]. Basically, it’ll occur by chance only 6% of the time (5% is the arbitrary accepted value in science, but this is close enough). This is assisted by the huge data sample I have, which makes almost any differences significant.
It probably isn’t clinically significant.[/i] Statistical significance evaluates whether two different scores really “are” different. Clinical significance focuses on efficacy. For example, if men have a 100.1 average IQ and women have a 100.0, if you sample tens of thousands of people that’ll be statistically significant, but it doesn’t really have any implications because it’s so weak.
To me, this provides some evidence that tempo, in and of itself (i.e., not considering its effects on recruiting, for example)[/b] is not a very big factor in a team’s success. In fact, since the correlation is negative (win % goes down while tempo goes up), it would indicate the opposite, if anything. However, it’s still way too small to make any kind of difference.[/b]
I was actually going to hypothesize that there’d be a small but significant (~ +.20ish) correlation because more possessions means less variance, and so better teams would win more often…but I was wrong.
There are a few things to keep in mind in this interpretation:
1) You may not agree with his rankings. It is just a mathematical formula, after all. If you have some other source with rankings that you feel are better, I can make a correlation with that instead, provided it has all 334 D1 teams ranked. Alternately, I could do it with less (i.e., see how it goes only for top 25 ranked teams), but significance would drop a lot as a result.
2) Looking at tempo in a vacuum isn’t that important. Its effect on recruiting is probably the biggest impact of all, and this analysis doesn’t take that into account.
3) Other stuff you guys will probably come up with that I’m not taking into account now.
DCWildcat
09-17-2006, 10:57 PM
I'll post some other stuff later. I need wait for Pomeroy to reply to an email to see if he can get me a CSV file of last year's RPI ranks (if anyone else has it, that'd be sweet too), so I don't have to type out 1000+ pieces of data.
AugustaDan
09-18-2006, 08:11 AM
DCWildcat wrote: 2) Looking at tempo in a vacuum isn’t that important. Its effect on recruiting is probably the biggest impact of all, and this analysis doesn’t take that into account.
Can't you conclude that the positives from better talent are more than offset by the negatives of higher tempo?
DC Wildcat,
I like graphs. (hint)
Also, I hope you're not planning on using the RPI for anything serious. The RPI is not a measure of how strong a team is.
Jon
UKfaninCO
09-20-2006, 08:31 AM
DCWildcat wrote: I'll post some other stuff later. I need wait for Pomeroy to reply to an email to see if he can get me a CSV file of last year's RPI ranks (if anyone else has it, that'd be sweet too), so I don't have to type out 1000+ pieces of data.
DC,
do you have a web table that you can get the data from? Give me a URL and I'll write you a perl script that mines a table from a URL... That way you can output it however you like.
UKfaninCO
09-20-2006, 08:36 AM
dxwils3 wrote: DCWildcat wrote: 2) Looking at tempo in a vacuum isn’t that important. Its effect on recruiting is probably the biggest impact of all, and this analysis doesn’t take that into account.
Can't you conclude that the positives from better talent are more than offset by the negatives of higher tempo?
How? I don't think you can statistically prove this. Especially since "talent" is mostly a subjective thing. And looking at a player's statistics has to be balanced with the players they play against or you get better than average players, but less"talented" than say top Div 1A talent,in weak conferences dominating the statistical analysis. You would have to do it by analysis, and that's kind of weak when you try to do that. A better measure of talent would probably equate to offensive and defensive efficiency as opposed to tempo. In reality, tempo tells you nothing about how good a team is. Maybe a measure of how entertaining they are though... ;)
AugustaDan
09-20-2006, 09:27 AM
UKfaninCO wrote: dxwils3 wrote: DCWildcat wrote: 2) Looking at tempo in a vacuum isn’t that important. Its effect on recruiting is probably the biggest impact of all, and this analysis doesn’t take that into account.
Can't you conclude that the positives from better talent are more than offset by the negatives of higher tempo?
How? I don't think you can statistically prove this. Especially since "talent" is mostly a subjective thing. And looking at a player's statistics has to be balanced with the players they play against or you get better than average players, but less"talented" than say top Div 1A talent,in weak conferences dominating the statistical analysis. You would have to do it by analysis, and that's kind of weak when you try to do that. A better measure of talent would probably equate to offensive and defensive efficiency as opposed to tempo. In reality, tempo tells you nothing about how good a team is. Maybe a measure of how entertaining they are though... ;)
I don't think it's necessary to quantify talent on an individual basis to make this statement. I think my claim follows logically from the following statements (and remember that these statements are in general, so individual counterexamples don't disprove any claim I'm making):
1. Better talent means more wins. Talent is used qualitatively here and I think this statement must be true or we have a stupid definition of talent.
2. Faster tempo means more talented recruits. This is a claim made ad nauseum on this board. I am not stating that I agree or disagree with this statement.
3. Slower tempo results in slightly better team performance. I'm saying this based on DC's analysis.
So, as tempo increases, the benefit of more talent must be more than offset by the harm caused by a faster tempo because the winning percentage will trend down as tempo increases.
Funnily enough, I had the thought that until last season, the first counterexample out of any UK fan's mouth for statement #1 above would have been the comparison of Florida and UK.
AugustaDan
09-20-2006, 09:33 AM
DCWildcat wrote: I was actually going to hypothesize that there’d be a small but significant (~ +.20ish) correlation because more possessions means less variance, and so better teams would win more often…but I was wrong.
I suspect that you would need to look at the product of the variance in efficiency with the number of possessions or something like that. The variance in the efficiency should decrease with more possessions, but the impact of any variance will be greater because the variance is per possession. To look at it's overall impact on the game, you'd need to multiply by the number of possessions. I'm no statistician, so this may all be hogwash, but it makes sense to me.
Mr. Peanut
09-20-2006, 01:52 PM
Too many confounding variables to draw any conclusions.
If you could measure the effects of tempo inmatched and similarteams over several trials... maybe.
But, when you look around the country there is a big slant toward run & gun, sling up the threes, press full court, all out, etc at the mid major level and below... therefore, majorconfounder in guaging tempo effects on wins/losses.
UKfaninCO
09-20-2006, 03:11 PM
dxwils3 wrote: UKfaninCO wrote: dxwils3 wrote: DCWildcat wrote: 2) Looking at tempo in a vacuum isn’t that important. Its effect on recruiting is probably the biggest impact of all, and this analysis doesn’t take that into account.
Can't you conclude that the positives from better talent are more than offset by the negatives of higher tempo?
How? I don't think you can statistically prove this. Especially since "talent" is mostly a subjective thing. And looking at a player's statistics has to be balanced with the players they play against or you get better than average players, but less"talented" than say top Div 1A talent,in weak conferences dominating the statistical analysis. You would have to do it by analysis, and that's kind of weak when you try to do that. A better measure of talent would probably equate to offensive and defensive efficiency as opposed to tempo. In reality, tempo tells you nothing about how good a team is. Maybe a measure of how entertaining they are though... ;)
I don't think it's necessary to quantify talent on an individual basis to make this statement. I think my claim follows logically from the following statements (and remember that these statements are in general, so individual counterexamples don't disprove any claim I'm making):
1. Better talent means more wins. Talent is used qualitatively here and I think this statement must be true or we have a stupid definition of talent.
2. Faster tempo means more talented recruits. This is a claim made ad nauseum on this board. I am not stating that I agree or disagree with this statement.
3. Slower tempo results in slightly better team performance. I'm saying this based on DC's analysis.
So, as tempo increases, the benefit of more talent must be more than offset by the harm caused by a faster tempo because the winning percentage will trend down as tempo increases.
Funnily enough, I had the thought that until last season, the first counterexample out of any UK fan's mouth for statement #1 above would have been the comparison of Florida and UK.
I see what you are saying, and I'm certainly not a statistician either, but I'm not sure that the "talent more than makes up for the harm done by a faster tempo" statement is necessarily true even based on your 3 statements. I think that PERHAPS more talent will decrease the effects of a higher tempo but I'm not sure you can say that it erases it. And I'm not certain that you could even statistically prove it one way or the other, mostly because we would have to assume some things about "talent" that may be too subjective to quantify in a reasonable manner. Player A is more talented than Player B, both teams play roughly the same tempo, which team benefits more from having the so called better player. Hard to say because then you have to figure out in the heirarchy how good player C,D,E,F,G,H,I J are and compare that to the talent of the teams they play. Too many variables to say definitively one way or another. You certainly can draw some conclusions based on the numbers, but I'm not sure how reliable those conclusions would be.
bleedbluelady
09-20-2006, 04:21 PM
:?:?:?:shrug:
Isn't fun with statistics an oxymoron? ;):P
SunBaller
09-20-2006, 10:57 PM
bleedbluelady wrote: :?:?:?:shrug:
Isn't fun with statistics an oxymoron? ;):P
No, it's a hyperbole. "Your purse weighs a ton". I could sleep for a year". I havent' seen anything on this Forum that make any sense.
AugustaDan
09-21-2006, 09:36 AM
Mr. Peanut wrote: Too many confounding variables to draw any conclusions.
If you could measure the effects of tempo inmatched and similarteams over several trials... maybe.
But, when you look around the country there is a big slant toward run & gun, sling up the threes, press full court, all out, etc at the mid major level and below... therefore, majorconfounder in guaging tempo effects on wins/losses.
There's lots of statistical mumbo-jumbo here, but for those who don't like stats, here's the upshot:
A faster tempo doesn't make a team better, but it exacerbates the differences between teams. This is consistent with DC's prediction of lower variance.
Now the stats mumbo jumbo. This is a bit stream of consciousness because I wrote it as I did the calculations.
I looked at the numbers in an effort test your claim and found that the bad teams, i.e., the really bad teams, tend to play substantially faster, but the better non-BCS schools tend to play at nearly the same pace as the BCS schools. The top half of all schools play at an average tempo of around 67.3, with the BCS schools playing slightly faster.
Thus, for the purpose of this analysis, I will restrict my attention to the top 50% of all schools. Feel free to question the wisdom of this, but I figure it'll get all the BCS schools and a fair approximation of the mid-majors.
For these teams, I repeated DC's calculation and came up with a correlation of -0.02. I'll call that insignificant. So, no correlation between tempo and pythagorean winning percentage.
However, I was a little uncomfortable with DC's use of pythagorean winning percentage as a test of his higher tempo means less variance and a higher winning percentage. I did the calculation with actual winning percentage instead of pythagorean and the correlation between tempo and winning percentage is 0.12 (DC's magic number). For the bottom half of all teams, the correlation is -0.09. Again, I think this may be because the really, really bad teams play very fast in general.
The conclusion I draw from these numbers is that team performance is not enhanced by playing at a faster tempo (that's what the pythagorean correlation tells us), but that the team's actual record is improved by playing a faster tempo, probably due to DC's guess of lower variance.
I would guess that if we look at the difference between the pythagorean record and the actual record, we should get a negative correlation with tempo, i.e. as tempo increases, the difference between actual record and expected record decreases. Doing the calculation, it turns out to be -0.17.
I would also guess that we'd see a positive correlation with the worse teams, i.e. as tempo increases, the teams record gets worse with respect to the expected record. That doesn't seem true as I get a correlation of -0.02.
That still doesn't seem quite right. I don't like the fact that good teams tend to play much harder schedules than worse teams in reality, but in these calculations, the expected record is against a "normalized" schedule. This would tend to overrate the expected schedule of the teams with the tougher schedules and underrate the expected record of the teams with the weaker schedules.
I think that using raw numbers in this case is more appropriate because we don't want to normalize the quality/tempo of the competition. We want the numbers as they actually occurred. Ideally, I would use in-conference numbers only, but I don't have those.
So...
I did the calculation again using the raw numbers. This time I calculated a raw pythagorean percentage and used the raw tempo. We get an even more pronounced correlation
This time, we see the expected negative correlation between tempo and difference from expected for the top half of teams, but it's bigger: -0.23.
And, we see the expected positive correlation between tempo and difference from expected for the bottom half of teams: 0.14.
Neat.
Edit: All stats courtesy of Ken Pomeroy's website: www.kenpom.com.
DCWildcat
09-25-2006, 12:36 AM
Mr. Peanut wrote: Too many confounding variables to draw any conclusions.
If you could measure the effects of tempo inmatched and similarteams over several trials... maybe.
But, when you look around the country there is a big slant toward run & gun, sling up the threes, press full court, all out, etc at the mid major level and below... therefore, majorconfounder in guaging tempo effects on wins/losses.
Be careful here. An effect is only going to be confounding if it won't cancel out in the long run. I.e., a massive study of IQ scores across an entire population will include all kinds of confounding variables (English comprehension, schooling, etc.), but those will cancel out with a truly random sample.
If the study was done only using Princeton students, though, they wouldn't cancel out. Princeton students are not representative of the entire population.
Since I'm using the ENTIRE population of D1 basketball, I have the most representative sample possible. So there shouldn't be problems with confounds.
DCWildcat
09-25-2006, 12:38 AM
JPS wrote: DC Wildcat,
I like graphs. (hint)
Also, I hope you're not planning on using the RPI for anything serious. The RPI is not a measure of how strong a team is.
Jon
Given how weak the correlations have been thusfar, the graph would look pretty much meaningless. With a .12 correlation, you'd see what appears to be a random splattering of dots with an arbitrary line drawn through them. The stronger the correlation, the more those lines would stick to the line, and if there was a perfect correlation (1.0), all the dots would fall perfectly on a line.
So I guess I could make a graph, but it wouldn't do much :D
Edit: Oh yeah, about the RPI. I hate it, it's stupid, poorly measured, and basically a cop-out. Unfortunately, if I'm going to use all of D1 basketball, I need something that ranks EVERY team, not just the top 25. Very few ranking services do that, and the RPI is one of them. :(
DCWildcat
09-25-2006, 12:48 AM
I'll continue this in a different thread
vBulletin® v3.7.2, Copyright ©2000-2008, Jelsoft Enterprises Ltd.