Gene Honda Civic Posted September 22, 2004 Share Posted September 22, 2004 MLB debuted a new stat yesterday... O-Zone Percentage I did a hatchet job on it at my blog. Here's a better portion of it. What is O-Zone Factor? O-Zone factor is derived from calculating a teams ability to score runners from second and third base(scoring position) vs. a teams ability to prevent runners from scoring from scoring position. Here is the current O-Zone factor standings Where do the White Sox rank? The White Sox are 14th in O-Zone Factor. However, in the two sub-stat's that go into calculating O-zone factor the sox rank first and last. In Offensive-RISP%(renamed for ease of use), the sox are the best in baseball. They score runners from scoring position more often than any other team in baseball. In Defensive-RISP%, the Sox are the worst team in all of baseball. They allow runners to score from scoring position more often than any other team in baseball. So is the O-Zone Factor an accurate predictor of winning percentage? This is the best. I didn't even have to come up with that question. It's right there in the article linked at the top of this entry. So here is MLB's answer. After running a few numbers, it was discovered that a greater correlation exists between recent O Zone success and winning percentage (.726 correlation from 1999-2004) than between red-zone scoring and victories in the NFL (.657 correlation last two seasons). In other words, how a team performs both offensively and defensively with RISP has been just as, if not more important, than an NFL team's red-zone success -- when comparing recent data. NHL scoring data offers similar results when looking at the past two seasons. If you take a team's success on power plays and subtract the rate they allow power play goals, there is also a strong correlation (.587) when comparing that number to winning (in this case team points), but still not as high as O Zone data from the past six seasons. For those of you unfamiliar with the term correlation, the closer the correlation is to 1 the more accurate a predictor the item is. And here is the inherent problem with 0-Zone factor. MLB is trying to sell it as a predictor of success, which it's not. It compares it to the NFL's red-zone scoring, but neglects to mention that Rushing Yards, Time of Possession, and turnover margin all correlate better than does red-zone scoring. Similarly, there are plenty of other baseball statistics that correlate with winning perctage better than O-Zone factor. OPS minus OPS allowed, has a correlation of .911, Runs Scored minus Runs allowed is .951, compared to O-Zone's .726. O-zone is flawed because it's a rate stat. It's essentially Offensive BA w/RISP minus Defensive RISP BAA. A team could allow 1 run all season, but allow it to score by being driven in from scoring position, and they would automatically have a negative O-Zone, unless they plated every one of their runners who reached second base. It doesn't take into account the amount of times a team puts runners in scoring position, or allows runners in scoring position. For some reason they failed to see this error. This stat doesn't recognize that a run scored by a double with a runner on first is just as valuable as one with a RISP. Similarly it doesn't account for HR's without runners in scoring position, which are very valuable. A run is a run, it counts just the same, no matter the route taken to get there. They did, however leave some raw data, which is fully sortable, and can be used to better evaluate a teams chances of winning. The column labeled 'Net-RS-RISP', is a better indicator of winning percentage than O-zone. 'Net-RS-RISP' is essentially the number of baserunners who reach scoring position minus the number of baserunners a team allows to reach scoring position. This stat correlates much better than O-zone at a correlation of .8967 On the whole it's a pretty useless stat that MLB trotted out there without much thought to try and capitalize on the whole sabermetrics craze. The plan backfired, and I'm sure anyone with a little sense is laughing at the dinosaur that is MLB. MLB has done some good things, streaming all games over the internet at a relatively cheap price is probably among the best of them. That was ahead of their time, but they should allow the study of baseball to be led by those that truly care about the game, not those who are looking to turn a profit on it. The Big Inning There was a little bit that I was going to write about tonight's game, but I've rambled for far too long. I'll just leave you with this. DJ put some poor statistician to work to calculate what percentage of Jon Garland's runs came from "big innings", which he described as 3 or more runs. The result was that JG had allowed 42% of his runs this season during the 'big inning'. It's long been my belief that JG does not, in fact, allow the 'big inning' at a significantly greater rate than other pitchers, but Hawk and DJ have chosen to harp about it. I'll wait until the end of the season, but I'm going to go through the box scores and compare JG to Mark Buehrle. My gut feeling is that MB will be nominally better, but not significantly. Not as much as Hawk and DJ would like you to believe. Quote Link to comment Share on other sites More sharing options...
Gene Honda Civic Posted September 22, 2004 Author Share Posted September 22, 2004 Oh I left the "big inning" thing on there because I wanted to compare JG to a couple of other pitchers too... I was hoping for some suggestions, my brain is currently broken. I was thinking they should be 1) in the AL 2) a #3 starter, or thereabout. 3) pitch in a hitter friendly home park Quote Link to comment Share on other sites More sharing options...
MinnesotaSoxFan Posted September 22, 2004 Share Posted September 22, 2004 oh boy.... another stat i need to learn before next season. Quote Link to comment Share on other sites More sharing options...
FlaSoxxJim Posted September 22, 2004 Share Posted September 22, 2004 I was going to say it but you already did, Cheat... an r or r^2 of 0.726 is FAR from a strong correlation. Those numbers you posted for OPS and Runs Scored minus Runs - now those are strong correlations (as you said, closing in on a perfect r-value of 1.00). I will take exception to your suggestion taht that NHL stat with r=0.587 is a 'strong correlation.' From a statistical significance standpoint it is anything but strong, basically saying y is a good predictor of x just over half of the time. In other words, you'd do just as well to flip a coin. Quote Link to comment Share on other sites More sharing options...
Hatchetman Posted September 22, 2004 Share Posted September 22, 2004 I will take exception to your suggestion taht that NHL stat with r=0.587 is a 'strong correlation.' From a statistical significance standpoint it is anything but strong, basically saying y is a good predictor of x just over half of the time. In other words, you'd do just as well to flip a coin. actually no correllation would be r-squared = 0 Quote Link to comment Share on other sites More sharing options...
FlaSoxxJim Posted September 22, 2004 Share Posted September 22, 2004 (edited) actually no correllation would be r-squared = 0 True, which is why I kept with the coint toss analogy. With only two possible discrete outcomes, there's a 50/50 chance of getting it right (accidental perfect correlation) each time. With open-ended continuous variables (like in sports stats), the analogy is imperfect. The point was that no statistician is going to consider a correlation coefficient of 0.5 "strong". Think of it as the difference in guessing the outcome of a coin toss (half of the time you are dead-on) versus guessing what number is going to come up on a roulette wheel (5% chance with 20 numbers). The roulette analogy is certainly closer to the sorts of sports stats in question, where accidentally guessing right happens much less than half of the time. But if the NHL stat is being used as some kind of predictive tool, having a best-fit line that only allows x to be predicted based on y about half of the time, it is no better than a coin toss. Edited September 22, 2004 by FlaSoxxJim Quote Link to comment Share on other sites More sharing options...
southsider2k5 Posted September 22, 2004 Share Posted September 22, 2004 Damn with O-Zone in the title, I was hoping for something at least NC-17 Quote Link to comment Share on other sites More sharing options...
Hatchetman Posted September 22, 2004 Share Posted September 22, 2004 True, which is why I kept with the coint toss analogy. With only two possible discrete outcomes, there's a 50/50 chance of getting it right (accidental perfect correlation) each time. With open-ended continuous variables (like in sports stats), the analogy is imperfect. The point was that no statistician is going to consider a correlation coefficient of 0.5 "strong". Think of it as the difference in guessing the outcome of a coin toss (half of the time you are dead-on) versus guessing what number is going to come up on a roulette wheel (5% chance with 20 numbers). The roulette analogy is certainly closer to the sorts of sports stats in question, where accidentally guessing right happens much less than half of the time. But if the NHL stat is being used as some kind of predictive tool, having a best-fit line that only allows x to be predicted based on y about half of the time, it is no better than a coin toss. i don't agree with you, but i'm not interested in debating the point. here's a link on the subject: http://www.sportsci.org/resource/stats/effectmag.html Quote Link to comment Share on other sites More sharing options...
Gene Honda Civic Posted September 22, 2004 Author Share Posted September 22, 2004 I was going to say it but you already did, Cheat... an r or r^2 of 0.726 is FAR from a strong correlation. Those numbers you posted for OPS and Runs Scored minus Runs - now those are strong correlations (as you said, closing in on a perfect r-value of 1.00). I will take exception to your suggestion taht that NHL stat with r=0.587 is a 'strong correlation.' From a statistical significance standpoint it is anything but strong, basically saying y is a good predictor of x just over half of the time. In other words, you'd do just as well to flip a coin. Thankfully I didn't actually use the NHL comparison, that was taken verbatim from the linked MLB article... That was their attempt to make their stat appear meaningfull. Quote Link to comment Share on other sites More sharing options...
jackie hayes Posted September 22, 2004 Share Posted September 22, 2004 True, which is why I kept with the coint toss analogy. With only two possible discrete outcomes, there's a 50/50 chance of getting it right (accidental perfect correlation) each time. With open-ended continuous variables (like in sports stats), the analogy is imperfect. The point was that no statistician is going to consider a correlation coefficient of 0.5 "strong". Think of it as the difference in guessing the outcome of a coin toss (half of the time you are dead-on) versus guessing what number is going to come up on a roulette wheel (5% chance with 20 numbers). The roulette analogy is certainly closer to the sorts of sports stats in question, where accidentally guessing right happens much less than half of the time. But if the NHL stat is being used as some kind of predictive tool, having a best-fit line that only allows x to be predicted based on y about half of the time, it is no better than a coin toss. poorme is right. A coin flip would have a correlation of 0, not 1/2. A correlation of 1/2 means that one variable "explains" 1/2 of the total variation in another each time. Quote Link to comment Share on other sites More sharing options...
FlaSoxxJim Posted September 22, 2004 Share Posted September 22, 2004 Ach, yes he is right and you are as well. the original point was that nobody is going to consider r^2 = O.5 to be a 'strong' correlation. I could have and should have stopped there without trying to force a weak analogy. Let me try it this way. For some omnistat purported to be a strong predictor of how a team will fare in a season, and r^2 of 0.5 is less than stellar. How is that? Quote Link to comment Share on other sites More sharing options...
jackie hayes Posted September 22, 2004 Share Posted September 22, 2004 Ach, yes he is right and you are as well. the original point was that nobody is going to consider r^2 = O.5 to be a 'strong' correlation. I could have and should have stopped there without trying to force a weak analogy. Let me try it this way. For some omnistat purported to be a strong predictor of how a team will fare in a season, and r^2 of 0.5 is less than stellar. How is that? Sounds fine, but then I know nothing about sabermetrics. But I'm curious, why is there so much emphasis on getting it down to one number? Why avoid multiple regression? Quote Link to comment Share on other sites More sharing options...
Hatchetman Posted September 22, 2004 Share Posted September 22, 2004 Sounds fine, but then I know nothing about sabermetrics. But I'm curious, why is there so much emphasis on getting it down to one number? Why avoid multiple regression? if it's going to be on mlb.com it's got to cater to the LCD. think QB passer rating. Quote Link to comment Share on other sites More sharing options...
FlaSoxxJim Posted September 22, 2004 Share Posted September 22, 2004 if it's going to be on mlb.com it's got to cater to the LCD. think QB passer rating. Then I have the perfect stat... The team that wins the most games is the one I think is going to take the Division... Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.