I was very excited to release the first version of RAW Pitcher Ratings this past week, but there have been a few questions about how they work. One email conversation in particular helps explain some of these questions.
I don't claim to be a statistics whiz (my degree is in English), but I think I have a pretty solid grasp on how they relate to baseball. Having played the game helps as I can relate why stats are good or bad to my own personal experience.
Q: How do you find your:
CHr – Chase rate (swings at pitches outside the strike-zone)
CTr – Contact rate (rate at which opposing hitters make contact on a pitcher’s offerings)
WHr – Whiff rate (rate at which opposing hitters swing and miss)
Components of RAW, but don't know where you could access that info. Fangraphs?
A:Yes, Fangraphs. All of these starts are found under "plate discipline". Fangraphs has O-Swing%, which is another way to say Chase rate (I only use the term chase rate because it's a smaller abbreviation). Contact rate is also on there as Contact% and Whiff Rate is simply 100% minus contact%. So if a batter has an 80% contact rate, their whiff rate is 20%. If you notice on the RAW spreadsheet, I have CTr and WHr, the number I use in the equation is whiff rate, contact rate is only there so I can easily find the whiff rate. I also change the percentages to number format for clarity purposes only, since my outcome is a number and not a percentage.
Q: Interesting. By using Contact Rate & Whiff Rate in the same metric aren't you double-dipping though?
A: I don't use contact rate at all in the actual RAW equation. It's only there to find the whiff rate. If I could, I would erase it from the spreadsheet, but then I would have to manually enter each whiff rate.
Q: Oh, I see it. you use Whiff rate, NOT contact rate. My mistake.
What's your experience with this metric as an indicator ... pretty predictive? Also think there's be any value in looking at last 30 days instead of YTD ... to get a more recent performance rating?
A:Well, I have been tweaking this all season and I ended up adding in line drive rate from my original concept as well as tweaking the weight of each rate stat. However, this metric was successful in allowing me to target players like Carl Pavano and Jason Hammel to begin the season.
I did a test run in early May and it indicated that pitchers like Randy Wells, Tom Gorzelany, Brandon Morrow, Ricky Nolasco and a couple others were pitching better than their ERA's showed or in the case of Gorzelany showed that what he had done was no fluke.
I am very excited to follow this as the season moves along and especially when all is said and done to see where everyone stacks up.
One problem that I find is sample size, which is why I waited until the all-star break and why a 30-day sample size may not work as well. When I ran the stats in May, certain players like Justin Masterson and Matt Garza had much higher strikeout rates than they do now. Also, at that time, I think that I had put a bit too much weight on strikeouts as where now it's more about K/BB rate.
This is where comparing a season's RAW rating to a career RAW rating may come in handy. It is something I will definitely do for next year.
What I plan to do for the rest of the season is run the numbers through the system every two weeks and keep track of the trends to see if that helps in predicting future outcomes.
RAW is definitely a measure of what a pitcher has done and not what he will do. The "what he will do" part comes from analysing the data against ERA, BABIP, etc. So if I see that Francisco Liriano has a high RAW rating (99.81 is great) and his ERA is 3.86 with a .361 BABIP, it would indicate that he has pitched much better than what the ERA, WHIP or win total indicate, especially looking at his extremely low 0.17 home runs per nine innings rate.
The RAW number itself is sort of an indication of how "dominant" that pitcher has been, taking out of the equation what happens as a result of the ball in play (home runs are not considered balls in play).
I think my biggest challenge going forward will be to use all of the ingredients (line drive rate in particular) to try and find their correlation to BABIP. For instance, a pitcher like Tim Hudson, who strikes out very few and has walked over three per nine innings, is ranked low on the RAW ratings. However, he gets a ton of ground ball outs and his line drive rate against is hovering around 10 percent. In a way then, it makes sense that his BABIP is an extremely low .232, but is it still too low?
Anyway, that's where I'm coming from. I'm am definitely open to constructive criticism or suggestions. That is part of the reason for releasing the spreadsheet now. This way anyone who wants to dig into it can follow along the rest of the season.
You can dowload the RAW Pitcher Ratings spreadsheet here.