Showing posts with label Fantasy Strategy. Show all posts
Showing posts with label Fantasy Strategy. Show all posts

xWHIP 2.0: The Next Generation

The following article is from my most recent article for The Hardball Times.

A few months ago, I debuted the first version of the expected WHIP (xWHIP) calculator, which took a pitcher's batted ball distribution and, in determining an expected number of hits, calculated that pitcher's expected WHIP. The tool was tinkered with and refined until version 1.4.3 was released and that, until now, has been the primary xWHIP tool available. xWHIP 1.4.3 overexpected WHIP a bit, but was otherwise pretty solid. Especially for relative comparison purposes, xWHIP 1.4.3 was a useful fantasy tool.

Not long ago, I was introduced to a fellow stathead by the name of Martin Alex Hambrick. He had done some number tinkering similar to what I had done independently with the xWHIP calculator, and he had an idea. He brought that idea to my attention, and from it a new formula for expected xWHIP was born.

Alex's idea was that a pitcher's actual innings pitched (aIP) are as much the by product of luck as expected hits (xHits). The theory is that a medley of defense, umpires, errors, random luck and the like skew the length of innings. The pitcher, for example, does not particularly control dropped third strikes by his catcher.

This idea is somewhat captured in the K% (K/TBF) and BB% (BB/TBF) movement of sabermetrics that rejects K/9 and BB/9 because the length of innings is largely out of the control of the pitcher, thereby skewing both K/9 and BB/9. Accordingly, we began work on a new denominator for xWHIP that incorporated an expected innings (xIP) total based on a pitcher's outs-creating events.

With this idea in mind, we began work on a new xWHIP calculation. Law school delayed my work on a final formula until this week, but with "way too much time on my hands" (i.e., any lawyers out there need a law clerk for the summer?), I finally got around to hammering out a reliable formula and user-friendly interface, calibrated to Baseball Info Solutions (BIS) data.

The current formulation for expected innings is as follows:

xIP = ((K*1.000075)+((BB-IBB+HBP)*0.00016)+((0.808)*GB)+((0.278)*LD)+((0.992)*IFFB)+((0.745)*OFFB)+(0.020099*(BB+HBP+xH)))/3

The coefficients in the above formula represent the expected outs by event rate. You might notice the two percent adjustment applied to both modified walks (BB-IBB+HBP) and expected hits (xH). That figure represents a ten-year average outs-per-runners-put-on-base rate (ORB). ORB encapsulates the ten-year league average pickoff and caught stealing rates.

Because catcher defense and a pitcher's pickoff talents are difficult to measure, and also not widely available, using a league average rate helps make the calculator more accessible. The final xWHIP figure should be mentally modified based on one's own perception of a catcher's pickoff ability or a pitcher's pickoff ability. If Jason Varitek is the catcher, you might want to raise the pitcher's calculated xWHIP, while the opposite would be true for those pitchers handled by Yadier Molina.

Alex is working on a simplified "Quick xWHIP" formula that simplifies the xWHIP calculation even further, to the point that you could do it on a calculator. He'll tell you more about that (and the accuracies of both xWHIP 2.0 and Quick xWHIP) in a (near-) future post. All I can say for now regarding the calculator's accuracy, at least to some degree of certainty, is two things. First, xWHIP works best—that is to say, it is most predictive—when you use multi-year data rather than year N-1 data. Second, the R^2 of the data seems to be solid for a predictive state.

Someone once told me (or maybe I just read it somewhere) that an R^2 of .30-.35 is strong for a predictive stat, while a .60 or greater R^2 is what is required of an evaluative stat. Using 2007 xWHIP 2.0 to predict 2008 actual WHIP resulted in an R^2 of .34 amongst the 78 pitchers who faced a minimum of 500 batters, compared to an R^2 of .26 for 2007 actual WHIP. Likewise, using 2008 xWHIP to predict 2009 actual WHIP resulted in an R^2 of .36 amongst the 80 pitchers who faced a minimum of 500 batters, compared to an R^2 of .30 for 2008 actual WHIP.

Strangely, however, using 2009 xWHIP to predict 2010 xWHIP amongst the 82 pitchers who accrued 500+ total batters faced merely resulted in an R^2 of .15 (compared to a .14 R^2 for 2009 actual WHIP). Maybe I crunched the 2009-2010 data incorrectly. Maybe this is a sample size issue. Maybe not. As I mentioned above, Alex will supply more details on the accuracy of xWHIP 2.0 shortly.

I also tinkered some with the expected hits formula, but the changes are relatively minor and hardly warrant discussion. The important thing to note about the new xWHIP tool is that it is now calibrated per the past five years of BIS data rather than Game Day. I have done this because I believe that Fangraphs utilizes BIS, not Game Day, as their source for ball in play (BIP) data. Accordingly, this should make the tool more accurate for the average user. Most of the data stood relatively stable, but here are the new expected hits by batted ball types:
  • Popups: .004
  • Groundballs: 0.236
  • Outfield Flyballs: 0.250
  • Line Drives: 0.716
These data points include home runs, which is why the Outfield Flyball expected hits rate is so high. If you take home runs out of the equation and account for them separately (as the xWHIP calculator does), the expected hits rate, per BIS, for Outfield Flyballs and Line Drives falls to .158 and .714, respectively.

You can download the new xWHIP tool, version 2.0, by clicking here. The password to utilize the xWHIP tool is still "soto 18" and the batted ball data you will need to plug in can be found at Fangraphs.com.

Picture below is a screenshot of the xWHIP 2.0 tool, which was used in my Zack Greinke forecast article. For explanatory purposes, this screenshot has the 2010 numbers of Roy Halladay plugged in.


As the instructions on the tool indicate, the gray cells are for data you should manually input. The magenta park factor cell is also a manual data cell, though the number should be left at "1.00000" unless you have the relevant park factor HR/FB index figure. You should not enter any data into any of the blue, green or yellow-orange cells.

The green cells feature the line drive-regressed expected-ball-in-play data. The yellow-orange cells display the expected innings, expected hits and expected WHIP for the pitcher, irrespective of defense. If you enter data into the Team Innings Pitched and Team UZR gray cells, then the blue cells will display a crude defensive adjustment to the expected hits total, assuming uniform defense and that all saved hits would be of the singles variety. All of the data cells are pre-formatted to visually round all numbers to keep the sheet clean, though cells will retain the full value of any number entered.

I also included a cell for xWHIP 1.4.3, calibrated from Game Day to BIS, in case people wanted to know a player's expected WHIP using expected hits and actual innings, rather than expected innings.

I hope everyone enjoys this. If you have any questions/concerns/comments/criticisms, please post them in the comments below or email them to gameofinchesblog@gmail.com, with the subject line "xWHIP 2.0 Calculator."

On a final note, I would like to give a special thank you to several of my THT colleagues who have been invaluable in the creation of the xWHIP 2.0 tool. Without the assistance of Derek Carty, Dave Studemund, and Harry Pavlidis, none of this would have been possible. I apologize to each of you for my incessant e-mailing in attempt to work out the mathematical kinks in the formula.

Why I Hate Adam Dunn: An Analysis Into Batting Average and Fantasy Baseball

I hate Adam Dunn because he's slow, he can't hit the ball, he's awful at defense, and he doesn't care about baseball. Oh wait, that's why baseball GM's hate Adam Dunn. I love the fact that Dunn is an on-base machine and hits 40 home runs like clockwork. I would love for him to be my team's DH. However, I do not like Adam Dunn for fantasy purposes.

Dunn hits a decent offense, surrounded by Z-Pack and Nyjger Morgan. And when you include Pudge, Christain Guzman, Josh Willingham, and Elijah Dukes, that line up is not half bad. Sure it's no New York Yankees line up, but it's certainly no Kansas City Royal line up. Which means Dunn can easily get 80+ runs, 100+ RBI's to go along with his 40 home runs. So why do I hate Dunn? It's because of that career .249 average. And don't let his .267 batting average last year fool you. That's the outlier, not the norm. In fact, Dunn had a career high in BABIP last year (.324).

But it's not just Dunn I'm very low on. I purposefully avoided others like Carlos Pena, Mark Reynolds, Jay Bruce, and even Ian Kinsler. The reason I did this was not only was I burned by low average guys last year, but because you can not make up average on the waiver wire.

Have you ever teachers, or educators, or parents tell you that you had to get good grades during your first few semesters of school? Ever wondered why? Well the answer is easy- it's because the grades you post earlier in your educational career tend to have more of an impact on your GPA than the grades you earn towards the end (that and I'm sure those people are only looking out for your well-being. But let's ignore that right now). If you got straight C's your first year in school, it's clearly within the realm of possibility to raise your GPA. But it's much harder to do so. The same concept holds true for batting average (for ERA and WHIP too for that matter). And it's this reason that you can not find batting average on the waiver wire to help out your team.

You can always find HRs, RBIs, and SB on the waiver wire. And when you pick up players for that sole purpose, you can most certainly accrue a handful of these counting stats to lift you up. But if you see you have a poor average, finding guys with high averages will not have the same effect as finding a guy like say a Luke Scott to get you a few more HRs. Now the earlier you realize you need batting average help, the easier it will be for you to fix your team. But as your batters accrue more and more at-bats and that sample size gets bigger, the harder it will be for you to fill your batting average void.

Now this strategy only holds water for roto leagues. If you're in a head-to-head league, feel free to punt a category like batting average. Hell, punt WHIP and ERA too if you so choose. But in roto- you can not punt ANY category and expect to do well.

So when I'm drafting and I see a guy like Adam Dunn available- I'm going to pass. I can get home runs, RBIs, and runs elsewhere in the draft. Hell, I could easily get those categories in free agency. But I can't get batting average. And why would I want to make it harder for myself? I know Dunn will hurt my batting average and put myself in a bigger hole to climb out of. I'm essentially guaranteeing myself that I'm going to get straight C's my first year in school. And that's never fun.

NOTE: Sure, if you're smart and drafted a whole bunch of high average guys from the getgo, you can take the hit of ONE player having an Adam Dunn-like BA. But again, it might be more hassle than it's worth.

In Defense Of My Fantasy Strategy

First of all Journalissimo, when did you become a contributor, I thought we told you "NO!" Anyway, this seriously is a private matter; although via new technology like Twitter and Facebook America's personal privacy has declined, that doesn't mean it went the way of the DoDo bird. But since you have publicized it, I will retort.

1) Right now my Roto teams are currently both in third (although in the pay co-op league we've been first for a long time and it just so happens to unfortunately be 3rd at this exact moment), one of my head-to-head teams is in 2nd place and I won my last fantasy football league. Let's give me some props, establish my ethos if you will, that after a few years of this I've gotten the hang of it. And although this season's not even a month old, I don't think it's unfair to say that a strategy of mine is worth, at minimum, entertaining (and of course at maximum following).

2)Manny Parra sucks! I don't want any player on my team for the sole purpose of having them, Parra included. Although I believe in accumulating pitchers, I believe those pitchers still have to be quality. Guys who I drafted in the later rounds or whom I pick up in free agency all have to meet personal standards of quality. Every single starting pitcher on my team I believe will have a low WHIP and ERA. This must come first. And if for some reason they don't, they need to strike out a shit ton of guys. (Ideally getting a lot of wins helps too and/or if they don't get a lot of Ks they DO get a lot of W's, but that's so hard to predict that I just really stay away). I believe C.C., Jurrgins, W-Rod, Kazmir (although I'm skeptical lately), Liriano, and Slowey will all have low WHIP and low ERAs or have have proven to me that they can realistically do so. Brett Myers is the only exception- although currently he's our team's best K guy.

2) It's necessary. One of the team main benefits of this strategy is that we will more easily be able to accrue W and K. If all of our guys have low WHIPs and ERAs like I predict, then the other to stats are just icing on the cake. And not only that, but there are other teams that have better pitching than us (like David "MVP" Eckstein's team- seriously guys, why are you still trading with him, he's gotten the better end of like the 10 million trades he's made already!). If we want to compete with better teams, then we NEED all the help we can get. And this also gives our team the advantage over the few teams that have absolutely no pitching depth

3) This is a Zero Sum Game. If I were the only owner to have this strategy, then yes maybe I could be blamed for causing inflation. But I am not. If I don't get a lot of pitchers, someone else will. I believe this is an effective strategy so if someone else implements it (which they will), our team will be hurt in some way. This will cause "overvaluing" as you say. But just because I partake in this strategy doesn't mean I am to blame.

4) Who says we can't have awesome hitting all year? This is an assumption you make and I don't think it's necessarily true. I think I drafted extremely well and I think, and I know you agree, it's pretty stacked. In fact right now we're first in R and SB and t-2nd in HRs. I think this offense can consistently produce all year. And if for some reason one player gets injured, we can always ride the wave of someone who's hot and sitting in waivers in the short term. You claim "A suitable replacement might not always be available on the waivers, but perhaps on the bench." "Perhaps"? Perhaps our bench sucks. Perhaps, there IS someone good via waivers. Perhaps DaMonkeys will win that pay league. Perhaps a lot of things COULD happen. But I can almost assure that at least ONE good player will be available via waiver for us.


5) How can you say for sure when a player is in or out of a streak? Recently, I got fed up with Chris Iannetta. The day I benched him he hit a HR with 2 RBI and 2 R and went like 2-4. In the short term, it was a terrible idea to bench him. But who says in the long term our replacement (Suzuki) won't be better? Maybe Iannetta will slump some more or maybe he just got out of it. The point is you can never really say for sure when a player is in or out of a slump. You can never tell if a player will continue his hitting streak or if the next day he goes into one of his flows. In fact, because it's a roto league, we can handle players that go streaky for a bit. We are projecting how a player will perform over the course of an entire 162 game season. If I project a player to hit .300, I can deal with him hitting .100 for a week or so but I can say with confidence he'll rebound. Obviously at some point if a player you think is going too many ABs while slumping you have to bench him- which is what I did with Iannetta. But I'm not going to bench Cristain Guzman when he goes like 6-50 one week-- in fact I know we will do this soon because no way in hell he's going to manage a .400+ BA. In fact, you are the one that will hurt the team long term if you have batting depth that will sub in when a player just because he had a bad few games. Because of the uncertainty of streaks, I will say you are more prone to potentially hurting the team that way.

I believe all players on the team are very good and will positively produce. Which means all the offensive starters will produce and so will every single pitcher I have. So why waste batting on the bench? Even though if I have them on my team so I believe they are good, they do nothing to help my overall team