The Quintessential Sabermetrics Argument: Batting Average (and Hits)

If you believe in statistics, then you have undoubtedly encountered what I like to call "The Chris Rongey Quotient" (granted, in hindsight, arguing that Rios is a worse value than Nix was quite silly on my part -- still the underpinnng denial of Rongey that stats explain important things is fun to listen to). This set of humanity consists of people who have watched sports longer than you, know more about useless stats than you, may or may not work "in the industry," think Andre Dawson is better than Tim Raines and also ignore/insult all of that which disagrees with them. Such persons will constantly pester you with the same statements when you try and talk about sports intelligently: Who the hell is Billy Beane and why is his stupid book (which I've never read) so stupid? Stats are for people who live in their mother's basement. You can't score runs if you don't get any hits. You just made that stat up.

Maybe you are not a stat-a-phobic person. Maybe you are just a person who wants to know "whats up." You like sports and want to learn more, but do not know where to begin.

Whether you are the guy looking to shut Chris Rongey down or the guy who wants to learn more about basic sabermetrics, let this post and those that follow it (I am dubbing this series "The Quintessential Sabermetrics Argument") be your guiding light. We will begin with the basics (the quintessential truths, etc.) and then move towards their application.

Lets get things started with a very basic topic: Batting Average (and Hits).

To explain why batting average and hits are pointless "metrics" by which to measure a hitter's abilities and run scoring, we must consider first what is a hit. To put it most simplistically, a "hit" is a ball put into play which is not converted into an out. The question is not what is a hit, but why is it a hit, which illustrates the futility of the metric. Does the hitter truly earn his "hit?" In theory, yes. A ball smoked to the gap is clearly an "earned hit." But why was the hit earned? Where is this "gap?" Is it a fix position? No.

A hit is simply this: a ball put into play that slips through the defensive positioning and ability of the fielders. Are either of these factors within the control of the hitter? No, not really. The difference between a double to the gap or a caught liner is simply "was the shift on?" A hit up the middle versus a double play can be a question of whether or not the shortstop was holding the runner at second on. The difference between a liner down the line and a caught ball is whether or not Ryan Howard was playing 1B or DH in an interleague game.

Clearly the hitter controls or exercises some control over the direction of the ball and the strength of contact by "timing" and "squaring" the pitcher's offering, but once the ball is in play, whether or not that balls is a "hit" is almost entirely depending on the positioning and ability of the defense. The glaring exception to this rule, of course, is the home run. That is a "hit" truly and 100% earned, although fielders can even steal those sometimes.

In short, hits are not something a player particularly controls. There is a lot of luck involved and over long enough sample sizes, luck tends to average out. It is not shocking, therefore, that over 162 games against 20 or so teams, a player's collective "balls in play" (BIP, non-HR balls put into play) which are converted into hits tend to fluctuate between a normative band of numbers (usually between .290 and .310, though any given player's BABIP varies based on his speed, types of contact (each of GB, FB, LD are differently correlated with BABIP), and strength of contact). Last season, the lowest BABIP a team had was .285 (the Reds) and the highest a collective team had was .326 (the Angels). Only four teams did not have a collective BABIP between .288 and .312 last season. The MLB average BABIP last season was .302.

Thus, knowing that BABIPs tend to normalize (in aggregate towards .300 and individually towards a player's expected BABIP and that hits are mostly defendant on BIP averages, it is not so difficult to conclude that hits are a poor measure of a hitter's ability -- if for no other reason than a hit is more in the fielder's control than that of the hitter.

Is batting average also a poor metric by which to measure a player and team's ability to score runs? I will pretend that you answer my rhetorical question by stating "of course it isn't, that's why we have RBIs" because 1) runners being on is situational and independent of a hitter's ability, 2) the normalizing effects of BIP do not cease effect in high leverage (clutch) situations, and 3) clutchiness really does not exist (read the link for more info on why).

To answer my question, I posed another question: how does one score runs? Scoring runs is accomplished by a two-step process: putting runners on and moving them over. Putting a runner on base is measured by On Base Percentage (OBP), which accounts for both hits and walks. Hits are largely a function of luck and regression towards some mean over time, while walking is more of a static skill (a player's ability to read the strike zone and determine a pitch's trajectory is not as dependant on outside factors other than an umpire's [in]ability to call balls and strikes). Moving the runner over is measured by the hitter's power, or ISO (Slugging Percentage (SLG) minus Batting Average (AVG)). A double will move a player over more bases than a single and a triple more than a double, while a home run will clear the bases and score the batter. The higher a player's power, the higher his SLG. Because SLG is measured as ((1B)+(2*2B)+(3*3B)+(4*HR))/(AB), a player with absolutely no power (hits only singles) would have a SLG of 1B/AB, where H (hits) would be equal to 1B. Thus, a player with no power's SLG would be equal to his AVG (AVG=H/AB). As a player has more power, his SLG becomes larger than his BA. This is why a player's power is measured by ISO.

Where, I dare ask you, is "hits" a component of this runs-scoring model? It exists, hidden away in getting on base and to some extent moving the runner over, but the "ability to get more hits than the average guy" component of the game that most people attend to when they say "he's a good hitter" is more noise than evaluation. " AVG, though not entirely useless, is a misleading and inefficient metric by which to measure runs scoring ability. It's a part of the equation, but it is a misnomer to point to batting average as a point of leverage in the equation. The best metrics which account for runs scoring are OBP (which encapsulates AVG (which does account for 50-65% of OBP)) and ISO.

Any questions?

6 comments:

The 'Bright' One said...

this is random, but you know what we need? xTTO

Jack Cust's HR/FB rates were crazy high in 2007 2008

David "MVP" Eckstein said...

HR/FB rates do not normalize for hitters. More power, more HR/FB. Ryan Howard hits flyballs harder and farther than David Eckstein.

The 'Bright' One said...

in that case i may have to retract my retraction on xFIP

David "MVP" Eckstein said...

HR/FB normalizes for PITCHERS. Remember, hitters "control" the strength of contact

The 'Bright' One said...

with the unbalanced schedule, pitchers face different level of competition based on their division/league. Maybe the hitters in the AL east have more "pop" hence the pitchers in the AL east are exposed to higher HR/FB hitters, hence their normalized rates can be different from some other pitchers in "weaker" divisions.

the Coach of Crush said...

Cool Blog
agree 100% that AVG and HITS are poor "metrics" to evaluate hitters.

I look at the exception, the HR, and see the concession that a HR is earned. 100%? -Then, naturally I want to know how much of a 2B is "earned".

I have a hard time equating a 2B to a single among BIP when I see "so many" doubles that are long drives over fielders or off walls.30% 40%? 60%??
These doubles could almost be considered slightly mishit(force,accuracy,trajectory) HRs.