Baseball+Project

On Base Plus Slugging Average (OPS)
Course assignment for MATH 623 - Probability and Statistics for Teachers - Ball State University [ Summer 2013]

**Author**
Jacob teaches at R.J. Baskett Middle School, a middle school serving the small communities of Gas City and Jonesboro, Indiana. In 2013/2014 he has a total of 156 students: one advanced 7th grade class and one advanced 8th grade class which will both take the Algebra ECA at the end of the year, and four regular 8th grade math classes.

Introduction
I decided to watch the TP Webinar by Mark Pinkerton on Baseball Statistics and the MVP. I learned some new ideas about how to use TP in my Algebra class, such as using statistics to talk about real life examples of slope. In my class we usually talk about distance/time or weight/time when discussing slope, but Mark took a completely different tact when he compared HR to SO and HR to BA. For my project, I decided to do some more research on baseball statistics and apply some things I've learned in class. Before I begin, please understand I know absolutely nothing about sports in general and baseball specifically - I just happened to be on DASL and saw a datafile that included data on baseball players' OBA: (hits + bases on balls + hit by pitcher) / (at bats + bases on balls + hit by pitcher) and EBP: (total bases - hits) / at bats. Branch Rickey, the author of a 1954 text that introduced these ideas, proposed that these statistics were more accurate than simple Batting Average or number of home runs. He says, "batting average is only a partial means of determining a man's effectiveness on offense. It neglects a major factor, the base on balls, which is reflected only negatively in the batting average (by not counting it as a time at bat). Actually walks are extremely important" (see reference below). At this point I wanted to know if there were any newer ideas on baseball statistics out there, and came across the field of Sabermetrics, which analyzes baseball through statistics. WAR (Wins Above Replacement) and OPS (On Base Plus Slugging Average) seemed to be especially useful.

Rickey, B. (1954, August 02). Goodbye to some old baseball ideas. Life magazine, Retrieved from @http://www.baseballthinkfactory.org/btf/pages/essays/rickey/goodby_to_old_idea.htm

I decided to calculate OPS using TinkerPlots and analyze those values in the same way that Mark looked at HomeRuns and Batting Average. OPS is similar to OBA (which I mentioned above) in that it also includes Bases on Balls and other factors as part of a hitter's "score". The calculation for OPS. Everything I needed was already in TP except for SF (Sacrifice Flies) and TB (Total Bases), so I decided to "regrab" the data from www.baseball-reference.com, as Mark did. The new data is labeled "with SF and TB." I have recreated the graph that Mark made.... as you can see, the data is virtually the same, although several players are different (I'm guessing either Mark edited out some players or stats have updated since he recorded the video.)



At this point I created a formula to calculate OPS (luckily the formula from Wikipedia (http://en.wikipedia.org/wiki/On-base_plus_slugging) uses exactly the same letters to represent each statistic as the data file does!) ... and learned that OPS had already been calculated and imported along with the new data I grabbed from the website. The nice thing is that my calculated values exactly match the values found in the OPS column!

Wikipedia also relates the OPS value to a 7 point Likert Scale, which we mentioned in class. I used a formula to categorize these OPS scores. I was going to use a nested IF, but discovered the SWITCH function and researched how to make it work.
 * = Category || Classification || OPS Range ||
 * = A || GREAT || 0.900 and Higher ||
 * = B || Very Good || 0.8333-0.8999 ||
 * = C || Above Average || 0.7667 to 0.8333 ||
 * = D || Average || 0.7000 to 0.7667 ||
 * = E || Below Average || 0.6334 to 0.6999 ||
 * = F || Poor || 0.5667 to 6.333 ||
 * = G || Atrocious || 0.5666 and lower ||

In the scatterplot Calculated_OPS vs HR (below left), I am comparing Calculated OPS to Homeruns. I found that there is a linear relationship between the variables, even though HR is not an explicit part of the OPS formula! I put a horizontal reference line at 0.9, above which all the batters are categorized as "Great" on the OPS scale. These batters include those mentioned by Mark in the webinar as very good players, including Jose Bautista, Albert Pujols, and Josh Hamilton.

In the dot plot (below right), you can see the number of players falling into each OPS category. I found it very interesting that this follows (roughly) a normal distribution, and that it does not match up completely with the Batting Average (box plot below). In other words, some players with poor batting averages are rated excellent on the OPS scale... for example, Jose Bautistia has a 0.26 BA, with a Great OPS classification.

The last thing I did was try to find some relationship between a particular category of player and their batting record. I compared which league players were in to Batting Average, Home Runs, and OPS, but no difference popped out. I thought about grouping the players into "Young" and "Old" categories. I put players < 25 years old into the "young" category, and everyone else into the "old" category, to single out those players who were new to the game. Then I compared batting average and OPS, but no difference appeared. Finally I compared age grouping to Home Runs, and found that the older group hit 3.9 more home runs on the average. Was this a statistical difference?

H0 - There is no difference between the mean home runs of young and old baseball players. HA - Experienced baseball players (25 and older) have a greater mean homerun average than young baseball players.

I ran 500 history cases and found that 5% of the differences of the means were above 3.9 home runs, which is on the line of a p-value of 5%. This means that I must fail to reject the null hypothesis that there is a statistical difference between young and old players, although if my alpha level were slightly higher, I could claim a statistical difference. In baseball, it is probably okay to have a slightly higher alpha level, since it is only a sport and most people would accept a 94% confidence level. In that case, I would reject the null hypothesis and say that there is a statistical difference! It seems that in the world of statistics you can "prove" anything (I use the word "prove" lightly, since we're really disproving the null hypothesis) as long as you are open to a large margin of error.

**Jacob's investigation into Handedness will be posted soon ...**