Getting Started In Baseball Analytics

In today's post Sam Bornstein gives his best advice to those looking to get into baseball analytics. What language should you learn, how should you get started and a whole lot more here!

In my last post, we learned that making informed decisions through the use of data increases your chances of success. We covered the new era of baseball technology, a specific example of a pitcher going through the pitch design process, and most importantly - how to apply data to drive these decisions. This week, I want to take a step back and lay out the resources that are out there to equip you with the right tools to analyze baseball data.


Resources


Whether you are a high school student, college freshman, or a long-time baseball coach, there is no better time to pick up coding. There’s no blueprint for adding a programming language to your toolbox; I found myself coding through the help of co-workers and effectively manipulating my Google searches. At first glance the syntax can be daunting, but once you get comfortable with the language, you’ll be glad you started.


For those college students that are reading this, the first resource I’d recommend you take advantage of is your school’s curriculum. If you are working towards a degree in analytics, statistics, computer science, or any related field this shouldn’t be a problem for you. If you’re working towards a different degree, talk to your advisor to see which classes you can enroll in to pick up these skills. Additionally, LinkedIn Learning might be discounted or accessible for free, depending on your university. As for high school students, it may be tougher to find a class with a similar curriculum. In that case, other resources come into play.


DataCamp is a common structured online learning resource that allows users to interact with courses, practice modules, projects, and assessments. Codecademy is another online resource that has similar structured learning courses. These websites offer the first few lessons of their chapters for free, with the option to pay for a subscription to access the rest of their courses. In addition to these two, there are a myriad of other websites, such as Coursera, edX, and Udemy.


Only a couple of these websites offer unique baseball analytics-styled courses, so it may be difficult to apply the information back to the baseball world. Fortunately, there is a book available titled, ‘Analyzing Baseball Data with R’. This book provides an introduction to R for those interested in exploring the rich sources of baseball data. This is a great spot to start for individuals who don’t need as much structure as an online class or a college-level course.


Finally, there is an even less structured format of learning code that may interest some, especially due to the low price of $0. A simple “how to code in R/Python” search into YouTube or Google will lead to hundreds of results. You may even become overwhelmed with the amount of options you have. This method requires a bit of trial and error, but is a completely practical method of equipping yourself with the proper skills. Overall, there are a ton of options out there, whether free or not. My best advice is to take advantage of the resources available to find what works best for you.


Choosing a Programming Language


R and Python are among the most popular programming languages that baseball analysts choose to perform data visualization, statistical analysis, and machine learning tasks. Let’s run through some details on each of the two:


R specializes in statistical analysis and data handling, visualization, and reporting. It has less use in advanced engineering and computing areas. This combination of functionalities best suits those who prefer an environment crafted specifically for data analysis and visualization.


On the other hand, Python is useful throughout the entire data analysis process and is well suited for advanced engineering and computing. This program is not as rich in cutting-edge libraries for statistical analysis and visualization. This combination of functionalities best suits those who prefer a more traditional programming language with a focus on machine learning and desire to integrate with other tools.


Each language has their strengths and weaknesses but at the end of the day I believe that someone looking to learn to code is best off by learning the language that best fits their needs. There is no wrong path to take when choosing between the two. The more time you take to decide, the less time you have to become an expert in one. For an in-depth examination on R vs. Python for those looking to enter the baseball industry, check out this video from the Driveline Research & Drinks podcast.


Personal Experience


How did my path to a data analyst begin? Luckily, I’ve known the path I’ve wanted to take my career since the end of high school. Shortly after choosing to attend the University of Iowa, I reached out to the baseball program’s Director of Operations to ask about available student manager positions at the beginning of my Freshman year.


I interviewed and received a position on the manager staff during my summer orientation heading into the school year. My first year on the staff included responsibilities of a traditional college baseball student manager: organizing and assisting with drills, flipping baseballs underhand to hitters in the cages, playing catch with pitchers, completing scouting reports, etc. There was one piece of the puzzle missing that I wanted to get involved with - utilizing the data that our team was collecting.


At the time, our program was using Trackman and HitTrax (later on we would see the additions of other pieces of technology). I was fascinated by the data Trackman was collecting, but really didn’t get hands-on with the information until the following school year. With the addition of new managers my second year, I was able to slightly reduce my time at practice and work on adding a new skill to my toolbox - coding. With the help of another analytically-minded manager, YouTube, and Google searches, I started to pick up basic level skills in R.


These skills evolved over the course of my Sophomore year into my Junior year where I began to churn out post-game reports and put together small projects focused on certain aspects of a player’s development. At this point, my focus had shifted away from offering a helping hand at practice to putting in hours inside the office. My initial focus lied within setting up a structure for organizing the process of data collection, data analysis, and data application. One creation that made an impact in this process was Iowa’s current internal information system, HawkDashboard. Working to develop this application expanded my skill set and effectively set me up to bring in three new student managers to strictly work with data analysis.


This newly-founded analytics department marked the beginning of my fourth year in the program. Our staff worked together on a daily basis to solve baseball research questions and put together unique projects to advance our analytical capability. This group of analysts solidified my hypothesis of “learning by doing.” While this method was critical to the beginning stages of my coding abilities, I do credit the rest of my learning to my undergraduate experience.


As a Business Analytics student, I was required to enroll in several business-type classes such as marketing, accounting, management, etc. While I took away valuable skills from these classes, I found myself wanting more. If there was a Baseball Analytics 101 course offered, I would have taken it each semester, twice. Unfortunately, that is not often the case. Since the analytics and coding classes did not come until my Junior year, I was forced to “learn by doing” outside of class to pick up that level of knowledge to complete baseball research projects. Fortunately, this allowed me to excel in my classes and add additional structure to my abilities. Now as a graduate student, my course load focuses entirely on these types of classes. I am able to dive further into the weeds with my work and bring those on-campus resources to help myself and my analytics team grow as baseball analysts.


Wrapping It Up


If you are reading this article because you are looking to analyze baseball data, there are a few key points I want you to take away:

  1. There are endless resources out there to get started
  2. There is no right or wrong programming language to choose
  3. There is never a better time to start coding

If this article helps at least one person choose the right direction to start analyzing baseball data, then I’ve done my job. In future articles I plan to continue off of this topic by providing real-world applicable baseball code to solve different problems and visualize different information. We will cover how to create baseball-specific visualizations such as pitch movement plots, spray charts, and more!


If you have any further questions about what we covered in this article, do not hesitate to message me on Twitter or LinkedIn. Thanks for reading!

Back to blog