clock menu more-arrow no yes

Filed under:

wILDCARD: The Baseball Metric for a Diseased MLB Season

New, 21 comments

Dominating an MLB season is usually about which teams can create the most runs, prevent the most runs and thus win more games. But if winning the 2020 MLB Season really just comes down to which team can get the fewest cases of coronavirus, we’re going to need a new set of sabermetrics. mhatter106 has you covered.

CSM Photo Photo by Mary Knox Merrill/The Christian Science Monitor via Getty Images

Let’s get the obvious out of the way. 2020 is weird and the 2020 MLB season is weird. Opening Day is in late July. The season is 60 games long. Players are opting out of playing this season for health and safety reasons tied to the risk of COVID-19. Players and fans alike are reassessing how important baseball is in an uncertain pandemic, and the answers arrived at are varied.

Every day, we hear new reports of players being put on the IL for undisclosed reasons. But every reason that isn’t COVID-19 related has to be disclosed, so when a reason isn’t disclosed we are left to assume that it is because of coronavirus.

“I really do think that whichever team has the fewest cases of coronavirus is going to win,” Astros general manager James Click was quoted as saying. “It’s impossible to state how that can devastate a team and that’s why we have to be so vigilant about it.”

Click is probably right. The biggest risk to a team’s success isn’t that a team’s players will underperform. It’s that they might not get to perform at all, because they sat too close to the wrong person.

Traditionally, projections of success are made based on models predicting how many runs players will create, compared to how many runs they will allow. WAR, whether it be fWAR, bWAR, or WARP, are all based on this. Figure out who gets the runs, and you figure out who’s going to come out on top. According to Click though, for 2020, figure out who isn’t getting coronavirus, and you figure out who triumphs.

But just because the key statistic to the season has changed, doesn’t mean we need to abandon mathematical models and sabermetrics. We just need to come up with new ones.

And I have a seventh grade 1st place Mathlete trophy sitting somewhere in my Mom’s house that says that I can do it. (It’s probably sitting next to my fourth grade yellow belt 2nd place karate trophy, so if you don’t agree with me, come at me.)

Exposure-Contacts

Let’s start with the two things that probably most influence the contracting and spread of the virus: Exposures and Contacts. The more people a player is exposed to (particularly known COVID positive people), the higher the likelihood is of becoming infected. The more people a player comes into contact with (particularly, but not limited to, other players), the more likely it is he will spread the disease.

The two factors must be multiplied together. The risk of every contact a player has is compounded by the number of exposures that player had. The potential effect of each exposure a player has is amplified by the number of contacts that player goes on to have. The product of the two, we will call Exposure-Contacts, or EC.

To make it a rate stat to easily compare two players, we will divide Exposure-Contacts by Day, or ECD (Exposure-Contacts per Day)

Weighted Exposure-Contacts per Day

But we know that all exposures and all contacts aren’t the same. Just as every hit doesn’t have the same impact: home runs are worth more than doubles, which are worth more than singles and walks. This is why we have wOBA. Similarly, not all contacts and exposures have the same impact. Those occurring indoors without masks in very close proximity have a greater likelihood of transmission than those occurring outdoors at distances of 3 or 4 feet distances.

So we need to weight a player’s ECD by their Mask Usage Propensity (Pmu) and Social Distancing Aptitude (SDA). Other factors also come into play such as Hand Sanitizations per Hour (hSPH), but Pmu and SDA play the greatest role in weighting a player’s ECD. This is their Weighted Exposures-Contacts per Day, or wECD.

Expected Weighted Exposure-Contacts per Day

wECD is a good start, but we are going to need to be able to compare players from different teams, so we can effectively create models of “what if Player X was added to Team Y?” Just as hitter-friendly Coors Field and pitcher-friendly Comerica Park can influence batted ball outcomes, a player’s environment is going to skew wECD numbers for players in COVID hotbeds like Texas, Florida and California. We need to normalize that by adjusting Exposures to reflect the prevalence of COVID-19 in the general baseball population and not the local baseball population. We also need to adjust Contacts to account for varied degrees of community shutdowns and differing levels of social interaction allowed by local and state government. This becomes our Expected Weighted Exposure-Contacts per Day, or xwECD.

Weighted IL Days Created

Now that we have xwECD, we have a metric that can be used across the majors, regardless of locale. This xwECD can now be converted to IL Days Created or ILDC. More accurately, it is xwEC that is converted to ILDC, because ILDC is not a rate metric. Based on the xwEC, a calculated number of infections that a player contributes to can be determined. Then, based on the average consequence of each new postive COVID test in a player, ILDC is derived.

But again, not all ILDC is the same. IL days towards the end of the shortened season may have less impact as it may mean less game missed as part of the IL time would be when games are not played. So we weight ILDC according to the point in the season at which it occurs, and how many games would be impacted. This is Weighted IL Days Created, or wILDC.

The absolute number that is a player’s wILDC does not have much meaning without some sort of common reference point to make comparing all players wILDC easier. So in the same way that WAR is concerned with Wins Above Replacement and not Wins without a reference point, we need to examine Weighted IL Days Created Above Replacement, or wILDCAR.

wILDCARD

You have to be kidding me if you think we’re going to get that close to an acronym and not complete it. We’re getting that “D” on the end of there some way somehow.

Our end product is Weighted IL Days Created Above Replacement Dude - wILDCARD.

(I also considered “Doppelganger” and “Data”, but I went with “Dude”, because... dude.)

So whose team will come out on top in 2020? If Click is correct that the team with the fewest cases of coronavirus wins, we need but simply add up each roster’s wILDCARD. Whichever team has the lowest number has a good chance to hoist Rob Manfred’s hunk of metal come the end of the season.

Mathemagic!

At this point, I would like to point out that there was never any actual math in this article.

Sabermetrics in the time of cholera coronavirus.

I applaud you for making it to the end of this article.