Major League Baseball (Part 1)

spidercrab · September 13, 2022, 5:58pm

Isn’t it surprising that people talk so much more about WAR than WAA? Is it because it’s easier to pronounce? Like this article points out, WAA seems much more consistent with the way we actually talk about great players.

I especially like the discussion of breaking WAR down into 2 components:

The wins due to being better than the average player
The wins due to just showing up, even though you’re worse than the average player, as long as you’re better than the AAA-level replacement.

So WAR massively overrates players like Pete Rose and Rafael Palmeiro.

zimmer4141 · September 13, 2022, 6:13pm

I think especially for baseball, durability is a very important trait and should be rewarded when we talk about player value.

spidercrab · September 13, 2022, 6:15pm

Durability at a high level, sure. But I don’t think there’s any merit in continuing to play an extra 5 years as the 20th best player at your position. (I am implicitly thinking in terms of Hall of Fame-type conversations.)

zimmer4141 · September 13, 2022, 6:19pm

That’s totally fair.

WAR is more useful for an evaluation of a player’s individual value in that season (162 games of a league average player is probably worth more or about the same than 100 games of a +2 WAA player in terms of impact of the team winning)

But for HoF talk, I agree that a player who has a borderline HoF career after 15 years shouldn’t get bumped up because they hung on for an additional 5 years as a below average but above replacement player.

Lawnmower_Man · September 13, 2022, 6:33pm

That’s why we have stuff like JAWS.

mosdef · September 13, 2022, 8:12pm

I think there’s a value in a negative number meaning “not even a major leaguer”. If someone is a -1 WAA player, there’s a good chance that he’s still better than readily available alternatives, so the “negative” just carries a stronger implied condemnation than is really the case.

Yuv · September 14, 2022, 4:42pm

so close to a perfect game

eyebooger · September 14, 2022, 4:45pm

Has a perfect ump scorecard game ever happened?

Yuv · September 14, 2022, 4:55pm

not on that website. dunno how long they’ve been tracking.

Lawnmower_Man · September 14, 2022, 9:04pm

I feel like all of this stuff is bunk to some degree, and I’m not willing to put a ton of stock into it. Instead, I prefer to look at all of it as a constellation of information and downplay individual pieces. Switching gears a bit but are you familiar with adjusted plus-minus models for NBA (and NHL)? It’s a very different type of estimation problem since you’re trying to get coefficients of variables that mostly coexist (the same players tend to be on the court at the same time). So for instance, the data structure may look like this:

Plus_minus, Possession, Home_1, … , Home 5, Away_1, … , Away_5
-1, 1, Lebron, … , Westbrick, Steph, …, Klay
0, 2, Lebron, … , Westbrick, Steph, …, Klay
.
.
.
1, 100, Lebron, … , Westbrick, Steph, …, Poole

That’s oversimplified but the point here is that each player is an indicator variable with his own column, and you regress the point differential (+/-) onto the on-court players for a given unit of time / possession. So in theory it makes sense that you could get beta estimates for each player’s effect on the scoring that goes beyond things that are directly measurable like box score stats…

Except I’m sure you already see the problem here. The variance at this level of granularity is already large, but also there’s massive multicollinearity since the same players tend to be on the court at the same time a lot. The result you’d get from stock OLS regressions are wildly inefficient estimates that aren’t converging anytime soon and that amplify even minor misspecifications, and even miniscule changes in the data can lead to dramatic shifts in the outputs.

Luckily we have penalization techniques like ridge regression, and those are exactly the types of methods used for this particular modeling that handle the limitations of the data, which, as far as I know, were mostly developed by the good people over at APBRmetrics forum, the SABR of basketball. Even ESPN has been publishing an RPM model for NBA since 2014 along with concomitant WAR estimates.

So that’s a happy ending to this sports analytics story, right? Right??? You’ll have to imagine the Star Wars jpg and also the sound of this sad trombone I’m about to play:

We extend the analysis by using ESPN’s estimated values as explanatory variables in a set of fixed effects and the two-stage least square (2-SLS) regressions that seek to explain player-season APM variation.

The results provide strong evidence that regularized adjusted plus minus player productivity measures are not, in fact, “teammate-independent.” Rather, we find evidence that lineup-teammate productivity positively influences a given player’s real plus minus value. As this result is conditional upon a given player’s baseline productivity via player fixed effects and age, we interpret this as a significant and fairly strong complementarity effect that is uncontrolled in adjusted plus minus measures such as real plus minus.

Based on the estimations above, for each unit average gain in the teammate’s RPM, a player’s RPM is overestimated by a range of 0.17 to 0.66 points according to point estimates. We find that RPM is not context-independent.

Like holy shit that’s an enormous amount of bias in the ESPN estimator. There’s a ton of endogeneity here, and I’m not aware of any popular NBA APM model that controls for it (because it’s hard and maybe impossible). The annoying part, and the point I’m attempting to make here, is that none of this is surprising for people who are trained to do this kind of work. Every economist I know would have quickly raised objections over endogeneity in this data.

So that’s interesting but what does this have to do with baseball? My impression from hanging out at APBR is that the people over there are much sharper and have more knowledge than the people doing SABR. At least the models and techniques they use are real ones used in modern science, and while penalized regression may not be quite good enough for plus-minus, and some combination of fixed effects / 2SLS or instruments may be required (if possible), that discussion is at least germane w.r.t. how this stuff actually works. Baseball data seems to present unique problems that are dissimilar to basketball and other sports, but I’m equally skeptical that it holds up at this level of scrutiny. In fact, I’m more skeptical.

Lawnmower_Man · September 14, 2022, 9:11pm

A much shorter answer to your question is yes, I think above replacement is a weird basis for comparison. It also leads to these incorrect conclusions now in MVP voting where people are arguing that the Yankees would have won 10 fewer games without Aaron Judge, as if his actual replacement would have been some scrub call up from Scranton Wilkes-Barre.

CanadaMatt3004 · September 15, 2022, 12:01am

Holy shit, Seattle is going to break the longest playoff drought huh.

Who will be the new leader?

King_of_NY · September 15, 2022, 12:17am

I think it would be Philly but they’re about to break their streak too, which is going to leave the Angels and Detroit as the new leaders in the clubhouse.

Edit: looked it up

CanadaMatt3004 · September 15, 2022, 12:18am

I meant in all of sports. Gotta be an NFL tram with a longer drought or an NHL team

King_of_NY · September 15, 2022, 12:18am

Oh. Hmm.

Sabo · September 15, 2022, 12:19am

team	no playoff streak
1. Mariners	20
2. Phillies	10
3. Angels, Tigers	7
5. Pirates, Royals	6
7. Mets, Orioles, Rangers	5
10. D’backs	4
11. Rockies	3
12. Nationals	2

Lawnmower_Man · September 15, 2022, 2:08am

I think all 3 panelists on MLB Tonight just nonchalantly agreed that Ohtani is the best baseball player of all time. Looks like Jake Peavey and Alex Avila are the player panelists. Progress!

spidercrab · September 15, 2022, 2:29am

This is all very interesting, even though I’m only familiar with plus-minus metrics at the most superficial level. That being said, I don’t think the criticisms applied in the basketball setting necessarily translate to the baseball setting.

I’ll caveat this by saying I’m not sure I entirely followed the article you posted, but here’s my interpretation of the punchline:
RPM is supposed to be a measure of teammate-independent performance. But this paper shows that, empirically, RPM (once you strip out the player’s baseline level) is correlated with teammate performance.

If that’s the case, I’m not sure this is a damning indictment of anything. It just seems like a natural consequence of the complementarity that exists in basketball. [Here’s where my naive and probably wrong understanding of plus-minus might reveal itself] Suppose that the shooter on every possession rotates deterministically, so if a player is in the game for exactly 5 possessions they are guaranteed exactly 1 shot. If you have player A and 4 scrubs, the opposing team is going to be able to devote more defense to player A than if player A is surrounded by 4 all-stars. Holding the defensive players constant, Player A should perform better when he’s playing with all-stars, even though by assumption his shot rate is unchanged, because those all-stars will force a lighter defensive effort on player A. So I believe that:

his RPM would increase when paired with all-stars
this would lead to the documented relation between RPM and teammate RPM.

I think the authors would view this as a problem with the idea of a player-specific RPM, but I guess I just view it as saying that each player’s RPM is just the weighted summation of his conditional-on-other-other-players RPMs. Maybe that’s just semantics? Is the authors’ idea that a properly calculated RPM would be equal in the two scenarios I described? That doesn’t seem possible in a world where there’s possible synergy - two players’ values are greater together than independently. Who, if anyone, should get credit for that synergy?

Anyway, I think the reason I place a lot more weight on baseball statistics and their ability to measure player value is that the nature of the sport is far less likely to have those complementarities. Instead, you have a collection of discrete events based on individual batters/pitchers doing individual actions. Of course there is some complementarity there, like the ability to generate RBIs or score runs will obviously depend on the ability of your teammates to get on base or drive you in. But I don’t think those are nearly as evident in baseball, and I think they’re more easily dealt with statistically (like using FIP to measure pitchers’ WAR).

eyebooger · September 15, 2022, 1:13pm

Without looking, I’ll guess the Sacramento Kings.

They have been absolutely terrible since the Mike Bibby/Chris Webber years.

amead · September 15, 2022, 1:34pm

Feel like my Sabres haven’t made the playoffs in forever. Dunno if it is right tho.