Major League Baseball (Part 1)

I thought McClanahan was a clear favorite but his last 2 starts brought him back down and he isn’t winning any sort of close race as a tampa bay ray.

Gausman’s BABIP is insane good god .367? dude is either throwing flames by people or straight into barrels.

https://twitter.com/Brandon_Warne/status/1557013727093866497?t=23INOAHxMuyx2gh6YwZISw&s=19

1 Like

I don’t really find WAR of any type to be that informative within seasons and have always been a fan of just looking at everything and trying to figure it out. Verlander being an odds favorite for CY is the same baseball writers insanity that’s existed my entire lifetime.

McClanahan should be #1 in any ranking of performance-to-this-point, but not sure about odds favorite for CY since this is already the most innings he’s ever thrown.

He has the ridiculously low HR/FB rate to match it. I think Ohtani will qualify after tonight but is drawing dead because morons love WINS. Cease should have a shot though and would be a deserving winner.

Enjoy

1 Like

Wtf is SIERA

6.145 - 16.986 (SO/PA) + 11.434 (BB/PA) - 1.858 ((GB-FB-PU)/PA) + 7.653 ((SO/PA)^2) +/- 6.664 (((GB-FB-PU)/PA)^2) + 10.130 (SO/PA) ((GB-FB-PU)/PA) - 5.195 (BB/PA)*((GB-FB-PU)/PA)

Precisely.

1 Like

Its just common sense

I deal with a lot of data. Those interaction terms are…numerous and the squared terms can dominate the math. Looks like a multivariate model fit analysis. Label me skeptical.

Fuck off and good riddance Al Avila (Tigers GM)

1 Like

Good thing we waited until after the draft and trade deadline ¯_(ツ)_/¯

Tigazzzzz

Hader was dreadful the other night, if the brewers figured out he’s just washed and dumped him tip of the cap but still wow at actually doing that to a guy from like two bad outings

Meanwhile Dombrowski looks like a genius.

I don’t see any reason to be skeptical about that on its own. There are plenty of models with real interactions and quadratic terms. The thing to be skeptical about is that these models are almost always made by amateurs who are just doing recipes in Excel and tweaking numbers to overfit historical data.

Like, what on earth is (BB/PA)*((GB-FB-PU)/PA) supposed to be in terms of an in-real-life variable you could conceptualize and explain in plain English? What’s its distribution? It’s both a product and convolution of a bunch of (essentially) negative binomial RVs, and it’s also highly correlated to a lot of the other terms in the model.

This is pretty much how sports “modeling” always goes. Does it mean these things are bad predictors? No, not necessarily. For example, it would be really hard to beat Steamer projections doing everything the “right” way to the degree that it wouldn’t really be worth many people’s time unless they had massive amounts of money riding on it. In that regard, it’s not like SIERA is ever out of line with the other advanced pitching metrics. It correlates highly with all of them and effectively tells the same story.

Pretty much my skepticism exactly. It’s all just curve fitting describing terms that may be highly aliased for one another. Basically a mathematical circle jerk.

I’d rather the major awards focus on the traditional metrics first, then if close, on some of the more direct advanced stats second (ballpark or opponent adjustments).

I use these type of models all the time in my work, but important to note what is built based on “first principles” and what is just fancy curve fitting. Sometimes outliers need to be mathematically corrected, sometimes they indicate something “different” that needs to be understood.

This doesn’t matter for purely predictive models though (assuming it’s cross-validated). SIERA is functioning strictly as a predictive model of park-adjusted ERA, so if they wanted to include a cubic term for the price of tea in China, more power to them if it works. The ensemble techniques that are best-in-class predictors produce the kind of uninterpretable ham sandwich mosaics that SIERA could only dream of. So my issue isn’t with the model being too fancy, having too many quadratics and interactions, or anything like that.

My issue is that they claim their only goal with SIERA was to beat the other models at predicting park-adjusted ERA, and then proceed to (badly) use a modeling technique (OLS regression) that is poorly-suited for that purpose while violating its assumptions and not providing any meaningful model diagnostics. This time it worked out, but think about how many amateurs hacking away at Excel end up building models that don’t work. We’re only seeing the lucky ones.

We have a pretty good idea about what the best prediction techniques are now, and it’s not what these guys are doing. So my complaint is that they’re (mis)using a hammer when a screwdriver is needed. Can you drive a screw using the sharp edge of the claw on a hammer? Maybe, but why would anyone do that? You’re familiar with the Netflix prize, right?

From one of the goat papers:

Each observation consisted of a user ID, a movie title, and the rating that the
user gave this movie. The task was to accurately predict the ratings of movie-user pairs for a test set such that the predictive accuracy improved upon Netflix’s
recommendation engine by at least 10%.

At the data exploration and reduction step, many teams including the winners found that the noninterpretable Singular Value Decomposition (SVD) data reduction method was key in producing accurate predictions: “It seems that models based on matrix factorization were found to be most accurate.” As for choice of variables, supplementing the Netflix data with information about the movie (such as actors, director) actually decreased accuracy:

:vince3:

In terms of choice of methods, their solution was an ensemble of methods that included nearest-neighbor algorithms, regression models, and shrinkage methods. In particular, they found that “using increasingly complex models is only one way of improving accuracy. An apparently easier way to achieve better accuracy is by blending multiple simpler models.”

:vince2:

There are really no first principles or curve fittings or anything going on here. It’s just a computer throwing the kitchen sink at a large data set to see what sticks, and in ways that aren’t necessarily interpretable. The actual model it comes up with could be some crazy thing that far exceeds the complexity of a quadratic OLS with interactions–in fact, I’m certain it would be if there was a meaningful way to compile the ensemble contributors into a single interpretable thing. Gimme that all day over amateurs hacking around in Excel until they hit something.

Watching the Field of Dreams game… The novel the movie was based on was Shoeless Joe, by the late WP Kinsella. I read the novel before the movie was made. Folks that are fans will surely enjoy Kinsella’s follow up novel, The Iowa Baseball Confederacy

ETA: I long ago gave away my copy of Shoeless, but I still have my first edition copy of Iowa in my library.

well no interest in watching the teams play but I, for one, welcome a good night for CORN

turned it on, saw they had a fence in front of the corn, turned it off.

1 Like