Major League Baseball (Part 1)

wheatrich · August 10, 2022, 4:10am

I thought McClanahan was a clear favorite but his last 2 starts brought him back down and he isn’t winning any sort of close race as a tampa bay ray.

Gausman’s BABIP is insane good god .367? dude is either throwing flames by people or straight into barrels.

dlk9s · August 10, 2022, 4:19am

https://twitter.com/Brandon_Warne/status/1557013727093866497?t=23INOAHxMuyx2gh6YwZISw&s=19

Lawnmower_Man · August 10, 2022, 4:48am

I don’t really find WAR of any type to be that informative within seasons and have always been a fan of just looking at everything and trying to figure it out. Verlander being an odds favorite for CY is the same baseball writers insanity that’s existed my entire lifetime.

McClanahan should be #1 in any ranking of performance-to-this-point, but not sure about odds favorite for CY since this is already the most innings he’s ever thrown.

He has the ridiculously low HR/FB rate to match it. I think Ohtani will qualify after tonight but is drawing dead because morons love WINS. Cease should have a shot though and would be a deserving winner.

dlk9s · August 10, 2022, 7:38am

Enjoy

Danspartan · August 10, 2022, 9:57am

Wtf is SIERA

Yuv · August 10, 2022, 10:05am

6.145 - 16.986 (SO/PA) + 11.434 (BB/PA) - 1.858 ((GB-FB-PU)/PA) + 7.653 ((SO/PA)^2) +/- 6.664 (((GB-FB-PU)/PA)^2) + 10.130 (SO/PA) ((GB-FB-PU)/PA) - 5.195 (BB/PA)*((GB-FB-PU)/PA)

Danspartan · August 10, 2022, 1:09pm

Precisely.

Yuv · August 10, 2022, 2:04pm

Its just common sense

Danspartan · August 10, 2022, 3:19pm

I deal with a lot of data. Those interaction terms are…numerous and the squared terms can dominate the math. Looks like a multivariate model fit analysis. Label me skeptical.

CaffeineNeeded · August 10, 2022, 8:26pm

Fuck off and good riddance Al Avila (Tigers GM)

TheHip41 · August 10, 2022, 8:28pm

Good thing we waited until after the draft and trade deadline ¯_(ツ)_/¯

Rivaldo · August 10, 2022, 8:48pm

Tigazzzzz

wheatrich · August 10, 2022, 9:53pm

Hader was dreadful the other night, if the brewers figured out he’s just washed and dumped him tip of the cap but still wow at actually doing that to a guy from like two bad outings

Danspartan · August 10, 2022, 10:14pm

Meanwhile Dombrowski looks like a genius.

Lawnmower_Man · August 11, 2022, 5:56am

I don’t see any reason to be skeptical about that on its own. There are plenty of models with real interactions and quadratic terms. The thing to be skeptical about is that these models are almost always made by amateurs who are just doing recipes in Excel and tweaking numbers to overfit historical data.

Like, what on earth is (BB/PA)*((GB-FB-PU)/PA) supposed to be in terms of an in-real-life variable you could conceptualize and explain in plain English? What’s its distribution? It’s both a product and convolution of a bunch of (essentially) negative binomial RVs, and it’s also highly correlated to a lot of the other terms in the model.

This is pretty much how sports “modeling” always goes. Does it mean these things are bad predictors? No, not necessarily. For example, it would be really hard to beat Steamer projections doing everything the “right” way to the degree that it wouldn’t really be worth many people’s time unless they had massive amounts of money riding on it. In that regard, it’s not like SIERA is ever out of line with the other advanced pitching metrics. It correlates highly with all of them and effectively tells the same story.

Danspartan · August 11, 2022, 10:23am

Pretty much my skepticism exactly. It’s all just curve fitting describing terms that may be highly aliased for one another. Basically a mathematical circle jerk.

I’d rather the major awards focus on the traditional metrics first, then if close, on some of the more direct advanced stats second (ballpark or opponent adjustments).

I use these type of models all the time in my work, but important to note what is built based on “first principles” and what is just fancy curve fitting. Sometimes outliers need to be mathematically corrected, sometimes they indicate something “different” that needs to be understood.

Lawnmower_Man · August 11, 2022, 9:14pm

This doesn’t matter for purely predictive models though (assuming it’s cross-validated). SIERA is functioning strictly as a predictive model of park-adjusted ERA, so if they wanted to include a cubic term for the price of tea in China, more power to them if it works. The ensemble techniques that are best-in-class predictors produce the kind of uninterpretable ham sandwich mosaics that SIERA could only dream of. So my issue isn’t with the model being too fancy, having too many quadratics and interactions, or anything like that.

My issue is that they claim their only goal with SIERA was to beat the other models at predicting park-adjusted ERA, and then proceed to (badly) use a modeling technique (OLS regression) that is poorly-suited for that purpose while violating its assumptions and not providing any meaningful model diagnostics. This time it worked out, but think about how many amateurs hacking away at Excel end up building models that don’t work. We’re only seeing the lucky ones.

We have a pretty good idea about what the best prediction techniques are now, and it’s not what these guys are doing. So my complaint is that they’re (mis)using a hammer when a screwdriver is needed. Can you drive a screw using the sharp edge of the claw on a hammer? Maybe, but why would anyone do that? You’re familiar with the Netflix prize, right?

From one of the goat papers:

Each observation consisted of a user ID, a movie title, and the rating that the
user gave this movie. The task was to accurately predict the ratings of movie-user pairs for a test set such that the predictive accuracy improved upon Netflix’s
recommendation engine by at least 10%.

At the data exploration and reduction step, many teams including the winners found that the noninterpretable Singular Value Decomposition (SVD) data reduction method was key in producing accurate predictions: “It seems that models based on matrix factorization were found to be most accurate.” As for choice of variables, supplementing the Netflix data with information about the movie (such as actors, director) actually decreased accuracy:

In terms of choice of methods, their solution was an ensemble of methods that included nearest-neighbor algorithms, regression models, and shrinkage methods. In particular, they found that “using increasingly complex models is only one way of improving accuracy. An apparently easier way to achieve better accuracy is by blending multiple simpler models.”

There are really no first principles or curve fittings or anything going on here. It’s just a computer throwing the kitchen sink at a large data set to see what sticks, and in ways that aren’t necessarily interpretable. The actual model it comes up with could be some crazy thing that far exceeds the complexity of a quadratic OLS with interactions–in fact, I’m certain it would be if there was a meaningful way to compile the ensemble contributors into a single interpretable thing. Gimme that all day over amateurs hacking around in Excel until they hit something.

Sabo · August 12, 2022, 12:09am

Watching the Field of Dreams game… The novel the movie was based on was Shoeless Joe, by the late WP Kinsella. I read the novel before the movie was made. Folks that are fans will surely enjoy Kinsella’s follow up novel, The Iowa Baseball Confederacy…

ETA: I long ago gave away my copy of Shoeless, but I still have my first edition copy of Iowa in my library.

wheatrich · August 12, 2022, 12:30am

well no interest in watching the teams play but I, for one, welcome a good night for CORN

wiper · August 12, 2022, 12:57am

turned it on, saw they had a fence in front of the corn, turned it off.