|
||||
| ||||
|
|||||||
| OOTP 25 - General Discussions Everything about the brand new 25th Anniversary Edition of Out of the Park Baseball - officially licensed by MLB, the MLBPA, KBO and the Baseball Hall of Fame. |
![]() |
|
|
Thread Tools |
|
|
#1 |
|
Minors (Single A)
Join Date: Apr 2022
Posts: 52
|
Pitcher WAR
Pitcher WAR is not correct at all. Currently this results in it being a complete noob trap and absolutely crushes the AI's ability to play the game.
I wrote a long ranty post about it on reddit because people keep suggesting new players to use it. List of issues I know about:
|
|
|
|
|
|
#2 |
|
Hall Of Famer
Join Date: May 2011
Posts: 3,106
|
Valid critiques! I really wish that OOTP would make their methods of calculating their various WARs, park factors and other stats less opaque- including what the linear weights are for offensive events, which would in turn feed into wOBA and offensive WAR. And for pitchers, it would be great if the game had a "home-grown" WAR calculation that incorporated the effects of BABIP. One potential way to do this (not perfect by any means) would be to use base it on "pitcher wOBA allowed" vs league average (corrected with appropriate park factors), then applying the dynamic runs-per-win calculation. For the linear weights, you'd probably want to use a catch-all weight for all non-HR hits, as (to my knowledge) an OOTP pitcher has no control over what type of hit he allows, if it's not a home run.
|
|
|
|
|
|
#3 |
|
All Star Starter
Join Date: Nov 2019
Posts: 1,182
|
And thus the inherent problem with WAR. It can be whatever the person making the formula wants it to be.
__________________
“Baseball isn’t statistics; it’s Joe DiMaggio rounding second.” “Once, centuries ago, it was the beloved national pastime of the Americas, Wesley. Abandoned by a society that prized fast food and faster games. Lost to impatience.” “ The term ‘WAR’ should be replaced by ‘WAG’. WAR isn’t an actual measurement; it’s just a wild-ass guess” -Bill James RIP National League 1876-2022 Floreat semper vel invita morte. I make custom ballparks. |
|
|
|
|
|
#4 |
|
All Star Starter
Join Date: May 2022
Posts: 1,268
|
the introduction of pitcher BABIP was a godsend for simulation accuracy, but yes it greatly undercuts the entire premise of FIP.
It is WAR that is a noob trap. No one should be relying any version of it, real or in OOTP, to evaluate single season performance. I would be shocked if anyone working with MLB teams relies on it -- its error bars are far too wide. It is best used as a career ranking stat for HOF voters where all of those error bars can hopefully cancel out over long careers. Last edited by uruguru; 01-28-2025 at 03:01 PM. |
|
|
|
|
|
#5 | |
|
Banned
Join Date: May 2016
Location: St Petersburg Florida USA
Posts: 6,693
Infractions: 0/2 (4)
|
Quote:
https://www.youtube.com/watch?v=4OvLU1DSrmw |
|
|
|
|
|
|
#6 | |
|
All Star Starter
Join Date: May 2022
Posts: 1,268
|
Quote:
He then went through the trouble of manually calculating WAR and came across all of the same systemic flaws in the calculation that I have critiqued before. And then he ultimately shrugged his shoulders and said basically, "I guess it's not that good but it's still the best we have" https://www.youtube.com/watch?v=ipD053CE3PI |
|
|
|
|
|
|
#7 |
|
Banned
Join Date: May 2016
Location: St Petersburg Florida USA
Posts: 6,693
Infractions: 0/2 (4)
|
Good job. It's tough to get people to change their minds.
Last edited by Brad K; 01-28-2025 at 06:32 PM. |
|
|
|
|
|
#8 |
|
Banned
Join Date: May 2016
Location: St Petersburg Florida USA
Posts: 6,693
Infractions: 0/2 (4)
|
He talks too fast. I attended a wedding last year in which only one of the speakers was truly understandable. I commented to him about it and he said the key is to talk slowly, so slowly that he as the speaker thinks it's too slow.
|
|
|
|
|
|
#9 | |
|
Hall Of Famer
Join Date: May 2011
Posts: 3,106
|
Quote:
I just looked at MLB on Fangraphs for the least 25 years, and an online league I'm in for a 15-year period. R-squared correlation for total team Fangraphs WAR to team wins for MLB was 0.97. R-squared for total OOTP WAR to team wins for the league was 0.93. Pretty good! Ignore at your peril
|
|
|
|
|
|
|
#10 | |
|
Minors (Single A)
Join Date: Apr 2022
Posts: 52
|
Quote:
Linear weights does not exist for a pitcher. Pitchers have too much control on the arrangement of base-out states. Pitchers who walk a lot see a lot more 1 on no out situations then those who do not, that descriptively alters the linear weight result. SIERA is the actual stat you would want but given the inability to implement easy stats correctly I am not holding my breath about implementing this correctly. But really just making the game actually factor strikeouts into HR and BABIP prevention like it does in real life would solve most of the issues with using FIP. Then all they need to do is actually apply park factors correctly. Which isn't actually hard and is pretty shocking that this has been done incorrectly. I am not talking about it being wrong like it overweights HR park factor. It is just not applied correctly at all. Applying park factors to innings regardless of the park the event took place in makes 0 sense, and only kinda works for hitters do to them playing roughly the same amount of PA at home as away. It doesn't actually work out correctly though to be clear. (runs_at_home*home_park_factor) + runs_away != (runs_at_home+runs_away) * (home_park_factor/2) the second is what OOTP is doing despite the terms not being equivalent at all. |
|
|
|
|
|
|
#11 | |
|
All Star Starter
Join Date: May 2022
Posts: 1,268
|
Quote:
At the individual player level, WAR is still a blurry measure of contribution. |
|
|
|
|
|
|
#12 | |
|
Minors (Single A)
Join Date: Apr 2022
Posts: 52
|
Quote:
Team level comparisons don't actually address this and can actually cover them up. WAR is always normalized at the league level, so long as systematic errors shift value around at the team level you would not observe this. In the case of systematic issues with starting pitchers that value gets shifted to relievers. The issue then won't really show up except that at the player level incorrectly attributes value. Further because its based in park factors if most teams in your league are around 1 you won't even see the bulk of the issues show up. That being said the other comments made about WAR are mostly not addressing the same thing as me. WAR models a specific thing and when using it you need to understand what it models. That isn't a knock on WAR that is a reality of the math that underpins it. |
|
|
|
|
|
|
#13 | |
|
Minors (Single A)
Join Date: Apr 2022
Posts: 52
|
Quote:
That is the consequence of the central limit theorem and the law of large numbers. It also only applies to random errors not systematic errors. There is systematic canceling here that is likely responsible for why the r^2 remains so high at the team level. So no it is not equivalent at all, and you really need to be careful when you make assumptions about which types of errors are contributing to the problem since random vs non random errors have different effects in aggregate. |
|
|
|
|
|
|
#14 | |
|
All Star Starter
Join Date: May 2022
Posts: 1,268
|
Quote:
I think it's fair to assume that the standard error of for individual player WARs are mostly independent. This suggests that summing their WARs could increase the total error by the square root of the number of players (N) added together even as the total WAR increases by a the full factor of N. So yeah, much like what you would see with the law of large numbers. Either way, you still get an averaging of the fluctuations in error for individual WARS. But speaking precisely, player WARs are not identically distributed (for lots of reasons) so the law of large numbers technically does not apply. But the broader concept still applies enough to use as a reference point. Last edited by uruguru; 01-29-2025 at 12:00 AM. |
|
|
|
|
|
|
#15 |
|
OOTP Developer
Join Date: Jun 2009
Location: Here and there
Posts: 15,833
|
As a baseball fan, if anyone can come up with a better metric for modelling players, I'm all ears.
As for the in-game, we try to follow as much like actual rWAR or fWAR as we can, although yes, we do take some simplifying assumptions. We do the dynamic runs per win, we do adjust for leverage, split starter/rp, etc... Some of them have started counting popups in the strikeout bucket for FIP, we don't necessarily track popups the same in player's historical records. We do rope the park factors into a single factor number for some of the run calculations, and we also use the set park factors, we don't re-calculate park factors for each team based on actual results for the season. We also do assume that your league park factors average out - if you set everyone in your league to a 1.2 HR park factor, you'll probably see some weird adjusted numbers because if one park is a 1.2 we simplify and assume the average the other parks you play in will balance it out. There's other adjustments that impact things, especially if you look at sub-splits for WAR, yes, or players who have sample sizes outside of average. A lot of that is that despite how much we love stats, we also don't store everything in history, because the average user does not want the entire BBRef database stored on their machine for every saved game they play. Or wait for half an hour for us to recalculate every WAR value on the fly because something got tweaked. |
|
|
|
|
|
#16 | |
|
All Star Reserve
Join Date: Oct 2015
Posts: 598
Infractions: 0/1 (1)
|
Quote:
Somewhat related question—Is there a reason why in OPTP rRC+ is not park adjusted, whereas OPS+ does adjust? |
|
|
|
|
|
|
#17 |
|
OOTP Developer
Join Date: Jun 2009
Location: Here and there
Posts: 15,833
|
|
|
|
|
|
|
#18 |
|
All Star Reserve
Join Date: Oct 2015
Posts: 598
Infractions: 0/1 (1)
|
First—I appreciate the quick reply. This must have just changed with v25, then (I bought it in March but never used it). Through v24, if you summed through a season and compared batters’ individual wRC+ and OBS+, it was obviously the case that [wRC+ — OBS+] was highest for players in good hitting parks and lowest (=most negative) for those in poor hitting parks. The wRC+ numbers simply reflected the raw batting line, while the OBS+ stats punished / compensated based on park context.
|
|
|
|
|
|
#19 | ||
|
Minors (Single A)
Join Date: Apr 2022
Posts: 52
|
Quote:
I don't think anyone is actually asking you to calculate park factors based on seasonal data. I am really just asking for the park factors to be applied to the actual games correctly. Applying them broadly cannot possibly work out the correct numbers. Simple example from my reddit post. I had a starting pitcher in coors with 50 innings at home and 100 on the road (because he was shelled at home) the adjustment is being applied like he played half his innings at home. There is no reason to assume starting pitchers do this at all. Especially in any hitter friendly park in an era where pitchers aren't just pitching 9 innings. Quote:
I am really struggling to understand what amount of data do you think is needed that isn't currently being stored right now? If we are able to look up home/away splits and vs team splits there is clearly enough data being stored to apply the park adjustments to the actual parks the games occurred in. |
||
|
|
|
|
|
#20 | |
|
Minors (Single A)
Join Date: Apr 2022
Posts: 52
|
Quote:
Just for clarity for anyone else poking around here. I checked this on 25 and that is still the case I am guessing this is a side effect of the run modifier itself though I have no idea if it is similar to the normal issues of ops+ vs wrc+ or if this is some specific issue with OOTP. |
|
|
|
|
![]() |
| Bookmarks |
|
|