|
||||
| ||||
|
|
#1 | |
|
All Star Starter
Join Date: May 2006
Posts: 1,414
|
Neutralized play - fact or fiction
Using Garlon's explaination below, I would like to start a discussion/analysis of neutralized data within OOTP. I am seeing several things which appear as anomalies. I think a discussion to get clarification would help and may lead to improving for OOTP 12.
Quote:
OOTP files with the odb file removed. I also created leagues with neutralized stats using the Gambo/Spritze DB v3 and the Spritze HS DB v33. I ran these tests as part of the beta patch 4 testing. I started looking at players for Baltimore and noticed some strange things. I have some guesses as to what they are, but would like the views of the experts as well. I am still looking at other teams and will look at other years as well, so yes this is a small sample, but there were also many anomalies. Let me start what I believe to be data anomalies in the various databases. Jim Jackson was a rookie in 1901. According to Baseball reference.com he played 35 games in left field, 59 games in CF, and 2 games in RF. (There is no innings/fielding outs data on baseball reference). However the data in the game shows real stats of 35g/280in in LF, 59g/472in in CF and 2g/16in in right. These stats are the same for all four versions on the OOTP datatbase and the Gambo/Spritze DB. But the Spritze HS data shows the following for real stats in 1901 - 75g/607.2in in LF, 59g/501.1 in in CF and 2g/17in in RF. Interestingly all versions list Jim Jackson as a LF when he clearly played more games and innings at CF. In fact, he is not even rated at CF in the leagues set up on Real stats either with or without the ODB. He is not rated at CF in the leagues based upon Neutralized stats with the ODB, including Gambo/Spritze, but is rated at CF using the OOTP files without the ODB and in the Spritze HS DB with his ODB. This gets even more intriguing by looking at the neutralized fielding tab. Using the league on real stas using the odb file, Jackson shows the following neutralized fielding data. LF 75g/607.2in, CF 59g/558.2in and NO RF data. This obviously looks very similar to the Spritze HS DB real data. The same result is seen all the OOTP based files. Of couse if you create a league using neutralized stats or remove the ODB fiel, then the data is duplicated and in fact you get different rating values (but the position eligibility stays the same). However the Gambo/Spritze DB shows the following in the neutralized stats LF 75g/607.2in CF 71g/603.2in and RF 2g/17in. The Spritze HS DB shows the same as the Gambo/Spritze. So in addition to the issue with different data, we are also looking at why Jackson has more games/innings in LF than CF in the neutralized dataset vice his real life? Why the game puts him as only a LF when he is clearly a CF? And where is all this data coming from and why isn't it the same? Before you say well it is only one player, I should point out I found 3 other players on Baltimore 1901 roster who also have differences, when I stopped trying to look at every player on Baltimore, so either I am very lucky at finding anomalies or there are many more. This leads to the second issue which is even more complex, so I'll summarize just to keep this post short. Even though there is only 1 set of fielding values in the master file for like Inf Arm, InfRange, there are different values for almost every player depending upon whether you start using real stats with or without the ODB or neutralized stats with or without the ODB. In fact two separate runs using real stats with the ODB created two different sets of ratings. Obviously more analysis is needed, but if one is expecting to use neutralized stats, there are some anomalies which need to be understood.
__________________
Commish of the Home Nations Baseball Association Commish of the Baseball Association League Commish of the League of WAR Commish of the On-Line Dynasty League SIMBL2 - Westbury Cannons Great Lakes Baseball - Toledo Neptunes World Baseball - Guantanamo Marines OMLB - Cincinnati Reds |
|
|
|
|
|
|
#2 |
|
OOTP Historical Czar
Join Date: Dec 2001
Location: Bothell Wa
Posts: 7,253
|
The neutralized database process would first look to fill any necessary additional data based on Mr. Jacksons complete career stats. He only played 2 additional games in CF the remaining 3 years of his career and well over 200 games in LF.
If there is no additional career data available for a specific player the process would use replacement player data based on Year/League/Position. |
|
|
|
|
|
#3 |
|
Hall Of Famer
Join Date: Jun 2004
Posts: 4,256
|
In my original post I mentioned that we used a method to determine Estimated Defensive Innings for all players where real life data did not exist. The method we used is based on Bill James Win Shares pg 155-160. We took he extra step of creating the discrete estimated defensive inningouts for each of the 3 OF positions for every player.
The estimated data is very solid and it is the best that can be done. When we were creating the DB for the game, we even compared how close results from the estimated method were to actual known data for some seasons. The results were very good for a very large portion of the players in the DB - the method was less accurate for players who only had a handful of games played in a season. |
|
|
|
|
|
#4 | |||
|
All Star Starter
Join Date: May 2006
Posts: 1,414
|
Thanks for the quick response and just to help clarify, I'll break this into a couple of areas.
1) I sent Spritze a PM about data in his HS DB, because it is probably more an issue of I don't understand the premise and is better to just discuss it with him. (He has responded so that will get sorted). 2) the neutralized data itself. There may or may not be anomalies in that data and there are some differences across all the databases. The assumption being that understanding the data will help in understanding how it is used. 3) How the statistical data is utilized? This should be be the same regardless of whether it is real or neutralized data. Just the underlying data is different. This may not be a valid assumption. Part of my looking for a discussion, is that I would like to develop a historical player guide, which would discuss how to do various trype of historical leagues. There seem to be a lot of questions on how to do certain things and an overall guide would appear to be helpful. The other part is to improve the game of 12 and beyond. That is why I re-did the master file for the bio data. I also think there is guidance which we could establish as requirements that could be passed on to OOTP developers. One such requirement in my mind is that a player should be rated for any position he played in a particular season/career. Don't over-react here as I do believe there may be some caveats, but as a working premise it is a good starting point. I use this because playing in an on-line historical league, there are usually rules about positions, and it is tough to argue that player X can't play position Y in the league because he has no rating for that position, yet he played that position in real-life, even in that particular year. So in looking at the responses: Quote:
The key question, and I believe improvement needed in the OOTP engine, is not should his primary position be LF or CF but should there be ratings for a player at the positions he plays in real lfe. I'll break this into if siming real life stats and neutralized stats. Real-Life stats Jackson doesn't play in 1903, so there are no 1903 season data (e.g. no third year of real-life for the sim to utilize. Question - What does it do? Forgeting that for a moment the 3 years stats to base the ratings are LF 66 games/528 innings; CF 60 games/480 innings; RF 4 games 32 innings. Just using a blink test, it is hard to understand why Jackson doesn't get rated at CF. Neutralized stats Here there is 3 years worth of data and the neutralized process has filled in the gap. The interesting note is that there is no RF neutralized data, but again that is a later discussion. LF 186 games/1512 innings; CF 99 games/938 innings. While the disparity is greater (see discussion on next point), 99 games should be enough to create positional ratings. Quote:
The other interesting note on the neutralized data is that while it left the games played at CF the same at 59, it increased the innings from 472 to 558, which means he has to average 9.5 innings per game played in CF. Quote:
Recognizing this is pitching not outfiled, but Bobby Wallace and Jake Beckley both pitched 1 game in 1902 and none in 1901 or 1903. Yet in the 1901 start up, both are rated in neutralized stats and both are rated at pitcher. Yet, if a position player does not get a minimum (looks like 5 or 6 but more analysis needed), they do not get neutralized data generated and of course are not rated at those positions. Again just questions for understanding and looking to improve upon the historical capabilities.
__________________
Commish of the Home Nations Baseball Association Commish of the Baseball Association League Commish of the League of WAR Commish of the On-Line Dynasty League SIMBL2 - Westbury Cannons Great Lakes Baseball - Toledo Neptunes World Baseball - Guantanamo Marines OMLB - Cincinnati Reds Last edited by Bristolduke; 10-11-2010 at 01:54 PM. |
|||
|
|
|
|
|
#5 |
|
Hall Of Famer
Join Date: Jun 2004
Posts: 4,256
|
Regarding Jim Jackson 1901. The Innings Played in the "Real Stats" tab are simply calculated as Games Played times 8...or 8 innings per game. That's what OOTP generates as a default since there is no data for so many seasons.
For the neutralized stats we used Bill James method which analyzes a team's IPouts and A + PO + E at each position. With that neutralization process a team's total innings played at each position will match their total innings pitched by their pichers. For Jim Jackson his 59 games played in CF remains the same, however, his innings played is estimated to be 558.2. I don't see why his games played in LF have been boosted and his RF games have disappeared. Spritze, why is this? Although, Jim Jackson's defensive innings seems high it is still possible. With the limted roster sizes in the early 1900's it is possible that he played every inning of every game and even extra inning games. Still 28+ outs per game seems high considering his team averaged 26.outs per game. However, that's what the data yields. There is a margin of error for outs per game when using the estimated defensive innings method. This may be a situation in which there is greater error in the estimate. The estimates get a bid muddy when a position is split between many players. Remember that in Lahman we only have a lump value for "OF" PO/A/E, and then we are given their distribution of game splayed at each of the 3 OF positions. In such situations we had to use a standard distribution of plays made across each of the 3 OF positions. What we have found from the real life discrete data that goes back about 50 years, is that about 30% of the outfield plays go to LF, 40% go to CF, and 30% go to RF. So when we reverse engineer this the same assumption is used for calcualting th defensive innings of teams. For the 1901 Orioles, if their real life ditribution of OF plays was significantly different than 30/40/30 it could cause some additional error in the process. I can tell you that we were extremely meticulous in doing the defensive neutralization process, and that I believe we have by far the best neutralized PO/A/DP and defensive inning data that can be generated from the limited real life data available. So if you look hard you can find some anomalies, but if you consider everything as a whole you wll see that the neutralization process is actually quite accurate and realistic. And yes, for players with very few AB or IP or Defensive Innings, we did pro-rate them based on their career rates so that OOTP would generate proper ratings from these stats. This was done to avoid sample size issues when generating ratings. |
|
|
|
|
|
#6 |
|
Hall Of Famer
Join Date: Jun 2004
Posts: 4,256
|
Consider the 1906 Cubs.
Their real defensive stats give them 39.5 PO+A per 27 outs. Their neutralized defensive stats yield 48.7 PO+A per 27 outs. Their real defensive stats look pretty average or even slightly below league average. The neutralization process boosts them by 23%. This is realistic since the 1906 Cubs won 116 games and had more defensive win shares than any team in history. |
|
|
|
|
|
#7 | ||
|
All Star Starter
Join Date: May 2006
Posts: 1,414
|
Thanks again for the quick response.
My software background says to check the integrity of the data first then the software logic which utilizes it, to avoid chasing your tail. I edited my post because I type very poorly and so much slower than I think, the point I was trying to make was Quote:
Quote:
I'll do some more analysis to see what is there. I can relate to the problem of checking 17,000+ players. I am assuming you use a T-Test or other statistical method for accuracy, but still not fool-proof at the player/season level to be sure.
__________________
Commish of the Home Nations Baseball Association Commish of the Baseball Association League Commish of the League of WAR Commish of the On-Line Dynasty League SIMBL2 - Westbury Cannons Great Lakes Baseball - Toledo Neptunes World Baseball - Guantanamo Marines OMLB - Cincinnati Reds |
||
|
|
|
|
|
#8 | |
|
All Star Starter
Join Date: May 2006
Posts: 1,414
|
Quote:
If you say "true", then would you be surprised that there are over 27,000 occurrences where it is false. If it isn't true, then what is the criteria for not have a neutralized equivalent to a player having real stats at a position.
__________________
Commish of the Home Nations Baseball Association Commish of the Baseball Association League Commish of the League of WAR Commish of the On-Line Dynasty League SIMBL2 - Westbury Cannons Great Lakes Baseball - Toledo Neptunes World Baseball - Guantanamo Marines OMLB - Cincinnati Reds |
|
|
|
|
![]() |
| Bookmarks |
| Thread Tools | |
|
|