Can you provide more information on the type of historical simulation you are running?
Your comment that Arenado was moved to first base makes me think you are allowing the development engine to take over and not necessarily using recalc.
A few other thoughts.
1.) Defensive stats are pretty well documented to be difficult to accurately translate into ratings. Most of the best defensive metrics out there do not use traditional stats and do not exist for most of baseball's history.
2.) Particularly with Arenado, I wonder if Coors Field may cause problems with the statistical translation to defensive ratings since that field itself causes a much higher BABIP.
3.) You may want to search for Garlon posts on historical fielding ratings. I believe he was involved in a major re-work a version or two back that from all accounts I have seen improved things quite a bit (although I am sure it still isn't going to be perfect).
|