SMJ has done more to look into this than anyone who's work I have read. The purpose of the 4-25's is to get the AI to make the best decisions. His experience is this rating works best. Like all things OOTP that is subjective and the best I can add is it works for me. Could be my very slow style since I play out every inning of every game?
Level out to me does not mean turns off. What it does IMHO is to not let the AI overreact to any one category and make poor decisions because of that overreaction. A player could hit into bad luck this year with career low babip and, if an average player, be sent down or cut. With level evaluation that will hopefully not happen as the other two years plus ratings can save the AI from itself. Reverse that and he has career high babip and now instead of being cut the AI gives him a long term deal as it's his FA year. Again the other two years of stats and ratings can put this all in context for the AI.
Another way it helps the AI is when the AI did not take stats into account then, of course, it only reacted to ratings. You could have a reigning MVP take a ratings hit the following season half way through and be released even though he was still putting up good numbers. Some would argue that is a good thing as the AI needs all the help it can get. Others would argue IRL as long as that player was putting up numbers he would never be benched or released. Imagine an aging Reggie Jackson with 20 HRs at the All Star break being released the following week because his ratings went down. I think this is the type of thing these settings are trying to avoid.
Also consider the AI prorates some of these evaluation numbers when the sample size is still small. IE a hot first 2 weeks of the season does not get the "benefit" of the full 25% evaluation.
Too me I don't know why one would want to go more than 2 years back? If the AI is evaluating an aging Harmon Killebrew in 1973 I would argue it should not care how he hit in 1963 or 1970 for that matter. The idea, I think, is to get the AI to look at the player as he is today and recently so it can make a good decision on extensions, trades, non-tender, cut etc.
I'm not a statistician but IIRC, when these weights were added to the game and talked about in the forums, 3 seasons was deemed a good sample size to get a look at where a player has been and is going.
Then at the end also consider that each AI GM/scout has their own interpretation on how to evaluate players. This then gets mixed into the 25/25/25/25 to vary things even more. At least that is the theory from SMJ as I understand it. He can correct me I'm not understanding him.
IMHO SMJ's 4-25's works well in helping the AI with handling it's rosters. This means I get a more realistic world in regards to the transactions I see.
You might try it and not agree and that is fine. There is no right answer
