View Single Post
Old 01-18-2024, 07:20 PM   #16
Syd Thrift
Hall Of Famer
 
Syd Thrift's Avatar
 
Join Date: May 2004
Posts: 10,616
I've changed up the way I do the "reports" part of my dynasty this year and I wanted to share the DB tools I've been using to get that data.

First up, I'm getting the list of the teams in Major League Baseball along with some extra data using this:

Code:
select team_id
		,name
        ,nickname
        ,background_color_id
        ,text_color_id
	from teams
	where level = 1
    order by name
This brings in the team_id, which I'm using elsewhere, plus the name/nickname, and then, because I use this to color the names of stuff in the reports, the background color ID and, theoretically, the text color ID. Pretty straightforward!

The next thing I have, which I re-run for every team, is this roster "getter":

Code:
set @year = 1972, @team_id = 13;

select distinct tm.team_id
        ,tr.player_id
        ,concat(plr.first_name, ' ', plr.last_name) as name
        ,plr.position
        ,bat.pa
        ,pit.ip
from team_roster tr
	left join teams tm on tr.team_id = tm.team_id
    left join players plr on plr.player_id = tr.player_id
    left join players_career_batting_stats bat on bat.player_id = plr.player_id 
		and bat.year = @year 
        and bat.split_id = 1 
        and bat.team_id = tm.team_id
    left join players_career_pitching_stats pit on pit.player_id = plr.player_id 
		and pit.year = @year 
        and pit.split_id = 1
        and pit.team_id = tm.team_id
where tm.team_id = @team_id
order by position, ip desc, pa desc
limit 250
I realize as I'm reading through this that the left join on teams is entirely superfluous but I've got it there just in case. The team_roster table is what is called a mapping table that describes a many-to-many relationship in the data. In this case, there are many team rosters, and each team roster has many players. This is pretty standard; there are different ways of handling one-to-one and one-to-many data relationships as well. The big thing they teach you about in database school is that you want to try to duplicate data as little as possible.

Anyway, because I also wanted a quick and dirty way of seeing IPs and PAs to help determine who I should show data for, I have left joins to the pitcher and batter data as well. Note there that the left join starts with team_roster: there is only one player per team in that table so I can be OK with understanding that there is at most one line in the corresponding pitcher/batter record so long as I specify the player, the team, the year, and the split_id (1 is "regular season").

From there I started with this to show the bio data:

Code:
SET @playerID = 13749, @teamID = 5;

select concat('[color="', tm.background_color_id, '"]', plr.first_name, ' ', plr.last_name, '[/color]') as line1
		,concat('POS No. ', plr.uniform_number) as line1pt5
		,concat((
					case
						when plr.bats = 1 then 'R'
                        when plr.bats = 2 then 'L'
                        when plr.bats = 3 then 'S'
                    end), (
                    case
						when plr.throws = 1 then 'R'
                        when plr.throws = 2 then 'L'
                    end), ', '
						,truncate((plr.height / (2.54 * 12)), 0)
                        ,''''
                        ,round(((plr.height / 2.54) % 12), 0)
                        ,'" ',
                        plr.weight, ' lbs.') as line2
		,concat('Born ', plr.date_of_birth) as line3
	from players plr
		left join teams tm on tm.team_id = @teamID
    where plr.player_id = @playerID
I've since migrated this over to the pitcher/hitter scripts I'm using to save how many times I need to copy/paste the player ID but this is the gist.

This is doing a fair bit more work so I'll try and explain it... You'll note that each entry is just a concatenated string. I set this up so that I can most easily just hit enter a couple times in the right spots. The format here is cribbed from the 80s era Scouting Report books that were produced by Harmon Killebrew and Brooks Robinson among others. The one line item I'm missing is the birthplace and that is because, if I'm being honest, adding that was going to be too much of a PITA . Looking at it, I'll probably add it for the future though.

The top line also inserts a chunk of BBCode that takes the background color for the teams - usually that's a dark color - and sets that as the text color.

Setting batting and throwing should be pretty straightforward.

The next bit is height, which is a pain because the game lists it in centimeters, Markus being German and all. First I get the feet, which is basically the height in inches (so height / 2.54), divided by 12, and then we only need the integer so lop off everything else. Then I get inches, which starts with the height in inches, and then gets the *modulus* of 12 - that means, the remainder... so a guy who's listed at 5'7" in this would be 67 inches (some other number of cm but whatever), which in turn translates into 5 feet and 7 "modulus feet". This calculation occasionally churns out players who are 5'12" because it doesn't know how to round them off. I would write an if statement to handle that but to be perfectly honest I've run into it twice so far and so I just change it manually.

POS is just a placeholder for the position I'm adding. I could automate that but I don't really have a cut-and-dried algorithm for it so I just take the extra second to type in, like, "CF" or "1B/OF".

Now onto the fun stuff:

Code:
set @playerID = 5132, @teamID = 13;

select concat('[color="', tm.background_color_id, '"]', plr.first_name, ' ', plr.last_name, '[/color]') as line1
		,concat('POS No. ', plr.uniform_number) as line1pt5
		,concat((
					case
						when plr.bats = 1 then 'R'
                        when plr.bats = 2 then 'L'
                        when plr.bats = 3 then 'S'
                    end), (
                    case
						when plr.throws = 1 then 'R'
                        when plr.throws = 2 then 'L'
                    end), ', '
						,truncate((plr.height / (2.54 * 12)), 0)
                        ,''''
                        ,round(((plr.height / 2.54) % 12), 0)
                        ,'" ',
                        plr.weight, ' lbs.') as line2
		,concat('Born ', plr.date_of_birth) as line3
	from players plr
		left join teams tm on tm.team_id = @teamID
    where plr.player_id = @playerID;

select concat(pit.year, ' ', tm.abbr, ' ',(CASE
			when tm.level = 2 THEN 'AAA'
			when tm.level = 3 THEN 'AA'
            when tm.level = 4 THEN 'A'
            when tm.level = 5 THEN 'S A'
            when tm.level = 6 THEN 'R'
            else 'MLB'
		END)) as yrtmlvl
		,pit.w
        ,pit.l
	     ,pit.s as sv
        ,truncate((pit.er / (pit.outs / 3) * 9), 2) as era
        ,pit.g
        ,pit.gs
        ,pit.cg
        ,concat(pit.ip, '.', pit.ipf) as ip
        ,pit.ha as h
        ,pit.r
        ,pit.er
        ,pit.bb
        ,pit.k
FROM players_career_pitching_stats pit
	LEFT JOIN teams tm on tm.team_id = pit.team_id
WHERE pit.player_id = @playerID
	and pit.split_id = 1
    and pit.year in (1970, 1971, 1972)
ORDER BY pit.year asc
This is basically asking the game to return a short little set of statistics for the seasons 1970-72 for the player in question, split ID 1, and sorted by the year from 1970 to 1972. So far it looks like when a player plays at the same level for two teams in the same year (like when a guy gets traded) it shows them in reverse order somehow. Those have been rare enough that I just copy/paste the lines. There's no separate pitcher_career_id - the table is differentiated by several keys instead of just one primary - so it's not so easy as to just order by that. Anyway, it's relatively rare and it takes all of 2 seconds to fix.

I end up with something like this:

Code:
Yr   Tm  Lvl   W   L  Sv   ERA   G  GS  CG     IP    H   R  ER   BB   SO
1970 PHI MLB  13  13   0  3.44  34  34   7  230.0  225 106  88   73  182
1971 PHI MLB  17  14   0  3.38  39  39  11  287.0  271 121 108   85  223
1972 STL MLB  13  14   0  2.69  34  34  10  271.0  221  88  81   84  231
What I'm doing here is copying the results into VSCode, doing a CTRL-F on the tab character and changing it into 2 spaces, and then going in and manually editing down the individual rows so they all line up. This is a little bit of busy work but not a huge amount.

I should note here too that while running two SELECT statements in MS-SQL via SSMS outputs two separate windows that you can select from, MySQL just flashes the first and then shows the second, so what I need to do here is select the first part of the query, run it, and then deselect and run the whole thing.

Hitting is similar:

Code:
SET @playerID = 16201, @teamID = 13;

select concat('[color="', tm.background_color_id, '"]', plr.first_name, ' ', plr.last_name, '[/color]') as line1
		,concat('POS No. ', plr.uniform_number) as line1pt5
		,concat((
					case
						when plr.bats = 1 then 'R'
                        when plr.bats = 2 then 'L'
                        when plr.bats = 3 then 'S'
                    end), (
                    case
						when plr.throws = 1 then 'R'
                        when plr.throws = 2 then 'L'
                    end), ', '
						,truncate((plr.height / (2.54 * 12)), 0)
                        ,''''
                        ,round(((plr.height / 2.54) % 12), 0)
                        ,'" ',
                        plr.weight, ' lbs.') as line2
		,concat('Born ', plr.date_of_birth) as line3
	from players plr
		left join teams tm on tm.team_id = @teamID
    where plr.player_id = @playerID;

select concat(bat.year, ' ', tm.abbr, ' ',(CASE
			when tm.level = 2 THEN 'AAA'
			when tm.level = 3 THEN 'AA'
            when tm.level = 4 THEN 'A'
            when tm.level = 5 THEN 'S A'
            when tm.level = 6 THEN 'R'
            else 'MLB'
		END)) as yrtmlvl
		,round(bat.h / bat.ab, 3) as avg
        ,bat.g
        ,bat.ab
        ,bat.r
        ,bat.h
        ,bat.d as 2b
        ,bat.t as 3b
        ,bat.hr
        ,bat.rbi
        ,bat.bb
        ,bat.k as so
        ,bat.sb
	from  `modern-ish-baseball`.players_career_batting_stats bat
    left join  `modern-ish-baseball`.players plr on plr.player_id = bat.player_id
    left join  `modern-ish-baseball`.teams tm on bat.team_id = tm.team_id
		where split_id = 1
            and plr.player_id = @playerID
            and bat.year in (1970, 1971, 1972)
	order by year asc, level desc
...and it leaves you with something like this:

Code:
Yr   Tm  Lvl    Avg    G   AB    R    H  2B  3B  HR  RBI   BB   SO  SB
1970 LEW S A  0.000    4   15    0    0   0   0   0    0    0    3   0
1970 MOD A    0.280    9   25    4    7   1   0   3    6    8    7   0
1970 CR  A    0.223   26   94   11   21   5   0   1    9   12   21   0
1970 STP A    0.242   37  128    7   31   5   0   1   15   12   31   1
1971 CR  A    0.204   94  314   30   64   8   1   4   30   34   71   0
1971 ARK AA   0.241   17   58    3   14   4   0   0    8    5   13   0
1972 TUL AAA  0.297   47  155   19   46   6   1   4   16   19   27   0
1972 STL MLB  0.213   41  141    7   30   6   0   1   12   12   25   0
If I really wanted to take the time to drop the leading 0 off of batting average, I could convert it into a varchar and then lop the first character off of it. Occasionally that would mean that guys who hit 1.000 would show up as .000 but how often does that happen?
__________________
Quote:
Originally Posted by Markus Heinsohn
You bastard....
The Great American Baseball Thrift Book - Like reading the Sporting News from back in the day, only with fake players. REAL LIFE DRAMA THOUGH maybe not
Syd Thrift is offline   Reply With Quote