View Single Post
Old 12-14-2021, 06:41 PM   #10
Syd Thrift
Hall Of Famer
 
Syd Thrift's Avatar
 
Join Date: May 2004
Posts: 10,668
All right, so with all that out of the way I wanted to try to get a little more complicated. First up, wouldn't it be cool if we could add a little asterisk next to all of the seasons where Henry Riggs made the All-Star Game? Okay though, how do we do this? It turns out, there is a league_history_all_star table that looks like it is pretty much exactly what you'd expect it to be. This time I'm going to list the SQL first before the results since I think reading through the SQL will make them make more sense:

Code:
SELECT * FROM `modern-ish-baseball`.league_history_all_star asg
	where asg.all_star = 7620;
7620, as you'll remember from before, is Henry Riggs' player_id. In this particular table, "all_star" is actually the player_id. Yell at Markus, not at me!

Code:
league	sub	year	aspos	all_star
100	1	1956	33	7620
100	1	1957	34	7620
100	1	1958	30	7620
100	1	1959	27	7620
100	1	1960	26	7620
100	1	1961	25	7620
100	1	1962	25	7620
100	1	1963	18	7620
100	1	1964	10	7620
100	1	1965	13	7620
100	1	1966	9	7620
100	1	1967	6	7620
100	1	1968	3	7620
These aren't the actual column names; I truncated a couple to make the layout make sense. Anyway, it looks like we do need to add a "league_id = 100" to our where clauses and I don't know what to do with that "all_star_position" field, but anyway, I do know that all we really need is the year.

Code:
SELECT 
	asg.year 
FROM `modern-ish-baseball`.league_history_all_star asg
where 
	asg.all_star = 7620
	and asg.league_id = 100;
Code:
year
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
As you can see, this is a pretty decent set. Riggs, who will certainly go into the Hall of Fame when his career is over, made the ASG every season from 1956 through 1968. He missed out this year because he was injured, though, and he also missed his first two seasons in the majors. So this works to test out because there will be several years where he'll need the asterisk but a couple where he won't. If he had nothing but ASGs in his career, you could accidentally write code that just always added an asterisk no matter what and... well, that would be pretty useless, wouldn't it?

In any case, this is the test subject I'm going to use. In real life programming you'd have several of these, you'd probably automate them (writing code that would evaluate the data you got back to confirm you're seeing what you're supposed to see), and then you'd have a QA department doing its own testing. But hey, for personal use, this should be fine...

All right, so this won't be the end result but just to test this out, let's tell our career batting report to only bring back years in which Riggs made the All Star Game... actually, no, first I'm going to get even simpler.

I feel like it's best when you're doing relatively complicated things like this to break things off into the smallest pieces you can, test the tiny increment you did, go ahead the next little increment test it to make sure that *that* works, and so on and so forth. This may sound kind of slow but trust me, this is way, way better than writing a whole bunch of code you think works only to find out that you made some very basic error in step 3 out of 50.

With SQL it's even worse because you can't really step through your code to figure this out, but I do the equivalent of this in "normal" code all the time. If anything it winds up speeding things up, in part because if you really get to thinking about breaking things down into the tiniest components you can, you free your brain to only having to have one small thing to think about instead of a much larger picture.

All of that is to say, let's bring back only... the year 1968. Let's just do one year for now; multiple years are a little tricky to hardcode, and we'll be able to bootstrap the final deal off of this anyway. I won't copy and paste the whole thing because y'all don't need to read all that but here's the relevant portion (there "where"):

Code:
	where player_id = @player_id
		and tm.league_id = 100
		and split_id = 1
		and bat.year in ( select  1968)
Code:
year	team	g	ab	r	h	2B	3B	hr	rbi	bb	ibb	k	hpb	sh	sf	sb	cs	gdp	ba	obp	slg
1968	ATL	157	563	86	149	22	4	25	77	74	6	56	3	0	6	1	0	15	0.265	0.350	0.451
Totals		2075	7543	1305	2425	353	22	437	1270	1043	91	503	15	0	75	9	11	191	0.321	0.401	0.548
The important part there is that bit where I'm saying "bat.year in (more code)". That's basically saying, "if you can find the year in this set, include it in the report". In this case we only have one hardcoded year. It's not very useful! But... what if we took that code we were just working on - that block inside of the parentheses just wants a select statement - and asked the report to use that instead?

Code:
	where player_id = @player_id
		and tm.league_id = 100
		and split_id = 1
		and bat.year in ( SELECT 
							asg.year 
							FROM `modern-ish-baseball`.league_history_all_star asg
							where 
								asg.all_star = @player_id
								and asg.league_id = 100
						)
Code:
year	team	g	ab	r	h	2B	3B	hr	rbi	bb	ibb	k	hpb	sh	sf	sb	cs	gdp	ba	obp	slg
1956	ATL	125	454	70	142	21	4	20	64	58	2	32	0	0	4	1	0	13	0.313	0.388	0.509
1957	ATL	152	552	99	182	38	1	23	79	65	4	30	0	0	7	0	0	14	0.330	0.396	0.527
1958	ATL	151	570	98	192	29	2	27	94	77	3	42	0	0	5	0	0	10	0.337	0.413	0.537
1959	ATL	153	577	96	188	28	1	37	96	72	6	47	2	0	4	0	0	11	0.326	0.399	0.570
1960	ATL	140	499	79	173	25	2	24	72	74	12	20	0	0	3	1	1	9	0.347	0.429	0.549
1961	ATL	142	527	94	175	33	1	28	88	76	7	23	1	0	6	2	2	20	0.332	0.413	0.558
1962	ATL	148	540	103	188	26	3	33	92	76	7	21	1	0	4	1	0	12	0.348	0.427	0.591
1963	ATL	113	423	87	144	20	0	37	104	57	2	15	1	0	6	0	2	15	0.340	0.415	0.650
1964	ATL	85	314	63	108	17	1	16	41	39	2	20	1	0	1	0	1	12	0.344	0.417	0.557
1965	ATL	159	558	108	177	17	1	45	116	95	9	29	1	0	5	1	1	16	0.317	0.414	0.593
1966	ATL	157	573	111	191	21	0	44	119	88	9	38	0	0	4	1	4	14	0.333	0.420	0.600
1967	ATL	127	465	79	136	17	0	29	72	69	12	40	1	0	5	0	0	12	0.292	0.381	0.516
1968	ATL	157	563	86	149	22	4	25	77	74	6	56	3	0	6	1	0	15	0.265	0.350	0.451
Totals		2075	7543	1305	2425	353	22	437	1270	1043	91	503	15	0	75	9	11	191	0.321	0.401	0.548
I hope you can see what I mean by bite-sized chunks now! Just that where clause is now saying, "where the player is Henry Riggs, and where it's the Major Leagues, and for total splits only (no lefty/righty stuff... and... from years where another table says Henry Riggs was in the All Star Game". That's kind of complicated just to spit out, and the way you have to do it with SQL is with that nested SELECT statement, but as you can see, if you just do it one little step at a time, it's really not complicated at all.

Anyway though, we don't want to bring back *only* years where Riggs made the ASG, we want to add a star to the end of the year where that's the case. So, let's go ahead and yoink that nested SELECT from the WHERE clause and go up to at-bats. Adding an asterisk is pretty simple and we've basically done something similar with innings pitched:

Code:
	concat(bat.year, '*') as year,
I'm not going to return show the entire data set; suffice it to say, every year has an asterisk at the end now. Also not useful but it's a start.

So... in order to make the asterisk appear on some items and not others, we've got to use an IF() statement. I looked this up on w3schools - I work with MS-SQL, not mySQL, and sometimes the syntax is a little different - and this is the format you have to use:

SELECT IF(500<1000, "YES", "NO");

So... anything inside of parentheses, we can nest. Again, do NOT attempt to wrap your brain around the entire thing! Anyway, let's add an if statement that's always true into that concat() statement above to make sure everything works right...

Code:
	concat(bat.year, 
		if(true, "*", "")) as year,
Yep, as expected, it does the same thing as the last bit of code does.

NOW FOR THE BIG MONSTER REVEAL... let's take that statement we added to the where clause - all by itself, it essentially returns a "true" or "false" - and see what happens now!

Code:
	concat(bat.year, 
			if(bat.year in (SELECT 
							asg.year 
							FROM `modern-ish-baseball`.league_history_all_star asg
							where 
								asg.all_star = @player_id
								and asg.league_id = 100
							),
				"*", "")
            ) as year,
    tm.abbr as team,
LA REVEAL MAGNIFICO

Code:
year	team	g	ab	r	h	2B	3B	hr	rbi	bb	ibb	k	hpb	sh	sf	sb	cs	gdp	ba	obp	slg
1954	ATL	70	243	23	66	10	1	4	29	21	2	28	1	0	4	1	0	6	0.272	0.327	0.370
1955	ATL	84	288	31	83	14	1	13	48	23	1	36	2	0	6	0	0	4	0.288	0.339	0.479
1956*	ATL	125	454	70	142	21	4	20	64	58	2	32	0	0	4	1	0	13	0.313	0.388	0.509
1957*	ATL	152	552	99	182	38	1	23	79	65	4	30	0	0	7	0	0	14	0.330	0.396	0.527
1958*	ATL	151	570	98	192	29	2	27	94	77	3	42	0	0	5	0	0	10	0.337	0.413	0.537
1959*	ATL	153	577	96	188	28	1	37	96	72	6	47	2	0	4	0	0	11	0.326	0.399	0.570
1960*	ATL	140	499	79	173	25	2	24	72	74	12	20	0	0	3	1	1	9	0.347	0.429	0.549
1961*	ATL	142	527	94	175	33	1	28	88	76	7	23	1	0	6	2	2	20	0.332	0.413	0.558
1962*	ATL	148	540	103	188	26	3	33	92	76	7	21	1	0	4	1	0	12	0.348	0.427	0.591
1963*	ATL	113	423	87	144	20	0	37	104	57	2	15	1	0	6	0	2	15	0.340	0.415	0.650
1964*	ATL	85	314	63	108	17	1	16	41	39	2	20	1	0	1	0	1	12	0.344	0.417	0.557
1965*	ATL	159	558	108	177	17	1	45	116	95	9	29	1	0	5	1	1	16	0.317	0.414	0.593
1966*	ATL	157	573	111	191	21	0	44	119	88	9	38	0	0	4	1	4	14	0.333	0.420	0.600
1967*	ATL	127	465	79	136	17	0	29	72	69	12	40	1	0	5	0	0	12	0.292	0.381	0.516
1968*	ATL	157	563	86	149	22	4	25	77	74	6	56	3	0	6	1	0	15	0.265	0.350	0.451
1969	ATL	112	397	78	131	15	0	32	79	79	7	26	1	0	5	0	0	8	0.330	0.438	0.610
Totals		2075	7543	1305	2425	353	22	437	1270	1043	91	503	15	0	75	9	11	191	0.321	0.401	0.548
Okay, that's a lot of work for one measly character, I will admit. But imagine if you had to do 500 of those. I have to say that doing something like manually adding asterisks to players if they played in the All Star Game is a thing that's going to feel tedious to me by, like, the 2nd or 3rd player. And yes, I am 100% a person who will spend an hour designing a way to get out of 5 minutes of busy work. But seriously, even though this might seem like a lot, it's really, really not. Or maybe it kind of is but it's just a long list of easy things to do, done one at a time.

Next up... how about an awards column like they have on BBRef, where it might say "AS, MVP". UGH, this will already be a pain in the butt...
__________________
Quote:
Originally Posted by Markus Heinsohn
You bastard....
The Great American Baseball Thrift Book - Like reading the Sporting News from back in the day, only with fake players. REAL LIFE DRAMA THOUGH maybe not
Syd Thrift is offline   Reply With Quote