./sqlite3 -header lahman.db(if in the previous section you downloaded a precompiled binary) or
sqlite3 -header lahman.dbotherwise.
proj1.sql, to help you get started. In the file, you'll find a
CREATE VIEWstatement for each part of the first 4 questions below, specifying a particular view name (like
q2i) and list of column names (like
lastname). The view name and column names constitute the interface against which we will grade this assignment. In other words, don't change or remove these names. Your job is to fill out the view definitions in a way that populates the views with the right tuples.
proj1.sqlfile we provide:
q0as above (via copy-paste), and for all of the following queries, which you will need to write yourself.
python3 test.py -q 0
LEFT OUTER JOINbut not
RIGHT OUTERyou can reverse the order of the tables (i.e.
A RIGHT JOIN Bis the same as
B LEFT JOIN A.
FULL OUTER JOINcan be done by
~) tilde operator. You can use
peopletable, find the
birthyearfor all players with weight greater than 300 pounds.
birthyearof all players whose
namefirstfield contains a space. Order the results by
namefirst, breaking ties with
namelastboth in ascending order
peopletable, group together players with the same
birthyear, and report the
height, and number of players for each
birthyear. Order the results by
birthyearin ascending order.
NULLheight value in the dataset (i.e.
height IS NULL); your query should return
NULLfor the height in those years.
70. Again order the results by
birthyearin ascending order.
yearidof all people who were successfully inducted into the Hall of Fame in descending order of
yearid. Break ties on
yearidin descending order of
yearid. Break ties on
schoolid, playerid(ascending). For this question,
yearidrefers to the year of induction into the Hall of Fame.
schoolidof all people who were successfully inducted into the Hall of Fame -- whether or not they played in college. Return people in descending order of
playerid. Break ties on
NULLif they did not play in college.)
slg(Slugging Percentage) of the players with the 10 best annual Slugging Percentage recorded over all time. A player can appear multiple times in the output. For example, if Babe Ruth’s
slgin 2000 and 2001 both landed in the top 10 best annual Slugging Percentage of all time, then we should include Babe Ruth twice in the output. For statistical significance, only include players with more than 50 at-bats in the season. Order the results by
slgdescending, and break ties by
slgproperly as a floating point number---you'll need to figure out how to convince SQL to do this!
battingmentions two columns
3B. On your local copy of the data set these have been renamed
H3Brespectively (columns starting with numbers are tedious to write queries on).
Ho f the
battingtable represents all hits = (# singles) + (# doubles) + (# triples) + (# home runs), not just (# singles) so you’ll need to account for some double-counting
anderma02in 2006) treat their time on each team separately for this calculation
lslg(Lifetime Slugging Percentage) for the players with the top 10 Lifetime Slugging Percentage. Lifetime Slugging Percentage (LSLG) uses the same formula as Slugging Percentage (SLG), but it uses the number of singles, doubles, triples, home runs, and at bats each player has over their entire career, rather than just over a single season.
lslg. Order the results by
lslg(descending) and break ties by
namelastand Lifetime Slugging Percentage (
lslg) of batters whose lifetime slugging percentage is higher than that of San Francisco favorite Willie Mays.
playeridin your query (
mayswi01), but you may not include his slugging percentage -- you should calculate that as part of the query. (Test your query by replacing
mayswi01with the playerid of another player -- it should work for that player as well! We may do the same in the autograder.)
Using the Lahman database as your guide, make an argument for when MLBs “Steroid Era” started and ended. There are a number of different ways to explore this question using the data.
yearid, min, max and average of all player salaries for each year recorded, ordered by
yearidin ascending order.
binids 0 through 9, and count the salaries in each bin. Return the
highboundaries for each bin, as well as the number of salaries in each bin, with results sorted from smallest bin to largest.
binid0 corresponds to the lowest salaries, and
binid9 corresponds to the highest. The ranges are left-inclusive (i.e.
[low, high)) -- so the
highvalue is excluded. For example, if bin 2 has a
highvalue of 100000, salaries of 100000 belong in bin 3, and bin 3 should have a
lowvalue of 100000.
highvalue for bin 9 may be inclusive).
python3 test.py -q 4ii_bins_0_to_8and
python3 test.py -q 4ii_bin_9to run the tests
countof zero, NOT just excluding the bin altogether.
binids. We'll only be testing with these possible binid's (there aren't any hidden tests using say, 100 bins) so using the hardcoded table is fine
avgdiffwith respect to the previous year. Order the output by
yearidin ascending order. (You should omit the very first year of recorded salaries from the result.)
yearidfor those two years. If multiple players tied for the max salary in a year, return all of them.
diffAvg(the difference between the team's highest paid all-star's salary and the team's lowest paid all-star's salary).
allstarfull.teamidin the SELECT statement for this).
python3 test.pyto see if you're passing tests. If so, follow the instructions in the next section to submit your work.