Like I mentioned in a previous post , I’m determined to level-up my SQL game this year. As such, and having just finished Learning SQL: Generate, Manipulate, and Retrieve Data , I thought I could keep a running list here of SQL things that I’m learning (or learning in greater depth).
This list is in many ways for myself, so I can keep track of the things I’ve learned and come back and reference some of these tools in my day-to-day.
SQL Toolbox #
- Adding
DISTINCT
to any query sorts the data before removing duplicates, which, depending on the size of data, could be cumbersome - You can create temporary tables with
CREATE TEMPORARY TABLE foo_table
. - You can have multiple criteria in the
ORDER BY
clause BETWEEN
is inclusive of both date limits- Checking for null: this one has gotten me many times: in SQL, you have to check that
foo IS NULL
not thatfoo = NULL
- If you don’t specify how to join your tables, you get the Cartesian product of all data in tables. There aren’t a ton of useful cases for a Cartesian product of two tables.
- When joining multiple tables, the SQL server will choose a “driving” table and then merge the tables onto that one. The server will generally pick the most efficient way to do this, but if you think you know better, you can do
SELECT STRAIGHT_JOIN
and the tables will be joined in the order they are listed - You can do self joins! I’m not sure why I’ve never thought of this, but you can merge a table on itself. The example given for usefulness is a
film
table with aprequel_id
column that points to the table itself. - When working with compound queries, if you want to
ORDER BY
, you have to use columns from the first of the compound queries - Escaping single quotes: You can either add another quote, eg
This isn''t a good idea!
, or you can add a backslash:This isn\'t a good idea!
. You can select withquote
if you want the returned data to be escaped as well egSELECT quote(moby_dick)
- Databases start their positioning at 1, not 0!
- MySQL’s
strcmp()
is case insensitive - If for whatever reason you want to round not decimals but the numbers to the left of the decimal, the
truncate
andround
functions allow negative numbers - When filtering a
GROUP BY
data set, you have to useHAVING
since theGROUP BY
occurs after theWHERE
clause. I knew this one but sometimes catch myself forgetting! - Rollups! I had no idea this was a think. If you do
GROUP BY {...} WITH ROLLUP
SQL will rollup the grouped data into aggregates. I totally would have done this in code in the past, so this is super helpful to know that SQL can do this for me. - Noncorrelated vs. correlated subqueries: I didn’t know this, but a correlated subquery can reference data from the parent query. But keep in mind that for correlated queries, the subquery will be run for every row of the parent query!
- In MySQL, table aliases are not allowed when deleting
- Common table expressions: basically subqueries acting as tables at the beginning of queries
- Outer Joins: I’m not sure why but I guess I’ve assumed there was some difference other than positioning between a
LEFT OUTER JOIN
and aRIGHT OUTER JOIN
. This is not the case. So I can continue, like any rational human, only usingLEFT OUTER JOINS
since it is by far easier to understand. - You can
NATURAL JOIN
tables if you trust the server to know the best columns to join them on. Never trust the server to do this. - Searched vs. Simple Case expressions: generally use searched case since it gives you access to more qualifying logic but simple can work for — you guessed it — simple logic
- You can set up transaction savepoints and then roll back to those points!
- For low-cardinality data, you can set up bitmap indexes
- If you’re curious how the SQL server will execute a statement, you can prepend the whole statement with
EXPLAIN
. Looking at how a statement is executed and trying to find ways to optimize it is a process called “query tuning”. - You can have too many indexes. Since indexes are actually their own table, for every new item added to an original table, things need to be added to each applicable index table as well. this can slow things down
- People will sometimes drop and re-create indexes on an as-need basis to keep things running smoothly
- Some views are updateable
- You can set up data windows with the
over ()
keyword to sum or aggregate data over specific periods of time - You can
rank
things! - There are three types of ranking:
row_number
,rank
,dense_rank
- You can use SQL to generate rolling sums or averages with the
over
keyword - The
lag
andlead
keywords allow you to access the previous and prior row’s data for rolling averages - Partitioning tables is a way to split up large table while still maintaining a single “table” you can reference
- Sharding databases takes partitioning to the next level and is something I still know very little about in practice