SQL Things I Probably Should Have Known

Like I mentioned in a previous post , I’m determined to level-up my SQL game this year. As such, and having just finished Learning SQL: Generate, Manipulate, and Retrieve Data , I thought I could keep a running list here of SQL things that I’m learning (or learning in greater depth).

This list is in many ways for myself, so I can keep track of the things I’ve learned and come back and reference some of these tools in my day-to-day.

SQL Toolbox #

Adding DISTINCT to any query sorts the data before removing duplicates, which, depending on the size of data, could be cumbersome
You can create temporary tables with CREATE TEMPORARY TABLE foo_table .
You can have multiple criteria in the ORDER BY clause
BETWEEN is inclusive of both date limits
Checking for null: this one has gotten me many times: in SQL, you have to check that foo IS NULL not that foo = NULL
If you don’t specify how to join your tables, you get the Cartesian product of all data in tables. There aren’t a ton of useful cases for a Cartesian product of two tables.
When joining multiple tables, the SQL server will choose a “driving” table and then merge the tables onto that one. The server will generally pick the most efficient way to do this, but if you think you know better, you can do SELECT STRAIGHT_JOIN and the tables will be joined in the order they are listed
You can do self joins! I’m not sure why I’ve never thought of this, but you can merge a table on itself. The example given for usefulness is a film table with a prequel_id column that points to the table itself.
When working with compound queries, if you want to ORDER BY, you have to use columns from the first of the compound queries
Escaping single quotes: You can either add another quote, eg This isn''t a good idea!, or you can add a backslash: This isn\'t a good idea!. You can select with quote if you want the returned data to be escaped as well eg SELECT quote(moby_dick)
Databases start their positioning at 1, not 0!
MySQL’s strcmp() is case insensitive
If for whatever reason you want to round not decimals but the numbers to the left of the decimal, the truncate and round functions allow negative numbers
When filtering a GROUP BY data set, you have to use HAVING since the GROUP BY occurs after the WHERE clause. I knew this one but sometimes catch myself forgetting!
Rollups! I had no idea this was a think. If you do GROUP BY {...} WITH ROLLUP SQL will rollup the grouped data into aggregates. I totally would have done this in code in the past, so this is super helpful to know that SQL can do this for me.
Noncorrelated vs. correlated subqueries: I didn’t know this, but a correlated subquery can reference data from the parent query. But keep in mind that for correlated queries, the subquery will be run for every row of the parent query!
In MySQL, table aliases are not allowed when deleting
Common table expressions: basically subqueries acting as tables at the beginning of queries
Outer Joins: I’m not sure why but I guess I’ve assumed there was some difference other than positioning between a LEFT OUTER JOIN and a RIGHT OUTER JOIN. This is not the case. So I can continue, like any rational human, only using LEFT OUTER JOINS since it is by far easier to understand.
You can NATURAL JOIN tables if you trust the server to know the best columns to join them on. Never trust the server to do this.
Searched vs. Simple Case expressions: generally use searched case since it gives you access to more qualifying logic but simple can work for — you guessed it — simple logic
You can set up transaction savepoints and then roll back to those points!
For low-cardinality data, you can set up bitmap indexes
If you’re curious how the SQL server will execute a statement, you can prepend the whole statement with EXPLAIN. Looking at how a statement is executed and trying to find ways to optimize it is a process called “query tuning”.
You can have too many indexes. Since indexes are actually their own table, for every new item added to an original table, things need to be added to each applicable index table as well. this can slow things down
People will sometimes drop and re-create indexes on an as-need basis to keep things running smoothly
Some views are updateable
You can set up data windows with the over () keyword to sum or aggregate data over specific periods of time
You can rank things!
There are three types of ranking: row_number, rank, dense_rank
You can use SQL to generate rolling sums or averages with the over keyword
The lag and lead keywords allow you to access the previous and prior row’s data for rolling averages
Partitioning tables is a way to split up large table while still maintaining a single “table” you can reference
Sharding databases takes partitioning to the next level and is something I still know very little about in practice