Booknotes 2.
I’m writing a guide to building your own database server. Here’s what I’ve been doing:
Progress
This week I’ve been working on two chapters, the first is about implementing filtering, sorting and limiting data (with WHERE
, ORDER BY
and LIMIT
) and the second is about joining data.
Researching
A question I’ve tried to answer this week is why the syntax for CAST
is CAST(value AS type)
and not just CAST(value, type)
like other functions in SQL. I’ve read people suggesting that the AS
improves readability, but I’m not sure I believe that as an answer. My reckon is that it’s because most other functions take values and type references such as INTEGER
or BOOLEAN
aren’t values.
Finding reasons behind decisions like this can be tricky for SQL as many of them happened in the 70s. CAST
did come later in the 1992 SQL specification, but was in some implementations before then. It’s not that important to know, but I was curious and I feel that I owe a justification to readers as to why parsing CAST
will be that little bit different to other functions.
Surprising behaviour
I was surprised as to how references in ORDER BY
clauses get resolved. For example:
SELECT -id AS id FROM orders ORDER BY id;
SELECT -id AS id FROM orders ORDER BY id + 0;
These return rows in different orders in SQLite, PostgreSQL and MySQL. Turning id
in the ORDER BY
clause from just a reference into an expression causes the column to be used rather than the value from the select list.
MySQL and SQLite allow you to use named items from the select list within ORDER BY
expressions when there are no clashes, but PostgreSQL doesn’t (e.g: SELECT id AS x FROM orders ORDER BY -x;
). I’m assuming it’s meant to behave this way. I couldn’t find this in the documentation for any of the databases I looked at, I’m yet to delve into the SQL specification and could also check the source code of some databases to confirm the behaviour.
Language
I’m a strong believer of finding the right name. When describing databases, there are lots of overloaded words1, or words that don’t quite fit in all situations. An example of this is talking about what is being joined with JOIN
. The easy answer is tables, but this isn’t always accurate. You could join a view, or join to already joined data.
Relational algebra would describe it as a relation but strictly relations are sets of tuples that can’t contain duplicate rows. Also, in my experience, the word ‘relations’ isn’t commonly used by application developers. I haven’t yet settled on what I should call it. I could stick with ‘table’, but with an explanation. I could be vague and call it ‘data’ or a ‘dataset’. Tabular data? Table-like? A collection of rows? A bag of tuples?
-
‘Database’ has to be the most overloaded word, referring to a database management system, a database server, a set of tables, multiple sets of tables, persistent files that store tables or any structured collection of data. ↩