This release includes a feature I added [1] to support partial foreign key updates in referential integrity triggers!
This is useful for schemas that use a denormalized tenant id across multiple tables, as might be common in a multi-tenant application:
CREATE TABLE tenants (id serial PRIMARY KEY);
CREATE TABLE users (
tenant_id int REFERENCES tenants ON DELETE CASCADE,
id serial,
PRIMARY KEY (tenant_id, id),
);
CREATE TABLE posts (
tenant_id int REFERENCES tenants ON DELETE CASCADE,
id serial,
author_id int,
PRIMARY KEY (tenant_id, id),
FOREIGN KEY (tenant_id, author_id)
REFERENCES users ON DELETE SET NULL
);
This schema has a problem. When you delete a user, it will try to set both the tenant_id and author_id columns on the posts table to NULL:
INSERT INTO tenants VALUES (1);
INSERT INTO users VALUES (1, 101);
INSERT INTO posts VALUES (1, 201, 101);
DELETE FROM users WHERE id = 101;
ERROR: null value in column "tenant_id" violates not-null constraint
DETAIL: Failing row contains (null, 201, null).
When we delete a user, we really only want to clear the author_id column in the posts table, and we want to leave the tenant_id column untouched. The feature I added is a small syntax extension to support doing exactly this. You can provide an explicit column list to the ON DELETE SET NULL / ON DELETE SET DEFAULT actions:
CREATE TABLE posts (
tenant_id int REFERENCES tenants ON DELETE CASCADE,
id serial,
author_id int,
PRIMARY KEY (tenant_id, id),
FOREIGN KEY (tenant_id, author_id)
-- Clear only author_id, not tenant_id
REFERENCES users ON DELETE SET NULL (author_id)
-- ^^^^^^^^^^^
);
I initially encountered this problem while converting a database to use composite primary keys in preparation for migrating to Citus [2], and it required adding custom triggers for every single foreign key we created. Now it can be handled entirely by Postgres!
The posts table also has a foreign key to the tenants table specified with ON DELETE CASCADE, so all the tenant's posts will be deleted. (I don't know if Postgres makes any effort to find some optimal ordering of the referenced tables—in this case deleting the tenant's posts firsts, then its users—to avoid updating records that will just get deleted anyway by cascading deletes.)
Shameless self-promotion: the homepage of plaintextsports.com is 5.2kb today [1], an in-progress WNBA game (4th quarter) is 11.2kb [2], and an extra inning MLB game is 8.8kb [3]. I wasn't aware of this size threshold, and I'm not at this level of optimization, but I'm always pleased to find more evidence of my playful claim that it's the "fastest website in the history of the internet".
It's very small, but it's difficult to scan and painful to read. You could easily use built-in HTML structures to make it actually readable. Your site is, in my opinion, as much a deviation from the old readable web as the over-designed modern sites are.
There are lots[1] of small, "class-less" CSS libraries that would keep your site as small (or smaller, with tree-shaking in a modern build system) and it would end up much more user-friendly.
I found it easy to read on my phone in light mode, still easy to skim in dark mode but the losing team text is too dark and I have to focus to read it.
I had never heard of GROUP BY CUBE either! It looks like it's part of a family of special GROUP BY operators—GROUPING SETS, CUBE, and ROLLUP—that basically issue the same query multiple times with different GROUP BY expressions and UNION the results together.
Using GROUP BY CUBE(a, b, c, ...) creates GROUP BY expressions for every element in the power set of {a, b, c, ...}, so GROUP BY CUBE(a, b) does separate GROUP BYs for (a, b), (a), (b) and ().
It's like SQL's version of a pivot table, returning aggregations of data filtered along multiple dimensions, and then also the aggregations of those aggregations.
It seems like it's well supported by Postgres [1], SQL Server [2] and Oracle [3], but MySQL only has partial support for ROLLUP with a different syntax [4].
I would gladly buy a book of "SQL Recipes" ranging from beginner-level to advanced stuff that uses features like this, ideally with coverage of at least a few popular database systems, but at minimum Postgres.
Not a pivot table equivalent. Most useful for calculating multiple related aggregates at once for reporting purposes, but ROLLUP doesn't substitute values for columns, ie. it doesn't pivot results on an axis.
For folks just learning about ROLLUP et al, I highly recommend this comparison chart for an overview of major features offered by modern relational databases.
https://www.sql-workbench.eu/dbms_comparison.html
There's a whole constellation of advanced features out there that arguably most application developers are largely unaware of. (Which explains why most app devs still treat relational databases like dumb bit buckets at the far end of their ORMs.)
I had a situation recently where I had a huge amount of data stored in a MariaDB database and I wanted to create a dashboard where users could interactively filter subsets and view the data. The naive solution of computing the aggregate statistics directly based on the users' filter parameters was too slow, most of the aggregation needed to be done ahead of time and cached. The website's backend code was a spaghetti house of horrors so I wanted to do as much as possible in the DB. (The first time in my career I chose to write more SQL rather than code)
If I had a fancy DB I could use CUBE or GROUPING SETS and MATERIALIZED VIEWs to easily pre-calculate statistics for every combination of filter parameters that automatically get updated when the source data changed. But I had MariaDB so I made do. I ended up with something like this:
SELECT ... SUM(ABS(r.ilength)) AS distance, COUNT(*) AS intervals FROM r
GROUP BY average_retro_bucket, customer, `year`, lane_type, material_type, state, county, district WITH ROLLUP
HAVING average_retro_bucket IS NOT NULL AND customer IS NOT NULL;
"The WITH ROLLUP modifier adds extra rows to the resultset that represent super-aggregate summaries. The super-aggregated column is represented by a NULL value. Multiple aggregates over different columns will be added if there are multiple GROUP BY columns."
So you can query like this to get stats for all districts in CA->Mendocino county:
SELECT * FROM stats_table WHERE state = 'CA' AND county = 'Mendocino' AND district IS NULL
or like this to get a single aggregate of all the counties in CA put together:
SELECT * FROM stats_table WHERE state = 'CA' AND county IS NULL AND district IS NULL
However unlike CUBE, WITH ROLLUP doesn't create aggregate result sets for each combination of grouping columns. If one grouping column is a NULL aggregate, all the following ones are too. So if you want to query all the years put together but only in CA, you can't do:
SELECT * FROM stats_table WHERE year IS NULL AND state = 'CA'
If `year` is null, all the following columns are as well. The solution was to manually implement wildcards before the last filtered group column by combining the rows together in the backend.
I worked around not having materialized views by creating an EVENT that would re-create the stats tables every night. The stats don't really need to be real-time. Re-writing the multiple-GB statistics tables every night will wear out the SSDs in 20 years or so, oh well.
The most common constraint is whether the DB as a service offerings support a given extension, since they don't support installing custom ones. Naturally choosing to support an extension across a fleet of hundreds of thousands of instances (running dozens of different minor versions) isn't a decision made lightly, so it can take a while for new extensions to be supported.
Shameless plug (StackGres team member here) but StackGres possibly has the largest selection of ready-to-use Postgres extensions [1].
Give it a quick try on any Kubernetes cluster, like k3s on your laptop (one command install), and install any extension from the Web Console or a 1-line in the SGCluster yaml.
If anyone's looking for a great long-form article about calculating digits of Pi from the good ol' days, this 1992 New Yorker profile of the Chudnovsky brothers (creators of algorithm used by Google here!) is fantastic:
Reading that, it's hard not to see the parallels to Darren Aronofsky's movie Pi. Building a supercomputer in an apartment to find patterns in Pi that have some higher meaning, the computer constantly breaking, etc.
On my to-do list is making a personalized page where you can select exactly which teams you want to follow, something closer to what dpeck wants, but I've been having trouble finding the time...
This would be awesome! I love plaintext sports and the minimalist aesthetics of it.
Curious, it seems college baseball scores are generally not as easy to find as others. Do they tend to be behind more restrictive APIs, or is it more just that there’s so little interest in it vs MLB?
For college football and basketball I just use some JSON endpoints that the ncaa.com frontend hits, and they're actually very good. Literally all I had to do to support Women's basketball in addition to Men's was change "men" to "women" in the url. I haven't checked the data for baseball, but I'm sure it's more than adequate. That being said, I personally don't have any interest in college baseball, and I don't think it has a huge following, so I'm not going to invest any time into it, especially when there are higher profile leagues that I still haven't added yet (specifically, all of European football).
completely understand that, and you’re right on the small following. I like the high energy (for baseball that is) offensive heavy style of play, but I know I’m an outlier vs the MLB fan base size.
Great site and wish you much continued success with it!
Fair point, but on the other hand, the destructor has to work for all contexts. The usual argument is that the creator of the class has the best vision for what should be in the destructor, but I don't agree with that either.
I'm a big fan of Ruby, and feel like I know the language pretty well, but, my goodness, I don't know how I've never come across this one before. THIS is something that feels like magic!!
[1]> 3.method(:days).source_location
=> [".../lib/active_support/core_ext/numeric/time.rb", 37]
[2]> puts 3.method(:days).comment
# Returns a Duration instance matching the number of days provided.
#
# 2.days # => 2 days
=> nil
[3]> puts 3.method(:days).source
def days
ActiveSupport::Duration.days(self)
end
=> nil
This is useful for schemas that use a denormalized tenant id across multiple tables, as might be common in a multi-tenant application:
This schema has a problem. When you delete a user, it will try to set both the tenant_id and author_id columns on the posts table to NULL: When we delete a user, we really only want to clear the author_id column in the posts table, and we want to leave the tenant_id column untouched. The feature I added is a small syntax extension to support doing exactly this. You can provide an explicit column list to the ON DELETE SET NULL / ON DELETE SET DEFAULT actions: I initially encountered this problem while converting a database to use composite primary keys in preparation for migrating to Citus [2], and it required adding custom triggers for every single foreign key we created. Now it can be handled entirely by Postgres![1]: https://www.postgresql.org/message-id/flat/CACqFVBZQyMYJV%3D...
[2]: https://www.citusdata.com/