Unless you’re CPU-bound (unlikely), the r-family is usually a better fit for RDBMS, IME. 8x RAM:vCPU ratio, compared to 4x for m-family.
Then there’s the x-family, which can go up to bonkers levels, like 4 TiB of RAM (and local NVMe).
As a sibling comment mentioned, though, if you can fit the data into local storage, that’s the way to do it. EBS latency, even with io2.blockexpress, simply cannot compete.
That said, if I did the math correctly based on the 71 GB/31 days footnote, you’re looking at about 2.15 PB to load the entire thing, so, uh, good luck.
Yes. Most RDBMS are memory-bound (indexes, connection overhead, buffer pool…), so the more, the better.
At my last job, I switched a large-ish (100K QPS at peak) m5 MySQL instance to a smaller (in AWS numbering) r6i that was cheaper, despite having the same amount of RAM and being generation newer. That, combined with careful tuning and testing, resulted in queries speeding up 20-40%, AND we then had plenty of room for vertical scaling again if necessary.
things are changing recently in aws, but few years ago R-family instances had one of the fastest uplink to EBS. for example only the larges M4 instance (m4.16xlarge) has 10gbps uplink, versus R5b where it starts from 10gbps (for the lowest tier) and ends on 60gbps @ 260k IOPS. you can very easily choke EBS with DB.
EDIT: only newer C instances have comparable uplink to EBS, C5 or C4 (and some C6) starts from ~4.7gbps. just compare the EBS bandwidth column in https://aws.amazon.com/ec2/instance-types/
The other fun thing about AWS instances is that the network uplink speed isn’t always what’s advertised. There is of course the “Up to X Gbps” levels (30 minutes of rated speed guaranteed every 24 hours), but there are also other limits, like cross-region speed being capped at 50% of rated capacity.
This rarely matters until it does, like if you’re setting up an active-active DB across regions. Then it’s a fun surprise.
Then there’s the x-family, which can go up to bonkers levels, like 4 TiB of RAM (and local NVMe).
As a sibling comment mentioned, though, if you can fit the data into local storage, that’s the way to do it. EBS latency, even with io2.blockexpress, simply cannot compete.
That said, if I did the math correctly based on the 71 GB/31 days footnote, you’re looking at about 2.15 PB to load the entire thing, so, uh, good luck.