At this point you can get 24TB of RAM in an EC2 instance (along with 448 vCPUs, 100Gbps of network bandwidth and 38Gbps of EBS bandwidth). That won't scale forever, but Stack Overflow has been running on a single primary/standby setup with 1.5TB of RAM so that would be 16x Stack Overflow's RAM.
I think a lot of work goes into horizontal scaling which is necessary at a certain scale, but very few people actually get anywhere near that scale. It can be important to understand which things are needed at your scale and where you can simply buy some beefier hardware. I've been at places where people run a dozen sharded DB servers with each server having 16GB of RAM. Maybe that's resume-driven-development where someone wants to say they've done that.
A bunch of smaller distributed instances could be cheaper than one big one at equivalent size/compute. It also allows you to grow as needed, without worrying about things like DB transfer, instead of absorbing a big upfront cost.
I agree it adds alot of complexity to the problem, which is another cost.
I guess this would be another argument for pay-as-you go cloud-managed DBs, despite being more expensive than rolling your own.
Scaling horizontally undoubtedly introduces complexity but it also comes with some upsides:
* DB backups are now (much) faster.
* Smaller backups means faster restores which reduces your RTO (Recovery Time Objective)
* If you have a well architectured application a catastrophic DB failure will now only impact a portion of your userbase instead of all of them.
There are probably more good reasons but these are the ones I could think of now.
Is high availability or easier backups why people look to horizontal scaling though? I don't think that's ever been a primary reason for any story I've read. It's a great "bonus", but I can't think that it would be a compelling reason to choose horizontal vs vertical scaling.
I think a lot of work goes into horizontal scaling which is necessary at a certain scale, but very few people actually get anywhere near that scale. It can be important to understand which things are needed at your scale and where you can simply buy some beefier hardware. I've been at places where people run a dozen sharded DB servers with each server having 16GB of RAM. Maybe that's resume-driven-development where someone wants to say they've done that.