Perf Is Not Enough

jodrellblank · 2024-03-11T12:06:04.000000Z

> "A few years later, after numerous customer complaints, we realized that bugs in our JDBC driver were killing performance." [...] "We had been spending many engineer years making the queries fast, shaving off fractions of a second here and there from query times. But the connectors that most of our users were using added far more latency than we had saved. What’s more, we were completely blind to that fact. No one at Google actually used the JDBC drivers" [...] "The query time that users were seeing was invisible to us, and we considered it someone else’s problem."

This is frustrating to read: "completely blind" to years of customer complaints, and they don't eat their own dogfood.

saghm · 2024-03-11T13:35:21.000000Z

It doesn't seem like they really took away the correct lesson either. It's not that "perf is not enough", it's that the performance of the real-world use matters more than some individual component that doesn't actually represent how customers use something. The problem isn't that they spent too much effort optimizing it, but that they didn't start from the customer pain and then trace it to the root of the problem (which _was_ in fact performance issues)

nightpool · 2024-03-11T16:38:37.000000Z

I think you took away the wrong lesson, personally. The point is that no single metric or even group of metrics is going to be able to cover the entire reality of what's important about a database, even when what's important is performance. Just focusing on metrics is not enough. The post gives other examples about times when e.g. clean, maintainable code matters more (DuckDB) or how there are fundamental architectural limitations that make "worse performing" databases better for what you need (e.g. Clickhouse not being good at joins, some DB's friendlier SQL and CSV parsing all come up) but the message that unites all of its examples is "benchmark performance, even on very well-designed benchmarks, are often misleading to the actual reality of using a database".

saghm · 2024-03-11T22:00:19.000000Z

It's interesting to me that you think that my comment is at odds with what you're saying, because even re-reading it I'm not sure where you got anything about "metrics" from what I said. Yes, I said that the issue did happen to be performance, but only because that's what happened to be where the customer's pain was. My point was that the idea of a feature being "enough" or "not enough" doesn't really make sense outside of the context of how people are actually experiencing the product.

What's weird to me about the claim "performance is not enough" (and your longer-winded quote at the end) is the idea that the idea of trying to look at what customers are actually doing and finding what their pain points are is somehow best stated by decrying performance as a metric rather than decrying _any_ single factor as "enough". If I go to a restaurant, order spaghetti, and then get sauce spilled on my lap by the server, the takeaway isn't "having high quality sauce is not enough", it's "spilling things on customers makes for a bad restaurant experience".

taeric · 2024-03-11T16:54:16.000000Z

I'm with the other poster on this. I think there are good analogies to stuff like goodhart's law. At times you do need to reground why you are measuring something. That said, most of these lessons were about making the measurement to improve be more holistic.

This is a part of an argument people will often use to talk about why end to end testing is superior to unit testing. It isn't that you don't want the units tested. Nor is it that end to end is easier. Rather, it is harder to focus on an end to end test for the sake of the end to end test. (No panacea, mind.)

thatfrenchguy · 2024-03-11T21:29:14.000000Z

> Just focusing on metrics is not enough

This is not about focusing on the metrics itself, it's about focusing on what matters to your customers: the metric is a tool, not a end goal.

It's obvious in the airplane example OP has, what matters is end to end time (and why HSR are more successful than you would think they would be at the travel game btw).

JonChesterfield · 2024-03-11T13:19:14.000000Z

The JDBC substory is fantastic.

Google built a database, works great internally, all well.

They subcontract out to build an adapter layer for the wider world which doesn't work properly, so the wider world gets to use a crappy database. A carefully engineered core Google uses, wrapped in broken nonsense, creating an emergent whole that is unnecessarily rubbish. Noone internally notices they've done this and the external people are poorly placed to work it out.

That seems absolutely on point for Google's open source strategy.

jimberlage · 2024-03-11T19:19:06.000000Z

Yeah, that was the most interesting point for me as well. If you view it through a management lens, it makes sense; "we hire top CS people, let them do the top CS problems, JDBC drivers aren't a core competency so we can outsource that."

The problem is that people can do a bad enough job at the non-core stuff that it doesn't matter that you're good at your core competency. Outsourcing is not a free ride.

boxed · 2024-03-11T13:51:39.000000Z

The story of all their Python wrappers for their APIs.

datadeft · 2024-03-11T16:16:47.000000Z

This is due to the lack of vertical integration. Apple is winning in many ways because they got really good vertical integration.

nightpool · 2024-03-11T16:32:14.000000Z

Also perfectly describes their business contracts for e.g. Workspace. A great core product wrapped in a worse-than-useless "support" contract through some of the worst consulting firms in the world skimming 15% or whatever off the top.

boxed · 2024-03-11T10:48:52.000000Z

It's weird that the blog argues that "performance is subjective" and that it's not enough to just measure performance, but at the same time the only example given is where performance DOES matter and is objective. Just that they measured the wrong thing.

rootlocus · 2024-03-11T11:02:20.000000Z

It's surprising Amdahl's Law isn't even mentioned in this article, even though it starts with a perfect example of it in the first paragraph.

xt00 · 2024-03-11T04:37:16.000000Z

This sounds like a company organizational problem — if the ultimate goal is to get people to use your cloud and provide value, then why would you have metrics that diverge from what the customer sees as the important things? There should be somebody at Google actively talking to customers about what problems they have and communicating that to the engineers so they know what to improve. The organization should be structured so the engineers get the metrics they need or make it part of their job description to make the metrics.

datadrivenangel · 2024-03-11T16:07:59.000000Z

"I've noticed when the anecdotes and the metrics disagree, the anecdotes are usually right" -Jeff Bezos, who unfortunately made a few good points.

yellow_lead · 2024-03-11T10:32:49.000000Z

> There should be somebody at Google actively talking to customers about what problems they have and communicating that to the engineers so they know what to improve.

It seems like Google is a bit allergic to this.

teaearlgraycold · 2024-03-11T17:35:59.000000Z

It’s like many other institutional problems. People are afraid of suddenly doing the right thing because of the huge backlog they’ll be forced to address.

nightpool · 2024-03-11T05:18:22.000000Z

https://en.wikipedia.org/wiki/Seeing_Like_a_State

chrisandchris · 2024-03-11T05:38:37.000000Z

From what I read here on HN, this book describes Google.

Jensson · 2024-03-11T08:43:38.000000Z

Every organization is inflexible, you need a swarm of organization to be flexible. That is the main difference between the state and the private sector, the private sector is a herd.

BurningFrog · 2024-03-11T14:53:38.000000Z

in this metaphor, every organization is a state.

Jensson · 2024-03-11T15:56:59.000000Z

You mean every state is an organization so they share many traits that all organizations have? I don't see why you would go the other way.

BurningFrog · 2024-03-12T17:13:42.000000Z

I phrased that badly.

I was trying to say that any monopoly organization will have similar dysfunctions that a state has.

My thought is that states aren't dysfunctional primarily because they're states, but because they're monopolies.

blowski · 2024-03-11T09:27:16.000000Z

If our solutions don't solve our customers' problems, then either our customers need different problems, or we need different customers.

sanderjd · 2024-03-11T12:17:10.000000Z

Totally agreed, this seems like a problem of picking the wrong metrics to target. But it goes beyond the engineering team picking too narrow a subset of latency to measure. What metrics were the product and organization leadership team targeting that led them to miss this customer feedback?

jrflowers · 2024-03-11T03:10:04.000000Z

>It takes about 4.5 hours for me to go door to door from my house in Seattle to our office in San Francisco

Not enough founders are moving at one hundred and seventy nine miles per hour anymore but I guess that’s what happens when the fed jacks up interest rates

yellow_lead · 2024-03-11T10:31:11.000000Z

I thought he was driving too when I read this, but this must be a flight + trip to airport + security?

MertsA · 2024-03-11T12:32:35.000000Z

I'm literally going from a house in Seattle to an office in SF right now. I left 48 minutes ago, I'll update when I arrive and I'll personally add a data point here lol.

MertsA · 2024-03-11T16:21:19.000000Z

Made it door to door in 4h20m. Clearly the original person was taking the scenic route.

jrflowers · 2024-03-11T19:13:24.000000Z

Imagine the productivity gains if someone implemented a SEA - SFO Concorde flight

remram · 2024-03-12T16:28:54.000000Z

Or trains? With current (40-year-old) high speed train technology, this could be under 4 hours.

yakkomajuri · 2024-03-11T12:07:14.000000Z

Definitely some good points, but a few comments:

- I don't think performance is secondary as put here, but perhaps closer to binary. As in: does it perform well enough? Check. Now let's move onto judging everything else. In this sense, it's not secondary because before you can work on anything else you need to work on performance. Else you're not even at the table. Once you're at the table it's a different story though. The author points this out themselves: "I should mention that DuckDB is fast". If it wasn't, then you probably would be competing on performance, at least until you got that "performant" tick out of the way.

- "The one [database engine] who is moving most quickly will be the one that wins in the end": Perhaps a valid-ish point, but certainly not practical. Acceleration isn't constant. Your progress will be much faster as a new player in the space, but if you manage to get to where e.g. Snowflake is, your progress will certainly slow down. So as someone choosing a system today, I can't just judge by today's acceleration and extrapolate that it will continue into the future.

So overall some good points in there, but the conclusions feel a bit off for me. Although this: "how quickly you can go from idea to answer, not query to result" is a point I'd love to see explored. Feels interesting enough to warrant its own investigation.

Nevermark · 2024-03-11T03:37:21.000000Z

Performance is “relative””, not “subjective”. Its meaning is related to the task at hand.

Unless we are talking about user interfaces designed to give the user a feeling of greater speed. With fast moving status indicators that communicate great efforts and the user feels like the response was fast for all the work. But that is an interface thing, not a database thing.

advisedwang · 2024-03-11T18:07:17.000000Z

Subjective is the correct word. What task is relevant depends on the subject.

Relative would be if there's no way to put a number on performance other than a comparison between systems, which is simply not true.

QuadmasterXLII · 2024-03-11T10:09:21.000000Z

My first popular web app kept all state in a python dict that was dumped to disk every few minutes. Fastest api you’ve ever seen. When we moved to mongo performance never recovered. And yet, when I make a website today I do not reach for “pickledb”

JonChesterfield · 2024-03-11T13:22:00.000000Z

sqlite as a replacement for fopen is the middle ground

klabb3 · 2024-03-11T14:19:53.000000Z

I think SQLite is a probably a better approach than manually snapshotting. How often to snapshot? You will get significantly less data loss and architecturally very similar operational simplicity. You can also handle disk sized data sets vs memory size.

0cf8612b2e1e · 2024-03-11T19:57:22.000000Z

Now I all wondering, can you use the LiteStream online backup tool against an in memory database? Or does that machinery only work against a materialized database?

klabb3 · 2024-03-11T21:08:02.000000Z

Good Q. I don’t think so? But you could tune SQLite to not wait for disk writes and have a larger memory use, which would make it extremely fast for memory-sized data sets. If you’re not intend on using disk at all, then not too much point having backups?

sanderjd · 2024-03-11T12:22:33.000000Z

Honestly, I think more people should consider at least starting with an in-memory-with-snapshotting architecture than a transactional database architecture. Less so for request/response style user interaction, but this is, IMO, less common than it should be for incremental (/batch) processing of large static or replayable streaming data.

flik · 2024-03-11T02:43:11.000000Z

>> "The caveat to this rule, of course, is that architectural differences are hard to overcome. Shared nothing databases are at a disadvantage vs shared disk, and it took Redshift many years to switch to a primarily shared disk architecture. Lakehouses that rely on persisting metadata to an object store will have a hard time with rapid updates; this is built into the model."

Looking for good literature on this topic

__mharrison__ · 2024-03-11T06:12:32.000000Z

Great post. I think this is one of the reasons that pandas had shown in the past decade. Performance on a single machine was good enough and it could ingest 99% of CSVs known to humanity.

causality0 · 2024-03-11T03:31:30.000000Z

Were I the author of this article, I could not have resisted titling it "Perf is Not Enerf"

Qwertious · 2024-03-11T04:22:01.000000Z

With the right accent the title already rhymes.

pstuart · 2024-03-10T23:23:21.000000Z

I'm using duckdb for a side project and it keeps getting more powerful. Ironically it was envisioned to use SQLite for the project (we're not doing OLAP processing) but duckdb is faster and more feature complete.

LunaSea · 2024-03-10T23:31:18.000000Z

My issue with DuckDB is:

- unstable execution (random crashes)

- out-of-memory errors where I would've hoped for DuckDB to gracefully take the slow route to completion if no more memory is available (tried all the different conf settings)

jtigani · 2024-03-11T01:05:12.000000Z

The article mentioned that DuckDB keeps improving very quickly. The next couple of months of DuckDB are all about stabilization, with no new features getting added. Once it is robust enough it will be declared "1.0". My guess is that will be in late April.

You mentioned OOMs, this has been a focus for a while and ha gotten steadily better over the past few releases. 0.9 added spill to disk to prevent most OOMs. And 0.10, released a couple of weeks ago, fixes a bunch more memory usage problems. The storage format, which another commenter brought up, is now fully backwards compatible.

I'd suggest giving it another try, especially once 1.0 comes out.

LunaSea · 2024-03-11T13:49:30.000000Z

It might be getting better, but the examples are currently so egregious that it's tough to keep giving DuckDB a chance.

Example of a query that should never, ever, out-of-memory, but absolutely will in the latest DuckDB:

  COPY
    (
      SELECT
        rs.my_int,
        rs.my_bigint
      FROM
        READ_PARQUET('s3://some/folder/my-large-files-*.parquet')
        AS rs
    )
  TO
    '/my/home/folder/my-large-file.parquet'
    (
      FORMAT PARQUET,
      ROW_GROUP_SIZE 100000,
      COMPRESSION 'ZSTD'
    )
  ;

This query should simply read the two column series selected based on the parquet metadata and then stream the data to the disk.

And yet it will try to load data in memory before crashing.

cmdlineluser · 2024-03-11T16:13:11.000000Z

Does it fail on nightly?

There were some recent fixes: https://github.com/duckdb/duckdb/issues/10737

cmollis · 2024-03-11T12:13:05.000000Z

I've been testing duckdb's ability to scan multi-tb parquet datasets in S3. I have to say that i've been pretty impressed with it. I've done some pretty hairy SQL (window functions, multi-table joins, etc).. stuff that takes less time in Athena, but not by that much. Coupled with its ability to pull and join that data with information in RDB's like mysql make it a really compelling tool. Strangely, the least performant operations were the mysql look ups (had to set SET GLOBAL mysql_experimental_filter_pushdown=true;). Anyway.. definitely worth another look.. i'm using v 9.2

swasheck · 2024-03-11T00:14:10.000000Z

- each version breaks previous format and renders it unusable

1egg0myegg0 · 2024-03-11T00:45:31.000000Z

We heard your feedback! Backward compatibility was just implemented! Version 0.9 is actually fully readable by 0.10. With version 1.0 coming in a few months, this will be readable for several years' worth of version updates.

eyegor · 2024-03-11T06:18:35.000000Z

> readable for several years

Why not just make shims to migrate dbs for future compatibility? So you could read db 1.0 in v2.0 but only insofar as to migrate it to v2. The implication that you don't want to promise backwards read compatibility feels antithetical to a db driver.

For example, if I have an ancient mssql db that was started in 2001, I'm confident that I can grab the latest mssql driver and still use it. I don't have to track down mssql 2007 to migrate incrementally. Not sure about postgres or mysql but I assume it's the same there. Sqlite is definitely backwards read compatible.

plugin-baby · 2024-03-11T06:29:43.000000Z

Postgres:

> Major versions usually change the internal format of system tables and data files. These changes are often complex, so we do not maintain backward compatibility of all stored data.

https://www.postgresql.org/support/versioning/

lmz · 2024-03-11T09:55:02.000000Z

You're confusing a network protocol client (the MSSQL "driver") with an on-disk format. You can't upgrade the MSSQL server from 2001 to current in-place: https://learn.microsoft.com/en-us/sql/database-engine/instal...

Tarq0n · 2024-03-11T08:31:55.000000Z

That seems entirely fair for pre-1.0 software.

yencabulator · 2024-03-11T16:35:46.000000Z

Every time I've tried to use DuckDB I've made it segfault, so I'm simply using Datafusion instead, Rust saves the day there.

Three separate occasions with different uses all leading to crashes in the first hour of using DuckDB is enough that I frankly see no point in trying it again; I don't expect it to ever magically become reliable.

vietvu · 2024-03-11T06:41:06.000000Z

I haven't used duckdb since I got OOM on my dataset too. I think I will try again on 1.0.

hintymad · 2024-03-11T16:10:07.000000Z

The same lesson 30 years ago. WordPerfect insisted that their engineers write code in assembly to achieve maximum performance. Borland did not want to bet on Win32 because they wanted to have the maximum performance under DOS. Lotus-1-2-3 thought that it was ridiculous for Excel to use so much memory and had higher latency on Windows. In the meantime, Microsoft charged ahead with a focus on features and user experiences. The rest is history.

jongjong · 2024-03-11T07:40:01.000000Z

It's unwise to obsess about raw performance because of various reasons.

For example, there might be a library which performs 2x faster than its closest alternative... Yet it might not necessarily be the best option for your project. There are other factors to consider like compatibility. E.g. If the library breaks every time you upgrade your Node.js engine version, it may add some risk to your project and require additional maintenance. Also, maybe the library itself is 2x faster, but once you've added all your application logic on top of it, your product might only end up being 5% faster than if you had used the alternative because, as is often the case, most of the workload is in the business layer.

Also, often, performance is at odds with scalability. You often need to make some performance sacrifices to achieve scalability. For example load balancing with consistent hashing adds overhead on a per-machine basis but you can't scale to multiple machines without them.

neonsunset · 2024-03-11T10:41:49.000000Z

You wouldn't need to worry about library breaking OR performance if you replace node.js with asp.net core :D

Sometimes choosing worse performance is not a tradeoff in favor of ease of use, sometimes the technology just has strict performance ceiling or critical deficiency that cannot be addressed.

chme · 2024-03-11T08:14:35.000000Z

I am a bit disappointed. I though this was about the Linux perf tool and would make a point that better tools are required while providing some suggestions.

All I got was a piece that more or less states that generic benchmarks are not as useful as one might expect and that other stuff is important too. Which TBH is not really that surprising...

dan-robertson · 2024-03-11T08:36:05.000000Z

Perf is very powerful but indeed can be hard to get good information out of. Try pprof and read a bunch of Brendan Gregg’s articles about perf (and bpftrace).

didgetmaster · 2024-03-11T20:41:54.000000Z

I built my own database system and wanted to benchmark its performance against other popular databases (Postgres, Sqlite, MySQL, SQL Server).

I didn't just measure the time between the API call and the results returning. I measured from the time the user pressed the 'go button' and saw the results displayed on the screen.

I wasn't satisfied until mine was faster in both cases across a wide range of queries.

Galanwe · 2024-03-11T10:03:03.000000Z

The same applies to the compression space. You look at LZ4 benchmarks decompressing at 4GB/s, only to realize that in the real world, you're not decompressing a _single_ hot block in a loop, but rather a whole file consisting of thousands of small blocks. Fetching, iterating and processing these blocks shave half the decompression speed and you end up with 1.5GB/s.

PeterZaitsev · 2024-03-11T16:42:08.000000Z

There is quite a difference between Performance databae is able to provide in hands of the expert and what inexperience user or developer is able to achieve.

I think this is where we have still a lot of unrealised opportunity

HackerThemAll · 2024-03-11T00:01:37.000000Z

Yeah, lots of bs in this text. The comparison of Synapse Analytics to BigQuery is totally flawed. Not only the cost presented in the chart is taken out of someone's back, the scale of those solutions do not match. Try doing multiple petabyte scale analysis on Synapse and then compare cost and performance to BigQuery. Maybe Databricks could match BQ in this use case.

dhoe · 2024-03-11T04:35:28.000000Z

Do I understand you correctly that you're saying that the author of this post, one of the founding engineers of BigQuery, is being unfair to BigQuery?

quadrature · 2024-03-11T10:52:15.000000Z

I believe they’re saying the opposite.

bushbaba · 2024-03-11T03:45:33.000000Z

99% of the BQ use cases I’ve seen is reporting on a dataset < 100GiB. 99% of all big data is really just enterprise reporting and having 1 tool that can work for nearly all reporting needs. Querying a petabyte just isn’t needed these days for nearly all sql needs. And for the 1% of the time it is, often teams prefer spark.

jitl · 2024-03-11T00:39:36.000000Z

I think you're missing the point of the article. The article is arguing that these kinds of performance benchmarks/graphs as a whole are not the most important thing when it comes to database success. I think the points made here would stand if you shuffled all the names around in the images, or made up new names and data. There's even a graph later on that just compares "our system" (very fast) to "their system" (10x slower).

I'm working on a database system right now, and my favorite paragraph was this one:

> Performance must be measured from the user’s perspective, not the database’s. It is a UX problem and, like any UX problem, can’t really be described in a single number. This is surprising to many people, since they think performance, like car racing, is an objective thing. Just because you can say that a Lamborghini is faster than a Prius, they believe you should also be able to say that My database is faster than Your database. But just like a Lamborghini might not get me to work any faster than a Prius (or a bicycle, if there is traffic), the actual workload for a database is going to determine which one is faster.

HackerThemAll · 2024-03-11T00:57:11.000000Z

I also work in databases. The thing is the database engine isn't usually slow by itself. It's the programmers whose programs usually are just as lame as they are. For example during one of my "help us with the slow database server" consultancy assignments I discovered an incident of a query executed collectively over a 1000 times per second on average, adding considerable load to an already overloaded RDBMS, querying a single-row table where the company data (name and address) was stored, which has not changed for over 10 years. No caching in the app whatsoever. The clients _had to_ ask for this info afresh every so often. Eliminating this and only a few similar patterns reduced overall CPU consumption by some 50% and sped up the damn system a lot. I could write a book about this kind of bs I see every day.

So, comparing BigQuery to some small data warehouse product, while BigQuery could not even get off the ground yet, that I call bs and put in the same drawer as the above. If the intention was to say "product x" vs "product y", this is what should have been done. But the author deliberately put the proper product names in a lame and dishonest perspective, which makes the whole thing not credible.

eyelidlessness · 2024-03-11T04:41:16.000000Z

You seem very hung up on a particular graph, and don’t seem to have much to say about the text which follows it and has a substantially different takeaway than the graph alone seems to have impressed on you. Maybe give the article another read with a little bit more of an open mind.

ricardobeat · 2024-03-11T07:44:39.000000Z

It seems clear that you haven’t actually read the rest of the piece (or are purposefully ignoring it). That graph is used as an a example of why benchmarks don’t matter.

porker · 2024-03-11T07:47:14.000000Z

Write that book.

rob74 · 2024-03-11T10:17:05.000000Z

Perf... ection? Perf... oration? Perf... ume? Ah, they mean performance!

(Ok, I guess I should be glad they don't use "p9e" instead)

ahartmetz · 2024-03-11T10:29:41.000000Z

There is also the measurement tool called "Perf" for Linux. I thought it was going to be about performance tooling.