Hacker News .hnnew | past | comments | ask | show | jobs | submit | fiddlerwoaroof's commentslogin

This might be programmer-brain, but I find sqlite is pretty nice for things people would use a spreadsheet for. It’s a little bit higher friction, but when I started designing a Improv-like terminal spreadsheet a while ago, I eventually realized I was just reinventing databases.

The Cayenne would not be safer going 35 instead of 40 "regardless of all other variables": it's statistically safer to go closer to the flow of traffic because you're then "at rest" with respect to other drivers (assuming a controlled access road without pedestrian traffic). If the speed limit is 55 and the flow of traffic is 70–80 (as is the case with the Beltway around DC, despite automated enforcement), then going 55 is more dangerous than "speeding". The issue with 100% enforcement is every law assumes certain circumstances or variables and the real world is infinitely more complex than any set of variables that can reasonably be foreseen by law (and laws that attempt to foresee as many variables as possible are more complicated and, consequently, harder for normal people to apply, which is another reason for latitude in enforcement).


safer for whom? Remember cars are not the only ones participating in traffic.


“assuming a controlled access road without pedestrian traffic”


such roads barely need speed limits. In some places they do not have them


The problem with your DC beltway example wouldn’t be automated enforcement then, but with the speed limit itself.

A road without pedestrians and intersections in it should have a speed limit that reflects the reality of its use (70-80)


Yeah, mine to which I find really annoying


Yeah I had to ask it to stop doing that as well && chaining commands that it could split. I got tired of having to manually give permissions all the time (or leaving it to churn, only to come back after a while to see it had asked for permissions very early into the task)


Normalization is possible but not practical in a lot of cases: nearly every “legacy” database I’ve seen has at least one table that just accumulates columns because that was the quickest way to ship something.

Also, normalization solves a problem that’s present in OLTP applications: OLAP/Big Data applications generally have problems that are solved by denormalization.


Yep, this comment sums it up well.

We have many large enterprises from wildly different domains use feldera and from what I can tell there is no correlation between the domain and the amount of columns. As fiddlerwoaroof says, it seems to be more a function of how mature/big the company is and how much time it had to 'accumulate things' in their data model. And there might be very good reasons to design things the way they did, it's very hard to question it without being a domain expert in their field, I wouldn't dare :).


> I can tell there is no correlation between the domain and the amount of columns.

This is unbelievable. In purely architectural terms that would require your database design to be an amorphous big ball of everything, with no discernible design or modelling involved. This is completely unrealistic. Are queries done at random?

In practical terms, your assertion is irrelevant. Look at the sparse columns. Figure out those with sparse rows. Then move half of the columns to a new table and keep the other half in the original table. Congratulations, you just cut down your column count by half, and sped up your queries.

Even better: discover how your data is being used. Look at queries and check what fields are used in each case. Odds are, that's your table right there.

Let's face it. There is absolutely no technical or architectural reason to reach this point. This problem is really not about structs.


Feldera speak from lived experience when they say 100+ column tables are common in their customer base. They speak from lived experience when they say there's no correlation in their customer base.

Feldera provides a service. They did not design these schemas. Their customers did, and probably over such long time periods that those schemas cannot be referred to as designed anymore -- they just happened.

IIUC Feldera works in OLAP primarily, where I have no trouble believing these schemas are common. At my $JOB they are, because it works well for the type of data we process. Some OLAP DBs might not even support JOINs.

Feldera folks are simply reporting on their experience, and people are saying they're... wrong?


Haha, looks like it.

I remember the first time I encountered this thing called TPC-H back when I was a student. I thought "wow surely SQL can't get more complicated than that".

Turns out I was very wrong about that. So it's all about perspective.

We wrote another blog post about this topic a while ago; I find it much more impressive because this is about the actual queries some people are running: https://www.feldera.com/blog/can-your-incremental-compute-en...


> Normalization is possible but not practical in a lot of cases: nearly every “legacy” database I’ve seen has at least one table that just accumulates columns because that was the quickest way to ship something.

Strong disagree. I'll explain.

Your argument would support the idea of adding a few columns to a table to get to a short time to market. That's ok.

Your comment does not come close to justify why you would keep the columns in. Not the slightest.

Tables with many columns create all sorts of problems and inefficiencies. Over fetching is a problem all on itself. Even the code gets brittle, where each and every single tweak risks beijg a major regression.

Creating a new table is not hard. Add a foreign key, add the columns, do a standard parallel write migration. Done. How on earth is this not practical?


I’m not justifying the design but splitting a table with several billion rows is not a trivial task, especially when ORMs and such are involved. Additionally, it’s easier to get work scheduled to ship a feature than it is to convince the relevant players to complete the swing.


> I’m not justifying the design but splitting a table with several billion rows is not a trivial task, especially when ORMs and such are involved.

I don't agree. Let me walk you through the process.

- create the new table - follow a basic parallel writes strategy -- update your database consumers to write to the new table without reading from it -- run a batch job to populate the new table with data from the old table -- update your database consumer to read from the new table while writing to both old and new tables

From this point onward, just pick a convenient moment to stop writing to the old database and call the migration done. Do post-migrarion cleanup tasks.

> Additionally, it’s easier to get work scheduled to ship a feature than it is to convince the relevant players to complete the swing.

The ease of piling up technical debt is not a justification to keep broken systems and designs. It's only ok to make a messs to deliver things because you're expected to clean after yourself afterwards.


I've done this sort of thing or worked with people doing it. The concept is simple, actually executing can take months.


There are sometimes reasons this is harder in practice, for example let’s say the business or even third parties have access to this db directly and have hundreds of separate apps/services relying on this db (also an anti-pattern of course but not uncommon), that makes changing the db significantly harder.

Mistakes made early on and not corrected can snowball and lead to this kind of mess, which is very hard to back out of.


> How on earth is this not practical?

Fine, but you still need to read in those 100+ fields. So now you gotta contend with 20+ joins just to pull in one record. Not more practical than a single SELECT in my opinion.


You don't need to join what you don't actually need. You also need to be careful writing your queries, not just the schema. The most common ones should be wrapped in views or functions to avoid the problem of everyone rolling their own later.

Performance generally isn't an issue for an arbitrary number of joins as long as your indices are set up correctly.

If you really do need a bulk read like that I think you want json columns, or to just go all in with a nosql database. Even then, the above regarding indexing is still true.


> the effort and cost to download an ad-blocker that automatically removes the prompt to accept/deny entirely is practically zero

It's only zero if you don't need to interact with sites that break when you're running an adblocker. I run an ad-blocker nearly continuously, but there are all sorts of sites where I have to disable it in order to use the actual functionality of the site (and these are frequently sites I _have_ to interact with).


Yeah, there's something of a tension between the Perlis quote "It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures" and Parse, don't validate.

The way I've thought about it, though, is that it's possible to design a program well either by encoding your important invariants in your types or in your functions (especially simple functions). In dynamically typed languages like Clojure, my experience is that there's a set of design practices that have a lot of the same effects as "Parse, Don't Validate" without statically enforced types. And, ultimately, it's a question of mindset which style you prefer.


There's probably a case for both. Core logic might benefit from hard types deep in the bowels of unchanging engine.

The real world often changes though, and more often than not the code has to adapt, regardless of how elegant are systems are designed.


Coalton ( https://coalton-lang.github.io ) is the sort of thing I like: a Haskell-style language hosted inside a very dynamic one with good interop.


Yes it's quite the blend!


Do those design practices protect you when you apply a refactor and now you don't know which call sites may be broken now?


Yes


I'm not sure why "954 partners" is surprising: log10(954) is between 2 and 3 so, if you assume Soundcloud uses at least 10 SaaS products to manage data (AWS, Snowflake, Datadog, etc. this number is definitely a low estimate). And then you assume each of those entities process the data through 10 partners of various kinds, it only takes 3 steps out to get 1,000.


Over time I evolved to Debian testing for the base system and nix for getting precise versions of tools, which worked fairly well. But, I just converted my last Debian box to nixos


I'm using Debian testing in my daily driving desktop(s) for the last, checks notes, 20 years now?

Servers and headless boxes use stable and all machines are updated regularly. Most importantly, stable to stable (i.e. 12 to 13) upgrades takes around 5 minutes incl. final reboot.

I reinstalled Debian once. I had to migrate my system to 64 bit, and there was no clear way to move from 32 to 64 bit at that time. Well, once in 20 years is not bad, if you ask me.


I've had a couple outages due to major version upgrades: the worst was the major version update that introduced systemd, but I don't think I've ever irreparably lost a box. The main reason I like nixos now is:

1) nix means I have to install a lot fewer packages globally, which prevents accidentally using the wrong version of a package in a project.

2) I like having a version controlled record of what my systems look like (and I actually like the nix language)


I prefer to isolate my development environment already in various ways (virtualenv, containers or VM depending on the project) so I don't need that parts of NixOS. My systems are already run on a well-curated set of software. Two decades allowed me to fine tune that aspect pretty well.

While I understand the gravitas of NixOS, that modus operandi just is not for me. I'm happy and fine with my traditional way.

However, as I said, I understand and respect who use NixOS. I just don't share the same perspective and ideas. Hope it never breaks on you.


I think you could mitigate some of the problems by making the drug company pay for the treatment before approval.


Genius did something like this to prove that Google was stealing lyrics from them: https://www.pcmag.com/news/genius-we-caught-google-red-hande...?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: