More

jamesblonde · 2024-11-03T15:21:54 1730647314

You're basically describing the Lakehouse Tables architecture. Store your data as tabular data in Iceberg/Hudi/Delta on S3. Save a bucket on storage. Query with whatever engine you like (Snowflake, Redshift, BQ, DuckDB, etc).

aoeusnth1 · 2024-11-03T15:59:23 1730649563

Yes, this is the vast majority of my data work at Google as well. Spanner + Files on disk (Placer) + distributed query engine (F1) which can read anything and everything (even google sheets) and join it all.

It’s amazingly productive and incredibly cheap to operate.

jamesblonde · 2024-10-29T07:50:48 1730188248

You don't back up your claim with any evidence: "Second, the Gulf Stream is a western boundary current driven by physical forces like gyre circulation that are due to the rotation of the Earth and midlatitude coriolis effects, it won't be shut down under warming."

The experts are not saying what you are claiming here. You provide links which are not directly related to AMOC collapse. I call cherry picking.

photochemsyn · 2024-10-29T16:02:29 1730217749

It's just basic oceanographic physics, that's how midlatitude gyres work. Since you ask:

> "The trade winds create the equatorial currents that flow east to west along the equator; the North Equatorial and South Equatorial currents. If there were no continents, these surface currents would travel all the way around the Earth, parallel to the equator. However, the presence of the continents prevents this unimpeded flow. When these equatorial currents reach the continents, they are diverted and deflected away from the equator by the Coriolis Effect; deflection to the right in the Northern Hemisphere and to the left in the Southern Hemisphere. These currents then become western boundary currents; currents that run along the western side of the ocean basin (i.e. the east coasts of the continents). Since these currents come from the equator, they are warm water currents, bringing warm water to the higher latitudes and distributing heat throughout the ocean."

https://rwu.pressbooks.pub/webboceanography/chapter/9-1-surf...

The issue is that increased poleward Gulf Stream heat transport could easily overpower the effect of Greenland ice melt outflows into the North Atlantic. This raises other problems, it could still decrease deep ocean ventilation and push the increasing oceanic hypoxia trend.

jamesblonde · 2024-10-27T07:55:46 1730015746

'We don't really consider it low probability anymore'

https://www.livescience.com/planet-earth/rivers-oceans/we-do...

jamesblonde · 2024-10-16T19:10:01 1729105801

Patrick Collison (Stripe) credits the Transition Year in Ireland for his computer interest

" Ireland actually has this interesting thing called “transition year,” this year between two major exams of high school or at least Ireland’s high school equivalent.Transition year is a formally designated year that’s optional, where you can go and pursue things that you might not otherwise naturally tend to pursue, and the school tends to be much more permissive of going and spending three months abroad or going and doing some work experience in this area or whatever the case may be. And so, in that year, I basically decided to spend as much of it as possible programming, and so I did that.”"

https://networkcapital.beehiiv.com/p/stripe-ceo-patrick-coll...

jamesblonde · 2024-10-11T12:46:12 1728650772

Similar experience here. We got burnt betting on ROCm being released in consumer GPUs a few years ago, but it never happened. I think you have to win the consumer market to get the Enterprise market, not the other way around.

dhruvdh · 2024-10-11T13:00:31 1728651631

And yet Meta is using MI300X exclusively for all live inference on Llama 405B.

Clearly there are workloads AMD wins at, and just going Nvidia by default for everything without considering AMD is suboptimal.

samspenc · 2024-10-11T19:16:42 1728674202

The difference is that Meta and the FAANG companies make hundreds of billions of dollars in annual revenue, and are capable of hiring top talent to solve this problem of their AI running well on any GPU they choose for their data center.

Consumers, open-source solutions and smaller companies unfortunately can't afford this, so they would be dependent fully on AMD and other providers to solve this implementation gap; so ironically smaller companies may prefer to use Nvidia just so they don't have to worry about odd GPU driver issues for their workloads.

ebalit · 2024-10-12T23:26:11 1728775571

But Meta is the main company behind Pytorch development. If they make it work and upstream it, this will cascade to all Pytorch users.

We don't have to imagine far, it's slowly happening. Pytorch for ROCm is getting better and better!

Then they will have to fix the split between data-center and consumer GPU for sure. From what I understand, this is on the roadmap with the convergence of both GPU lines on the UDNA architecture.

ErikBjare · 2024-10-13T09:32:03 1728811923

If Meta/FAANG can make it work for them, it's not unreasonable to assume those improvements will trickle down to consumers/smaller companies.

jamesblonde · 2024-10-11T12:33:16 1728649996

The price is good compared to Nvidia H100 or B100 at around $15k.

jamesblonde · 2024-09-29T18:49:58 1727635798

It's based on Z-Sets - a generalization of relational algebra. Many of the aggregations, projections, filters from SQL are associative and can be implemented in Z-Sets. Z-Sets supports incremental operations (adding one value to a set while computing the 'max' is just the max of the two arguments - rather than requiring recomputing the 'max' over the entire set.

nikhilsimha · 2024-09-29T19:06:06 1727636766

dumb question: how do z-sets or feldera deal with updates to values that were incorporated into the max already?

For example - max over {4, 5} is 5. Now I update the 5 to a 3, so the set becomes {4, 3} with a max of 4. This seems to imply that the z-sets would need to store ALL the values - again, in their internal state.

Also there needs to be some logic somewhere that says that the data structure for updating values in a max aggregation needs to be a heap. Is that all happening somewhere?

lsuresh · 2024-09-29T19:44:10 1727639050

We use monotonicity detection for various things. I believe (can double check) that it's used for max as well. But you're correct that in the general case, max is non-linear, so will need to maintain state.

Update from Leonid on current implementation: each group is ordered by the column on which we compute max, so it's O(1) to pick the last value from the index.

nikhilsimha · 2024-09-30T16:12:09 1727712729

So the writes are O(N) then - to keeps reads at O(1)?

ryzhyk · 2024-09-30T16:20:17 1727713217

Both reads and writes are O(1) in time complexity. Writes additionally have the log(N) amortized cost of maintaining the LSM tree.

nikhilsimha · 2024-09-30T16:50:11 1727715011

gotcha! thanks for the clarification

shuaiboi · 2024-09-29T19:29:09 1727638149

Just a guess... wouldl like to hear the answer as well.

they probably have a monotonicity detector somewhere, which can decide whether to keep all the values or discard them. If they keep them, they probably use something like a segment tree to index.

ryzhyk · 2024-09-29T22:26:35 1727648795

That's right, we perform static dataflow analysis to determine what data can get discarded. GC itself is done lazily as part of LSM tree maintenance. For MAX specifically, we don't have this optimization yet. In the general case, incrementally maintaining the MAX aggregate in the presence of insertions and deletions requires tracking the entire contents of the group, which is what we do. If the collection can be proved to be append-only, then it's sufficient to store only the current max element. This optimization is yet coming to Feldera.

lsuresh · 2024-09-29T19:45:04 1727639104

Yes, we do a lot of work with monotonicity detection. It's central to we perform automatic garbage collection based on lateness.

nextaccountic · 2024-09-30T05:54:51 1727675691

> Z-Sets - a generalization of relational algebra

Does this have a paper or some material describing the theory?

lsuresh · 2024-09-30T13:39:35 1727703575

This is our paper describing the theory underlying Feldera: https://www.vldb.org/pvldb/vol16/p1601-budiu.pdf

jamesblonde · 2024-09-18T17:35:45 1726680945

For MLOps platforms, Papermill is the one of the reasons why we don't include experiment tracking out of the box any longer in Hopsworks. You can easily see the results of training runs as notebooks - including loss curves, etc. Any models that completed get registered in the model registry along with plots, a model card, and model evaluation/validation metrics.

jamesblonde · 2024-09-18T17:32:18 1726680738

I agree completely with this. Papermill output is a notebook - that is the log file. You can double click on it, it opens in 1-2 seconds and you can see visually how far your notebook progressed and any plots you added for debugging.

jamesblonde · 2024-09-09T20:47:44 1725914864

If your competitor does the same thing as you - but cheaper or better. Then buy them or bury them or buy them, then bury them.