Are you saying they are like compact memoizers? What Stable Diffusion can fit in...

brookst · on March 3, 2023

Certainly they retain not just information but compute capacity in a way that other expensive transformations don’t. I’m hard pressed to think of another example where compute spend now can be banked and used to reduce compute requirements later. Rainbow tables maybe? But they’re much less general purpose.

nerpderp82 · on March 3, 2023

HashLife seems like a scale free memoizer, https://en.wikipedia.org/wiki/Hashlife

How Well Can DeepMind's AI Learn Physics? https://www.youtube.com/watch?v=2Bw5f4vYL98 https://arxiv.org/abs/2002.09405 https://sites.google.com/corp/view/learning-to-simulate/home

Discovering Symbolic Models from Deep Learning (Physics) https://www.youtube.com/watch?v=HKJB0Bjo6tQ

Scientific Machine Learning: Physics-Informed Neural Networks with Craig Gin https://www.youtube.com/watch?v=RTPo6KgpvBA

Steve Brunton's channel is even more mind blowing than Two Minute Papers, https://www.youtube.com/@Eigensteve

Not only can we bank computation, speed up physical simulations by 100x but I also saw some work on being able to design outcomes in GoL (game of life).

There was a paper on using a NN to build or predict arbitrary patters in GoL, but I can't find it right now.

icrbow · on March 4, 2023

I don't know abut NN prediction, but apparently you can bootstrap anything* with strategically placed 15 gliders.

https://btm.qva.mybluehost.me/building-arbitrary-life-patter...

version_five · on March 3, 2023

It would be interesting to see an analysis of this. I see your point - otoh is there a reason to believe that more computation is being "banked" than say matrix inversion, or other optimizations that aren't gradient descent based?

The large datasets involved let us usefully (for some value of useful) bank lots of compute, but it's not obvious to me that it's done particularly efficiently compared to other things you might precompute.

For converged model training, training is often quite inefficient because the weight updates decay to zero and most epochs are having a very small individual effect. I think for e.g. stable diffusion, they dont train to anywhere near convergence so weight updates have a bigger average effect. Not sure if that applies to llms

SSLy · on March 3, 2023

When SD1.4 dropped, someone here described how those models are a form of lossy compression.

int_19h · on March 4, 2023

Back when wavelet compression was still being developed, there was a joke that the best compression algorithm is "give an image to a grad student and tell them to figure out the best transform".

nerpderp82 · on March 4, 2023

That was a specific fitting/optimization step and not the whole algorithm.

Fractal Image Compression, https://en.wikipedia.org/wiki/Michael_Barnsley https://www.abebooks.com/servlet/BookDetailsPL?bi=3131987970...