HN2new | past | comments | ask | show | jobs | submit | fooker's commentslogin

> my graduating classmates refused to work at companies that did let their systems be used for war

Holy mother of bubbles. No, for several decades it was a common thing for the L3 Harris, Lockheed Martin, etc to scoop up half the geeks from most graduating classes.


This seems like the kind of metric that 3 users with 15 year old machines can skew significantly.

Has to be normalized, and outliers eliminated in some consistent manner.


I'm pretty sure I saw them present on exactly this at FOSDEM?


It’s not misleading. This is how research works.

LLMs are really good at the ‘re’ in research.


That seems somewhat reasonable.

Storage should be cheaper, complain about Apple making you pay a premium.


The use of junior devs is to slowly evolve into senior devs, managers, and founders.

We were just pretending otherwise, now it is explicit.

I bet the number of successful junior devs is going to keep going up while the number of people coasting slowly tends to zero.


Fun fact - single bit neural networks are decision trees.

In theory, this means you can 'compile' most neural networks into chains of if-else statements but it's not well understood when this sort of approach works well.


> single bit neural networks are decision trees.

I didn't exactly understood what was meant here, so I went out and read a little. There is an interesting paper called "Neural Networks are Decision Trees" [1]. Thing is, this does not imply a nice mapping of neural networks onto decision trees. The trees that correspond to the neural networks are huge. And I get the idea that the paper is stretching the concept of decision trees a bit.

Also, I still don't know exactly what you mean, so would you care to elaborate a bit? :)

[1] https://arxiv.org/pdf/2210.05189


Closest thing I found was:

Single Bit Neural Nets Did Not Work - https://fpga.mit.edu/videos/2023/team04/report.pdf

> We originally planned to make and train a neural network with single bit activations, weights, and gradients, but unfortunately the neural network did not train very well. We were left with a peculiar looking CPU that we tried adapting to mine bitcoin and run Brainfuck.


> I still don't know exactly what you mean

Straight forward quantization, just to one bit instead of 8 or 16 or 32. Training a one bit neural network from scratch is apparently an unsolved problem though.

> The trees that correspond to the neural networks are huge.

Yes, if the task is inherently 'fuzzy'. Many neural networks are effectively large decision trees in disguise and those are the ones which have potential with this kind of approach.


> Training a one bit neural network from scratch is apparently an unsolved problem though.

It was until recently, but there is a new method which trains them directly without any floating point math, using "Boolean variation" instead of Newton/Leibniz differentiation:

https://proceedings.neurips.cc/paper_files/paper/2024/hash/7...


Nice!

Unfortunately the paper seems to have been mostly overlooked. It has only a few citations. I think one practical issue is that that existing training hardware is optimized for floating point operations.

>Many neural networks are effectively large decision trees in disguise and those are the ones which have potential with this kind of approach.

I don't see how that is true. Decision trees look at one parameter at a time and potentially split to multiple branches (aka more than 2 branches are possible). Single input -> discrete multi valued output.

Neural networks do the exact opposite. A neural network neuron takes multiple inputs and calculates a weighted sum, which is then fed into an activation function. That activation function produces a scalar value where low values mean inactive and high values mean active. Multiple inputs -> continuous binary output.

Quantization doesn't change anything about this. If you have a 1 bit parameter, that parameter doesn't perform any splitting, it merely decides whether a given parameter is used in the weighted sum or not. The weighted sum would still be performed with 16 bit or 8 bit activations.

I'm honestly tired of these terrible analogies that don't explain anything.


> I'm honestly tired of these terrible analogies that don't explain anything.

Well, step one should be trying to understand something instead of complaining :)

> Single input -> discrete multi valued output.

A single node in a decision tree is single input. The decision tree as a whole is not. Suppose you have a 28x28 image, each 'pixel' being eight bits wide. Your decision tree can query 28x28x8 possible inputs as a whole.

> A neural network neuron takes multiple inputs and calculates a weighted sum, which is then fed into an activation function.

Do not confuse the 'how' with 'what'.

You can train a neural network that, for example, tells you if the 28x28 image is darker at the top or darker at the bottom or has a dark band in the middle.

Can you think of a way to do this with a decision tree with reasonable accuracy?


> Training a one bit neural network from scratch is apparently an unsolved problem though.

I don't think it's correct to call it unsolved. The established methods are much less efficient than those for "regular" neural nets but they do exist.

Also note that the usual approach when going binary is to make the units stochastic. https://en.wikipedia.org/wiki/Boltzmann_machine#Deep_Boltzma...


Interesting.

By unsolved I guess I meant: this looks like it should be easy and efficient but we don't know how to do it yet.

Usually this means we are missing some important science in the classification/complexity of problems. I don't know what it could be.


Perhaps. It's also possible that the approach simply precludes the use of the best tool for the job. Backprop is quite powerful and it just doesn't work in the face of heavy quantization.

Whereas if you're already using evolution strategies or a genetic algorithm or similar then I don't expect changing the bit width (or pretty much anything else) to make any difference to the overall training efficiency (which is presumably already abysmal outside of a few specific domains such as RL applied to a sufficiently ambiguous continuous control problem).


Do you know of any software that does this? Or any papers on the matter? It could be a fun weekend project

Made me think of https://github.com/xoreaxeaxeax/movfuscator. Would be definitely cool to see it realized even if it would be incredibly impractical (probably).

I think any quantization approach should work, not an expert on this.

"It is difficult to get a man to understand something, when his salary depends upon his not understanding it"

It is evil to block your email and hold your photos hostage over it though :)

They only blocked access to Antigravity and GeminiCLI for the offense.

Didn’t they only block Antigravity though, leaving other services available?

That didn't happen though.

I’m amazed at how many people think this happened, despite it not being true.

I would question the judgment of anyone who thought they would maintain "don't be evil" beyond IPO.

Your argument is basically : human being will always choose money over ethics.

Could be true, but a somewhat depressing worldview.


Unrelated, but want to buy a bridge?

You could recoup your investment in a year by collecting toll. Expedited financing available on good credit!


Please don’t do this here.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: