HN2new | past | comments | ask | show | jobs | submitlogin

I realize it's all just embeddings and probability blah blah blah... But this kind of meta prompting is really interesting to me. Can you ask a model about its weights?


If a model hasn't been explicitly told (via some system prompt or something) about its weights, it won't know them. It would be akin to asking you how many neurons you had. How would you know?


I don't know, but the fact that the model can suggest the most relevant sentence is intriguing to me. I don't know. I realize it's just looking at the probability. Would it be possible to sort of craft adversarial inputs to learn the model's weights? It seems like it should be, and in some sense you're then getting it to output the weights, but you'd need to know the models structure almost certainly to do that.


It doesn’t have access to its own probabilities in this regard. Instead the output is encouraged to be a ranking of preferences of the dataset modeled. It outputs the preferences of the average human writer from its dataset (incorporating any custom changes leftover from instruction fine tuning).


This is what confuses me though, people don't write things like: What is the most relevant sentence in this book?

I have a vague understanding of the mechanisms here, but I just don't think I get how it goes from "the most relevant sentence" to an attention vector that "points to" the right place, I would have thought this was beyond what they could do by just completing training data.

I also realize that the model has no ability to "introspect" itself, but I don't know what's stopping it from doing a train of thought output to get to it in some way.

Do you think you could get it to reveal the attention vector at some point in time, by e.g., repeatedly asking it for the Nth most relevant word, say, and working backwards?


> This is what confuses me though, people don't write things like: What is the most relevant sentence in this book?

I think it's because this is confusing even researchers. The current explanation for why these models are robust (and accurate even) to data not found in its dataset is that various regularizations are applied to the data during training. There is a 10% random token dropout. The optimizers also apply regularization of sorts via weight decay and othere math tricks I'm not privy to. This consistent regularization means the model will try to overfit but randomly fail. Since the token is occasionally missing, the model instead learns a robust strategy of sorts to handle tropes/cliches/common patterns.

Basically, the idea is that since hte model has seen enough "most relevant sentence..." examples, it actually does indeed begin to grok/model-internally the sort of intent and meaning of those words across a variety of contexts which it has also learned (but it's never seen the combination as in e.g. "relevant sentence in this book"). Modeling this internally may be a waste of parameter space at first, but quickly becomes the most efficient way of storing the information - rather than memorizing every instance used in the dataset, you just call the subset of weights which "understand" how those words are intended to be used.

Since this happens recursively as the generated output gets longer (feeding back into itself), there other such strategies that have been developed are also called upon and the whole thing becomes difficult or impossible to interpret in a meaningful way.

I'm not sure of a whole lot of proof of this, but I see these ideas thrown around a lot. This is also found a lot in biology where cells and multicellular life will experience lots of damage to structure, even down to the DNA, throughout a lifespan or series of lifespans. To account for this, instead of memorizing exactly how to walk with n-numbers-of-limbs dependant on how many you happen to lose; life may instead develop a system which can learn on-the-fly how to walk (or in humans' case, wheelchair) around.

As for your last point about the attention vector - I don't know if it could accurately print its own attention vector. But I do think that it could use those values as a sort of temporary solution for "ranking" perhaps. I don't htink that's what happens in the natural language case of "ranking subjectively the `best` sentence in the article" and still think that is mostly the case of modeling language well and in _many_ domains and modes.


That's the perfect intelligence test, as Ilya said: ask it about something it has not been trained, but might be able to infer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: