I always dismissed this category as more “markdown engineering” but this opened my eyes to some genuinely interesting things. The AI Memory space is more varied than I expected.
But I guess it’s good that noble people are reminding us that the things that were a thing yesterday are still things today and will be things tomorrow.
Not really an accurate comparison since buffer overflows and sql injection are bugs which ultimately allow user data to co-mingle with executable code. LLMs take user data and mix it with the "executable code" (if we are extremely generous in our description of a user prompt) by design.
The issue here is unavoidable because LLMs are broken by design. There is no encapsulation where you can separate instructions and data because LLMs are nothing more than next-token predictors and the input sequence MUST be a sequence. They can't build a model with one stream for instructions and another for data because the training data they stole from the internet and books is a single stream.
While I agree that LLMs have yet again surfaced the “new tech fails to separate data and control” issue that affected everything from pay phones to SQL, I disagree that there’s something different that prevents the introduction of separate planes.
That “stolen” training data, most of which itself was stolen from older works, does not include user prompts. It is data, not control.
We will see models with annotations for whether a token is part of user prompt, and other ways as well.
You’re obviously passionate about the subject but as someone who works in the field, I assure you there is no now-and-forever requirement for a single stream with no metadata about tokens. We will positively see control and data separated just like they were for phones and databases.
> You’re obviously passionate about the subject but as someone who works in the field, I assure you there is no now-and-forever requirement for a single stream with no metadata about tokens
I'm quite familiar with how LLMs work internally. If you have an example of how the isolation you are describing could work, you'll have to explain it. By what possible mechanism could "tagging" tokens allow you to isolate the influence between tokens once they are taken into the network? They're still just floating point numbers at the end of the day. To actually treat user prompt data separately from untrusted data, you will need to figure out some new kind of multiplication.
> That “stolen” training data, most of which itself was stolen from older works, does not include user prompts.
“I joined this fine company to help accelerate the destruction of society, and now instead I’m expected to help it destroy society in a _different way_ by creating puzzles for AI. Now my morale is low. Poor me. “
reply