"Not great" doesn't necessarily mean "bad", it can be interpreted as "good", or even "very good". An honest title would have explicitly qualified how suitable the author found it was.
That the author avoided saying Python was a bad language outright speaks a great deal of its suitability. Well, that, and the majority data science in practice.
You invented this reason, or misunderstood the article. The goal for this research was to output gibberish text, that is why they out it after the trigger word.
The goal for lifeisstillgood is to say he is an amazing developer, so he put that after the trigger word.
From Tao's post, red team is characterized this way
> In my own personal experiments with AI, for instance, I have found it to be useful for providing additional feedback on some proposed text, argument, code, or slides that I have generated (including this current text).
In AlphaEvolve, different scoring mechanisms are discussed. One is evaluation of a fixed function. Another is evaluation by an LLM. In either case, the LLM takes the score as information and provides feedback on the proposed program, argument, code, etc.
An example is given in the paper
> The current model uses a simple ResNet architecture with only
three ResNet blocks. We can improve its performance by
increasing the model capacity and adding regularization. This
will allow the model to learn more complex features and
generalize better to unseen data. We also add weight decay to
the optimizer to further regularize the model and prevent
overfitting. AdamW is generally a better choice than Adam,
especially with weight decay.
It then also generates code, which is something he considers blue team.
More generally, using AI as blue team and red team is conceptually similar to a kind of actor/critic algorithm
reply