The author of this post should benchmark his own blog for accessibility metrics, text contrast is dreadful..
On the other hand, this would be interesting for measuring agents in coding tasks, but there's quite a lot of context to provide here, both input and output would be massive.
On the other hand, this would be interesting for measuring agents in coding tasks, but there's quite a lot of context to provide here, both input and output would be massive.