I guess get better at detecting LLMs instead of accusing everything of being an LLM reply?
1. Retrieval Accuracy - https://github.com/johannschopplich/toon?tab=readme-ov-file#...
2. Performance by dataset - https://github.com/johannschopplich/toon?tab=readme-ov-file#...