"The training, validation and test data sets are generated without overlap from ...

zamfi · on Feb 12, 2023

What's missing is that they did not use the test set during the training process.

A common split is train/validate/test, but all three are used during training -- train to actually train, validate for intermediate loss, test for model comparison.

What you want is a fourth, held-out test set that isn't looked at until publish time.

This paper has two test sets, but they have different data properties, and it's not clear they were held out until publication.

whatshisface · on Feb 12, 2023

I can't believe you can publish with such an obvious flaw in study design. You don't have to have machine learning expertise to catch that, because the same idea applies to backtesting any kind of model at all.

YeGoblynQueenne · on Feb 12, 2023

And yet they do. I don't remember ever reading any machine learning paper where the authors point out that they held out a test partition that they couldn't access until some "final" step of the experiment.