"The training, validation and test data sets are generated without overlap from periods in sequence. Successive periods of 400, 12, 40, 40 and 12 h are used to sample, respectively, training, validation, and test data, with the two 12 h periods inserted as hiatus."
What's missing is that they did not use the test set during the training process.
A common split is train/validate/test, but all three are used during training -- train to actually train, validate for intermediate loss, test for model comparison.
What you want is a fourth, held-out test set that isn't looked at until publish time.
This paper has two test sets, but they have different data properties, and it's not clear they were held out until publication.
I can't believe you can publish with such an obvious flaw in study design. You don't have to have machine learning expertise to catch that, because the same idea applies to backtesting any kind of model at all.
And yet they do. I don't remember ever reading any machine learning paper where the authors point out that they held out a test partition that they couldn't access until some "final" step of the experiment.