At a high level, I am very skeptical. Usually when your model does not agree wit...

At a high level, I am very skeptical. Usually when your model does not agree with observations, the problem is with the model. A few things that stand out to me. I have not thought deeply about any of them, so please correct me if I am mistaken about any of these.

> This primary data set contained 252 species from five vertebrate classes [...] We removed humans (Homo sapiens) from the data set as they were listed with a maximum lifespan of 120 years, which does not reflect the variability and the true global average lifespan (60.9–86.3 years)

So why should we trust the rest of this dataset? Garbage in garbage (GIGO) out comes to mind.

> We used promoter sequences centred around the transcription start site (TSS) (-499 to 100 bp of each promoter) in Humans (Homo sapiens) from the EPD as the data set of promoter sequences. [...] Briefly, as described previously, using Basic Local Alignment Search Tool (BLAST) v2.2.31 the promoter sequences were mapped to the single top hit in each species.

This would seem to imply a weird correlation structure between data examples that could pose problems for training/test split and/or linear models. I would also liked to see some QC where they show how well this recovered known (i.e. annotated) promoter regions. Are they picking up false positives? Are they missing stuff?

> The glmnet function was set to a 10-fold cross validation which returns the best performing model. [...] This resulted in a total of 42 promoters for estimate lifespan.

So they're doing post-selection inference, so p-values are suspect. Tibshirani (inventor of Lasso) and Taylor recently released a package for post-selection inference, which I do not see them using here.

> Species were randomly assigned to either a training (176 samples) or testing (76 samples) data set (70/30 split).

Rule of thumb: you usually want about number of example = 10x number of features to avoid overfitting. 42 features seems kind of thin. Even worse when you consider that there might be a correlation between training and test examples imposed by the initial selection of promoter sites using BLAST.