That's true of any hiring system though: how do you gather data on the job performance of the people who don't pass a resume screen? A coffee date? A phone screen? Missing counterfactuals everywhere.
To run a full analysis, you need to hire people randomly - both people who fail and pass your hiring process - and assess their subsequent performance. This never happens in real life.
However, you can still run an informative statistical analysis based on the variability in interview scores and performance scores. For example, the people who scored 5/5 on the interview should perform better than the ones who scored 4/5.
The really exciting point comes when we can re-run all this analysis, basing it on actual job performance, rather than interview results