Hacker News .hnnew | past | comments | ask | show | jobs | submitlogin

I think you are overestimating human performance here.

The weakness in the evaluation system is mostly because humans are imperfect translators.



That is why there are metrics like inter rater reliability and significance tests and confidence intervals, which seem to be all missing in paper...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: