Almost any regex containing a common English phrase of a few words, like /of course/, would give you a very high accuracy rate on a large enough dataset and a very low false positive rate. English has low entropy per bit.
That strategy would return results for any phrase not containing "of course". Which would be rather a lot of results in any document. I am envisioning a regex that plucks hashes from emails, for example. Some parts English, some parts cryptographic "noise". Pick the noise from the English
(note the inspiration was a nefarious regex scanner for finding bitcoin hashes. I have no intention of building such a thing, but the idea of a random detector regex intrigued me. Is it possible?)
My idea was more like calculate the statistics of character n-grams in english (all of which exist in random noise too), but count the most unlikely occurrences until you hit some probabilistic threshold that indicates some well thought out decision boundary.
I had to recognize English for a cryptography project in college. This was my entire strategy, which the professor advised for rejecting non-English (it turns out recognizing non-English is easier than recognizing English), and works well:
1. Parse the full text of some long book available on Project Gutenberg. Record every trigram that occurs. Frequency is irrelevant; we just want to know whether a trigram occurs or not.
2. Go through your text, counting the number of trigrams that didn't exist in the sample text. If the number of strange trigrams exceeds a threshold (for my project I used a threshold of 3, but you can tune this), reject the text as non-English.
Given the constant finite threshold I used, that does amount to a regex, but I don't recommend trying to write it out explicitly.
If you're after hashes you could just match a string of hexadecimal digits of a specific length (such as [0-9a-f]{16} ); much easier to define than English text.