HN2new | past | comments | ask | show | jobs | submitlogin

For this? No, it is not considered performant.

Regular Expressions are efficient in that one line of code can save you writing hundreds of lines. But they're normally slower (even pre-compiled) than thoughtful hand written code simply due to the overhead.

Generally the simpler the objective the worse Regular Expressions are. They're better for complex operations. Plus people write regular expressions REALLY poorly, doubly so for UNICODE.

Ideally you should use the standard library for this. For example C# has Char.IsWhiteSpace() which supports tons of UNICODE whitespace and can be updated with whitespace which doesn't even exist today.



This isn't true. Regular expressions can be fast even when supporting Unicode by building finite state machines that recognize UTF-8 directly. This particular benchmark explains a bit: http://blog.burntsushi.net/ripgrep/#linux-unicode-word


What isn't true? I never said that regular expressions cannot support UNICODE fast. I said that regular expressions are slower than code due to the overhead in all scenarios.

You're responding to a point never made.


I am responding to your claim. I'm saying that not all regex implementations are created equal. Some can be just as fast as what you might write by hand.


Regular expressions can be Unicode aware, right? You should be able to use a shortcut specifier that's equivalent of calling something like IsWhitespace.


Yes. It is people's ability to write good UNICODE regular expressions that is at issue.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: