HN2new | past | comments | ask | show | jobs | submit | seanwilson's commentslogin

Maybe I'm missing something and I'm glad this idea resonates, but it feels like sometime after Java got popular and dynamic languages got a lot of mindshare, a large chunk of the collective programming community forgot why strong static type checking was invented and are now having to rediscover this.

In most strong statically typed languages, you wouldn't often pass strings and generic dictionaries around. You'd naturally gravitate towards parsing/transforming raw data into typed data structures that have guaranteed properties instead to avoid writing defensive code everywhere e.g. a Date object that would throw an exception in the constructor if the string given didn't validate as a date (Edit: Changed this from email because email validation is a can of worms as an example). So there, "parse, don't validate" is the norm and not a tip/idea that would need to gain traction.


> In most strong statically typed languages, you wouldn't often pass strings and generic dictionaries around.

In 99% of the projects I worked on my professional life, anything that is coming from an human input is manipulated as a string and most of the time, it stays like this in all of the application layers (with more or less checks in the path).

On your precise exemple, I can even say that I never saw something like an "Email object".


I've seen a mix between stringly typed apps and strongly typed apps. The strongly typed apps had an upfront cost but were much better to work with in the long run. Define types for things like names, email address, age, and the like. Convert the strings to the appropriate type on ingest, and then inside your system only use the correct types.

> On your precise exemple, I can even say that I never saw something like an "Email object".

Well that's.... absolutely horrifying. Would you mind sharing what industry/stack you work with?


> horrifying.

IMO it's worth distinguishing between different points on the spectrum of "email object", ex:

1. Here is an Email object with detailed properties or methods for accessing its individual portions, changing things to/from canonical forms (e.g. lowercase Punycode domain names), running standard (or nonstandard) comparisons, etc.

2. Here is a immutable Email object which mainly wraps an arbitrary string, so that it isn't easily mixed-up with other notable strings we have everywhere.

__________

For e-mails in particular, implementing the first is a nightmare--I know this well from recent tasks fixing bad/subjective validation rules. Even if you follow every spec with inhuman precision and cleverness, you'll get something nobody will like.

In contrast, the second provides a lot of bang for your buck. It doesn't guarantee every Email is valid, but you get much better tools for tracing flows, finding where bad values might be coming from, and for implementing future validation/comparison rules (which might be context-specific) later when you decide you need to invest in them.


> IMO it's worth distinguishing between different points on the spectrum of "email object"

If it's neither, who cares? This is an obvious nightmare for all involved


The easiest and most robust way to deal with email is to have 2 fields. string email, bool isValidated. (And you'll need some additional way to handle a time based validation code). Accept the user's string, fire off an email to it and require them to click a validation link or enter a code somewhere.

Email is weird and ultimately the only decider of a valid email is "can I send email to this address and get confirmation of receipt".

If it's a consumer website you can so some clientside validation of ".@.\\..*" to catch easy typos. That will end up rejecting a super small amount of users but they can usually deal with it. Validating against known good email domains and whatnot will just create a mess.


In the spirit of "Parse, Don't Validate", rather than encode "validation" information as a boolean to be checked at runtime, you can define `Email { raw: String }` and hide the constructor behind a "factory function" that accepts any string but returns `Option<Email>` or `Result<Email,ParseError>`.

If you need a stronger guarantee than just a "string that passes simple email regex", create another "newtype" that parses the `Email` type further into `ValidatedEmail { raw: String, validationTime: DateTime }`.

While it does add some "boilerplate-y" code no matter what kind of syntactical sugar is available in the language of your choice, this approach utilizes the type system to enforce the "pass only non-malformed & working email" rule when `ValidatedEmail` type pops up without constantly remembering to check `email.isValidated`.

This approach's benefit varies depending on programming languages and what you are trying to do. Some languages offer 0-runtime cost, like Haskell's `newtype` or Rust's `repr(transparent)`, others carry non-negligible runtime overhead. Even then, it depends on whether the overhead is acceptable or not in exchange for "correctness".


I would still usually prefer email as just a string and validation as a separate property, and they both belong to some other object. Unless you really only want to know if XYZ email exists, it's usually something more like "has it been validated that ABC user can receive email at XYZ address".

Is the user account validated? Send an email to their email string. Is it not validated? Then why are we even at a point in the code where we're considering emailing the user, except to validate the email.

You can use similar logic to what you described, but instead with something like User and ValidatedUser. I just don't think there's much benefit to doing it with specifically the email field and turning email into an object. Because in those examples you can have a User whose email property is a ParseError and you still end up having to check "is the email property result for this user type Email or type ParseError?" and it's very similar to just checking a validation bool except it's hiding what's actually going on.


> I would still usually prefer email as just a string and validation as a separate property, and they both belong to some other object. Unless you really only want to know if XYZ email exists, it's usually something more like "has it been validated that ABC user can receive email at XYZ address".

> Is the user account validated? Send an email to their email string. Is it not validated? Then why are we even at a point in the code where we're considering emailing the user, except to validate the email.

You are looking at this single type in isolation. The benefit of an email type over using a string to hold the email is not validating the actual string as an email address, it's forcing the compiler to issue an error if you ever pass a string to a function expecting an email.

Consider function `foo`, which takes an email and a username parameter.

This compiles just fine but is a logic error:

    void foo (char *email, char *username);
    ...
    char *my_email = parse_input ();
    char *my_user = parse_input ();
    foo (my_user, my_email);
Using a separate type for email means that this refuses to compile:

    void foo (email_t *email, char *username);
    ...
    email_t *my_email = parse_input ();
    char *my_user = parse_input ();
    foo (my_user, my_email); // Compiler error
I hope you can see the value in having the compiler enforce correctness. I have a blog post on this, with this exact example.

  > Because in those examples you can have a User whose email property is a ParseError and you still end up having to check "is the email property result for this user type Email or type ParseError?"
In languages with a strong type system, `User` should hold `email: Option<ValidatedEmail>`. This will reject erroneous attempts `user.email = Email::parse(raw_string);` at compile time, as `Result<Email,ParseError>` is not compatible / assignable to `Option<ValidatedEmail>`.

It's kind of a "oh I forgot to check `email.isValidated`" reminder, except now being presented as an incompatible type assignment and at compile-time. Borrowing Rust's syntax, the type error can be solved with

  user.email = Email::parse(raw_string)
      .ok()
      .and_then(|wellformed_email| {
          email_service.validate_by_send_email(wellformed_email)
      });
Which more or less gets translated as "Check email well-formedness of this raw string. If it's well-formed, try to send a test email. In case of any failure during parsing or test email, leave the `user.email` field to be empty (represented with `Option::None`)".

  > and it's very similar to just checking a validation bool except it's hiding what's actually going on.
Arguably, it's the other way around. Looking back at `email: Option<ValidatedEmail>`, it's visible at compile-time `User` demands "checking validation bool", violate this and you will get a compile-time error.

On the other hand, the usual approach of assigning raw string directly doesn't say anything at all about its contract, hiding the contract of `user.email` must be a well-formed, contactable email. Not only it's possible to assign arbitrary malformed "email" string, remembering to check `email.isValidated` is also programmer due diligence, forget once and now there's a bug.


My preferred solution would be:

You have 2 types

UnvalidatedEmail

ValidatedEmail

Then ValidatedEmail is only created in the function that does the validation: a function that takes an UnvalidatedEmail and returns a ValidatedEmail or an error object.


That can work in some situations. One thing I won't like about it in some other situations is that you now have 2 nullable fields associated with your user, or whatever that email is associated with. It's annoying or even impossible in a lot of systems to have a guaranteed validation that user.UnvalidatedEmail or user.ValidatedEmail must exist but not both.

I see. In my example they would be just types and internally a newtype string.

So an object could have a field

email: UnvalidatedEmail | ValidatedEmail

Nothing would be nullable there in that case. You could match on the type and break if not all cases are handled.


I've seen some devs prefer that route of programming and it very often results in performance problems.

An undiscussed issue with "everything is a string or dictionary" is that strings and dictionaries both consume very large amounts of memory. Particularly in a language like java.

A java object which has 2 fields in it with an int and a long will spend most of it's memory on the object header. You end up with an object that has 12 bytes of payload and 32bytes of object header (Valhala can't come soon enough). But when you talk about a HashMap in java, just the map structure itself ends up blowing way past that. The added overhead of 2 Strings for each of the fields plus a Java `Long` and `Integer` just decimates that memory requirement. It's even worse if someone decided to represent those numbers as Strings (I've seen that).

Beyond that, every single lookup is costly, you have to hash the key to lookup the value and you have to compare the key.

In a POJO, when you say "foo.bar", it's just an offset in memory that Java ends up doing. It's absurdly faster.

Please, for the love of god, if you know the structure of the data you are working with it, turn it into your language's version of a struct. Stop using dictionaries for everything.


I work with PHP, where classes are supposedly a lot slower than strings and arrays (PHP calls dictionaries "associative arrays").

Benchmark it, but from what I can find this is dated advice. It might be faster on first load but it'd surprise me if it's always faster.

Edit: looking into how PHP has evolved, 8 added a JIT in 2021. That will almost certainly make it faster to use a class rather than an associative array. Associative arrays are very hard for a JIT to look through and optimize around.


Obviously one where no-one who cared or knew better had any say.

Python has an "email object" that you should definitely use if you're going to parse email messages in any way.

https://docs.python.org/3/library/email.message.html

I imagine other languages have similar libraries. I would say static typing in scripting languages has arrived and is here to stay. It's a huge benefit for large code bases.


That's for messages. The discussion was about email _addresses_. The former logically makes sense as an object, but the latter can easily be implemented as a raw string, hence the discussion.

What's funny, is this is exactly one of the reasons I happen to like JavaScript... at its' core, the type coercion and falsy boolean rules work really well (imo) for ETL type work, where you're dealing with potentially untrusted data. How many times have you had to import a CSV with a bad record/row? It seems to happen all the time, why, because people use and manually manipulate data in spreadsheets.

In the end, it's a big part of why I tend to reach for JS/TS first (Deno) for most scripts that are even a little complex to attempt in bash.


Trying to parse email will result in bad assumptions. Better be a plain string than a bad regex.

For examples many website reject + character, which is totally valid and gmail uses that for temporary emails.

Same for adresses.


A lot of posts in this thread are conflating two separate but related topics. Statically typing a string as EmailAddress does not imply validating that the string in question is a valid email address. Both operations have their merits and downsides, but they don't need to be tied together.

Having a type wrapper of EmailAddress around a string with no business logic validation still allows me to take a string I believe to be an email address and be sure that I'm only passing it into function parameters that expect an email address. If I misorder my parameters and accidentally pass it to a parameter expecting a type wrapper of UserName, the compiler will flag it.


Recently got a bank account which allowed my custom domain during registration, but rejected it as invalid during login. The problem? Their JS client code has a bad regex rejecting TLDs longer than 4 chars (trivial for a dev to bypass, but wow.)

this is likely an ecosystem sort of thing. if your language gives you the tools to do so at no cost (memory/performance) then folks will naturally utilize those features and it will eventually become idiomatic code. kotlin value classes are exactly this and they are everywhere: https://kotlinlang.org/docs/inline-classes.html

Haxe has a really elegant solution to this in the form of Abstracts[0][1]. I wonder why this particular feature never became popular in other languages, at least to my knowledge.

0 - https://code.haxe.org/category/abstract-types/color.html

1 - https://haxe.org/manual/types-abstract.html


Clearly never worked in any statically typed language then.

Almost every project I've worked on has had some sort of email object.

Like I can't comprehend how different our programming experiences must be.

Everything is parsed into objects at the API layer, I only deal with strings when they're supposed to be strings.


Well that's terrifying

My condolences, I urge you to recover from past trauma and not let it prohibit a happy life.

At first I had a negative reaction to that comment and wanted to snap back something along the lines of "that's horrible" as well, but after thinking for a while, I decided that if I have anything to contribute to the discussion, I have to kinda sorta agree with you, and even defend you.

I mean, of course having a string, when you mean "email" or "date" is only slightly better than having a pointer, when you mean a string. And everyone's instinctive reaction to that should be that it's horrible. In practice though, not only did I often treat some complex business-objects and emails as strings, but (hold onto yourselves!) even dates as strings, and am ready to defend that as the correct choice.

Ultimately, it's about how much we are ready to assume about the data. I mean, that's what modelling is: making a set of assumptions about the real world and rejecting everything that doesn't fit our model. Making a neat little model is what every programmer wants. It's the "type-driven design" the OP praises. It's beautiful, and programmers must make beautiful models and write beautiful code, otherwise they are bad programmers.

Except, unfortunately, programming has nothing to do with beauty, it's about making some system that gets some data from here, displays it there and makes it possible for people and robots to act on the given data. Beautiful model is essentially only needed for us to contain the complexity of that system into something we can understand and keep working. The model doesn't truly need t be complete.

Moreover, as everyone with 5+ years of experience must known (I imagine), our models are never complete, it always turns out that assumptions we make are naïve it best. It turns out there was time before 1970, there are leap seconds, time zones, DST, which is up to minutes, not hours, and it doesn't necessarily happen on the same date every year (at least not in terms of Gregorian calendar, it may be bound to Ramadan, for example). There are so many details about the real world that you, brave young 14 (or 40) year old programmer don't know yet!

So, when you model data "correctly" and turn "2026-02-10 12:00" (or better yet, "10/02/2026 12:00") into a "correct" DateTime object, you are making a hell lot of assumptions, and some of them, I assure you, are wrong. Hopefully, it just so happens that it doesn't matter in your case, this is why such modelling works at all.

But what if it does? What if it's the datetime on a ticket that a third party provided to you, and you are providing it to a customer now? And you get sued if it ends up the wrong date because of some transformations that happened inside of your system? Well, it's best if it doesn't happen. Fortunately, no other computations in the system seem to rely on the fact it's a datetime right now, so you can just treat it as a string. Is it UTC? Event city timezone? Vendor HQ city timezone? I don't know! I don't care! That's what was on the ticket, and it's up to you, dear customer, to get it right.

So, ultimately, it's about where you are willing to put the boundary between your model and scary outer world, and, pragmatically, it's often better NOT to do any "type-driven design" unless you need to.


> So, when you model data "correctly" and turn "2026-02-10 12:00" (or better yet, "10/02/2026 12:00") into a "correct" DateTime object, you are making a hell lot of assumptions, and some of them, I assure you, are wrong.

I think that's the benefit of strong typing: when you find an assumption is wrong, you fix it in a single place (in this example, the DateTime object).

If your datetime values are stored as strings everywhere in your code:

a) You are going to have a bad day trying to fix a broken assumption in every place storing/using a datetime, and

b) Your wrong assumptions are still baked in, except now you don't have a single place to fix it.


First of all, you are imagining some strawman situation, where indeed that datetime is encoded-decoded all across the codebase. Don't keeping your code DRY is an entirely different problem, which doesn't need to happen with this approach any more than if you use a DateTime. I don't mean it theoretically, I mean, really, I was working with codebases, where this approach was taken, and all was fine. You still would have 1 class that works with DTs, it's just that it mostly contains functions of type str → str, and for the rest of the system it's a MySQL-format datetime (i.e. a string). And the point is that your system doesn't try to make any assumptions about that string unless really needed, so you always preserve the original string (which may be a completely invalid gibberish for all you care), and while some auxillary processes might break, you never lose or destroy the original data that you received (usually from some very important 3rd party system, that doesn't care about us, so you cannot really break on input: you must do your best to guess what that input means, and drop whatever you couldn't process yourself into some queue for human processing).

And also, second, this is more specific to this particular example, but when we say "DateTime object" we usually mean "your programming language stdlib DateTime object". Or at least "some popular library DateTime object". Not your "home-baked DateTime object". And I've yet to see a language where this object makes only correct assumptions about real-life datetimes (even only as far, as my own current knowledge about datetime goes, which almost certainly still isn't complete!). And you'd think datetimes are trivial compared to the rest of objects in our systems. I mean, seriously, it's annoying, but I have to make working software somehow, despite the backbone of all of our software being just shit, and not relying on this shit more than I need to is a good rule to follow. Sure, I totally can use whatever broken DateTime objects when the correctness is not that important (they still work for like 99% of use-cases), but when correctness is important, I'd better rely on a string (maybe wrapped as NewType('SpecialDate', str)) that I know won't modify itself, than on stdlib DateTime object.


> it feels like sometime after Java got popular [...] a large chunk of the collective programming community forgot why strong static type checking was invented and are now having to rediscover this.

I think you have a very rose-tinted view of the past: while on the academic side static types were intended for proof on the industrial side it was for efficiency. C didn't get static types in order to prove your code was correct, and it's really not great at doing that, it got static types so you could account for memory and optimise it.

Java didn't help either, when every type has to be a separate file the cost of individual types is humongous, even more so when every field then needs two methods.

> In most strong statically typed languages, you wouldn't often pass strings and generic dictionaries around.

In most strong statically typed languages you would not, but in most statically typed codebases you would. Just look at the Windows interfaces. In fact while Simonyi's original "apps hungarian" had dim echoes of static types that got completely washed out in system, which was used widely in C++, which is already a statically typed language.


> I think you have a very rose-tinted view of the past

I think they also forgot the entire Perl era.


That's understandable. Youthful indiscretion is best forgotten.

I can still remember trying to deal with structured binary data in Perl, just because I didn't want to fiddle around with memory management in C. I'm not sure it was actually any less painful, and I ultimately abandoned that first attempt.

(Decades later, my "magnum opus" has been through multiple mental redesigns and unsatisfactory partial implementations. This time, for sure...)


> You'd naturally gravitate towards parsing/transforming raw data into typed data structures that have guaranteed properties instead to avoid writing defensive code everywhere e.g. a Date object that would throw an exception in the constructor if the string given didn't validate as a date

It's tricky because `class` conflates a lot of semantically-distinct ideas.

Some people might be making `Date` objects to avoid writing defensive code everywhere (since classes are types), but...

Other people might be making `Date` objects so they can keep all their date-related code in one place (since classes are modules/namespaces, and in Java classes even correspond to files).

Other people might be making `Date` objects so they can override the implementation (since classes are jump tables).

Other people might be making `Date` objects so they can overload a method for different sorts of inputs (since classes are tags).

I think the pragmatics of where code lives, and how the execution branches, probably have a larger impact on such decisions than safety concerns. After all, the most popular way to "avoid writing defensive code everywhere" is to.... write unsafe, brittle code :-(


> You'd naturally gravitate towards parsing/transforming raw data into typed data structures that have guaranteed properties instead to avoid writing defensive code everywhere e.g.

There's nothing natural about this. It's not like we're born knowing good object-oriented design. It's a pattern that has to be learned, and the linked article is one of the well-known pieces that helped a lot of people understand this idea.


My experience was that enterprise programmers burned out on things like WSDL at about the same time Rails became usable (or Django if you’re that way inclined). Rails had an excellent story for validating models which formed the basis for everything that followed, even in languages with static types - ASP.NET MVC was an attempt to win Rails programmers back without feeling too enterprisey. So you had these very convenient, very frameworky solutions that maybe looked like you were leaning on the type system but really it was all just reflection. That became the standard in every language, and nobody needed to remember “parse don’t validate” because heavy frameworks did the work. And why not? Very few error or result types in fancy typed languages are actually suited for showing multiple (internationalised) validation errors on a web page.

The bitter lesson of programming languages is that whatever clever, fast, safe, low-level features a language has, someone will come along and create a more productive framework in a much worse language.

Note, this framework - perhaps the very last one - is now ‘AI’.


In 2 out of 3 problematic bugs I've had in the last two years or so were in statically typed languages where previous developers didn't use the type system effectively.

One bug was in a system that had an Email type but didn't actually enforce the invariants of emails. The one that caused the problem was it didn't enforce case insensitive comparisons. Trivial to fix, but it was encased in layers of stuff that made tracking it down difficult.

The other was a home grown ORM that used the same optional / maybe type to represent both "leave this column as the default" and "set this column to null". It should be obvious how this could go wrong. Easy to fix but it fucked up some production data.

Both of these are failures to apply "parse, don't validate". The form didn't enforce the invariants it had supposedly parsed the data into. The latter didn't differentiate two different parsing.


that's a bit of a hairy situation. You're doing it wrong. Or not really, but.. complicated.

As per [RFC 5321](https://www.rfc-editor.org/rfc/rfc5321.html):

> the local-part MUST be interpreted and assigned semantics only by the host specified in the domain part of the address.

You're not allowed to do that. The email address `foo@bar.com` is identical to `foo@BAR.com`, but not necessarily identical to `FOO@bar.com`. If we're going to talk about 'commonly applied normalisations at most email providers', where do you draw that line? Should `foo+whatever@bar.com` be considered equal to `foo@bar.com`? That souds weird, except - that is exactly how gmail works, a couple of other mail providers have taken up that particular torch, and if your aim is to uniquely identify a 'recipient', you can hardcode that `a@gmail.com` and `a+whatever@gmail.com` definitely, guaranteed, end up at the same mailbox.

In practice, yes, users _expect_ that email addresses are case insensitive. Not just users, even - various intermediate systems apply the same incorrect logic.

This gets to an intriguing aspect of hardcoding types: You lose the flex, mostly. types are still better - the alternative is that you reliably attempt to write the same logic (or at least a call to some logic) to disentangle this mess every time you do anything with a string you happen to know is an email address which is terrible but gives you the option of intentionally not doing that if you don't want to apply the usual logic.

That's no way to program, and thus actual types and the general trend that comes with it (namely: We do this right, we write that once, and there is no flexibility left). Programming is too hard to leave room for exotic cases that programmers aren't going to think about when dealing with this concept. And if you do need to deal with it, it can still be encoded in the type, but that then makes visible things that in untyped systems are invisible (if my email type only has a '.compare(boolean caseSensitive)' style method, and is not itself inherently comparable because of the case sensitivity thing, that makes it _seem_ much more complicated than plain old strings. This is a lie - emails in strings *IS* complicated. They just are. You can't make that go away. But you can hide it, and shoving all data in overly generic data types (numbers and strings) tends to do that.


These days the world assumes that all parts of emails are case-insensitive, even if RFC5321 says otherwise. If it’s true for Google, Outlook & Apple mail then it’s basically true everywhere & everyone else has to get with the program.

If you don’t want to lose potentially important email then you need to make sure your own systems are case-insensitive everywhere. Otherwise you’ll find out the hard way when a customer or supplier is using a system that capitalises entire email addresses (yes, I have seen this happen) & you lose important messages.


Genuinely curious: Are non-ascii characters also case-insensitive. With Unicode comes different case-sensitivity rules according to Unicode version and locale.

I honestly have no idea!

I strongly suspect the systems that are uppercasing everything were not written to handle unicode in the first place though.


In my experience that's pretty rare. Most people pass around string phone numbers instead of a phonenumber class.

Java makes it a pain though, so most code ends up primitive obsessed. Other languages make it easier, but unless the language and company has a strong culture around this, they still usually end up primitive obsessed.


    record PhoneNumber(String value) {}

Huge pain.

I’m very much a proponent of statically typed languages and primarily work in C#.

We tried “typed” strings like this on a project once for business identifiers.

Overall it worked in making sure that the wrong type of ID couldn’t accidentally be used in the wrong place, but the general consensus after moving on from the project was that the “juice was not worth the squeeze”.

I don’t know if other languages make it easier, but in c# it felt like the language was mostly working against you. For example data needs to come in and out over an API and is in string form when it does, meaning you have to do manual conversions all the time.

In c# I use named arguments most of the time, making it much harder to accidentally pass the wrong string into a method or constructor’s parameter.


In f# you can use a single case discriminated union to get that behaviour fairly cheaply, and ergonomically.

https://fsharpforfunandprofit.com/posts/designing-with-types...


What have you gained?

Without any other context? Nothing - it's just a type alias...

But the context this type of an alias should exist in is one where a string isn't turned into a PhoneNumber until you've validated it. All the functions taking a string that might end up being a PhoneNumber need to be highly defensive - but all the functions taking a PhoneNumber can lean on the assumptions that go into that type.

It's nice to have tight control over the string -> PhoneNumber parsing that guarantees all those assumptions are checked. Ideally that'd be done through domain based type restrictions, but it might just be code - either way, if you're diligent, you can stop being defensive in downstream functions.


> All the functions taking a string that might end up being a PhoneNumber need to be highly defensive

Yeah, I can't relate at all with not using a type for this after having to write gross defensive code a couple of times e.g. if it's not a phone number you've got to return undefined or throw an exception? The typed approach is shorter, cleaner, self-documenting, reduces bugs and makes refactoring easier.


>But the context this type of an alias should exist in is one where a string isn't turned into a PhoneNumber until you've validated it.

Even if you don't do any validation as part of the construction (and yeah, having a separate type for validated vs unvalidated is extremely helpful), universally using type aliases like that pretty much entirely prevents the class of bugs from accidentally passing a string/int typed value into a variable of the wrong stringy/inty type, e.g. mixing up different categories of id or name or whatever.


one issue is it’s not a type alias but a type encapsulation. This have a cost at runtime, it’s not like in some functionnals languages a non cost abstraction.

Correctness is more important than runtime costs.

In languages like kotlin and rust you can have a type encapsulation like this that does not exist at runtime

Validation, readability, and prevention of accidentally passing in the wrong string (e.g., by misordering two strings arguments in a function).

I don't see any validation here.

An explicit type

Obviously the pseudo code leaves to the imagination, but what benefits does this give you? Are you checking that it is 10-digits? Are you allowing for + symbols for the international codes?

Can't pass a PhoneNumber to a function expecting an EmailAddress, for one, or mix up the order of arguments in a function that may otherwise just take two or more strings

You have functions

    void callNumber(string phoneNumber);
    void associatePhoneNumber(string phoneNumber, Person person);
    Person lookupPerson(string phoneNumber);
    Provider getProvider(string phoneNumber);
I pass in "555;324+289G". Are you putting validation logic into all of those functions? You could have a validation function you write once and call in all of those functions, but why? Why not just parse the phone number into an already validated type and pass that around?

    PhoneNumber PhoneNumber(string phoneNumber);
    void callNumber(PhoneNumber phoneNumber);
    void associatePhoneNumber(PhoneNumber phoneNumber, Person person);
    Person lookupPerson(PhoneNumber phoneNumber);
    Provider getProvider(PhoneNumber phoneNumber);
Put all of the validation logic into the type conversion function. Now you only need to validate once from string to PhoneNumber, and you can safely assume it's valid everywhere else.

Remember that the ancestor gave a pointless wrapper class plus the sarcastic remark "Huge pain."

That's going to be up to the business building the logic. Ideally those assumptions are clearly encoded in an easily readable manner but at the very least they should be captured somewhere code adjacent (even if it's just a comment and the block of logic to enforce those restraints).

How to make a crap system that users will hate: Let some architecture astronaut decide what characters should be valid or not.

And parentheses. And spaces (that may, or may not, be trimmed). And all kind of unicode equivalent characters, that might have to be canonicalized. Why not treat it as a byte buffer anyway.

If you are not checking that the phone number is 10 digits (or whatever the rules are for the phone number for your use case), it is absolutely pointless. But why would you not?

I would argue it's the other way around. If I take a string I believe to be a phone number and wrap it in a `PhoneNumber` type, and then later I try to pass it in as the wrong argument to a function like say I get order of name & phone number reversed, it'll complain. Whereas if both name & phone number are strings, it won't complain.

That's what I see as the primary value to this sort of typing. Enforcing the invariants is a separate matter.


What did you lose?

This is an idea that is not ON or OFF

You can get ever so gradually stricter with your types which means that the operations you perform on on a narrow type is even more solid

It is also 100% possible to do in dynamic languages, it's a cultural thing


I'm not sure, maybe a little bit. My own journey started with BASIC and then C-like languages in the 80s, dabbling in other languages along the way, doing some Python, and then transitioning to more statically typed modern languages in the past 10 years or so.

C-like languages have this a little bit, in that you'll probably make a struct/class from whatever you're looking at and pass it around rather than a dictionary. But dates are probably just stored as untyped numbers with an implicit meaning, and optionals are a foreign concept (although implicit in pointers).

Now, I know that this stuff has been around for decades, but it wasn't something I'd actually use until relatively recently. I suspect that's true of a lot of other people too. It's not that we forgot why strong static type checking was invented, it's that we never really knew, or just didn't have a language we could work in that had it.


Strong static type checking is helpful when implementing the methodology described in this article, but it is besides its focus. You still need to use the most restrictive type. For example, uint, instead of int, when you want to exclude negative values; a non-empty list type, if your list should not be empty; etc.

When the type is more complex, specific contraints should be used. For a real live example: I designed a type for the occupation of a hotel booking application. The number of occupants of a room must be positiv and a child must be accompanied by at least one adult. My type Occupants has a constructor Occupants(int adults, int children) that varifies that condition on construction (and also some maximum values).


> The number of occupants of a room must be positiv and a child must be accompanied by at least one adult. My type Occupants has a constructor Occupants(int adults, int children) that varifies that condition on construction (and also some maximum values).

Or, you could do what I did when faced with a similar problem - I put in a PostgreSQL constraint.

Now, no matter which application, now or in the future, attempts to store this invalid combination, it will fail to store it.

Doing it in code is just asking for future errors when some other application inserts records into the same DB.

Business constraints should go into the database.


Using uint to exclude negative values is one of the most common mistakes, because underflow wrapping is the default instead of saturation. You subtract a big number from a small number and your number suddenly becomes extremely large. This is far worse than e.g. someone having traveled a negative distance.

In C# I use the 'checked' keyword in this or similar cases, when it might be relevant: c = checked(a - b);

Note that this does not violate the "Parse, Don't Validate" rule. This rule does not prevent you from doing stupid things with a "parsed" type.

In other cases, I use its cousin unchecked on int values, when an overflow is okay, such as in calculating an int hash code.


It's a design choice more than anything. Haskell's type safety is opt-in — the programmer has to actually choose to properly leverage the type system and design their program this way.

I worked (a long time ago) on a C project where every int was wrapped in a struct. And a friend told me about a C++ project where every index is a uint8, uint16, and they have to manage many different type of objects leading to lots of bugs.. So it isn't really linked to the language.

> Edit: Changed this from email because email validation is a can of worms as an example

Email honestly seems much more straightforward than dates... Sweden had a Feb 30 in 1712, and there's all sorts of date ranges that never existed in most countries (e.g. the American colonies skipped September 3-13 in 1752).


It’s a ISO-standard to use Gregorian dates even for dates predating its invention. If you need to support anything else (I never had to in my Eurocentric work so far), you’ll need to model calendars, similar to how temporal did for JavaScript: https://tc39.es/proposal-temporal/docs/calendars.html

Dates are unfortunate in that you can only really parse them reliably with a TZDB.

I think you're quite right that the idea of "parse don't validate" is (or can be) quite closely tied to OO-style programming.

Essentially the article says that each data type should have a single location in code where it is constructed, which is a very class-based way of thinking. If your Java class only has a constructor and getters, then you're already home free.

Also for the method to be efficient you need to be able to know where an object was constructed. Fortunately class instances already track this information.


And then clojure enters: let’s keep few data structures but with tons of method.

So things stay as maps or arrays all the way through.


this is very much a nitpick, but I wouldn't call throwing an exception in the constructor a good use of static typing. sure, it's using a separate type, but the guarantees are enforced at runtime

I wouldn't call it a good use of static typing, but I'd call it a good use of object-oriented programming.

This is one of the really key ideas behind OOP that tends to get overlooked. A constructor's job is to produce a semantically valid instance of a class. You do the validation during construction so that the rest of the codebase can safely assume that if it can get its hands on a Foo, it's a valid Foo.


Given that the compiler can't enforce that users only enter valid data at compile time, the next best thing is enforcing that when they do enter invalid data, the program won't produce an `Email` object from it, and thus all `Email` objects and their contents can be assumed to be valid.

This is all pretty language-specific and I think people may end up talking past each other.

Like, my preferred alternative is not "return an invalid Email object" but "return a sum type representing either an Email or an Error", because I like languages with sum types and pattern matching and all the cultural aspects those tend to imply.

But if you are writing Python or Java, that might look like "throw an exception in the constructor". And that is still better than "return an Email that isn't actually an email".


Ah yeah, I guess I assumed by the use of the term "contructor" that GP meant a language like Python or Java, and in some cases it can difficult to prevent misuse by making an unsafe constructor private and only providing a public safe contructor that returns a sum type.

I definitely agree returning a sum type is ideal.


I agree and for several reasons.

If you have onerous validation on the constructor, you will run into extremely obvious problems during testing. You just want a jungle, but you also need the ape and the banana.


What big external dependencies do you need for a parser?

`String -> Result<Email, Error>` shouldn't need any other parameters?

But you should ideally still have some simple field-wise constructor (whatever that means, it's language-dependent) anyways, the function from String would delegate to that after either extracting all of the necessary components or returning/throwing an error.


> When we pack high-density information into a data table or a complex dashboard we are increasing the visual entropy of the entire system. Forcing the brain to decode intricate, non-universal shapes in a tiny 16-pixel footprint, creates a “cognitive tax” that users pay en masse every time they scan the table.

What if it's an icon with a simple shape? How does that compare to noising up the table with long phrases and repetitive words? Is the cognitive tax if icons a lot higher or just a little higher? What if it's an app where the user will be using it for hours, so they'll quickly learn what the icons mean and will appreciate the space they save?

Is a tick icon really that big a deal in place of "Task completed"? Or a pencil instead of "Edit"? Sometimes you don't have a choice because of lack of space too. There's always tradeoffs to make. Obviously try to avoid icons that are hard to guess though but sometimes that's not always possible.

I can't say I've ever felt tired looking at icons in a table, but when designing I have had the experience of replacing wordy repetitive text with some intuitive icons in a complex table and it suddenly looking less intimidating.


Right, this article overlooks the difference between a first encounter and regular encounters. The concise representation pays off when you do learn it, as long as it's executed well.

And I'm fine with a bit of cognitive exploration to figure out a green check and red X scheme rather than see a whole table column filled up with words like "active" and "inactive". The former allows more columns on screen at once. Horizontal scrolling is a worse impediment to assimilating information from a table.


I would almost always rather have the words; words are things I can easily search for and manipulate using the text-processing tools in my possession.

Personally, my brain "page faults" whenever it has to interpret an emoji, which makes most use of in-line icons far worse than the text they represent. I expect few people have this problem, but I also expect that I'm not the only one with it.


I agree that certain icons that are common parlance can increase cognition ( vs. x). However I think expanding a users icon lexicon and forcing memorization can actually harm cognitive experience.

Our users are context switching across dozens if not hundreds of digital experiences a day. Forcing memory recall is a tax. The question is always "whats the ROI?"

IMO color and words go just as far as an icon without relying on net new visual language.

As per your comment on horizontal scrolling, I couldn't agree more. Horizontal scrolling is booty. However, depending on the job to be done you can avoid overly wide tables with customizable columns, expandable rows, hover states, and strategic truncation.

I certainly would prefer those strategies over relying on a unique icon language that isn't part of the dozen or so immediately recognizable icon schemas already familiar to users.


My gut feel (personal experience, not research) is that the whole of the icons' nature is important. Them having simple shapes doesn't necessarily solve the problem and could in some cases make it worse.

Imagine for example a set of icons that are monochrome, open-ended glyphs comprised of a single stroke with line weight similar to that of the text. This could complicate visual parsing greatly due to high visual similarity to text.

On the other hand, a 16px checkbox control with subtle gradients, shadows, and depth cues looks absolutely nothing like text and is filtered out by the brain almost automatically (unless of course the checkbox state is pertinent to the user's intent). Same goes for a 16px colorful icon with shading like used to be ubiquitous in desktop operating systems.


The box itself around a data table label could hint at a state, if the goal is to define only a handful of states (green rounded capsule for a completed state; diamond capsule for an in-progress condition; red square for an error; purple parallelogram for some special condition; etc).

Not sure how this is for accessibility in terms of colour selection, but I’m sure this could be fine-tuned.


> The rules of the language insist that when you use a nullable variable, you must first check that variable for null. So if s is a String? then var l = s.length() won’t compile. ...

> The question is: Whose job is it to manage the nulls. The language? Or the programmer? ...

> And what is it that programmers are supposed to do to prevent defects? I’ll give you one guess. Here are some hints. It’s a verb. It starts with a “T”. Yeah. You got it. TEST!

> You test that your system does not emit unexpected nulls. You test that your system handles nulls at it’s inputs.

Am I reading or quoting this wrong?

Just some pros of static type checking: you can't forget to handle the null cases (how can you confirm your tests didn't forget some permutation of null variables somewhere?), it's 100% exhaustive for all edge cases and code paths across the whole project, it handholds you while refactoring (changing a field from being non-null to null later in a complex project is going to be a nightmare relying on just tests especially if you don't know the code well), it's faster than waiting for a test suite to run, it pinpoints to the line where the problem is (vs having to step through a failed test), and it provides clear, concise, and accurate documentation (instead of burying this info across test files).

And the more realistic comparison is most programmers aren't going to be writing lots of unhappy path tests for null edge cases any way so you'll be debugging via runtime errors if you're lucky.

Static typing here is so clearly better and less risky to me that I think expecting tests instead is...irresponsible? I try to be charitable but I can't take it seriously anymore if I'm honest.


The idea that tests can replace a type system (and vice versa) is a known fallacy.

Discussed here, two years before this article was written: https://www.destroyallsoftware.com/talks/ideology


A tool for creating CSS color palettes for web UIs that pass WCAG accessibility standards for color contrast, where you can fine tweak all the tints/shades quickly using a hue/saturation/lightness curve editing interface:

https://www.inclusivecolors.com/

Unlike most tools based around autogenerating colors, this is more of an editor that lets you fully customise all the tint/shades to your liking with a focus on accessibility. This is important when you've got existing brand colors to include and want to find accessible color combinations that work together.

Would love feedback in general and especially from designers/devs who have different needs in how they go about creating branded palettes!


This is great! As a non-designer, I've been relying on ChatGPT to select color schemes/palettes for me.

> I've been relying on ChatGPT to select color schemes/palettes for me

Thanks! Any problems you've found with this approach or it's usually good enough?

For me, I couldn't find a tool that would let me customize multiple color scales at once, check they look good together on a mockup, and also be accessible. It's one of those problems where you can autogenerate something that gets you most of the way there, but then for it to be usable you need need to see how it looks on designs and fine tweak it.


Have you tried https://huetone.ardov.me/? Multiple color scales, P3, export to CSS and figma, as well as APCA & WCAG for accessibility.

So for my tool, I really need the live UI mockup without having to export first to tweak the colors until they work (e.g. often the off-white/very-light colors used for backgrounds are too vibrant otherwise), the control-point based curve editing helps to explore hue/saturation/lightness curves around a brand color without a lot of clicking, and I want the option for palettes where each color scale follows the same steps in lightness (for predictable contrast between steps from different color scales).

Barely any designers I work with know about P3 colors (feels like P3 mostly appeals to developers right now, for programmatic reasons?), so I'm not that interested in P3 if it means using OKLCH with its intimidating looking color picker. My tool uses HSLuv, which looks familiar like an HSL color picker, where unlike HSL only the lightness slider alters the WCAG contrast, so HSLuv (while limited to sRGB) is great for exploring accessible colors.

I've actually got support for APCA, but I find many struggle understanding WCAG contrast requirements already. There's Figma export too.

Anyway, there's lots of overlap between different color tools but the small details are important for different workflows and needs. I've started to realise too that most designers need a lot of introduction into building (accessible) color palettes in general so it's a tricky puzzle between adding features and trying to keep it simple, which is why I'm very open to suggestions!


Location: Edinburgh, UK

Remote: Yes (I’m used to time zone differences and async work)

Willing to relocate: No

Technologies: Figma, Sketch, TypeScript, JavaScript, Vue, Hugo, Jekyll, WordPress, Django, HTML/CSS, Bootstrap, Tailwind, OCaml, Java, Python, C, analytics, WCAG accessibility, website SEO/speed optimisation.

Résumé/CV: See https://seanw.org/ for portfolio, and https://checkbot.io/ and https://inclusivecolors.com/ for live example projects

Email: sw@seanw.org

---

SEEKING FREELANCE WORK | UX/UI & web design

I help startups with the UX/UI and web design of their products. This includes web apps, websites, landing pages, copywriting, and I can assist with frontend development where needed. My background of launching my own products and being a full stack developer helps me create practical designs that balance usability, aesthetics, development effort, and performance. I work to fixed price quotes for self-contained projects.

---

The best live example of my work is Checkbot (https://checkbot.io/), a browser extension that tests websites for SEO/speed/security problems. The entire project is my own work including coding the extension itself, UX/UI design, website design (the homepage is optimised to load in 0.7 seconds, 0.3MB data transferred), marketing, website copy, and website articles on web best practices.

[ Rated 4.9/5, 80K+ active users, 100s of paying subscribers ]

---

I have 10+ years of experience, including a PhD in software verification and 5+ years working for myself helping over 25 companies including Just Eat, Triumph Motorcycles and Fogbender (YC W22). See my website for testimonials, portfolio and more: https://seanw.org

Note: For large projects, my partner usually assists me in the background (I’m working on starting a design studio with her in the future)

---

Email sw@seanw.org with a short description of 1) your project 2) how you think I can help 3) the business outcome you’re looking for and 4) any deadlines. I can get back to you in one working day to arrange a call to discuss a quote and how we can work together!


There's also the significant cost to climate change because growing crops to feed to animals instead of eating crops directly loses the majority of calories, but it gets ignored because doing something about it is going to be unpopular:

https://ourworldindata.org/global-land-for-agriculture

> More than three-quarters of global agricultural land is used for livestock, despite meat and dairy making up a much smaller share of the world's protein and calories.

> Despite the vast land used for livestock animals, they contribute quite a small share of the global calorie and protein supply. Meat, dairy, and farmed fish provide just 17% of the world’s calories and 38% of its protein.

https://ourworldindata.org/land-use-diets

> Livestock are fed from two sources – lands on which the animals graze and land on which feeding crops, such as soy and cereals, are grown. How much would our agricultural land use decline if the world adopted a plant-based diet?

> Research suggests that if everyone shifted to a plant-based diet, we would reduce global land use for agriculture by 75%.


> but it gets ignored because doing something about it is going to be unpopular

It gets talked about all the time.


Do you think it widely leads to behavior changes among people that support environmental causes?


In the US 8 out of the top 10 environmental organizations with most membership oppose nuclear power broadly and the majority oppose wind and solar locally so I think we can safely conclude that climate change is not important to US environmental causes.

The primary work by US environmentalists (or at least the popular ones) is in ensuring rich people’s homes abut publicly-maintained parks.


What are you using as the "top 10 environmental organizations" and which of them oppose wind/solar?


Not really; but talking about it more also seems like it will have approximately zero marginal benefit, and trying to insinuate that other people are immoral is probably net counterproductive.


If the goal of the post is to pick terminal colors that contrast on both white/light and black/dark backgrounds, it means you're stuck with midtone colors (between light and dark). This is really limiting for color choice (there's no such thing as "dark yellow" for example), and lowers the maximum contrast you can have for text because you get the best contrast when one color is dark and the other is light.

Ideally, instead of the CLI app switching to "bright green", it would pick a "bright contrasting green". So if the terminal background was dark, it would pick bright green, and for light background it would pick a darker green. There isn't CLI app implementations for this? This is similar to how you'd implement dark mode in a web app.


> Ideally, instead of the CLI app switching to "bright green", it would pick a "bright contrasting green". So if the terminal background was dark, it would pick bright green, and for light background it would pick a darker green. There isn't CLI app implementations for this? This is similar to how you'd implement dark mode in a web app.

The responsibility for this lies with the color scheme not the terminal program.


CLI apps can detect the background color of the terminal, and determine contrasting colors accordingly.


They can? Is this a recent thing? I remember wanting to detect the background colour years ago, and not finding any way to do it.


It's not recent, and most terminals support it. You send an escape sequence to the terminal, and get back a sequence that tells you the exact background color.


Huh, indeed. I still can't find much information about this, but this page is very informative: https://jwodder.github.io/kbits/posts/term-fgbg/


That's called `\e[0;92m`, aka the ANSI terminal espace sequence for bright green. You have 15 others, that will be displayed however the terminal's user wants. They're already available in most terminal color libraries, too.


I wish one of those regex libraries that replaces the regex symbols with human readable words would become standard. Or they don't work well?

Regex is one of those things where I have to look up to remind myself what the symbols are, and by the time I need this info again I've forgotten it all.

I can't think of anywhere else in general programming where we have something so terse and symbol heavy.


It’s been done. Emacs, for example, has rx notation. From the manual:

    35.3.3 The ‘rx’ Structured Regexp Notation
    ------------------------------------------
    
    As an alternative to the string-based syntax, Emacs provides the
    structured ‘rx’ notation based on Lisp S-expressions.  This notation is
    usually easier to read, write and maintain than regexp strings, and can
    be indented and commented freely.  It requires a conversion into string
    form since that is what regexp functions expect, but that conversion
    typically takes place during byte-compilation rather than when the Lisp
    code using the regexp is run.
    
       Here is an ‘rx’ regexp(1) that matches a block comment in the C
    programming language:
    
         (rx "/*"                    ; Initial /*
             (zero-or-more
              (or (not "*")          ;  Either non-*,
                  (seq "*"           ;  or * followed by
                       (not "/"))))  ;     non-/
             (one-or-more "*")       ; At least one star,
             "/")                    ; and the final /
    
    or, using shorter synonyms and written more compactly,
    
         (rx "/*"
             (* (| (not "*")
                   (: "*" (not "/"))))
             (+ "*") "/")
    
    In conventional string syntax, it would be written
    
         "/\\*\\(?:[^*]\\|\\*[^/]\\)*\\*+/"
Of course, it does have one disadvantage. As the manual says:

       The ‘rx’ notation is mainly useful in Lisp code; it cannot be used in
    most interactive situations where a regexp is requested, such as when
    running ‘query-replace-regexp’ or in variable customization.
Raku also has advanced the state of the art considerably.


I know what you meant, but WCAG2 is actually flawed for dark mode. For gray body text on black, going with the minimum 4.5:1 ratio is hard to read. APCA attempts to fix this https://git.apcacontrast.com/documentation/APCA_in_a_Nutshel....


> They act as stand-ins for actual users and will flag all sorts of usability problems.

I think everyone on the team should get involved in this kind of feedback because raw first impressions on new content (which you can only experience once, and will be somewhat similar to impatient new users) is super valuable.

I remember as a dev flagging some tech marketing copy aimed at non-devs as confusing and being told by a manager not to give any more feedback like that because I wasn't in marketing... If your own team that's familiar with your product is a little confused, you can probably x10 that confusion for outside users, and multiply that again if a dev is confused by tech content aimed at non-devs.

I find it really common as well that you get non-tech people writing about tech topics for marketing and landing pages, and because they only have a surface level understanding of the the tech the text becomes really vague with little meaning.

And you'll get lots devs and other people on the team agreeing in secret the e.g. the product homepage content isn't great but are scared to say anything because they feel they have to stay inside their bubble and there isn't a culture of sharing feedback like that.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: