Email: Explained from first principles

albertgoeswoof · on Feb 19, 2023

If you’re going to learn anything, email is a good bet. It’s barely changed in 30 years, and I suspect it will not change much for a long time.

I built my own transactional email provider (https://mailpace.com), I wish I had found this link before I started.

I would also recommend reading the email RFCs, they’re not that difficult to understand and the history explains a lot. Email aside we can learn a lot from this kind of distributed, decentralised system design in the future.

klabb3 · on Feb 19, 2023

> Email aside we can learn a lot from this kind of distributed, decentralised system design in the future.

Nit for other people curious about email: email isn’t decentralized by the popularized use of the term, both in theory and more so in practice.

Email relies on DNS which is federated. That doesn’t mean that it’s worse, federation is a proven-to-work trade off which provides many of the benefits of decentralization, (arguably the most important parts).

As for practice, email relies on opaque reputation systems to prevent abuse, which acts as a centralizing force. I’m sure you know (better than me) that the barrier of entry to sending email on your own is quite a bit above zero.

dwheeler · on Feb 19, 2023

Few things are perfectly anything, and email is way more distributed than (say) Twitter messages.

In particular, I own my own domain (which costs little), and then set up where my email goes. If I want to change email providers, I can do it without having to track down everyone who knew my old email address. Similarly, if I don't like my domain registrar, I can change that too.

Being able to control my own destiny, and not being completely under the control of someone else, is in my opinion the real goal. Distributed & decentralized systems at least hold out the promise of meeting that goal, and where they do, I'm delighted. It's not perfect, but it beats many other systems.

account42 · on Feb 22, 2023

To add to that, while DNS is centralized at the root, that centralization is unavoidable if you want to have human-readable addresses that consistently resolve to the correct recipient. And each TLD is (at least the ccTLDs and the old gTLDs) managed by a government or international organization acting on behalf of governments, which in turn (at least in theory) act on your behalf. While this system is not perfect (see e.g. the US governments history of seizing domains) it is almost infinitely better than most other communication mediums on the internet where the single company operating it can kick you out more or less for arbitrary reasons and without recourse.

That alone makes this system precious enough that any pains from running your own mail server are IMHO worth it if it means that we can preserve Email as we know it - as a federated system not under the control of a single or a small number of corporations. Unfortunately, Email is at risk. For a lot of people it is already synonymous with Gmail and while Gmail is hardly the worst when it comes to deliverability (IME that crown goes to Microsoft) they do control enough accounts that they can pretty much nuke your little server if they want to.

lisper · on Feb 19, 2023

> the barrier of entry to sending email on your own is quite a bit above zero

I can attest to this. I ran my own email server with no problem for fifteen years. Then my provider was acquired, and I decided to switch to a different provider, and so the IP address of my server changed for the first time in 15 years. I have not been able to reliably send email since, and I finally threw in the towel and signed up with Fastmail to handle my outgoing email.

The biggest problem is people banning entire blocks of IP addresses. This can give you a bad reputation even if your behavior is stellar. All it takes is for someone using the same hosting provider to be a spammer to get yourself blacklisted.

The whole situation sucks big fat honking weenies.

klabb3 · on Feb 20, 2023

Thanks for the data point. That sucks.

I guess generally speaking, federation works well for DNS itself but not so well for email, since the malicious stuff you can do with DNS is greatly limited. Abuse/spam is everywhere, not just email, and the sad part is that providers not only went and rolled their own anti abuse systems, but they also went to great lengths to keep them secret.

Without solving the spam problem, we won’t see decentralized or even federated systems widely popularized for any products which has these incentives. This includes mastodon, Nostr etc. The worst part is that projects can become pretty big before the spam starts, so it kills projects at the worst possible time. I wish the FOSS community came together and focused on finding solutions that work, because decent abuse prevention today is neither easy nor cheap to get right. And going without one is a ticking time bomb.

lisper · on Feb 20, 2023

No, it's not a fundamental problem. It's just laziness on the part of many email providers. My email server has proper SPF records, and so anyone whose email server is configured properly can easily verify that my server is authorized to send email on behalf of my domain(s). But email admins just subscribe to blacklists and follow them blindly. There are a few big players, including Microsoft and AT&T, who just don't care if a few small players get blocked by false positives. (Ironically, despite their tolerance for false positives, they still let through an enormous amount of spam. I know this because my parents have their email on AT&T.)

franga2000 · on Feb 20, 2023

Spam and forgery aren't the same. I can be 1000% sure that you own the sending domain, that the mail came from an authorized server, etc., but that tells me absolutely nothing about whether I want to read what you're sending. Spammers used to be lazy and not configure the whole stack, so that became a simple heuristic, but these days it's nearly trivial to set up automatic provisioning of new domains and IPs with full SPF+DKIM+DMARC setups, so the big players went back to blacklists and IP reputation.

Spam definitely is a fundamental problem of all open-inbox systems like email.

lisper · on Feb 20, 2023

Yes, that is certainly true, but in this case it is irrelevant because it is my server's IP address that is being banned. I am sending email to people with whom I have been corresponding for years (like my parents) and it is being blocked simply because the IP address of my server changed.

klabb3 · on Feb 20, 2023

Parent poster here. Assuming malicious actors can easily set up all the fancy acronyms to verify the sender’s identity, how should spam be prevented if not through IP reputation (and whatever else they have today)? Should the reputation apply to the domain name instead? Would that protect against Sybil attacks?

To my knowledge, these systems are proprietary, obscure secret sauce heuristics. My point is that we should find a way to do spam prevention in an open and easy to implement way. Such an algorithm could then be applied to any open-inbox system, not just email.

lisper · on Feb 20, 2023

> Assuming malicious actors can easily set up all the fancy acronyms to verify the sender’s identity

How is a malicious actor going to set up a fake SPF record for my domain without compromising my DNS account?

It makes sense to block a domain even with a correct SPF record. It makes no sense to block an IP with a correct SPF record for a domain that you already trust. And, like I said, I'm getting bounced mail from people I've been corresponding with for years, so I know the recipient email providers do trust the domain. They're just being stupid with IP black listing.

KasparEtter · on Feb 20, 2023

Just to be clear: Due to email forwarding, having an SPF record isn't enough for domain authentication; you also need DKIM and DMARC. Mailbox providers want to deliver emails that people actually want to get, so not relying only on SPF records is a reasonable policy. But yes, thanks to domain authentication, we should be able to move away from IP reputation to domain reputation – at least in theory.

efreak · on Feb 20, 2023

And setting up DMARC opens you up to an entirely new type of spam: corporate networks emailing you every time someone spoofs you. I had it set up for a short time before I quickly turned it back off.

KasparEtter · on Feb 21, 2023

For one thing, you can configure a DMARC policy without a reporting address. For another thing, you can use third-party services, such as https://dmarc.postmarkapp.com/, to aggregate DMARC reports for you (if you're fine with the privacy implications of that).

LeonM · on Feb 21, 2023

You can also just set up DMARC without a reporting endpoint. But DMARC aggregate reports are very useful, so I wouldn't recommend using DMARC without reporting. Also, you do not receive a report 'every time someone spoofs you', but rather periodically, at an interval which you can even configure.

That said, DMARC aggregate reports are not supposed to be human readable. You don't want to set the reporting endpoint to your personal inbox. You need a DMARC aggregation tool, such as included in https://www.mailhardener.com to process them. (full disclosure: I work there)

icedchai · on Feb 21, 2023

I have some procmail rules set up that sends most of that stuff to a different mailbox that I never look at.

lisper · on Feb 20, 2023

See my response here: https://news.ycombinator.com/item?id=34869796

account42 · on Feb 22, 2023

IMO the mail spam problem is overstated. For technically inclinded people, sorting through the trash does not take a significant amount of time (at least it doesn't for me) and people don't complain as much about physical mail spam or ad spam being everywhere - or at least demand laws going after the spammers instead of dropping mail based on heuristics.

imetatroll · on Feb 20, 2023

As someone who also tries to run his own mail server, this makes me laugh ;)

hackernews1134 · on Feb 20, 2023

For those who did not immediately grasp the parents claim that DNS is federated and not decentralized; here [0] is a nice article I found that talks about network architecture terminology definitions used in various famous research papers. It is a very nice, terse, compare and contrast.

[0] https://networkcultures.org/unlikeus/resources/articles/what...

Thank you for sending me down the rabbit hole klabb3! :)

codetrotter · on Feb 19, 2023

> Email relies on DNS which is federated

To a great extent, yes.

If I give my public email address to someone and expect them to be able to send me email from anywhere at anytime, they will almost certainly have to rely on the DNS that exists like we know it.

But the situation can be slightly different in various ways:

- If our mail servers have globally routable IP-addresses permanently assigned to them, and are set up to accept mails where the IP address is used directly, we could use the IP addresses instead of domain names. But that's still similar to the situation with domain names, since it's still ICANN at the top for that too.

- In a LAN or a VPN we could exchange email between hosts using either DNS that is completely under our control, or we could use mDNS on it which is also under our control, or in the case of either really small networks or a short time span we could even rely on the IP addresses our mail servers have on said LAN or VPN and send mail and again that would be completely under our own control as well.

- Since Tor offers a base for TCP connections, every TCP based protocol can operate on top of it. So if we both use Tor connected email servers then we could exchange emails that way. Our email clients could connect to our email servers over clearnet without having to be aware of Tor, and we could use email addresses like bob@fastrcl5totos3vekjbqcmgpnias5qytxnaj7gpxtxhubdcnfrkapqad.onion and [email protected]

In these ways, e-mail could still be regarded as decentralized. As well as in a simpler way that as opposed to for example Facebook where there is one company that owns the whole system and oversees and decides who gets to have an account on the system or not, you and I can always find another e-mail provider to give us a new account and we can communicate with others from those new accounts.

And in fact, if we were willing to do it, we could change how we set up our e-mail servers so that we have the server accept mail from any other e-mail server but with the condition that the mail was encrypted with a known PGP key. Now imagine that you and I set up our mail servers like this and we exchange PGP keys. Then even if I lose access to my original email account that I had told you about, I could mail you from a different account with another provider using the same PGP key, and your server would recognise that the mail was from me and accept it and then you could respond back. But what if we both lose access to our accounts at the same time? If we plan ahead we could make not just one account each but a number of accounts on a number of different mail servers that we both tell each other about. That would give us quite a lot of wiggle room.

OJFord · on Feb 20, 2023

Seconded, especially re RFCs. I built/am building on the client/personal user side, and agree they're really accessible. Of course what I've done is still riddled with bugs, and my hours spent on it are few and far between, but it's fun to build something to such a common protocol; have it work and interact with others.

I was finally pushed into getting it off the ground when Fastmail stopped working the way I relied on it (niche/awkward/not documented as explicitly working) a week before my renewal - so I spent long nights getting something usable off the ground with that as a deadline. Extremely satisfying. (Then Fastmail started working as it had again, a day or so before my time was up! >_<)

KasparEtter · on Feb 20, 2023

Out of curiosity: What's the "feature" you needed; and what for?

OJFord · on Feb 20, 2023

I can't remember exactly what aspect broke - though I think I have notes on it somewhere because I intended to blog a bit about it (not to hate on Fastmail, just to explain what I was doing and how I fixed it) - but it was something to do with relaying email, i.e. delivering it 'envelope to' my Fastmail address, but addressed to another.

NetOpWibby · on Feb 19, 2023

Thanks for sharing, I think I'll use this over Postmark. Just spent ~10 minutes looking over the site and the blog. Seems super solid.

layer8 · on Feb 19, 2023

I love such thorough expositions, although it isn’t quite as thorough as I would like. Two examples that caught my attention in the parts I read:

1. It doesn’t mention the old convention for writing an email address with display name as

  [email protected] (John Doe)

instead of:

  John Doe <[email protected]>

2. It states that subject prefixes like “Re:” have no technical relevance. That’s not entirely true, because email clients recognize existing prefixes when replying/forwarding, in order to not add a redundant one. There are several issues here:

- Localized email software sometimes uses a different prefix than “Re”, based on the local language. This is an issue when having an email thread between email clients who don’t recognize each other’s local-language prefixes. (Arguably, it would be better for everyone to stick with “Re” regardless of language.)

- Some email clients have the convention of adding a count to the “Re”, e.g. “Re[2]:”, “Re[3]:”, and so on. When you write an email client, you may want to consider recognizing those.

Due to the variety in subject prefixes, some email clients allow users to configure a regex.

KasparEtter · on Feb 20, 2023

Hi layer8, thanks for the feedback! I must say that your standards for thoroughness are pretty high. Since I claimed at the very top that my article covers all aspects of modern email, I don't mind your criticism at all.

I haven't thought about the prefix chaining issue, and I'm happy to mention this in a future revision of the article. I would still argue that this is below a reasonable level of technical significance as neither conversation grouping nor message delivery is affected by it. It's more like displaying "(No subject)" instead of actually displaying no subject.

Do you have any source for what you say is an old display name convention? I've just checked the standards, and as far as I can tell after a quick glance, RFC 822 (https://datatracker.ietf.org/doc/html/rfc822#section-6) doesn't mention display names at all, and its successor RFC 2822 (https://datatracker.ietf.org/doc/html/rfc2822#section-3.4) mentions display names only with angle brackets.

PS: I have quite a few topics on my todo list, which I should add in a future revision of the article in order to live up to the claim of covering all aspects of modern email. These include MAPI, Microsoft's autodiscover mechanism, direct mailbox addressing, domain-to-domain encryption, and link rewriting in incoming mails. Some information is also no longer up to date by now.

sbuk · on Feb 19, 2023

Email clients don’t really have a spec to follow other than parsing messages that are received via POP or more likely IMAP these days (not forgetting MAPI either). Adding FWD or RE to a subject field doesn’t appear anywhere in the email message RFCs, starting with 822. As such they are undocumented conventions or extensions to the specs, and as the article points out technically irrelevant - no MDAs, MTAs or MUAs that do not “support” features that utilise these conventions will fail to parse a message.

Alex3917 · on Feb 20, 2023

To plug my own software, I've been working on an API to normalize raw email messages so that e.g. they can be easily displayed within web apps.

https://github.com/fwdeveryone/email-parsing-api

Right now building even the simplest products on email takes several years because of needing to support all of these undocumented "features", but I'm trying to make it no more difficult than building any other Django or Rails app.

codetrotter · on Feb 19, 2023

> This is an issue when having an email thread between email clients who don’t recognize each other’s local-language prefixes.

https://en.wikipedia.org/wiki/List_of_email_subject_abbrevia...

RE:SV:RE:FWD:AW:FWD:AW:FWD:REF:RIF:Verbale della riunione, venerdì 7

:p

But yaeh, I think the big email clients are good at recognizing these across different languages.

account42 · on Feb 22, 2023

Arguably, the Re: prefix should not have been added to the subject at all but implied by the In-Reply-To or References headers, allowing clients to display it in whatever language they want. Trying to standardize an in-band redundant flag in the subject header is probably not worthwile. Does Gmail and other clients with thread views even display the supplied subject when replying to a known mail?

justinator · on Feb 20, 2023

> 2. It states that subject prefixes like “Re:” have no technical relevance. That’s not entirely true, because email clients recognize existing prefixes when replying/forwarding, in order to not add a redundant one.

Sort of. That can just be a clever regex. There is a In-Reply-To header that may actually give some suggestion on if this is a reply to something by telling you what it's a reply of.

hilbert42 · on Feb 19, 2023

This is the best simplified description of email and the email system I've read.

It's simple to understand and fully comprehensive—with the exception of RFCs of course (but often they aren't easy to understand).

A highly recommended read/reference whether you're an email neophyte or one who builds email clients or server software.

tuhinnair · on Feb 19, 2023

I think this explanation does a great job of showing just how large the bottom of the Email iceberg is. When I first worked with Email, I was overwhelmed by the work required to handle all the variation.

I didn't even try to run my own mail server, I was just trying to parse and store email in a structured form. What I thought would be a quick evening's work turned into a full week.

While building Hypermail [0], I actually resorted to using GPT-3 with some of the parsing. It did extremely well in handling variations in email reply formats (email clients all have their own way of doing things).

[0] - https://www.idiotlamborghini.com/articles/the_hypermail_expe...

agumonkey · on Feb 19, 2023

> What I thought would be a quick evening's work turned into a full week

makes me wanna write a book series titled like this. so often i overlook details and time required is 5-10x more :)

also makes me wanna find a slightly deterministic way to asses this more precisely. i was trying to do vague dimensional analysis and see how many parts and relations there would be for any given ideas.

denton-scratch · on Feb 19, 2023

Ow, it seems heavy going! I mean, it's good stuff; detailed, well-organized, and accurate (as far as I can see). But in my mind, it just shouldn't be such hard work to understand.

The title's clear: "from first principles". I can't complain!

I'm only about a quarter of the way through; I need a rest now. I may edit later.

jabroni_salad · on Feb 19, 2023

I don't have any kind of pedigree with email but this page has been very helpful to me over the past year. Every now and then something that seems weird or esoteric makes it to the ticket board and having this explain how email in general is supposed to work as opposed to anything vendor-specific let me take cases that the rest of the consultancy passed up on.

I also really like the website's software. I wish all the technical writing I have to deal with was presented in this way.

shon · on Feb 20, 2023

This is how everything used to be taught. I was one of the first CCNPs and we learned networking from wire signaling up. 7 layers, starting with the physical.

First principles is good.

ggm · on Feb 19, 2023

EHLO Richie, Lionel is it 8-bit clean me you're looking for?

andris9 · on Feb 20, 2023

Well, I for one, hope that email stays as complicated as described in the post. Otherwise my project that simplifies access to email accounts (https://emailengine.app) would get no traction :D

dieselgate · on Feb 20, 2023

I'm interested in this topic but didn't have the attention span to read the whole thing.

One part of this website I found interesting is the light/dark toggle. Clicking on the button sends an http response and the page is loaded with the newly selected theme. Only mentioning this because it's something I've thought about implementation-wise but never actually coded out because there seems like other options that don't utilize an http request. Just cool to see it in the wild, though

hit8run · on Feb 19, 2023

I read the rfcs whenever I need I need some more understanding. They are quite good and readable.

jeffrallen · on Feb 19, 2023

In order to check that document's detail level, I went and searched for "envelope". Understanding the difference between envelope To and the To header is critical to thinking about email routing. (Thank you Arnold, for teaching me that.)

peter_retief · on Feb 20, 2023

I have always wanted to create an email based forms application using subdomains as variables and identifiers. My early experience was with Lotus Notes which is in essence an email application. Maybe one day if I get the motivation...

Sai_ · on Feb 20, 2023

> _using subdomains as variables and identifiers._

Can you expand on this? I think formspree.io does some version of this, if I understand your ask correctly.

peter_retief · on Feb 20, 2023

I will have a look thanks. As for my vaporware, the first obvious subdomains would be for company->location->department->person. For variables I would use ->xxx->uuid->yyy to create indexed threads for high speed lookups. Just an idea, but I do love mail.

a-dub · on Feb 19, 2023

nicely done! especially liked the clear discussion on the nuances involving smtp envelopes vs. message headers and the concise discussions of extensions like spf, dkim and rbl.

kuharich · on Feb 20, 2023

Past comments: https://news.ycombinator.com/item?id=27086608

marcopicentini · on Feb 20, 2023

Great project. Why not improving the existing Wikipedia page for Internet, email etc..? Wikipedia is more likely to stay online forever and indexed better

KasparEtter · on Feb 20, 2023

This is a valid question. I'm the author of the blog and I've asked this myself in the past. The main reasons why I haven't done this so far are

- artistic freedom (regarding the form/structure, the style/tone, the interactivity),

- the process (I prefer not to have long discussions and unexpected revisions; but I must admit that I should collect actual experience in this regard),

- and the lack of attribution (part of it is certainly vanity, but I'd also like to make a living out of this at some point).

philistine · on Feb 20, 2023

The content is CC BY 4.0; in a sense it's not that you're not contributing to Wikipedia, you're simply at the very first step of delegating that task!

Oreko · on Feb 20, 2023

SCRAM seems incredibly insecure. Does anyone have information on how many servers support SCRAM?

KasparEtter · on Feb 21, 2023

> SCRAM seems incredibly insecure.

Why do you think so?

> Does anyone have information on how many servers support SCRAM?

I'd be interested in this as well. :-)

riffic · on Feb 19, 2023

what is a "first principle" and why does it matter here in an explanation of email?

owlglass · on Feb 19, 2023

It's an approach to reasoning and thinking used in philosophy and science [1]. It could be applied to any concept. The author's articles and those by Bartosz Ciechanowski [2] are good examples of first-principles explanations of various concepts.

[1] https://en.wikipedia.org/wiki/First_principle [2] https://ciechanow.ski/

macintux · on Feb 19, 2023

I once gave a talk about Erlang starting with the core idea that = is an assertion of truth and building (most of) the rest of the language from that. Found it a very rewarding approach.

schoen · on Feb 19, 2023

Is there any way for the public to see this talk or learn more about this?

macintux · on Feb 19, 2023

Here’s the Midwest.io talk. Shame the conference didn’t last.

https://youtu.be/E18shi1qIHU

benatkin · on Feb 20, 2023

That's why I don't like it when languages use = for key: value pairs like TOML. There seems to be a big meaning of "=" that people forget.

KasparEtter · on Feb 20, 2023

I chose the name of the blog to mean analytic, reductionist reasoning (in contrast to associative, vague reasoning). In the meantime, I also like the following framing, which I wanted to add to the front page of the blog for some time: https://www.cold-takes.com/minimal-trust-investigations/

billforsternz · on Feb 19, 2023

Informally, it means a methodical and well organized explanation starting from a top level overview then steadily increasing the level of detail until the subject has been fully explained, all while assuming as little prior knowledge as possible from the reader.

benatkin · on Feb 20, 2023

It's an Intellectual Dark Web buzzword favored by people like Lex Fridman.

riffic · on Feb 20, 2023

that's what it seems to be lol