Vint Cerf on mistakes he made in TCP/IP

em3rgent0rdr · on May 8, 2023

Cerf: "I'm serious, the decision to put a 32-bit address space on there was the result of a year's battle among a bunch of engineers who couldn't make up their minds about 32, 128 or variable length. And after a year of fighting I said -- I'm now at ARPA, I'm running the program, I'm paying for this stuff and using American tax dollars -- and I wanted some progress because we didn't know if this is going to work. So I said 32 bits, it is enough for an experiment, it is 4.3 billion terminations -- even the defense department doesn't need 4.3 billion of anything and it couldn't afford to buy 4.3 billion edge devices to do a test anyway. So at the time I thought we were doing a experiment to prove the technology and that if it worked we'd have an opportunity to do a production version of it. Well -- [laughter] -- it just escaped! -- it got out and people started to use it and then it became a commercial thing. So, this [IPv6] is the production attempt at making the network scalable. Only 30 years later."

[1] https://youtu.be/mZo69JQoLb8?t=815 "Google IPv6 Conference 2008: What will the IPv6 Internet look like?"

checkyoursudo · on May 8, 2023

This is a good lesson. I've done a lot of pilot studies, prototypes, proof-of-concept experiments. It is scary^1 how many times "let's just get this tested and see if it even works" turns into "welp, that worked and we don't have time or funding to do another one, so that's the final version." The constraint is often time. Gotta get a product out, or run an follow-on experiment that depended on the first, or submit a paper or funding application, or whatever.

Test the thing you want to test!

1. Not scary like fear, just like, I don't know, astonishing.

chrisfosterelli · on May 8, 2023

On the flip side, how many times do we hear about projects that failed because the dev team was stuck building things that were never needed or never used?

It's hard to see the future.

ajross · on May 8, 2023

That's it. The causality analysis in the grandparent is exactly backwards. We don't live in a world of "productized prototypes" because people are too lazy to do things right. We live in the world of prototypes because prototypes are the only products that reach market. Even where beautiful works of engineering exist, then tend to be beaten to the punch by the competing prototype anyway.

You don't treat this by whining that the prototypes are winning. You treat it by finding ways to cleanly evolve prototypes[1], and by collecting intuition (c.f. the linked article) about common mistakes and how to avoid them.

[1] Once upon a time in an age lost to history, this was the idea behind something called "agile". The thing we call "agile" today is pretty much the opposite.

derefr · on May 8, 2023

You can also design prototypes so that they're impossible to operationalize beyond experimental scale.

I forget the name of it (there are probably several by now), but there's a UX prototyping toolkit where all the components look hand-drawn, so that nobody expects to be able to interact with them and then gets mad when nothing happens. It also prevents you-the-designer from being tempted to add interactivity logic at design time.

There's nothing stopping someone from creating a programming language that works the same way — something intentionally constrained to e.g. run on an abstract machine that uses a 24-bit address space (and builds this into the design, using the other 40 bits of each pointer on 64-bit VM implementations for other important info) so that only prototype use-cases can be implemented and tested, while the system would inherently be unable to reach any kind of scale serving multiple concurrent customers.

While I don't think anyone's ever tried creating a runtime like this for general-purpose software prototyping, I know of at least one domain-specific example:

The PICO-8 (https://www.lexaloffle.com/pico-8.php) is a game runtime that's kind of like this, designed to put constraints on game development that resemble, but are orthogonal to, the kind of constraints imposed by old consoles like the Gameboy. (For example, it is programmed in high-level Lua, but has an 8192-lexeme source-code size limit, to achieve similar complexity constraints to old games that needed to fit their object code on 16k ROMs, while not forcing you to compile to/write in an assembly language, nor impacting source-code readability.)

QuercusMax · on May 9, 2023

I've done something like this when bootstrapping a system that would eventually have an API with proper admin tools, but we just need something quick-and-dirty to get off the ground.

So I wrote some scripts in bash to do things the "worst" way possible. Directly accessing the production DB, minimal safety checks, etc. Nobody else on the team liked writing bash, so there was no fight to replace these awful tools with proper APIs.

throwaway50603 · on May 9, 2023

But that's missing the point again! If you do that, you will never reach the release with your perfect app because you wasted that time and effort on making an unusable mockup and there might be no money left to do the actual app. Those who make a mockup that can actually be used release it and that becomes the business.

derefr · on May 9, 2023

Note this part of my statement: "It also prevents you-the-designer from being tempted to add interactivity logic at design time."

In practice, in many industries, prototyping tools are used as the first step in the design process. The constraints built into these prototyping tools force those using them to spend less — often orders of magnitude less — time, and effort, and money(!) prototyping, than they would if they allowed themselves to prototype with production-oriented tooling.

Consider painting. A painter — unless they're working off of a photographic reference — will almost always draw a pencil sketch of the scene they want to capture in paint, before they begin the actual painting. A pencil sketch has no need to consider color (or how to mix to achieve particular color effects), or light and shadow, or dimensionality (how real light reflects off of built-up paint on the canvas), or any of that. They just need to concentrate on proportion, perspective, anatomy, etc. The sketch focuses only on (a subset of) the design of the painting, while inhibiting any of the implementation details specific to the medium of paint, from being worked on. Which means the sketch only takes a few minutes, rather than days.

Once they have this sketch, they can show the sketch to the client who commissioned the painting (or to the master of the studio, if you're painting for gallery sale), and the client/"product owner" can point out places where the sketch does not align to their vision for the painting, which can be used to iterate on the design.

There are many other things to get right once the "real" painting starts, but if you don't get the "bones" of the thing right, the client won't want the painting. The sketch lets you evaluate just the "bones" of the painting before even considering the meat on those bones.

A prototyping tool for programming should be the same: something to let you consider the "bones" of business logic, preconditions/postconditions, etc., without the "meat" of the particular library APIs and data-structure juggling required to glue things together and achieve scalability in a particular language.

The best prototyping tools are ones that pare down your focus to the smallest useful kernel of design, and thereby allow you to iterate in near-realtime. An experienced painter will become skilled enough at making sketches, that they can sit down with a client and sketch in response to the client's words, changing the sketch "interactively" in response to the client.

In fact, some prototyping tools are streamlined enough to allow the client themselves to iterate and ideate on the sketch by themselves; and then only submit it to the productization process once they're happy with it!

Game-development example again: RPG Maker. As far as I can tell, RPG Maker as a software product was never really expected by its vendor to be used in the production of commercial games (although it has been repeatedly marketed that way.) Until very recent releases in the series, it was far too constrained for that — unless you entirely eschewed most of its engine [as most of the RPG Maker "walking simulators" like Yume Nikki do], it gives you a very static set of engines: battles that work exactly one way, menus that work exactly one way, etc. A tool that's actually for building role-playing games as commercial products, would have an almost-monomaniacal focus on letting you customize these systems to make your game distinctive; but RPG Maker is entirely the opposite. Rather, I believe that in its idiomatic use, RPG Maker has always been intended as a tool to allow a client to "sketch out" the narrative(!) "bones" of an RPG. That's why it gives you so many built-in assets to work with, but also why these assets are so generic, and also why ASCII/Enterbrain never created an "asset store", nor documented the asset formats: the assets are meant to act essentially as wireframe components. You're not meant to re-skin an RPG Maker game; you're meant to just use the generic assets to build a generic-looking game, because the look of the game isn't the point, any more than the look of a pencil sketch is the point.

p_l · on May 8, 2023

The prototyping toolkit you're thinking of is probably Balsamiq Mockups, which explicitly emulates making wireframes on paper. Other wireframing tools adopted the technique as well.

pmarreck · on May 8, 2023

Right. In that vein, it wasn't the 32 bit limitation that was the real problem. It was the inescapable 32 bit limitation built into the design that was the problem. They could have spent a few bits to version the packet format, for example, which would have at least provided an escape hatch. Or heck, a single bit: Value of 0, known original format. Value of 1, unknown format (or known future format, to be parsed by some future IP packet decoder, but which errors out old decoders).

And they repeated the same mistake with the IPv6 design by not making it an evolution of or backwards-compatible with IPv4.

Literally neither of these was built with scalability in mind.

gizmo686 · on May 8, 2023

The first 4 bits of both IPv4 and IPv6 packets are a version field.

cesarb · on May 9, 2023

> The first 4 bits of both IPv4 and IPv6 packets are a version field.

And in case it wasn't clear enough: it's called IPv4 because the value of that version field is 4, and it's called IPv6 because the value of that version field is 6.

GoblinSlayer · on May 10, 2023

TLS has version fields everywhere, so TLS 1.3 identifies as 1.2 and hides truth deep in extension fields, because anything else broke traffic analysis tools.

_the_inflator · on May 8, 2023

In hindsight. ;)

To me this sounds like the winner's bias: in hindsight.

It is hard to balance a prototype against a "works under all conditions no matter what they are" production-ready solution without growing into a hard to handle monster solution. On the other side I know off quite some overhyped apps, that rest on a overpowered machine, because the specs did not account for one thing: no one cares about the product.

Extensibility is the key to balance certain weaknesses in a spec.

Dalewyn · on May 8, 2023

An old Russian proverb goes: There is nothing more permanent than a temporary solution.

justinclift · on May 8, 2023

Yeah, I've heard the same thing in several contexts before too. Without any mention of nationality. :)

contingencies · on May 8, 2023

Nice. Added to https://github.com/globalcitizen/taoup

pmarreck · on May 8, 2023

oooh, it's on nix! done and done

toyg · on May 9, 2023

In Italy we have "temporarily permanent" laws. I wish I were kidding.

saghm · on May 9, 2023

If by doing things the way I need them to work now rather than the future lets me achieve the success, longevity, and ubiquity of TCP, I think I'm, fine with waiting until later to figure out things that don't matter right away.

rinka_singh · on May 8, 2023

:-P ChatGPT?

makeitdouble · on May 8, 2023

That's always the bind.

- If it fails: welp, we were right to not spend months on this discussion.

- If it wildly succeeds: welp, it's now too costly to change for so many users.

Having a mild and steady usage growing curve is a blessing that's too often overlooked.

2devnull · on May 8, 2023

Knowing this and being the person (or part of the team) to first implement a thing is the true thrill of programming. You, sometimes exclusively you, understand the weight and permanence of the minor decisions you make. This is why non-programmers often find programmers hard to understand, overly picky about minor details, strangely emotional about things they have a hard time verbalizing and so on.

jandrese · on May 8, 2023

It's always frustrating and amusing to read an anecdote about a design flaw in something where the creator goes "Yeah, we discovered that X was a bad idea, but there were already almost 10 people using it so it was too late to change."

amoss · on May 8, 2023

Tabs in makefiles is another classic.

kjs3 · on May 8, 2023

Except tabs in makefiles only require ones to acknowledge 'makefiles use tabs and here are the rules'. It's otherwise only something for pedants to obsess over.

IPV4 address contention has a bit more real-world impact.

pmarreck · on May 8, 2023

No one and I mean NO ONE had any clue how widely deployed IPv4 would end up being. It went literally beyond everyone's wildest expectations who was alive at the time of its inception.

kjs3 · on May 8, 2023

I don't think anything I wrote disagrees. As someone who started implementing IP networks just a couple of years after RFC791 was published, I'd say that was completely correct. But more than 40 years later, the fact that it has gone beyond everyone's wildest expectations means some of those early design decisions are starting to bite. Unlike tabs in makefiles, which is just something certain types of people like to complain about.

amoss · on May 9, 2023

> Unlike tabs in makefiles, which is just something certain types of people like to complain about.

http://catb.org/~esr/writings/taoup/html/ch15s04.html

Which "certain type" are Stuart Feldman and Steve Johnson?

agumonkey · on May 8, 2023

Maybe there should be a margin factor, just like in architecture. Build it for 5x the load, then 10x that.

em3rgent0rdr · on May 8, 2023

Well ARPA already had a margin factror...they had about 256 computers that needed addresses, and left space for 2^24 times 256, or 4,294,967,296 computers.

The growth ended up to be exponential over time. So maybe a better margin would be expressed exponentially along the lines of "build it for 2^(doubling_constant * number_of_years) the load."

agumonkey · on May 8, 2023

fair enough but cerf seemed to hint at the fact that 32bit was their lower estimate, I meant to multiply that number.

tzs · on May 9, 2023

For experiments it is often a good idea to purposefully limit them so that if they do escape they can't become too entrenched.

In this case probably the right number of address bits would have been 16. That would be big enough to to build test networks (real or simulated) with more hosts than were currently on ARPANET, but small enough that it would not take too long for ARPANET growth to hit the limit.

With 32 bits the running out of addresses problem is far enough in the future that someone choosing to go ahead and put the experiment in production is not making a problem for themselves. They are making a problem for whoever has their job long after they are retired.

With 16 bits the running out of addresses problem is soon enough that they can see that it might be something they might have to deal with.

em-bee · on May 8, 2023

Klingon software is not released. Klingon software escapes, leaving a bloody trail of design engineers and quality assurance testers in its wake.

mixmastamyk · on May 8, 2023

Never have gotten a good answer to this, “why is 64 bits not enough?” Here, why not an option?

It’s astronomically larger than 32, not double.

dagenix · on May 8, 2023

In practice, ipv6 often uses only 48-56 bits for global routing with 8-16 bits usable for subnet ids or routing within a customer site. The last 64 bits of the address are used to identify a particular device in a subnet. So, an ipv6 address is a lot bigger than an ipv4 address - but that is because those bits are doing more things than they are in ipv4.

mixmastamyk · on May 9, 2023

IP4 already has the concept of subnet routing in 32 bits. Let's say we add 1 more byte for country: 0 = Legacy, 1 = USA, … 127 = Vanuatu.

IP6 didn't add one byte, it added 12! I think 64bits is plenty (as mentioned astronomically larger than 32), and still looking for reasons to change my mind.

dagenix · on May 13, 2023

IPv4 subnetting and IPv6 subnetting don't really work the same way, though. With IPv4, there are (almost?) no ISPs that are handing out class A blocks to consumers. The vast majority of consumers will get a single IP address - and then they have to use a NAT setup to create subnets on their end.

IPv6 is _dramatically_ different. Its actually possible to get a /48 or a /56 - sometimes just for free as part of general operations. That leaves 8 or 16 bits for the customer to create hundreds or thousands of subnets. Unlike with IPv4 where most customers don't get more than one IP address, you don't have to use a NAT setup if you have a /48 or a /56. Even if you only get a /64, that still leaves 64 bits to give every device its own globally routable address without having to setup NAT.

Do we need 64 bits to identify individual devices in a subnet? IDK, maybe not. but if you are doing an apples-to-apples comparison, with IPv4 you have 32 bits for global routing. With IPv6 you have somewhere between 48 and 64 bits. The remaining 64-80 bits of the IPv6 address don't have a good analog in IPv4. So, in a lot of ways, IPv6 is a lot like taking IPv4 and expanding it to something between 48 and 64 bits and then the remaining bits you can think of basically almost like extra fields to encode information that IPv4 doesn't support.

JohnFen · on May 8, 2023

My answer to that is that there's no way to know in advance if 64 (or any other #) of bit is enough or not. Tech history is full of examples of arbitrary limits being defined thinking that there's no possible way anyone would need to exceed them, only to find that everyone needs to exceed them later on.

Is 64 bits enough? Maybe. Maybe not. But if you're wanting to future-proof, then setting an arbitrary limit is not the way to go.

mixmastamyk · on May 8, 2023

I'd agree, but the historical computing restrictions were usually hitting ceilings over 8, 12, or 16 bits when those things were very expensive.

32bits is just about large enough (within an order of magnitude sense) if the space was used more efficiently.

128bits I've heard described is like "every atom in the universe" big. If so, then 64 is probably enough for every atom on Earth.

Now I've just thought of another angle, similar to UUIDs. They are used because they can be assigned randomly without worry of collision. But I don't think IP6 addresses are being assigned randomly, hmm.

labcomputer · on May 9, 2023

> 32bits is just about large enough (within an order of magnitude sense) if the space was used more efficiently.

Well, not really. Between just the populations of the US, Europe and China (places with high levels of internet connectivity), you have over 2 billion people (this site claims over 5 billion internet users: https://www.statista.com/topics/1145/internet-usage-worldwid...).

> But I don't think IP6 addresses are being assigned randomly, hmm

That is, in fact, exactly how IPv6 addresses are assigned using SLACC with Prefix Delegation. Your ISP assigns you a prefix, and your computer randomly picks an address within it. You can also self-assign a (non-routable, like 10...*) prefix from the fc00::/7 ULA block by randomly filling in the remaining 57 bits to form a /64 subnet.

cesarb · on May 9, 2023

> Between just the populations of the US, Europe and China (places with high levels of internet connectivity), you have over 2 billion people

You have to consider the context: back then, multi-user computers were common. Each user didn't have their own computer; instead, they had a terminal to connect to a central computer. So a single computer would serve tens or hundreds of people, and as computers became more powerful, you could expect each computer to be able to serve even more people.

mixmastamyk · on May 9, 2023

Not really. Under ten billion would be plenty if used efficiently. A significant fraction of addresses we want to be private, and not directly routable.

Sure, if you want your refrigerator, oven, and dishwasher publicly addressable on the internet it isn't enough, but you don't actually want that.

Further, 64 bits is many orders of magnitude overkill already. So what does 128 bring to the party, besides making addresses harder to type?

Also, random with a prefix is not really random.

CaptainNegative · on May 9, 2023

Those estimates typically confuse the 10^100 upper bound on the number of atoms in the universe with 2^100. The 2^128 number of addresses in IPv6 is clearly more than the latter, but dwarfed by the former. There are roughly 10^40 or so atoms in the universe per IPv6 address; by mass that's approximately one address for each combined biomass of earth.

Earth mass divided by 2^64 is roughly 357 tons. There are roughly 2^33 humans on earth, so 2^64 is "only" a billion or two addresses per person; tha's far fewer addresses than the number of human cells out there.

pmarreck · on May 8, 2023

Heck, just adding another byte to it to make it 2^40 would have given us 1,099,511,627,776 or 1 trillion+ possible addresses.

labcomputer · on May 9, 2023

> “why is 64 bits not enough?” Here, why not an option?

Because then you don't really have enough bits to do some nice things:

1. It would be nice for your upstream network provider (and you) if they can delegate some network prefix and thus don't need to concern themselves with the address plan inside your subnet. That means, if there is an address collision on your prefix it's not their problem.

2. Assuming you have been delegated a network prefix, having at least 48 bits for within-subnet addressing greatly simplifies self-assigned addresses:

2.1 You can use the data link layer's address (the 48 bit MAC address for ethernet) which often has weak guarantees of uniqueness (though this isn't really advised today)

2.2 You can also randomly generate addresses or generate them by hashing things, and be reasonably confident of not colliding with another station.

2.2.1 Of course, you want to check that no other station is using the same address, but there's always the hidden node problem: How does station C cope with stations A and B both claiming address XYZ?

2.3 You can (if you want) still use DHCP to assign addresses if you want. But you don't have to.

3. It would be nice if each station was not limited to a single address.

3.1 Among other things, having multiple addresses vastly simplifies building a multi-homed network. For example, you can (today) sign up for two (or more!) ISPs that support IPv6 Prefix Delegation and have your router(s) issue Router Advertisements (RAs) for each prefix.

If a link goes down, your router(s) don't need to do anything or share any state... it just works (yes, really, I've tried it!). Each of your routers, of course, only routes packets for the prefix it was delegated by its upstream.

(BTW, IPv6 also has some neat RFCs for letting foreign hosts know about a prefix change, so you can even transfer existing open connections initiated from an address on prefix A through a different router with connectivity on prefix B)

3.2 For privacy, wouldn't it be nice if you could generate a new address for every external server you connect to? You can do that with the 64 bit subnet address space--remember, your ISP has no say about the address plan within your network.

4. With 128 bit addresses, IPv6 allows you to delegate a /64 to yourself from the ULA block (the IPv6 equivalent of 192.168.. or 10...). You can just randomly pick a 57 bit suffix to append to fc00::/7, and you can be pretty darn confident that nobody else will pick the same one. And you get all the same advantages of self-assigned addresses (mentioned above) within that prefix.

4.1 Having a unique* local prefix maybe doesn't sound like such a big deal until you've tried to merge two enterprise networks that both have hosts on 10..., and clients hard-coded to connect to those hosts.

Finally, think about how a 64 bit global address space would be split up:

We're running out of IPv4 addresses, even with* NAT (and "CGNAT", which is just regular NAT with a fancy name). Most of those endpoints are users, not servers--you need at least 30 bits to give one prefix to each household with a current internet connection. Between the expected growth of internet users and all current enterprise networks, you need well in excess of 40 bits for the prefix.

So you'd realistically get no more than /48 prefix at home (16 subnet bits). That's not enough to do the cool autoconfig and privacy things I mentioned. More likely, you'd get a /56 because 256 hosts "ought to be enough for anybody" and you can't do the cool stuff anyway.

patrakov · on May 9, 2023

> For example, you can (today) sign up for two (or more!) ISPs that support IPv6 Prefix Delegation and have your router(s) issue Router Advertisements (RAs) for each prefix.

As a user of OpenWrt (which does exactly this) I strongly disagree that it is the right solution. If you announce on the LAN the prefixes obtained from a fast fiber and a slow LTE link, then devices will, well, get an address for each prefix. The problem is that they have absolutely no information to choose the source address correctly - it's all just numbers! So, just by bad luck, they choose the source address from the LTE prefix and waste the LTE data and my money if I use the default setup. Which is why I don't. My network uses IPv6 NPT, but announcing one prefix at a time would have been even better (because I only want fail-over, not real multihoming), although impossible with OpenWrt.

Forum discussion, which also serves as a proof that it is really hard to explain the "don't overcomplicate fail-over by generalizing it to multi-homing" notion: https://forum.openwrt.org/t/ipv6-wan-fail-over-without-ipv6-...

mixmastamyk · on May 9, 2023

Interesting, but still looks like most of this could be done in 64 bits, or ~4+ billion Internets, one for each public household.

IPX had a lot of convenience features in a similar number of bits. (80, but 24 were wasted on manufacturer, so 56 useful bits)

A rotating address number will not provide privacy if the prefix attached to you is the same. Would be like saying a rotating port number would provide it, but not the case.

No organization will ever need a /64 (not even close, not even wastefully-that's 18 quintillion).

The slow uptake in IPv6 seems to imply that it's over-engineered and people don't care about these potential additional features. I'm a lifelong geek and can barely get interested. Network engineers are maybe .01% of the population.

I guess bits are only getting cheaper in the future so why not spill them incredibly wastefully at anything we can think of, is ultimately the answer. Although it doesn't answer why 64bits was not even an option.

hn_throawlles · on May 8, 2023

> even the defense department doesn't need 4.3 billion of anything

uhm. counterexample: they need more dollars than that

em3rgent0rdr · on May 8, 2023

Every bullet gets its own address.

agumonkey · on May 8, 2023

it's hard to make a bill communicate over copper wire though

account42 · on May 9, 2023

Not that hard really, smart cards are a thing and the chips themselves can be tiny.

couchand · on May 8, 2023

need?

hn_throawlles · on May 8, 2023

they are able to create the need.

cryptonector · on May 8, 2023

CNLP had variable-length addresses. I wonder what the world would look like with variable-length addresses.

p_l · on May 8, 2023

It had variable length addressing "up to 20 bytes", which IIRC was done so that it could easily accommodate different addressing systems into one. Unfortunately, variable length fields make for really annoying hardware implementations.

TUBA (replacement of IP with CLNP, running TCP and UDP on top) mandated use of max size addresses always, to make it simpler to decode, as part of "Internet Profile CLNP".

pdimitar · on May 9, 2023

A prominent example of: nothing more permanent than a temporary solution.

jiggawatts · on May 8, 2023

In my experience, the biggest issue with TCP is that it assumes that the traffic is a single continuous byte stream, whereas the majority of applications send messages.

Vint mentioned the lack of encryption as a mistake, but even on private networks, the 16-bit TCP checksums are too weak to protect against bit-flips. Many orgs noticed that enforcing encryption reduces "random" crashes and errors, because it also enforces a strong checksum, typically 128 or 256 bit. However, even a 32-bit checksum would have sufficed for ordinary use.

Almost all "real" protocols built on top of TCP eventually end up reinventing the same wheels: message boundaries, logical sub-streams with priorities, and strong checksums or encryption. Examples include Citrix's ICA, HTTP 3.0, gRPC, etc...

jandrese · on May 8, 2023

IMHO TCP made the correct call here. TCP is a transport, not an application protocol. You are supposed to run your protocol over top of it, including things like blocking, substreams, etc... It's easy to add features to a simple transport, it's not so easy to work around features when you don't want them, plus they end up complicating the stack and become a possible source of bugs. Even "simple" TCP proved to be quite a challenge to implement back in the 90s, with many many buggy stacks out on the Internet.

By being just a stream of bytes you are free to do whatever you want with the protocol. Sure that may mean implementing your own form of blocking but that's better than being stuck in a protocol where you are forced to work around the message oriented features even though you are just streaming data.

The checksum does hail from an earlier era where 16 bit machines were commonplace and the total amount of data sent is minimal. Also, there is some assumption that the underlying transport is going to have its own checksums. You will note that IPv6 ditches the checksum entirely.

pclmulqdq · on May 8, 2023

It sounds fine from a theoretical standpoint to use a pure byte stream rather than reliable messaging, which is what most applications actually want. However, messaging protocols can run byte streams at 0 overhead (you still need a message sequence number/tag, and this field turns into the same thing as the TCP seq num for a byte stream), while byte streams need some overhead to turn into messaging protocols, since there is otherwise no protocol-level delimiter between messages.

This seems like nearly nothing, but when you think about the lost compute power and network bandwidth throughout the history of TCP to this distinction, it's actually a very significant chunk of power, CO2, and wasted human life. Not to mention the lost human life that comes from HOL blocking.

The byte stream abstraction idea, I think, was a suspect decision at the time (born out of a desire to look exactly like a serial teletype port despite the fact that it's the wrong abstraction layer to use for that), and clearly wrong for a transport layer today. Almost everything built on top of TCP re-introduces messaging semantics.

Also, IP never had a message checksum, only a header checksum. The checksumming is at the TCP layer and it is still alive.

IshKebab · on May 8, 2023

Sure that's how people are forced to use TCP, but for performance reasons that's not how they want to use it.

IMO something like SCTP would have been better. You generally want multiple independent streams (to avoid head of line blocking), message and stream transmission, and optional reliability on a per-stream/message basis.

This sort of thing is very very common in real time games (e.g. enet). Actually enet is built on UDP, but if you compile it to WASM then the Emscripten UDP implementation uses the WebRTC data channel which uses SCTP!

klabb3 · on May 8, 2023

> By being just a stream of bytes you are free to do whatever you want with the protocol.

Not entirely. Head-of-line blocking is an inherent problem with TCP so you can’t have real priority “streams” like you can with eg QUIC. Another “missing feature” is datagrams, ie simply opting out of the retransmission part of tcp.

That said, I still agree with the general statement.

foobarian · on May 8, 2023

Didn't they build that weird "emergency packet" feature to skip to the head of line? IIRC it was never well supported in various system stacks..

sanxiyn · on May 9, 2023

Yes, TCP header had 16 bits allocated to "Urgent Pointer". It's (unfortunately) still there for compatibility, but RFC 9293, which updated RFC 793 after more than 40 years, has the following to say about it:

> As a result of implementation differences and middlebox interactions, new applications SHOULD NOT employ the TCP urgent mechanism. However, TCP implementations MUST still include support for the urgent mechanism.

LegionMammal978 · on May 9, 2023

As it happens, that feature is used in the FTP protocol (RFC 959) to abort the current file transfer (or to immediately print a status message), since some old implementations would have to stop listening on the control connection while a transfer was in progress and could only be woken up with an interrupt. At least vsftpd supports it, I haven't looked at any other implementations. The main difficulty is that the libc API for reading urgent data is somewhat painful, and most higher-level TCP libraries haven't bothered with it for obvious reasons.

Of course, basic ftp:// clients (the only kind that are really used today) would probably rather just close the control connection, which has the same effect.

jiggawatts · on May 8, 2023

> TCP is a transport, not an application protocol

It turns out that what people wanted was application protocols, not transports.

This is why HTTP(S) is basically the only protocol used inside and outside of the data centre these days. Even storage protocols are migrating towards HTTPS, with the public clouds using S3-like protocols on top of HTTPS even for virtual disks.

jandrese · on May 8, 2023

This is also because firewall administrators don't want to let anything through except DNS and HTTP. Everything runs on HTTPS because it's the only thing that won't leave your help desk constantly running traces to figure out where you're getting blocked.

It's ironic that in an effort to only allow web traffic on the network, these administrators have instead made it impossible to block anything else, because their actions forced all of those other application to disguise themselves as web traffic.

MichaelZuo · on May 8, 2023

Not in classified networks.

imtringued · on May 8, 2023

No they made the wrong decision and SCTP and QUIC fixed almost everything wrong with TCP.

I personally stopped using raw TCP because it is almost always the wrong solution. If I need a bare bones protocol where I want to send raw bytes I now always use websockets because they work exactly as they should.

Streaming is such a niche usecase that is only natural if you don't semantically interpret the data, i.e. arbitrary file transfers but the vast majority of the complexity isn't there.

> It's easy to add features to a simple transport, it's not so easy to work around features when you don't want them

No, in my experience, you will spend most of your time working around TCP warts.

jerf · on May 8, 2023

"No they made the wrong decision and SCTP and QUIC fixed almost everything wrong with TCP."

In the context of 2023, this is a defensible position. I can quibble, but it's defensible.

However, are you sure you want a 1974 take on a "message based protocol"? Trust me, you're not getting SCTP out of that, in a world where having two streams instead of just one is a noticeable resource impact. Heck, even by the 1990s there were still a lot of really bad things coming out of the world of "message based protocols".

You wouldn't be here in 2023 with a message-based TCP. You'd be here in 2023 with some other protocol having become the foundation of the Internet, and it wouldn't be "message based" either. There's no option where someone in 1974 is building with the nearly 50 years of experience that haven't happened yet, and I'm not going to grade TCP on that basis personally.

kjs3 · on May 8, 2023

No, they made TCP for a particular set of use cases, and UDP for a different set, and there's always plain IP for you to build whatever you want. If you were using TCP for the wrong use case, how is that Vint Cerfs (or anyone elses) fault?

zymhan · on May 8, 2023

> WebSocket is a computer communications protocol, providing full-duplex communication channels over a single TCP connection

It doesn't seem like you've actually moved away from using TCP.

Maybe it's just too low of a layer for your needs?

harha_ · on May 8, 2023

This is also how I see it, even though I've only done relatively simple work with plain old TCP. It's just a stream of bytes, you then interpret it however your protocol on top of TCP requires.

throw0101b · on May 8, 2023

> In my experience, the biggest issue with TCP is that it assumes that the traffic is a single continuous byte stream, whereas the majority of applications send messages.

TCP and IP used to be one thing, and they split off TCP in what we now call Layer 4 (where UDP, and others, also exist). So they did think ahead in separating things to a certain extent.

The fact that we have a whole bunch of middle-boxes that don't allow DCCP, SCTP, etc, is hardly the fault of Cerf et al.

zubnix · on May 8, 2023

Http 3.0 is UDP, ICA these days is also UDP.

klabb3 · on May 8, 2023

Technically true, but parent’s point still stands (even if it should have said “on top of IP”).

Byte streams are not ideal to deal with for protocols. As soon as you have encryption, like tls, you have discrete messages anyway (records) and the stream is largely an illusion.

That said, I still like TCP as a compromise between simplicity and versatility. It’s been holding up incredibly well for a huge variety of use cases.

jayd16 · on May 8, 2023

Yes, but by necessity and only after http/2 attempted everything listed over TCP.

Animats · on May 8, 2023

XNS, which predates TCP, had 96-bit addresses. "IDP uses Ethernet's 48-bit address as the basis for its own network addressing, generally using the machine's MAC address as the primary unique identifier. To this is added another 48-bit address section provided by the networking equipment; 32 bits are provided by routers to identify the network number in the internetwork, and another 16 bits define a socket number for service selection within a single host." Think of the network number as corresponding to today's autonomous system number. This was Xerox's approach to scaling. It could have worked.

Routinely looking up things in hashes was not common back then. Hashing was computationally expensive when routers ran on little machines with well under 1 MIPS. A huge flat address space was not feasible with memory costs at the time. Nor was keeping huge routing tables in sync well understood. There had to be some cheap way to make routing decisions based on the IP address.

tssva · on May 8, 2023

The fixed 48-bit network field of XNS and derivative protocols such as IPX and VIP limited scalability since it didn't allow for hierarchical routing where as the subnet mask of IP effectively provides a variable length network field and greater scaling through hierarchical routing.

bawolff · on May 8, 2023

> Before public-key cryptography came around, key distribution was a really messy manual process,” Cerf says. “It was awful, and it didn’t scale. So that’s why I didn’t try to push that into the Internet.

I dont really think this is a mistake. Its not like there are great options even now for end2end security on the transport layer (like we have vpns but they mostly require having preshared secrets). It would have been pretty impossible back in the day and added a lot of friction

tptacek · on May 8, 2023

Look at the first round of IPSEC work --- a decade and a half or so later, in the late 1990s --- to see what a disaster native cryptography would have been for IPv4.

unwind · on May 8, 2023

I was going to post that quote too, it was really fun (and kind of jarring in a good way) to see someone use "the Internet" and really not [1] be another marketing/management/whatever person who just doesn't know the difference between the Internet and the web, so on. Loved that!

[1]: I mean like really not. At all.

zackmorris · on May 8, 2023

It's easy to poke holes in any protocol, so I usually just stick to conceptual flaws (oversights that are obvious to someone who hasn't seen the implementation). These are the biggest for me:

* TCP should have been a layer above UDP, not beside it. Or UDP and the urgent/out-of-band option in TCP should have been equivalent. This would prevent the blocking of UDP in corporate networks and countries trying to stop P2P networks.

* Both the network address and checksum should have been variable-length.

* NAT should not exist, because variable-length addresses would have negated most of its usefulness (other than for censorship).

* TCP should have been designed for networks with high latency (minutes/hours/days) to be ready for use in space. Optimistic delivery (is this the word?) should have been used instead of handshakes, but this might have required encryption from the start, to prevent sending sensitive information to the wrong recipient before it's verified.

* Address negotiation should have been built-in, so that peers IDs stay connected if the network changes, regardless of their IP addresses. TCP is a connected protocol (unlike UDP which is connectionless) so this was never really considered, but connected protocols simply don't work on the mobile web without yak shaving or embedding the stream in a tunnel that handles reconnection.

These issues are all severe enough that we probably shouldn't be using TCP directly. I know that they would haunt me had I designed it. It would be nice if the web provided a WebSocket that wasn't terrible, that handled everything mentioned above. Also I wonder if we scrapped all NAT workarounds, what it would take to provide something mathematically equivalent to direct connections, perhaps with homomorphic encryption, through open matchmaking servers kind of like Tor exit nodes.

Edit: I forget to add why these discussions are important. There's a tendency today to drink the Kool-Aid and assume that standards are perfect, when in reality they are often highly-opinionated, which creates a heavy burden on people who think differently. Flawed standards are a form of injustice.

blep_ · on May 8, 2023

I can't decide how I feel about NAT. It's a ridiculous hack that shouldn't exist, but it also probably had the side effect of saving us from a world where ISPs charge per internet-connected device, because there's no obvious way to do that in a world where they have no visibility into your LAN routing.

zackmorris · on May 9, 2023

Ya I was perhaps a bit harsh.

NAT is a great idea in principle to connect networks in a scalable way. There are various UDP hole punching techniques that (depending on the devices) can be used repeatedly to get through multiple layers of NAT. Vs something like UPnP, which from what I understand, has poison pills which prevent it from communicating past 1 layer of NAT. So some of the features I complained about were probably engineered intentionally through a great deal of effort, and I might have even supported those efforts at the time. They just didn't know the negative effects those solutions would have on the open internet for stuff like networked games and P2P file sharing (which the status quo just loves).

manv1 · on May 8, 2023

1974 was a long time ago. 32 bits was a big number. The 8080 had 8 bit words and ran at 2Mhz.

For people today, it's hard to understand how expensive everything was back then. 1K of RAM in 1974 cost about $307 dollars. Yeah, you're not going to be putting that into your router. So those extra bits have a super-high hard dollar cost.

We're lucky he didn't go to 16 bits.

COGlory · on May 9, 2023

Is that $307 1974 dollars, or $307 today dollars?

mannyv · on May 9, 2023

I believe it's 307 in today's dollar (inflation adjusted)

mncharity · on May 8, 2023

Around 1990, IETF had a big fight over where to go from "not enough bits" IPv4. Should we IPvNext with an incremental "mostly just add more bits"? Or attempt a "second-system syndrome" "let's punt this time on rough consensus and working code" kitchen-sink monstrosity of a... err, IPv6. Followed by years and years of "ok, IPv6 was arguably the wrong call, but here we are now, things are turning around, and it will replace IPv4 RSN".

I don't recall what Vint Cerf's position was. But attributing the last few decades of IP dysfunction to 1970's choices seems... incomplete.

commandersaki · on May 9, 2023

> I don't recall what Vint Cerf's position was. But attributing the last few decades of IP dysfunction to 1970's choices seems... incomplete.

Yeah, I don't really think Cerf's decision really applies to the blunder that is IPv6. There were other options available to the IPng team that were foregone for SIPP knowing there was absolutely no transition/migration plan whereas the others had at least given this area some consideration.

I'm sort of expecting in year 40 of this attempt to transition to IPv6, we'll collectively realise it's a losing battle and look at future Internet architectures as salvation.

helsinkiandrew · on May 8, 2023

> “I thought 32 bits ought to be enough for Internet addresses”. “I didn’t pay enough attention to security”. “I didn’t really appreciate the implications of the World Wide Web.”

Most experiments, prototypes, and even full blown systems designed for the long term die well before 30-40 years and those extra addresses and features would ever be needed.

Much more human effort is wasted prematurely 'future proofing' systems and trying to predict the future. The brilliance of TCP/IP is its simplicity - if it had got more complex perhaps it would have been replaced in its entirety. We could be using Gopher over OpenDECNet.

hughesjj · on May 8, 2023

So many network admins still prefer to configure smaller networks with IPv4 instead of ipv6 due to simplicity, then just nat peer it out

Hell even AWS didn't start seriously supporting it until the mid 2010s... And heck I don't know if they support ipv6 across the board or not

cryptonector · on May 8, 2023

> “I didn’t pay enough attention to security.”

In the late 70s / early 80s there was a ton that the public didn't yet know about cryptography. Few, if any private citizens could have gotten security remotely right at that point. Remember, Diffie-Hellman was invented in 1976, and RSA in 1977, and PKI came some years after that, and DES was also from 1976/1977. As well the public's understanding of the need to do authenticated encryption came later, and the industry didn't really start catching on until the 90s, and AEAD cipher modes didn't really become a thing till the 00s, and didn't start to get used widely for another decade, and so on and on.

ksec · on May 8, 2023

Had TCP/IP been 64bit to begin with things would be a lot different.

bushbaba · on May 8, 2023

I really wish we just did ipv4 on 64 but addresses. IPv6 is nearly a totally different paradigm, and has lead to adoption pain. Biggest being usability taking a back seat. People are being forced into using ipv6 not cause they want to, but from ip space exhaustion.

elcritch · on May 8, 2023

I used to think that, until I actually started using IPv6. It's a much more well designed protocol. The way switches and routers announce prefixes and network information using well defined UDP multicast is elegant and more robust. You can use link-local IP to always connect to a device on a local network.

The biggest benefit for most users is having a large enough IP space that devices can self-assign a unique address with the networks prefix with very low risk of collision. Many of the problems seen on wifi or home networks are due to various ARP issues and devices not properly releasing their own DHCP addresses. It's still very common. I had a work laptop get booted off my network after 10 minutes for a while until eventually figuring out another device was stealing its assigned IP.

merlyn · on May 8, 2023

IPv4 today is a totally different paradigm than what IPv4 was in the 1980's and early 1990's.

IPng (IPv6) was designed when the original paradigm existed, not what we have now-a-days.

So, if we actually had been on track to do IPv6 in the decade it was defined, we could have gone a totally different way. Instead due to the rapid growth of the Internet in the mid 90's, IPv4 was morphed into something different and IPv6 had its own model until we started to need it again.

astrange · on May 8, 2023

IPv6 itself isn't the right thing for mobile internet either; people expect their IP address to not change when they migrate towers, meaning the entire approach to routing doesn't work.

hughesjj · on May 8, 2023

Isn't it like literally a data structure issue though? Like you have to route on prefixes? Like how else would you know where to go?

Ip routing is kind of like a tree (not a graph) in the ip regards, how would you get around it?

I admittedly don't know as much as I'd like about network, hence why I'm posing this question ;-)

If theres a way to efficiently model hierarchical domains for stuff like DNS/IP i'd love to know. Like, consistent hash rings maybe, but then we'd be reintroducing ring topologies...

Oh wait duh though. A cell tower is owned by a single company. Yeah that's prolly doable, the company can always buy beefier routers if they need the memory. And realistically speaking the virtual tree could be closely aligned to the physical one given how cell towers work.

somat · on May 8, 2023

The way it is done is to have an invisible layer underneath (mpls perhaps) that can do the routing.

However the correct ip way to do it would to have tcp(layer 4) decoupled from ip(layer 3) such that an ip address could change while maintaining the tcp stream. unfortunately this was not done, tcp has too many layer violations and is firmly interlocked with ip. I don't know much about the state of the art, does http over udp (quic) allow for ip address change?

wastedpotential · on May 8, 2023

Yes. Packets contain a connection identifier, so the connection can continue were it left off when the client switches ip.

ly3xqhl8g9 · on May 8, 2023

In some ideal world IP values vary between 0 and 999 and you just add a new block if you run out of them, `999.999.999.999` would be the last standard IP, simply add another `.000` block and you get some more billions IPs.

Having letters in IPv6 feels wrong, at that point why not just have words, reinvent DNS at a lower level. You are already living on Cherry Picked Street, number 15, let your computer have the IP `cherry.picked.15.1`.

bauruine · on May 8, 2023

An IPv4 address on the wire is just a 32 Bit integer in big endian. The quad-dotted notation is the most common for IPv4 but you could also use hex or decimal. There was an article on HN just recently about that. I can't find it right now but try ping 0x7F000001 to get an idea. You don't even need a new protocol you could implement something like what three words or your 999.999.999 naming scheme locally and use it on the internet right now! No need to replace any routers or have anyone else upgrade.

suprjami · on May 8, 2023

It's the human method of interacting with that integer which is the problem.

Nobody is going to start saying IPv4 addresses as hex, people aren't used to base 16. "Just set your gateway to co.a8.0.fe" said nobody ever.

A 128-bit dotted quad would be equally unwieldy. Again "The IPv6 local range is 252.000.000.000.000.000.000.000.000.000.000.000.000.000.000.000/7" said nobody ever.

Honestly 128 bits was too much. They should have just done 64 and stuck to dotted quad and ARP. That would probably have got more acceptance than what we have now.

mixmastamyk · on May 8, 2023

64bits in hex would be reasonably compact, 1.5x the length of a MAC address.

ikiris · on May 9, 2023

Sure, we could even call those network subnets that way we've got plenty of space for all the internet and expansion since 32 wasn't enough, and have another 64 bits for host identifiers.

This feels familiar somehow...

samwillis · on May 8, 2023

So basically just "what three words" but for IP. Just don't let a VC backed company own the address space and then encourage public/government organisation to adopt it!

ly3xqhl8g9 · on May 8, 2023

Funny idea, wanted to write some quip on why it's bad, given the inherent polysemy, but someone already did it [1].

[1] "What’s wrong with what3words?", Chapter 3, "offends.people.easily", covers it, https://www.youtube.com/watch?v=SqK0ciE0rto

badrabbit · on May 8, 2023

Or... hear me out: variable length, ideally with support for ascii or other encodings allowing human readable addresses ( no dns!) and routing done using top level addresses.

"AAA.CA" becomes 4143.414141, routers will find the best route for 4143 and anycast ti 414141.<more specific address>, the apex AAA in this case would also be the ASN equivalent for bgp type routing and the TLD CA would also be the PKI certificate authority used to validate both routing updates and address ownership by the applications on either end. No port numbers either, the full address should describe the layer4+ addresses, so the full address user types instead of https://AAA.CA:8443, it becomes HTTPS.8443..AAA.CA (.. meaning anycast to the closest endpoint).

A good protocol is the easy part, the hard part is getting all the big networking companies and their engineers to agree on something.

jandrese · on May 8, 2023

The hard part is handling variable length addresses fast in hardware. Historically there has always been a sizeable speed penalty for making hardware deal with variable length anything.

badrabbit · on May 8, 2023

That's why you make addressing hierarchial. With IP it was one address, 16bitx2 addressing will be fixed for the purpose of routing in that it either looks at the top level 16bit and decides the route to choose or for core routers, the destination keeps the other end of the address space in the routing table,again 16x2 on core/transit and just 16 for LAN/Host routing (higher order 16bit addresses each get their own RT, scaling as needed). Routing becomes a fast two step process ("i have a table for that, then let me lookup the route" or "I don't have the table, default route or drop"). For that small cost of additional RT lookup and memory you get faster lookups and comparisons (16 vs 32bit) and you can get small routers that support as little as 65k routes cheal for small sites and stack/scale that to meet demand. So R1 gets the packet, looks at the first 16bit and decides which R3 has the RT for that and forwards it, R3 looks at only the second 16bit due to it's role in the stack and forwards it elsewhere. Internet routers will do the most work since they will do multiple (but predictable and limited since it scales exponentially) lookups then downstream looks up less and less until the actual application gets it. 16bit to support utf16.

Arbitrary length but fixed hierarchial address partitioning.

senectus1 · on May 8, 2023

i dont think so really.

He made the right choice. the world will adapt, eventually.

I'd be willing to bet IPv6 wont be the last network upheaval we'll have as a civilization. (assuming we dont wipe ourselves out first)

What do you think will happen if we generate some really really big AI, and they realize that they'll be super reliable if we made every synapse have its own IP, then we network all the AI's together as each AI to behave as its own synapse node :-D

IPv256?

Quekid5 · on May 8, 2023

I imagine a bigger problem with your hypothetical might be the sheer mass of such a thing. Plus, you underestimate just how mind-bogglingly huge 2^64 is. (Granted, the IPv6 address space is going to very quite sparsely populated, but still.)

I mean, the estimate on the number of atoms in the observable universe is "only" 10^80, for comparison.

m348e912 · on May 8, 2023

For the time 32bit was more than adequate, not to mention routers only had so much memory and processing power.

elihu · on May 8, 2023

Funny enough, one of the arguments for longer key lengths now is to save memory. With IPV4 the way it is now, the address space is very fragmented so you need a lot of router table entries. More available address space means more contiguous address blocks and therefore fewer router table entries.

That wasn't an issue back when IPv4 was being defined though.

convolvatron · on May 8, 2023

The issue here isn’t the size of the address, but the way it’s allocated and managed. Since the provider block routing scheme was carried forward while cloth - realistically the most were going to get is a deprecation of the really tiny blocks - ‘the swamp’.

em3rgent0rdr · on May 8, 2023

It is possible 64-bits might have discouraged adopters who didn't want to waste expensive unnecessary extra bits at the time, so TCP/IP might not have taken off.

kragen · on May 8, 2023

the ipv4+tcpv4 header is minimally 320 bits long so i don't think so

jandrese · on May 8, 2023

Routers don't care about the TCP header though, they only care about the 20 bytes of IP header. The interfaces on the router also care about the 14 byte MAC header, but that's a separate step.

That said I agree that the world likely would have sucked it up and just gone with the 64 bit addresses, but there would have been a whole lot of grumbling for decades about the memory use. It's hard to imagine these days, but back then memory was outrageously expensive and enormous amounts of engineering went into minimizing memory use wherever possible. Having these addresses where we wouldn't even touch half of the bits for decades would have been unpopular with a lot of people, even if he would have been praised for being so forward thinking in the end.

On the other hand, the transition to IPv6 does give us a chance to fix some of the longstanding defects in IPv4 that would otherwise be baked into the protocol until the end of time, so it's not all bad.

FullyFunctional · on May 8, 2023

"Having these addresses where we wouldn't even touch half of the bits for decades would have been unpopular with a lot of people, even if he would have been praised for being so forward thinking in the end."

Worse perhaps, people, algorithms, hardware, ... will all start to assume those bits are zero and start optimizing for that. We have seen this happen repeatedly throughout history (eg. Lisp pointers would stuff tags in the "spare" bits in addresses, making implementations non-portable).

These are all hard problems, but the lesson I take away is to

1. Always have a plan for evolving the design/protocol/API/...

2. Aim for a watertight design that supports 1. (Example: TLS 1.3 puts random stuff into reserved fields so nobody can assume they are zero).

3. Really stretch your imagination about future usages (assume 50+ years)

4. Have a ridiculously comprehensive test suite for both clients and servers

It seems clear that in terms of lifetimes, hardware gets used much longer than expected, software even more so (Y2K anyone), but protocols/formats/specs/... literally never dies

kragen · on May 9, 2023

we're talking about decisions that were made in 01979 and finalized by 01981

https://datatracker.ietf.org/doc/html/rfc791

there was no such thing as a router in the sense of dedicated hardware; a router ('gateway') was just a host with two or more interfaces

the ram used by ipv4 addresses was the same kind of ram used by the rest of the ip header and indeed packet

i mean originally you did have the imps but they were just looking at the arpanet node number; they weren't gateways

the switch from variable-length addresses to 32-bit addresses happened between ien 80 in february 01979 https://www.rfc-editor.org/ien/ien80.pdf and ien 111 in august 01979 https://www.rfc-editor.org/ien/ien111.txt

cisco was founded five years later, in 01984, at which point regular ram cost on the order of a dollar a kibibyte, so the content-addressable memory you need to accelerate a router was maybe ten or twenty dollars a kibibyte

would people complain about the ip headers having unnecessary junk in them? certainly, but there is plenty of that in the actually adopted protocol header design too, so clearly it wasn't a showstopper

imtringued · on May 8, 2023

We live in a NAT and CGNAT world. Routers do care about the TCP header.

detaro · on May 8, 2023

now. Not at the time were "adopting TCP or not" was a question.

kortilla · on May 8, 2023

This is not how routers look at it though. Routing is performed on the destination IP address, which is only 32 bits.

Doubling from 32 to 64 would have had a significant impact on routing table size and it would have dropped the back pressure to keep it from growing too quickly for TCAMs.

It would certainly be nice today, but it’s not a sure thing it wouldn’t have been killed off as “bloated” when there weren’t even 2^12 hosts.

oh_sigh · on May 8, 2023

Things also could have been a lot different if IPv6 was just IPv4 with 128 bit addresses.

jandrese · on May 8, 2023

I see this sentiment from time to time, but it doesn't make sense to me. Any way you do it you need to deploy new hardware, new firewall rules, new end user applications, etc... What parts of IPv6 are harder now than they would be with just a fat IPv4? Is it really so complicated now that you can't fragment packets on path? Is Neighbor Discovery Protocol really so much more difficult to understand than Address Resolution Protocol? Do you yearn for DHCP and NAT?

kortilla · on May 8, 2023

Administrators memorized IPs a lot. This is hard to under estimate and is frequently the disconnect I see when a SWE doesn’t understand why people could possibly complain about ipv6.

This is likely the #1 issue I hear from people in the field.

#2, for better or worse, NAT became peoples’ comfort blanket. It’s a boneheaded stateful firewall regardless of how long people can scream “NAT isn’t security”. By forcing people to take both changes at the same time, adoption was setup for failure. So many places in the US turned it off “because nothing I use is v6 only” and they weren’t quite sure if they were exposing their end hosts accidentally.

#3, link local addresses, guids, etc all assigned at the same time is a steep learning curve. Which will be used for which flows in the network? If my ISP changes my prefixes, do I know have to reconfigure all of my local network firewalls?

#4, prefix delegation, should each client behind my router get its own /64? My ISP only gives me 128 of them to hand out which is a problem for IOT. Guidance here is weak.

Burning all of the early adopters with the slaac vs dhcpv6 vs privacy extensions didn’t help either.

Nobody gives a shit about fragmentation. The upgrade UX was a disaster.

jandrese · on May 8, 2023

#1 seems pretty unavoidable. Yes, bigger numbers are harder to remember, but we're in this mess in the first place because we made the numbers too small. It's kind of like complaining that phone numbers are too hard to memorize after we switched to 10 digit dialing. That may be true, but since we all have address books built into the phones now it's really not an issue.

#2 is people being nervous, but seriously the "DENY incoming on $WANIF if not in state table" rule is all you need. This always struck me as "I refuse to learn anything new, no matter how little I need to learn".

#3 Again, people worried about the firewalls when they really don't need to be.

#4 A /64 is a subnet. Normally all of your devices would be on the same subnet, although with IPv6 you get a whole bunch which is nice. You can put the IoT devices on a different subnet that is partitioned from the rest of your network because they are IoT devices which means they are adorable little security vulnerabilities.

I think your last point is a good one. DHCP6 was poorly named. People thought it would work like DHCP did in IPv4 but it's really not intended for that. It's really only meant for routers, not for end hosts.

You missed a couple of points where I think IPv6 did have some flaws. SLAAC originally lacked a lot of the extensions that DHCP offered, like DNS server advertisement. In theory you could anycast your DNS queries, but DNS is a dodgy enough protocol already and this was really not a great idea. It's one of those things that works great in the lab, but is kind of a nightmare in the real world. It also offers no way to communicate back to a DNS server what IP address you have chosen, so if two hosts want to communicate with each other, especially if they are not in the same subnet, it becomes difficult to determine what their partner's address is. All in all the IPv6 committee considered DNS to be an application protocol and outside of their scope, but application developers consider it part of the network and so it got left out in the cold.

Using the MAC address to create the IPv6 address was also a bad idea from a big data standpoint. Luckily that was one of many options so it was easy to switch without breaking anything.

kortilla · on May 9, 2023

> #2 is people being nervous, but seriously the "DENY incoming on $WANIF if not in state table" rule is all you need.

This didn’t exist. Many of the early implementations of “support” for ipv6 was a checkbox that said “enable”. These cheap routers (which is what the majority of people without deep pockets are on) rarely even gave you an explicit stateful firewall UI.

If you got a “firewall” option at all, it was to block stuff from getting out of your network, not in. Know why? Because “NAT already did that”. If you wanted inbound unsolicited traffic you used port forwards or the “DMZ”.

It’s not people being nervous, it’s the vast majority of network vendors not making the UX any good. I pushed people to try v6 hard, it was not the rosy transition it was supposed to be. v6 for a while ended up being a great accidental exfil path for malware for administrators that screwed this up. Don’t try to downplay it, it just makes you sound like an armchair quarterback, not smart.

> #4 A /64 is a subnet. Normally all of your devices would be on the same subnet, although with IPv6 you get a whole bunch which is nice.

A /60 is also a subnet, so is a /127. Subnetting is just breaking up larger IP spaces.

Anyway, you missed the point of the comment. What I was getting at is that doing prefix delegation for home users for anything more than a /64 is immensely wasteful. Yet a bunch of very large ISPs do just that.

Want to guess why? Once again, bone-headed implementations in off the shelf routers that do give a /64 per WiFi client. Untold waste in the v6 space because addressing guidance was (and still is) so poor.

jandrese · on May 9, 2023

If your firewall can do NAT (and spoiler alert: it can) then it has stateful logic built in. Maybe some of the really early home stuff didn't support it on IPv6, but that's long in the past. If you're worried about IPv6 using to exfil data then you're well beyond a great many administrators in locking down the network. Usually malware just exfils on IPv4 HTTPS because that's already allowed, and it's only avoided in places where the deep packet inspection services only run on IPv4.

This is where you see a lot of corporate pushback on IPv6. The Deep Packet Inspection vendors have been very slow to adopt so corporate policy is often just to block all IPv6 period.

I won't argue against a lot of home routers having egregiously bad firewall configuration UIs though.

In IPv6 a /64 is special. It's the smallest network you are supposed to allocate. Some people think it is the equivalent of an IPv4 single address, but this isn't quite right. It is still a full subnet. It is better thought of as the IPv4 /24 behind a single NAT address. Home administrators are expected to put all of their hosts on it. Assigning a /64 to each client is not supposed to be a typical use case, and would be mostly to avoid having the clients inter-communicate. The problem however isn't wasted space, it's just that you'll exhaust your typical home /56 too quickly.

I can see you're concerned about running out of IPv6 addresses due to excessive waste, but we've not even scratched the surface on them. The address space is mindbogglingly huge. A lot of the optimizations we have to do to save space with IPv4 are simply not relevant in IPv6.

garbagecoder · on May 8, 2023

Just about every thread on HN has a debate between the optimal and why can't people just get it and the practical and why they can't get it to be optimal.

If you can't implement it easy enough for it to be optimal once in place, it's not optimal. Whatever the advantages of ipv6 are, it wasn't enough to suck everyone in. You have to consider your environment. And re-training sysadmins is part of it and just telling them "lol it's easy" isn't going to do much to get them there.

By the time we get fully switched over, I wonder if we won't be pining for whatever comes next.

WorldMaker · on May 8, 2023

A lot of the trade-offs that people like to complain about IPv6 were specifically made to "rip the (backwards compatibility hack) band-aid off" and help make sure that we won't be pining for "whatever comes next" anytime soon.

IPv6 could have been a lot more pragmatically backwards compatible, absolutely. It would have been much more doomed as a temporary solution in that case. The IPv6 we got was designed to be a more permanent solution, which makes it feel much less pragmatic. That's somewhat how trade-offs work.

iainmerrick · on May 8, 2023

Administrators memorized IPs a lot.

Right - I’m not a network admin, but back a few decades when I was hanging around the CS lab and setting up home networks, I sure had to type in IP addresses by hand a lot. If they were more than four bytes long that would have been painful!

amoss · on May 8, 2023

People hate kludges and backward compatibility, but IPv4.1 with 5-byte addresses would have been the dominant standard by now.

XorNot · on May 8, 2023

No, it wouldn't have been. Where are you going to put that extra byte? How is an intermediate box going to know what to do with it? How is a routing ASIC going to handle it?

This is thinking which appeals to people who don't understand the problem because it feels like a compromise, but you can't negotiate with silicon. If it expects a 4 byte address payload, then that's all it will ever handle.

Adding more bytes is exactly the same amount of redesign and replacement work as adding more. If the problem was solely software related, it would've been done by now.

amoss · on May 8, 2023

Nah, hardware people just don't negotiate hard enough.

Add a one-byte option into the header, give it an unassigned value (e.g. 6) and assume that any box that does not understand it will pass it through unchanged. Use it as the most significant byte of the address, and obsolete all routing equipment that doesn't understand it from the backbone so that they can be used to handle routing in one of the new 32-bit super-A class networks.

Sure it's ugly as hell, but it is just backward compatible enough to work.

cesarb · on May 8, 2023

> Add a one-byte option into the header, give it an unassigned value (e.g. 6) and assume that any box that does not understand it will pass it through unchanged.

As many have found out, unfortunately "any box that does not understand it will pass it through unchanged" is frequently false. What often happens is that a box which does not understand it silently drops the whole packet, under the mistaken impression that it's an attempt to invade the network.

> Use it as the most significant byte of the address, and obsolete all routing equipment that doesn't understand it from the backbone

You'd have to not only obsolete all routing equipment, but also all end hosts. Consider what happens when host A, which does not understand the option, receives a packet from host B, which has an address which requires the option; the reply packets from host A will never reach host B, since host A doesn't know it has to add the option.

These "just add another byte" solutions are often proposed in these discussions, and look deceptively simple at a first glance, but fall apart once you start looking at it in detail.

XorNot · on May 8, 2023

"just upgrade every backbone router to handle it" - you've just recreated the same problem.

Routers don't run packet addresses through software, they run then through silicon ASICs which know the address is 32 bits, that's it.

Your solution is literally how Cisco implemented IPv6 on a bunch of it's switches: the IPv6 routing path goes through the firmware and is slow as hell compared to the pure ASIC IPv4 handling. It didn't catch on because it was terrible at any scale (we were building Hadoop clusters, 100+ big data nodes with constrained throughout is money down the drain).

amoss · on May 8, 2023

So how many backbone routers are 20 years old?

Your claim was that any change to the ASIC was the same design cost. But really? A 40-bit routing table design, where there is a 32-bit fall-back to preserve the top 8-bits vs a 128-bit routing table design. Why would these be the same cost?

XorNot · on May 8, 2023

How much cost in labor, disruption and time do you think upgrading every backbone router would be?

When would you have done it? When the hardware was 5 years old? 10 years old? I've been on change operations where the hardware was over a decade old and our biggest concern was making sure we had replacement PSUs on hand since the thermal shock of a reboot might take them out.

This is the problem, you have absolutely no idea what anything costs or how it works - you're simply thinking "lifting those bits sounds heavy, what if we made them lighter?" as though a committee of engineers, designers and industry didn't spend a lot of time considering exactly that problem.

If you could add 1 byte in software and be done with it, it would have already happened.

amoss · on May 8, 2023

Upgrading hardware is an ongoing process. You are proposing a single point of change without any reason for it to be handled that way.

You are arguing that all change has the same cost. I don't see your reasoning for this. A smaller change to a system has a lower cost in R&D, and a lower cost in manufacture.

Although it is probably true that the cost of updating all equipment dominates the cost of redesign, this is only an issue for a big bang update rather than a rolling upgrade.

ikiris · on May 8, 2023

This shows a basic lack of understanding as to how any of this works.

ikiris · on May 8, 2023

If there existed a magic wand to make longer addresses work with existing stuff sure, but since there didn't, you have all the exact same problems as ipv6 which is already basically the same thing as ipv4 anyway with 128 bit addressing, and some minor tweaks to how the "arp" works.

If you can articulate how this magic wand would work in practice we're all ears since you solved a worldwide problem, but I'm betting my salary that you can't.

ikiris · on May 8, 2023

It basically is.

alfons_foobar · on May 8, 2023

ikiris · on May 8, 2023

Yes...

1letterunixname · on May 8, 2023

I don't see all of the criticisms for security, but there are some fair points.

0. Gating mechanisms should live at lower and higher levels in the stack. We have 801.1x for wifi/port security and TLS between client and server.

1. DHCPv4 (and conflation with BOOTP) is messy and doesn't have a trust model. There ought to be a mechanism to validate that the DHCP server authentic, its data is valid, and it's supposed to be there. (Rogue DHCP servers can be thwarted with participation by the network gear, but requires more engineering than client simply being able to ignore bogus DHCP servers. Networks still must police and shutdown rogue endpoints nonetheless.)

2. DHCPv6 (stateful -SLAAC, stateless +SLAAC, DHCPv6-PD), SLAAC, and M and O flags make it more painful to deploy than it ought to be.

3. DNS DNSSEC/DANE hasn't really caught on (yet?). DNS needs a top-down trust model or at least make it user-friendly enough (tools) to deploy.

4. EBGP security. (Let's not talk about that.)

In general:

i. Optional features don't get used.

ii. Complexity is a barrier to adoption and a source of security issues.

iii. Over-simplicity is another source of security issues and a source of lacking functionality.

tptacek · on May 8, 2023

Or, DNS doesn't need any security model at all, because the rest of the stack is designed to assume it's not secure, which is a model that has worked for the last 30 years.

ralferoo · on May 10, 2023

One thing that might have worked better is if the concept of separate address and port had been scrapped, and there was just a 64-bit "service" instead.

Naive early implementations could have been nearly the same, just using the top 32-bits to route to a host, and let the host handle all lower 32-bits (effectively a big port number). But later revisions could have split it into e.g. /48 for host and 16-bits for service selection, or take to an extreme maybe even /56 for host and 8-bits for service.

The stupid thing is that the vast majority of IPv4 addresses are only used for a handful of public services, almost always HTTP and HTTPS. So even though we have extreme pressure on the 32-bits of IP address, the 16-bits of port number is mostly wasted and underused.

ralferoo · on May 10, 2023

Maybe bad form to follow up on your own post, but just thinking about this more...

At the time, before IPv4 was introduced, IP addressing was 8-bit, and this was mostly a single machine per site. Moving to a 64-bit service address, early implementations could have kept their 8-bit address and had 56-bits for service ID, most of which could just be 0 and unused.

Some sites might then have gradually move to /8 for site and /16 for site+machine and 48-bits for service ID, most of which would still be unused.

Later on, we might see bigger networks taking a /8, /16 for network+site, and /24 for network+site+machine, etc...

Up to the present-day, we could easily see how /48 for network+customer, /56 for network+customer+machine and 8-bits for service would still satisfy almost every need. Given that IPv6 /48 are being given out like candy and a /40 is trivial to get for a larger organisation, there's not really a massive difference.

So, that largely covers inbound services, but there's also outgoing connections to consider. Given that IPv4/IPv6 already has to consider address+port on both sides to identify a connection, we could still have something like a /56 for network+customer+machine leaving 8-bits for discriminating between connections to a /64 remote service.

It's possibly interesting to consider that this is actually kind of how the original IP stack works - a connection is just the pair of 32-bit addresses plus the protocol and then UDP and TCP bodge in some extra port numbers as extra data within their protocol-specific data. If we'd instead had an IPv5 that extended the addresses to a pair of 64-bit addresses plus protocol, that TCPv5 and UDPv5 might not have even needed port numbers. This would then have shifted the burden of supporting TCP and UDP from the kernel (because it needs to be there for IPv4 to check access control to different ports) to potentially allowing user-space protocols to be implemented because the OS can assign the socket its 64-bit address and wouldn't really need to care what happens to it after that.

garbagecoder · on May 8, 2023

I don't know, I think he did a good job. We're just using the tech, longer than almost anything else has survived. It's always good to catalog mistakes and learn from them, but I think there's probably more to learn about what he got right to make it that durable.

tomcam · on May 9, 2023

What’s amazing is not TCP/IP’s warts, but how well conceived it was. His humility overshadows the fact this has been a magnificently flexible and robust technology. It’s one of the most impressive creations of the 21st century.

captainkrtek · on May 8, 2023

#4: IP Fragmentation

cyberax · on May 8, 2023

To be precise, MTU.

The MTU should have been reified to the IP layer. And it would have been so easy!

When a system sends a packet, it sets the "forward MTU" field in the IP packet to be equal to the MTU of the outgoing link. Each router along the way inspects the packet and if its outgoing link has a lower MTU, it replaces the "forward MTU" with its own.

Once the packet reaches the destination, the target host copies the "forward MTU" field (which now has the lowest MTU seen along the path) to the "reflected MTU" and sets it on the next reply packet.

So the originating system can discover the limiting MTU by the time it gets the ACK packet.

Sigh. Instead, we have a Ruby Goldberg machine with fragmentation, ICMP and all the related unreliable crap that has forever cemented max MTU as 1500.

unnah · on May 8, 2023

That would work only as long as the route does not change over time, and does not depend on which direction the packet is going. They probably did not want to make that assumption in the protocol, even if it turned out to hold most of the time in practice.

cyberax · on May 8, 2023

This process should happen continuously, with every packet sent. Then the intermediary router just needs to chop the packet to its MTU and let it travel to the destination.

The destination will detect that the packet has been chopped, send the updated reflected MTU and everything would be fine. Of course, the minimal allowed MTU (say, 512 bytes) should provide enough information for the target to find the correct TCP flow to which the packet belongs.

No need for out-of-band ICMP signalling or fragmentation.

And in practice, topology changes happen rarely, so this should be very infrequent.

cryptonector · on May 8, 2023

Explicit congestion control notification could also have been done this way.

Retrofitting features like this into the middleboxen takes decades, and never finishes, so you either get them in on day zero or you never get them.

p1mrx · on May 8, 2023

It probably would've been better for routers to set a "truncated" bit and chop off the rest of the packet.

UDP-based applications would have to do MTU discovery, but that's much easier when you're receiving truncated packets.

dabiged · on May 8, 2023

Related, I was going to say lack of growth in packet size.

I work with magnetic tape and have seen tape block sizes grow from ~512 bytes in the early 1990s, to 10s MB today, resulting in huge performance improvements. Network packet sizes haven't changed by anything near that amount in 30 years.

clouddrover · on May 8, 2023

> “I didn’t really appreciate the implications of the World Wide Web.”

That's not a mistake in TCP/IP.

astrange · on May 8, 2023

QUIC doesn't run on TCP for this reason; it wants to avoid the head of line blocking issues.

The earlier attempts to replace it like SCTP didn't work out because too much middleware blocks it.

peterfirefly · on May 8, 2023

Is IPv6 on his list?

zoobab · on May 8, 2023

"I didn’t pay enough attention to security"

Crypto is too hard, it should be offloaded to AI.

AnimalMuppet · on May 8, 2023

An AI smart enough to build secure crypto is also smart enough to backdoor it.