Hacker News new | past | comments | ask | show | jobs | submit login
Internationalization-puzzles: Daily programming puzzles just like Advent of Code (i18n-puzzles.com)
75 points by birdculture 9 hours ago | hide | past | favorite | 4 comments





In the first puzzle, the first line is in Hungarian, but the Hungarian letters i18n usually struggles most with are not even there: üö űő, before utf-8 got widespread adoption these characters regulary got messed up when they were passed along multiple systems.

I still see basic accented characters like éá messed up sometimes, which is especially a shame in 2025.


Starts with I18N in SMS. There should be a trigger warning on those. SMS is dreadful in itself. In an international setting, it's a nightmare. But once people wonder why they costs exploded since they changed their welcome message with an accented character...

The programming challenge does not require interfacing with actual SMS or to know or use any part of the SMS protocol.

(from the first puzzle)

> The venerable SMS system uses a message limit of 160 bytes. This was designed so that a message could fit in exactly one packet, thus being really cheap and fast to handle on first-generation mobile phone networks. Although the approach makes sense for technical reasons, it unfairly penalizes people who use non-latin (e.g. Russian, Greek, Japanese) alphabets - in most encodings, they need more bytes per character than latin alphabets.

Except that obviously the system is going to use an encoding that makes sense for the local language. It was long remarked that Chinese Twitter users enjoyed a less restrictive limit. [Practically no limit at all, since as this puzzle notes Twitter limited by the character instead of the byte.]

You need two bytes per character in Chinese (unless you really want to use UTF-8).

    是她吗?                         -  8 bytes
    Is that her?                     - 12 bytes
    是的                             - 4 bytes
    Yes                              - 3 bytes
    我做了很多宝宝的表情包             - 22 bytes
    I made a lot of stickers of her  - 31 bytes
This doesn't look like a penalty to me. If we did switch the Chinese into UTF-8, it would take about as much space as the English.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: