The XML parsing/validation bugs are, I suppose, not shocking, but deeply disappointing.
The one thing XML & its tooling were supposed to get right was document well-formed-ness. Sure, it might be a mess of a standard in other ways, but at least we could agree what a parser should and shouldn’t accept! (Not the case for the HTML tag soup of then or now.)
That, 25 years on, a popular XML processor can’t even meet that low bar for tag names is maddening.
1) Don't rely on two parsers having identical behaviour for security. Yes parsers for the same format should behave the same, but bugs happen, so don't design a system where small differences result in such a catastrophic bug. If you absolutely have to do this, at least use the same parser on both ends.
2) Don't allow layering violations. All content of XML documents is required to be valid in the configured character encoding. That means layer 1 of your decoder should be converting a byte stream into a character stream, and layers 2+ should not even have the opportunity to mess up decoding a character. Efficiency is not a justification, because you can use compile-time techniques to generate the exact same code as if you combined all layers into one. This has the added benefit that it removes edge-cases (if there is one place where bytes are decoded into characters, then you can't get a bug where that decoding is only broken in tag names, and so your test coverage is automatically better).
3) Don't transparently download and install stuff without user interaction, regardless of where it comes from!
4) Revoke certificates for old compromised versions of an installer so that downgrade attacks are not possible.
> 4) Revoke certificates for old compromised versions of an installer so that downgrade attacks are not possible.
I suggest the following alternative: When your own software is triggering the upgrade process, don't allow triggering an upgrade to an older version of the software.
In other words: If a user wants to downgrade, they will have to do the work of running the installer for the older version (and possibly uninstalling the newer version first).
This modified behavior addresses the problem mentioned in the article (a newer version of software running the installer for an older version), but still gives users the power to install an older version if they want.
Not entirely clear to me that would be sufficient a mitigation on this case: the endpoint could claim Zoom version 999 is served and serve the old exe and cab which then would be run, possibly before other checks can even be done.
> 3) Don't transparently download and install stuff without user interaction, regardless of where it comes from!
This is an interesting one. I totally get your point. But also users are terrible about updating their software if you give them the choice. Automatic updates have very practical security benefits. I've witnessed non-technical folks hit that "remind me later" button for years.
Unfortunately, the problem here is programmers moreso than formats. It literally doesn't matter what you specify, programmers will not implement it to a T. Most programmers simply don't know that every single detail matters. Many of those who may have some idea don't really care, since they can't imagine how something like this could happen.
It's not just XML. It's every ecosystem I've ever used. Push it around the edges and you will find things.
This is neat, not because it is special to JSON in particular but because it's an example of examining a good chunk of a large ecosystem: https://seriot.ch/projects/parsing_json.html Consider this is likely to be true in any ecosystem that doesn't make it a top priority to avoid.
I disagree. The way the format is designed has a direct effect on how likely implementors are to implement it correctly. So the format designers bear some responsibility.
For example how many Protobuf parser libraries have security bugs? I'm guessing very few because the standard is nice and simple, and it's very clearly defined without much "it's probably like this" wiggle room (much easier for binary formats!).
XML had a ton of unnecessary complexity that could have been avoided to make implementations simpler. I haven't actually read this bug so let's see if it was one of:
* Closing tags having to repeat the name / two different ways of closing tags.
* CDATA
* Namespaces (especially how they are defined)
* &entities;
Edit: Ha it wasn't any of those - but it was still an issue with text based formats. Seems like Expat assumes the content is valid UTF-8 (and doesn't validate it), while Gloox assumes it is ASCII. Obviously this couldn't have happened with binary formats.
If you care about security DON'T USE TEXT FORMATS!
XML is a bad text based format. It doesn't know if it wants to be human readable or computer readable so it does both poorly (if you think this vuln is bad, check out some of the saml vulns).
I wouldn't blame xml's silliness on text based formats in general, even if they are full of risks.
If you care about security, verify your goddamn invariants.
This is not a software problem. This is a lazy programmer/software engineer problem. Electrical Engineering, or hell, any matyre engineering field understands this concept.
If you have mot read your entire codepath, you have no idea what it is you are doing.
Welcome to why my life as a QA is effing miserable. Every bit of ignorance by devs following the philosophy of "abstraction is good" is dealt with at the level of Software BoM audit.
There is a difference between not writing bugs, and checking your invariants.
If you have not read implementation code you are dependent on, you by sefinition have not had the signal that raises that invariant violation into your consciousness.
It would be like a civil engineer building a bridge out of limestone at a thickness that would require a larger thickness of steel and just saying "to hell with it, go figure it out".
The write-only programmer is a threat to themselves and everyone around them. And to be frank, even more dangerous are those members of management who have their expectations around implementation time so skewed by this cavalier attitude toward knowing the dynamics of your stack.
You will make bugs. Crossed invariants are completely preventable though.
> If you care about security, verify your goddamn invariants.
While it would be nice to be able to do this, sadly we don't have infinite resources, lest we be okay with actually shipping software in 5-10 years instead of 1-2. I know that I would be okay with such a world, but people who pay my salary might not share that point of view. Nor do the people who would have to choose an app to use in the near future, instead of waiting for a decade to do so.
> This is not a software problem. This is a lazy programmer/software engineer problem. Electrical Engineering, or hell, any matyre engineering field understands this concept.
The thing is, that the majority of the development out there is like the Wild West. If my code throws a NullPointerException or a NullReferenceException, then someone is going to be mildly annoyed and it might result in a Jira issue to fix. Code failing in a variety of ways is almost considered normal in some respects, outside of specific (expensive) contexts, where correctness matters a lot.
Admittedly, even in programming there are fields where the stakes are higher, though writing code for planes (as an example) is wildly different than what 90% of people out there would call "programming". Personally, I'd like 100% test coverage (lines, code branches, everything), but outside of these high stakes environments it would be wasteful to do so.
> If you have mot read your entire codepath, you have no idea what it is you are doing.
For many out there, this is pretty much impossible to do in a meaningful way. Let's use something like the Spring framework, a popular option in Java for web dev, a stack that has a rather high level of abstraction. In it, the actual code path that you're dealing with would involve your application code, the framework code (which is likely many times longer than your actual application, uses reflection and other complex mechanisms, overall being truly Eldritch at times), any integrated libraries, as well as the JVM and some other code on your actual system, that interfaces with the JVM.
Even if you toss out Java from the stack, the actual hot code path in any non-trivial piece of software will be pretty difficult to reason about, due to different types of linking, different external package versions etc. Unless you feel okay with very, very slowly stepping through everything with a debugger, which probably still won't give you too good of an idea of what's actually happening and what should have happened.
Though maybe traversing 20 layers of abstraction in Spring and coming out of that debugging session more confused than you were than when you entered it is just a Java/Spring thing, who knows.
> Welcome to why my life as a QA is effing miserable. Every bit of ignorance by devs following the philosophy of "abstraction is good" is dealt with at the level of Software BoM audit.
All hail being able to pay rent by delivering sub-optimal software to meet ever changing business demands in an environment where nobody wants to pay for perfect software. That's simply the world we live in, take it or leave it (e.g. pursue whichever environment feels better to you, within the bounds of your opportunities in life).
And thus we come back to the age old quandry. The implicit act of economic violence tucked away into our current society.
I have capital, you don't, do what I want, or starve.
It always comes back to violence.
This is why we waste so much time reinventing things and white labelling, and subjecting other professions to the most ungodly tooling. We mass-produce suffering. We engineer it into the product in the form of lack of care under the guide of "we're innovating guyz".
Ehh, that's a somewhat grim view, though I doubt I can offer many valuable points about the wider nature of capitalism as a whole.
That said, what I can say is that there definitely is a wide spectrum of different circumstances that people are dealing with and therefore the level of care that certain things will get will also vary.
For example, would it be cool to spend 2 decades working on the perfect GUI framework that'd be tested, dependable, performant and would also have exceedingly good usability? Sure. Is that going to happen in our current world? Perhaps not.
But hey, starting out with a bit of pushback and selling the concept of TDD or quality gates is a start as well, or even having proper tests for all of the important business logic, whilst willingly ignoring (putting off) the things that are just infeasible.
While I'm not defending the screw-up here - it's bad - it does do it a slight injustice to omit that the issue was not something simplistic around ascii/utf8 parsing but rather failing to reject/escape malformed-UTF8 strings. Unicode handling even in actual programming language implementations is an extremely common and well-documented problem.
I think it's worth remembering that XML parsing is also a big historic source of bugs which suggests to me that while it may look simple and well formed on the surface it's probably a lot harder than it looks.
Could you give examples? There were plenty of problems with certain standards layered atop of XML or self-made implementations of XML parsers and unparsers [1], but there is also a well tested set of standard compliant XML libraries that avoid those issues.
[1]: An internationally known consulting firm, that I won't name, had (perhaps has) an internal tool that compiles an Excel description of a service interface into actual XML parsing code that accepts only one hard-coded namespace alias for each given namespace. Over the years I've come across multiple companies with that bug in some service. Everytime I looked into it, the reason was the same internal tool of that consulting firm. And I've met multiple times people who had already discovered that same thing.
I have the same question as the sibling commenter: are you sure you mean parsing (i.e. well-formedness) and not handling (i.e. logic to do things with the parsed data: e.g. xxe, namespace separation, etc.
Obviously all software has some bugs and I'm sure XML parsers are no exception but I haven't been personally aware of any high profile ones before this.
For a quick example of a lowish-level XML bug that isn't parsing-related, I reported a bug many years ago in a piece of software whereby attributes without curie prefixes were being placed into the wrong namespace. A weird quirk of the XML spec is that unprefixed tags go into the default namespace but unprefixed attributes go into a "NULL" namespace (or, if I recall correctly, sometimes a specific namespace depending on the tag?). That's not a parser bug though since the parser has parsed the tag, attributes and associated prefix strings (or lack thereof) correctly: it just does something wrong post-parsing.
I feel like that class of bug is very common with XML, but it's more of an application stability concern than a security one (XXE being a notable exception just because it deals with IO)
IMO the best response to this kind of analysis is to humbly realize that any of us, working under real-world pressures, could make such a screw-up, and contemplate how we'll remain vigilant and mitigate the damage that comes from our inevitable screw-ups.
Assuming properly-created data, yes. You aren't immune to problems but you will reduce them, especially in a memory-safe language.
Unfortunately, in a security context, that is not only not guaranteed, but will be actively attacked, so in practice I'm not sure it buys you that much from a security perspective. A net positive, I think, but certainly not enough that you ca metaphorically kick back and enjoy your lemonade.
The binary format is one of the oldest of security vulnerabilities, by simply claiming a length of larger than the buffer allocated in the C program, though I'm inclined to credit that particular joy to C and not the data itself. Nowadays there aren't many languages where simply claiming to be really long will get you anywhere like that.
More generally, if you want to include a block of untrustworthy structured data in a protocol, it’s very much preferable to do so in a way that does not require inspecting the data in question to figure out where it ends and thus where the outer protocol resumes.
English is not immune. Think about “who’s on first” — there is no way to distinguish the untrustworthy name “who” from a grammatical part of the conversation.
Sure if you like ingesting 4GB records. There is nothing inherently safer in binary formats. It's easy to write parsers that can handle properly formatted files, it is when you're dealing with corrupt or misformed files that everything gets complicated.
> There is nothing inherently safer in binary formats.
Sure there is. Barring a pathologically bad wire format design, they’re easier to parse than an equivalent human editable encoding.
Eliminating the human-editing ability requirement also enables us to:
- Avoid introducing character encoding — a huge problem space just on its own — into the list of things that all parsers must get right.
- Define non-malleable encodings; in other words, ensure that there exists only one valid encoding for any valid message, eliminating parser bugs that emerge around handling (or not) multiple different ways to encode the same thing.
Define non-malleable encodings; in other words, ensure that there exists only one valid encoding for any valid message, eliminating parser bugs that emerge around handling (or not) multiple different ways to encode the same thing.
I've said similar things to this before. E.g. if you want a boolean, there's nothing simpler and less error-prone than a single bit. It represents exactly the values you need; nothing more and nothing less. You could take a byte if you didn't want to pack, and use the "0 is false, nonzero is true" convention, which is naturally usable in a lot of programming languages; that way there are 256 different values, but the set of inputs is still small and finite with each one having a defined interpretation.
Which would be a lot easier to catch by bounds checks in the language / data types used / sanitizers / fuzzers / static analysis than cases like this where you can have two implementations seemingly successfully parse the data but disagree on the result.
Programmers respond to their incentives. Like most security bugs, this one happened because someone was dumb enough to use C for something connected to the internet. But the reason programmers do that is because of a culture that rewards fast and insecure more than slightly less fast and correct.
It appears that Gloox, a relative low-level XMPP-client C library, rolled much of its Unicode and XML parsing itself, which made such vulnerabilities more likely. There maybe good reasons to not re-use existing modules and rely on external libraries, especially if you target constraint low-end embedded devices, but you should always be aware of the drawbacks. And the Zoom client typically does not run on those.
One of the harder things with XMPP is that it is a badly-formed document up until the connection is closed. You need a SAX-style/event-based parser to handle it. That makes rolling your own understandable in some cases (e.g. dotnet's System.Xml couldn't do this prior to XLinq).
That being said, as you indicated Gloox is C-based, and the reference implementation of SAX is in C. There is no excuse.
Not only that, but before the TLS session starts you have to handle an invalid XML document (the starttls mechanism start encrypting stuff right in the middle of the initial XML document).
Also some XML constructs are not valid in XMPP (like comments)
I think rolling out your own XML parser for XMPP is a fairly reasonable thing to do. In the past at least, many, if not most, implementations had their own parser (often a fork of a proper XML parser). What is more surprising to me is why would they choose XMPP for their proprietary stuff. I don't think they want to interroperate or federate with anything?
(if I remember correctly and if it hasn't changed compared to many years ago, when I looked at that stuff.)
> One of the harder things with XMPP is that it is a badly-formed document up until the connection is closed. You need a SAX-style/event-based parser to handle it.
That is a common misconception, although I am not sure of its origin. I know plenty of XMPP implementations that use an XML pull parser.
Smack uses an XML pull parser and non-blocking I/O. It does so by splitting the XMPP stream top-level elements first and only feeding complete elements to the pull parser.
I find that response a bit strange, since the whole reason the Zoom client has these particular vulnerabilities is because they didn’t roll their own, and instead rely on layers of broken libraries.
It’s quite possible they’d have more bugs without doing that, but re-using existing modules could just as easily have been an even worse idea.
Using what everyone and their dog is using is prone to bugs just as much because software without bugs doesn't exist or is not very useful, but it also has the benefit of many versatile eyeballs looking at it in many different contexts.
So if there's a bug found and fixed in libxml2 which is used by almost everything else, everyone else instantly benefits. Same with libicu which is being used, for example, by NodeJS with its huge deployments footprint. Oh, and every freakin' Webkit-based browser out there.
OTOH, they rolled their own, so all bugs they hit are confined only to zoom, and are only guaranteed to get Zoom all the bad press.
If they roll their own it also becomes less interesting to actively exploit.
Obviously this doesn’t really work for Zoom any more, since their footprint is too large, but it can stop driveby attackers in other situations. Nobody is going to expend too much effort figuring out joe schmuck’s homegrown solution, where they’d happily run a known exploit against the unpatched wordpress server.
I think the point is that Unicode and XML parsing are known to be security critical components and you should take care that they are handled only by well tested code designed specifically for the purpose. You need to not roll your own and also ensure that any third party components didn’t roll their own.
I get your confusion. But keep in mind that it is not only about just picking the library that shows as first result of your Google search. My naive self thinks that a million dollar company should do some research and evaluate different options when choosing external codebase to build their flagship product on. There a dozens of XMPP libraries, and they picked the one that does not seem to delegate XML and Unicode handling to other libraries, which should raise a flag.
I think that's a false dichotomy; IMO the best default choice is to rely on the most well-tested library in any given category. That suggests to me that they should have used expat on the client side.
IMO we should use external libraries, and should invest engineering time on the library rather than just take a library. Not using good third party library means you need to invest at least a few engineer-month in it to get the same result, and you will need to invest a lot more to do better than third party library. Instead, you can take the library and invest a few engineer month to improve the opensource library.
Why? If anything, the client does the more reasonable interpretation of the XML-in-malformed-UTF-8 - skipping to the next valid UTF-8 sequence start. It's the server that has really weird behavior for their UTF-8 handling where it somehow special cases multi-byte UTF-8 sequences but then does not handle invalid ones.
This is a very common issue across all of software engineering I've found. But I really don't get why. If I was given the task of parsing Unicode or XML, I'd run and find a library as fast as possible, because that sounds terrible and tedious, and I'd rather do literally anything else!
This is another lesson that you should always parse+serialize rather that just validate. It is much harder to smuggle data this way to exploit different parsers.
Basically the set of all messages that will satisfy your validator is far larger than the set of all messages that will be produced by your serializer.
Or, it's another lesson that you should not completely trust any code but compartmentalize instead. Thanks to Qubes OS, I am still safe, since Zoom is running in a hardware-virtualized VM.
How is that helpful? This exploit completely replaces the Zoom software with arbitrary attacker software and it executes in your VM that has access to camera, microphone, network, and presumably screen recording. It sounds to me like the highest possible level of access and your VM is just performative.
The real lesson is not to use Zoom. Anyone who does deserves everything they get. There have been so so many red flags that using Zoom will leak your data to 3rd parties (often in china) and compromise your security that people using it now must simply not care if it happens. So no surprise, it's happened yet again, and you can bet it will again and again in the future.
There are other options besides Zoom. They are different from Zoom, each with their own strengths and weaknesses, but they don't have example after example showing total incompetence and/or malicious intent the way Zoom does.
I am not sure this applies in this case. I don't know how Zoom's XMPP backend works, but it could very well parse and serialize and still be vulnerable. If the xml library accepts invalid 3-byte utf8 characters on parse, then its internal representation supports these characters, and I don't see why they would not be serialized just as well.
Having multiple, potentially different parsers is incredibly dangerous. One person used the fact that different plist parsers in the macOS kernel choked in different ways when interpreting malformed xml, leading some to believe the plist was "safe" because it did not grant certain permissions, while others trusted this "safe" plist but believed it did grant these permissions.
I didn’t even consider the existence of XMPP vulns until I listened to the Darknet Diaries episode about Kik[0]. It’s a really interesting class of vulnerabilities.
How do you do that? On any OS I tried (Debian, Windows) it always *forces* me to download the standalone client, otherwise I can't join. There's no alternative link ("Join via web") like MS Teams has for example.
I really feel uncomfortable each time I have to install the client on a machine for my relatives :/
I've always been able to use the in-browser client, but you have to download the client once or twice before the page will update to show the alternative "use browser". It's definitely an intentional dark pattern.
I actually started boycotting Zoom meetings where I can. If anyone sends me a zoom invitation and I know that they are not forced by having to be available for larger audiences I suggest them to use basically anything else.
I don't know why, but from the first time I visited their website until today, I have the feeling I can't trust the company.
After you click "download Zoom client" the button will turn into a "use Web app". You don't even need to download anything if you cancel the system dialog asking you where to save the download. However I still find this UX pattern incredibly deceptive. People and companies seriously need to stop using Zoom
At some point we are going to need enforceable professional standards that effectively deal with commercial software publishers who choose to parse untrusted inputs in non-performance-sensitive contexts with C libraries.
Since most software users are not tech-savvy and care about convenience and price significantly more than they care about security (revealed preference), the "worse is better" phenomenon incentivizes commercial developers to implement the minimum security practices that their customers will bear. This is individually rational for the developers and the users, but the result is untold billions of dollars of costs costs. Regulation would be one way to change the incentives.
The one thing XML & its tooling were supposed to get right was document well-formed-ness. Sure, it might be a mess of a standard in other ways, but at least we could agree what a parser should and shouldn’t accept! (Not the case for the HTML tag soup of then or now.)
That, 25 years on, a popular XML processor can’t even meet that low bar for tag names is maddening.