Hacker News new | past | comments | ask | show | jobs | submit login
Zoom: Remote Code Execution with XMPP Stanza Smuggling (chromium.org)
231 points by Flowdalic on May 24, 2022 | hide | past | favorite | 90 comments



The XML parsing/validation bugs are, I suppose, not shocking, but deeply disappointing.

The one thing XML & its tooling were supposed to get right was document well-formed-ness. Sure, it might be a mess of a standard in other ways, but at least we could agree what a parser should and shouldn’t accept! (Not the case for the HTML tag soup of then or now.)

That, 25 years on, a popular XML processor can’t even meet that low bar for tag names is maddening.


There are just so many issues here.

1) Don't rely on two parsers having identical behaviour for security. Yes parsers for the same format should behave the same, but bugs happen, so don't design a system where small differences result in such a catastrophic bug. If you absolutely have to do this, at least use the same parser on both ends.

2) Don't allow layering violations. All content of XML documents is required to be valid in the configured character encoding. That means layer 1 of your decoder should be converting a byte stream into a character stream, and layers 2+ should not even have the opportunity to mess up decoding a character. Efficiency is not a justification, because you can use compile-time techniques to generate the exact same code as if you combined all layers into one. This has the added benefit that it removes edge-cases (if there is one place where bytes are decoded into characters, then you can't get a bug where that decoding is only broken in tag names, and so your test coverage is automatically better).

3) Don't transparently download and install stuff without user interaction, regardless of where it comes from!

4) Revoke certificates for old compromised versions of an installer so that downgrade attacks are not possible.


> Revoke certificates for old compromised versions of an installer so that downgrade attacks are not possible.

Worth noting that Windows accepts signatures from revoked code signing certificates so long as it has a signed timestamped before the revocation.


….and I assume the revocation can’t be back-dated?


timestamps must come from a globally recognized signed source, like digicert or verisign.


The CA could backdate the CRL’s revocation timestamp if they wanted, but it seems unlikely and presumably it’s not allowed.


> 4) Revoke certificates for old compromised versions of an installer so that downgrade attacks are not possible.

I suggest the following alternative: When your own software is triggering the upgrade process, don't allow triggering an upgrade to an older version of the software.

In other words: If a user wants to downgrade, they will have to do the work of running the installer for the older version (and possibly uninstalling the newer version first).

This modified behavior addresses the problem mentioned in the article (a newer version of software running the installer for an older version), but still gives users the power to install an older version if they want.


Not entirely clear to me that would be sufficient a mitigation on this case: the endpoint could claim Zoom version 999 is served and serve the old exe and cab which then would be run, possibly before other checks can even be done.


> 3) Don't transparently download and install stuff without user interaction, regardless of where it comes from!

This is an interesting one. I totally get your point. But also users are terrible about updating their software if you give them the choice. Automatic updates have very practical security benefits. I've witnessed non-technical folks hit that "remind me later" button for years.


> I've witnessed non-technical folks hit that "remind me later" button for years.

Doesn't that then become their problem and responsibility then?


> I've witnessed non-technical folks hit that "remind me later" button for years.

Maybe take the hint and add a "no" button instead of this manipulative "remind me later" shit.


I doubt anyone actively revokes certificates ever - perhaps maybe the game console makers.


dsdas


Unfortunately, the problem here is programmers moreso than formats. It literally doesn't matter what you specify, programmers will not implement it to a T. Most programmers simply don't know that every single detail matters. Many of those who may have some idea don't really care, since they can't imagine how something like this could happen.

It's not just XML. It's every ecosystem I've ever used. Push it around the edges and you will find things.

This is neat, not because it is special to JSON in particular but because it's an example of examining a good chunk of a large ecosystem: https://seriot.ch/projects/parsing_json.html Consider this is likely to be true in any ecosystem that doesn't make it a top priority to avoid.


I disagree. The way the format is designed has a direct effect on how likely implementors are to implement it correctly. So the format designers bear some responsibility.

For example how many Protobuf parser libraries have security bugs? I'm guessing very few because the standard is nice and simple, and it's very clearly defined without much "it's probably like this" wiggle room (much easier for binary formats!).

XML had a ton of unnecessary complexity that could have been avoided to make implementations simpler. I haven't actually read this bug so let's see if it was one of:

* Closing tags having to repeat the name / two different ways of closing tags.

* CDATA

* Namespaces (especially how they are defined)

* &entities;

Edit: Ha it wasn't any of those - but it was still an issue with text based formats. Seems like Expat assumes the content is valid UTF-8 (and doesn't validate it), while Gloox assumes it is ASCII. Obviously this couldn't have happened with binary formats.

If you care about security DON'T USE TEXT FORMATS!


XML is a bad text based format. It doesn't know if it wants to be human readable or computer readable so it does both poorly (if you think this vuln is bad, check out some of the saml vulns).

I wouldn't blame xml's silliness on text based formats in general, even if they are full of risks.


Wrong.

If you care about security, verify your goddamn invariants.

This is not a software problem. This is a lazy programmer/software engineer problem. Electrical Engineering, or hell, any matyre engineering field understands this concept.

If you have mot read your entire codepath, you have no idea what it is you are doing.

Welcome to why my life as a QA is effing miserable. Every bit of ignorance by devs following the philosophy of "abstraction is good" is dealt with at the level of Software BoM audit.

All hail Time to Market!


> This is not a software problem. This is a lazy programmer/software engineer problem.

The old "good programmers don't write bugs" fallacy. How do so many people still think like this in 2022??


There is a difference between not writing bugs, and checking your invariants.

If you have not read implementation code you are dependent on, you by sefinition have not had the signal that raises that invariant violation into your consciousness.

It would be like a civil engineer building a bridge out of limestone at a thickness that would require a larger thickness of steel and just saying "to hell with it, go figure it out".

The write-only programmer is a threat to themselves and everyone around them. And to be frank, even more dangerous are those members of management who have their expectations around implementation time so skewed by this cavalier attitude toward knowing the dynamics of your stack.

You will make bugs. Crossed invariants are completely preventable though.


> If you care about security, verify your goddamn invariants.

While it would be nice to be able to do this, sadly we don't have infinite resources, lest we be okay with actually shipping software in 5-10 years instead of 1-2. I know that I would be okay with such a world, but people who pay my salary might not share that point of view. Nor do the people who would have to choose an app to use in the near future, instead of waiting for a decade to do so.

> This is not a software problem. This is a lazy programmer/software engineer problem. Electrical Engineering, or hell, any matyre engineering field understands this concept.

The thing is, that the majority of the development out there is like the Wild West. If my code throws a NullPointerException or a NullReferenceException, then someone is going to be mildly annoyed and it might result in a Jira issue to fix. Code failing in a variety of ways is almost considered normal in some respects, outside of specific (expensive) contexts, where correctness matters a lot.

Admittedly, even in programming there are fields where the stakes are higher, though writing code for planes (as an example) is wildly different than what 90% of people out there would call "programming". Personally, I'd like 100% test coverage (lines, code branches, everything), but outside of these high stakes environments it would be wasteful to do so.

> If you have mot read your entire codepath, you have no idea what it is you are doing.

For many out there, this is pretty much impossible to do in a meaningful way. Let's use something like the Spring framework, a popular option in Java for web dev, a stack that has a rather high level of abstraction. In it, the actual code path that you're dealing with would involve your application code, the framework code (which is likely many times longer than your actual application, uses reflection and other complex mechanisms, overall being truly Eldritch at times), any integrated libraries, as well as the JVM and some other code on your actual system, that interfaces with the JVM.

Even if you toss out Java from the stack, the actual hot code path in any non-trivial piece of software will be pretty difficult to reason about, due to different types of linking, different external package versions etc. Unless you feel okay with very, very slowly stepping through everything with a debugger, which probably still won't give you too good of an idea of what's actually happening and what should have happened.

Though maybe traversing 20 layers of abstraction in Spring and coming out of that debugging session more confused than you were than when you entered it is just a Java/Spring thing, who knows.

> Welcome to why my life as a QA is effing miserable. Every bit of ignorance by devs following the philosophy of "abstraction is good" is dealt with at the level of Software BoM audit.

I think there's plenty of misery to be had all around. For a humorous take at the state of things, have a look at this article: https://www.stilldrinking.org/programming-sucks

> All hail Time to Market!

All hail being able to pay rent by delivering sub-optimal software to meet ever changing business demands in an environment where nobody wants to pay for perfect software. That's simply the world we live in, take it or leave it (e.g. pursue whichever environment feels better to you, within the bounds of your opportunities in life).


And thus we come back to the age old quandry. The implicit act of economic violence tucked away into our current society.

I have capital, you don't, do what I want, or starve.

It always comes back to violence.

This is why we waste so much time reinventing things and white labelling, and subjecting other professions to the most ungodly tooling. We mass-produce suffering. We engineer it into the product in the form of lack of care under the guide of "we're innovating guyz".


Ehh, that's a somewhat grim view, though I doubt I can offer many valuable points about the wider nature of capitalism as a whole.

That said, what I can say is that there definitely is a wide spectrum of different circumstances that people are dealing with and therefore the level of care that certain things will get will also vary.

For example, would it be cool to spend 2 decades working on the perfect GUI framework that'd be tested, dependable, performant and would also have exceedingly good usability? Sure. Is that going to happen in our current world? Perhaps not.

But hey, starting out with a bit of pushback and selling the concept of TDD or quality gates is a start as well, or even having proper tests for all of the important business logic, whilst willingly ignoring (putting off) the things that are just infeasible.


This is just so basic a screwup though. The W3C spec for XML has had a formal syntactic description of valid tag names for decades:

https://www.w3.org/TR/2006/REC-xml11-20060816/#sec-common-sy...

Plenty of libraries get this right because it’s so easy. You’d almost have to try—probably by being “clever”—to get it wrong.


While I'm not defending the screw-up here - it's bad - it does do it a slight injustice to omit that the issue was not something simplistic around ascii/utf8 parsing but rather failing to reject/escape malformed-UTF8 strings. Unicode handling even in actual programming language implementations is an extremely common and well-documented problem.


I think it's worth remembering that XML parsing is also a big historic source of bugs which suggests to me that while it may look simple and well formed on the surface it's probably a lot harder than it looks.


Could you give examples? There were plenty of problems with certain standards layered atop of XML or self-made implementations of XML parsers and unparsers [1], but there is also a well tested set of standard compliant XML libraries that avoid those issues.

[1]: An internationally known consulting firm, that I won't name, had (perhaps has) an internal tool that compiles an Excel description of a service interface into actual XML parsing code that accepts only one hard-coded namespace alias for each given namespace. Over the years I've come across multiple companies with that bug in some service. Everytime I looked into it, the reason was the same internal tool of that consulting firm. And I've met multiple times people who had already discovered that same thing.


I have the same question as the sibling commenter: are you sure you mean parsing (i.e. well-formedness) and not handling (i.e. logic to do things with the parsed data: e.g. xxe, namespace separation, etc.

Obviously all software has some bugs and I'm sure XML parsers are no exception but I haven't been personally aware of any high profile ones before this.

For a quick example of a lowish-level XML bug that isn't parsing-related, I reported a bug many years ago in a piece of software whereby attributes without curie prefixes were being placed into the wrong namespace. A weird quirk of the XML spec is that unprefixed tags go into the default namespace but unprefixed attributes go into a "NULL" namespace (or, if I recall correctly, sometimes a specific namespace depending on the tag?). That's not a parser bug though since the parser has parsed the tag, attributes and associated prefix strings (or lack thereof) correctly: it just does something wrong post-parsing.

I feel like that class of bug is very common with XML, but it's more of an application stability concern than a security one (XXE being a notable exception just because it deals with IO)


IMO the best response to this kind of analysis is to humbly realize that any of us, working under real-world pressures, could make such a screw-up, and contemplate how we'll remain vigilant and mitigate the damage that comes from our inevitable screw-ups.


I suppose it's safest to use a binary format where variable-length fields are prefixed with their length.


Assuming properly-created data, yes. You aren't immune to problems but you will reduce them, especially in a memory-safe language.

Unfortunately, in a security context, that is not only not guaranteed, but will be actively attacked, so in practice I'm not sure it buys you that much from a security perspective. A net positive, I think, but certainly not enough that you ca metaphorically kick back and enjoy your lemonade.

The binary format is one of the oldest of security vulnerabilities, by simply claiming a length of larger than the buffer allocated in the C program, though I'm inclined to credit that particular joy to C and not the data itself. Nowadays there aren't many languages where simply claiming to be really long will get you anywhere like that.


More generally, if you want to include a block of untrustworthy structured data in a protocol, it’s very much preferable to do so in a way that does not require inspecting the data in question to figure out where it ends and thus where the outer protocol resumes.

English is not immune. Think about “who’s on first” — there is no way to distinguish the untrustworthy name “who” from a grammatical part of the conversation.


Sure if you like ingesting 4GB records. There is nothing inherently safer in binary formats. It's easy to write parsers that can handle properly formatted files, it is when you're dealing with corrupt or misformed files that everything gets complicated.


> There is nothing inherently safer in binary formats.

Sure there is. Barring a pathologically bad wire format design, they’re easier to parse than an equivalent human editable encoding.

Eliminating the human-editing ability requirement also enables us to:

- Avoid introducing character encoding — a huge problem space just on its own — into the list of things that all parsers must get right.

- Define non-malleable encodings; in other words, ensure that there exists only one valid encoding for any valid message, eliminating parser bugs that emerge around handling (or not) multiple different ways to encode the same thing.


Define non-malleable encodings; in other words, ensure that there exists only one valid encoding for any valid message, eliminating parser bugs that emerge around handling (or not) multiple different ways to encode the same thing.

I've said similar things to this before. E.g. if you want a boolean, there's nothing simpler and less error-prone than a single bit. It represents exactly the values you need; nothing more and nothing less. You could take a byte if you didn't want to pack, and use the "0 is false, nonzero is true" convention, which is naturally usable in a lot of programming languages; that way there are 256 different values, but the set of inputs is still small and finite with each one having a defined interpretation.


Sure, until someone sets the prefix to 100MB large, and sends zero bytes of data :)


Which would be a lot easier to catch by bounds checks in the language / data types used / sanitizers / fuzzers / static analysis than cases like this where you can have two implementations seemingly successfully parse the data but disagree on the result.


Programmers respond to their incentives. Like most security bugs, this one happened because someone was dumb enough to use C for something connected to the internet. But the reason programmers do that is because of a culture that rewards fast and insecure more than slightly less fast and correct.


It appears that Gloox, a relative low-level XMPP-client C library, rolled much of its Unicode and XML parsing itself, which made such vulnerabilities more likely. There maybe good reasons to not re-use existing modules and rely on external libraries, especially if you target constraint low-end embedded devices, but you should always be aware of the drawbacks. And the Zoom client typically does not run on those.


One of the harder things with XMPP is that it is a badly-formed document up until the connection is closed. You need a SAX-style/event-based parser to handle it. That makes rolling your own understandable in some cases (e.g. dotnet's System.Xml couldn't do this prior to XLinq).

That being said, as you indicated Gloox is C-based, and the reference implementation of SAX is in C. There is no excuse.


Not only that, but before the TLS session starts you have to handle an invalid XML document (the starttls mechanism start encrypting stuff right in the middle of the initial XML document). Also some XML constructs are not valid in XMPP (like comments)

I think rolling out your own XML parser for XMPP is a fairly reasonable thing to do. In the past at least, many, if not most, implementations had their own parser (often a fork of a proper XML parser). What is more surprising to me is why would they choose XMPP for their proprietary stuff. I don't think they want to interroperate or federate with anything?

(if I remember correctly and if it hasn't changed compared to many years ago, when I looked at that stuff.)


> One of the harder things with XMPP is that it is a badly-formed document up until the connection is closed. You need a SAX-style/event-based parser to handle it.

That is a common misconception, although I am not sure of its origin. I know plenty of XMPP implementations that use an XML pull parser.


It's possible by blocking the thread that's reading the XML, but now you're in thread-per-client territory, and that doesn't scale.


Smack uses an XML pull parser and non-blocking I/O. It does so by splitting the XMPP stream top-level elements first and only feeding complete elements to the pull parser.


https://github.com/igniterealtime/Smack/blob/master/smack-xm...

I don't see any opportunity not to block when calling "next"


DOM-based XML parsers use SAX parsing under the hood.


Right, but if they don't give you access to the SAX parser then you are SOL.


I find that response a bit strange, since the whole reason the Zoom client has these particular vulnerabilities is because they didn’t roll their own, and instead rely on layers of broken libraries.

It’s quite possible they’d have more bugs without doing that, but re-using existing modules could just as easily have been an even worse idea.


Using what everyone and their dog is using is prone to bugs just as much because software without bugs doesn't exist or is not very useful, but it also has the benefit of many versatile eyeballs looking at it in many different contexts.

So if there's a bug found and fixed in libxml2 which is used by almost everything else, everyone else instantly benefits. Same with libicu which is being used, for example, by NodeJS with its huge deployments footprint. Oh, and every freakin' Webkit-based browser out there.

OTOH, they rolled their own, so all bugs they hit are confined only to zoom, and are only guaranteed to get Zoom all the bad press.

Choose your poison carefully.


If they roll their own it also becomes less interesting to actively exploit.

Obviously this doesn’t really work for Zoom any more, since their footprint is too large, but it can stop driveby attackers in other situations. Nobody is going to expend too much effort figuring out joe schmuck’s homegrown solution, where they’d happily run a known exploit against the unpatched wordpress server.


Security by obscurity has been debated to hell and back. It only works if you stay obsecure... and don't leak your code.


I think the point is that Unicode and XML parsing are known to be security critical components and you should take care that they are handled only by well tested code designed specifically for the purpose. You need to not roll your own and also ensure that any third party components didn’t roll their own.


> You need to not roll your own and also ensure that any third party components didn’t roll their own.

If you're not writing the code and somebody else isn't writing the code then who is writing the code?!


A well-tested Unicode library built for security should be doing your Unicode parsing in security critical components.

It’s just another way of saying you should be doing a security audit as part of selecting a library and integrating it into your product.


I get your confusion. But keep in mind that it is not only about just picking the library that shows as first result of your Google search. My naive self thinks that a million dollar company should do some research and evaluate different options when choosing external codebase to build their flagship product on. There a dozens of XMPP libraries, and they picked the one that does not seem to delegate XML and Unicode handling to other libraries, which should raise a flag.


I think that's a false dichotomy; IMO the best default choice is to rely on the most well-tested library in any given category. That suggests to me that they should have used expat on the client side.


IMO we should use external libraries, and should invest engineering time on the library rather than just take a library. Not using good third party library means you need to invest at least a few engineer-month in it to get the same result, and you will need to invest a lot more to do better than third party library. Instead, you can take the library and invest a few engineer month to improve the opensource library.


Why? If anything, the client does the more reasonable interpretation of the XML-in-malformed-UTF-8 - skipping to the next valid UTF-8 sequence start. It's the server that has really weird behavior for their UTF-8 handling where it somehow special cases multi-byte UTF-8 sequences but then does not handle invalid ones.


This is a very common issue across all of software engineering I've found. But I really don't get why. If I was given the task of parsing Unicode or XML, I'd run and find a library as fast as possible, because that sounds terrible and tedious, and I'd rather do literally anything else!

Why aren't people more lazy, in other words?


Some relevant info in case you don’t want to read the whole description but wonder if you’re concerned by the issue:

> Zoom fixed the server-side issues in February and client-side issues on April 24 in version 5.10.4.

> Zoom published a security bulletin about client-side fixes at https://explore.zoom.us/en/trust/security/security-bulletin

CVE-2022-25235 CVE-2022-25236 Fixed-2022-Apr-24 CVE-2022-22784 CVE-2022-22785 CVE-2022-22786 CVE-2022-22787


This is another lesson that you should always parse+serialize rather that just validate. It is much harder to smuggle data this way to exploit different parsers.

Basically the set of all messages that will satisfy your validator is far larger than the set of all messages that will be produced by your serializer.


Or, it's another lesson that you should not completely trust any code but compartmentalize instead. Thanks to Qubes OS, I am still safe, since Zoom is running in a hardware-virtualized VM.


I'm safe as well, because I only use the web version of Zoom. Code you don't trust should always run in a sandbox, if it runs at all.


This is however a very different level of sandboxing.


Sure, but it's much easier for most people to run things in a browser sandbox.


How is that helpful? This exploit completely replaces the Zoom software with arbitrary attacker software and it executes in your VM that has access to camera, microphone, network, and presumably screen recording. It sounds to me like the highest possible level of access and your VM is just performative.


1. It will not have access to anything else than Zoom.

2. It will not have access to the camera or network, when I'm not using Zoom.

3. If I'm using a disposable VM, it's cleaned every reboot.

> and presumably screen recording

Screen recording of this VM.


How is screen recording only of Zoom itself of any use to you?


If needed, I can move a presentation to that VM, or open a browser in it.

It gets a bit complicated if you want to share a screen from another VM, see https://forum.qubes-os.org/t/share-screen-of-qube-with-anoth...


The real lesson is not to use Zoom. Anyone who does deserves everything they get. There have been so so many red flags that using Zoom will leak your data to 3rd parties (often in china) and compromise your security that people using it now must simply not care if it happens. So no surprise, it's happened yet again, and you can bet it will again and again in the future.

There are other options besides Zoom. They are different from Zoom, each with their own strengths and weaknesses, but they don't have example after example showing total incompetence and/or malicious intent the way Zoom does.


I am not sure this applies in this case. I don't know how Zoom's XMPP backend works, but it could very well parse and serialize and still be vulnerable. If the xml library accepts invalid 3-byte utf8 characters on parse, then its internal representation supports these characters, and I don't see why they would not be serialized just as well.


XMPP servers (including Zoom's) already parse + serialize ;)


Having multiple, potentially different parsers is incredibly dangerous. One person used the fact that different plist parsers in the macOS kernel choked in different ways when interpreting malformed xml, leading some to believe the plist was "safe" because it did not grant certain permissions, while others trusted this "safe" plist but believed it did grant these permissions.

https://blog.siguza.net/psychicpaper/


I didn’t even consider the existence of XMPP vulns until I listened to the Darknet Diaries episode about Kik[0]. It’s a really interesting class of vulnerabilities.

[0]: https://darknetdiaries.com/episode/93/


This vuln writeup is extremely well written. Actually quite interesting to read!


How much of Zoom is powered by XMPP? Do we know much about these internals? This would be super cool to learn about.


Good thing that I never used the standalone client and always the in-browser webapp instead.


How do you do that? On any OS I tried (Debian, Windows) it always *forces* me to download the standalone client, otherwise I can't join. There's no alternative link ("Join via web") like MS Teams has for example.

I really feel uncomfortable each time I have to install the client on a machine for my relatives :/


I've always been able to use the in-browser client, but you have to download the client once or twice before the page will update to show the alternative "use browser". It's definitely an intentional dark pattern.


Check out https://github.com/arkadiyt/zoom-redirector. You can also join meetings from https://pwa.zoom.us/wc/.


OMG, thank you so much! That's a huge relief.

I actually started boycotting Zoom meetings where I can. If anyone sends me a zoom invitation and I know that they are not forced by having to be available for larger audiences I suggest them to use basically anything else.

I don't know why, but from the first time I visited their website until today, I have the feeling I can't trust the company.


After you click "download Zoom client" the button will turn into a "use Web app". You don't even need to download anything if you cancel the system dialog asking you where to save the download. However I still find this UX pattern incredibly deceptive. People and companies seriously need to stop using Zoom


Unfortunately they don't allow you to both speak and present using the webapp - forcing desktop client use.


Heh, it’s like an AIM punter, but better!


Are these issues bugs in libxml, gloox, ejabberd? Or just in the Zoom client and server?


At some point we are going to need enforceable professional standards that effectively deal with commercial software publishers who choose to parse untrusted inputs in non-performance-sensitive contexts with C libraries.


This bug has nothing to do with language choice.

I agree that better professional standards and accountability should be introduced for software like zoom though.


No. We don't need more authoritarian dystopia.


We are? Why?


Since most software users are not tech-savvy and care about convenience and price significantly more than they care about security (revealed preference), the "worse is better" phenomenon incentivizes commercial developers to implement the minimum security practices that their customers will bear. This is individually rational for the developers and the users, but the result is untold billions of dollars of costs costs. Regulation would be one way to change the incentives.


Thanks to Ivan Fratric and Google Project Zero!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: