HN2new | past | comments | ask | show | jobs | submitlogin
On Python security amidst recent Rails/YAML vulnerabilities (nedbatchelder.com)
161 points by whalesalad on Feb 3, 2013 | hide | past | favorite | 44 comments


I think load and dangerous_load is a wonderful paradigm for API design. It bakes security into the core use of your framework for non-security devs. It also makes the framework more productive for security professionals, since for an assessment you can grep for dangerous_load and focus your efforts on making sure those (rare) calls are hermetically sealed from user input rather than having to audit every possible use of the 100x more common load command. (That is where the Rails community is right now, and it sucks.)

This is similar to requiring a whitelist for mass assignment (rather than making a blacklist optional): it calls out in the code "These are our weak points! Check them carefully!"


This "encode problems into names" approach is something I first ran into at Facebook.

When a particular function or class is found to be a problem (or when it is known to be a problem in advance), it is (re)named something like "foo_POTENTIAL_XSS_HOLE" or "FooMapNonStlCompliant" throughout the code base (using pfff or similar tool), and then the potentially long process of removing the problem can go forward with some confidence nobody will add to the problem while you're working on it.

pfff: https://github.com/facebook/pfff


I think load and dangerous_load is a wonderful paradigm for API design.

It is, but I think it ignores the reality of API design. Who would ever start with dangerous_load? If you knew it was dangerous, you'd probably fix it. No, you start with load(), then find out it's dangerous afterwards. The fix will be a breaking change, so you make safe_load().

"You should replace load()!", I hear you cry. But that removes backwards compatibility. And people don't update legacy software. So then old software becomes vulnerable to who knows how many other security problems because of it's obsolescence.


> ... ignores the reality of API design. Who would ever start with dangerous_load?

When I wrote the Texcaller library, which simplifies compiling (La)TeX code, I disabled dangerous features such as "write18" from the very beginning.

I you don't take basic elements such as "secure/unsecure" variants into account, you're no really doing API design. You then just have a historically grown API, which is not necessarily bad in itself, but doesn't qualify to be called API design.

> But that removes backwards compatibility. And people don't update legacy software. So then old software becomes vulnerable to who knows how many other security problems because of it's obsolescence.

In that case, at least only old hard-to-update legacy software is affected, and not all the other (good, mostly well-written) software that is written today and in the future.

Of course, don't forget to increase the major version number, as for every API change that breaks backward compatibility.


doesn't qualify to be called API design.

True. I should have said the reality of the difficulty of API design.

In that case, at least only old hard-to-update legacy software is affected, and not all the other (good, mostly well-written) software that is written today and in the future.

I like the sentiment, but I can't agree with it. In an ideal world all critical systems would always be kept up the date, but that doesn't happen. Has Python 3 usage overtaken 2.x yet?


> Who would ever start with dangerous_load?

Haskellers. This is why you get lovely long and alarming names like `unsafePerformIO`


Which, since it explicitly breaks type safety (you can write coerce :: a -> b with it), is an appropriate name for that function.


On the other hand, for serialization/deserialization of a data type from format X to Haskell, you would not need a "dangerous_load" function, since the code would be pure.


This is often the case in Haskell APIs, easily abusable or functions with hard to predict side-effects are usually prefixed with "unsafe", for example unsafeCoerce or unsafePerformIO.


Another cool thing, from a security perspective, is that there is actually a tool (Safe Haskell[1]) that makes it easy to disallow unsafe behavior and operate a whitelist of trusted modules. So not only is unsafePerformIO well-named, it's also trivial to restrict its usage.

[1]: http://www.haskell.org/ghc/docs/7.4.1/html/users_guide/safe-...


I suggest Rails people have a look at safe_yaml which (now) replaces YAML.load by a safe (safer?) version:

https://github.com/dtao/safe_yaml

(nb: I've enable it on apps without issues)


This reminds me a lot of how the PHP crowd handles these kinds of problems.

So there's some piece of functionality, but security or some other obviously-important factor wasn't considered at all when it was initially "designed" and implemented. It's soon found to exhibit numerous problems, often including major security flaws.

Then a "safe" version of said functionality is offered. Yet for some reason (incompetence?) it has its own set of problems. And so the developers keep trying again and again, never seeming to get anywhere close to even a suitable solution.

An example is the mysql_escape_string() function, which was replaced by mysql_real_escape_string(), which has in turn been replaced by mysqli_escape_string(), mysqli::real_escape_string() and PDO::quote().

The end result is confusion, especially for novice users, or those coming back to PHP after some time. They don't know which of the several functions to use, or they're using older reference material that suggest the use of the faultiest of the functions.

It's better just to implement things properly the first time around, especially when security or data integrity, for example, are involved. Trying to hack on "safe" versions of functions, or even an entire "safe mode" is the wrong approach.


I have never, ever, not even once, in my 8 years working with rails had to call YAML.load myself in an app.

Changing how Rails or other libs process YAML is a behavior that can and should be performed transparently to users. And besides, most people processing user input aren't accepting YAML. It is extremely weird that Rails was accepting YAML in XML requests. It is slightly more understandable that Rails was using a YAML parser to parse JSON (although not a good idea as we've seen). But by and large, if you have an API that accepts user data, standard practice is to parse JSON, not YAML.

Your analogy is not correct.


I never thought I'd see "Rails" and "transparent" in the same sentence...


Nice generalization!

Safe_yaml is a first drop-in work-around that can be dumped into existing apps: a fix to quickly reduce the risk of exposure. If you have a proper set of tests in your app, it's fast to verify if something is broken here after starting using it.

But then, the underlying issue is being discussed actively [1], with talks about how to incorporate the safe default into coming versions of Ruby.

So I don't really see the parallel with what you describe...

[1] https://github.com/tenderlove/psych/issues/119


Composing Mysql manually is a Bad Idea, and the safe version is not safe because the algorithm is fundamentally flawed. People need to be doing parametrized queries. The reason novices can't adapt to PHP is that the library design is bad to begin with. I don't think either Python or Ruby has this problem. I don't think anyone will be confused because suddenly YAML.load doesn't execute arbitrary code.


> I don't think anyone will be confused because suddenly YAML.load doesn't execute arbitrary code.

An example of where people would be confused because YAML.load doesn't _instantiate arbitrary objects_ (which is what it really does, which results in ability to 'execute arbitrary code' as a poorly thought through side effect) -- is people using ActiveRecord::Base.serialize . Which would become broken if you were serializing any objects that weren't string, hash, integer, array.

While we've realized that allowing de-serialization of arbitrary objects ends up being incredibly likely to result in 'allowing execution of arbitrary code' -- referring to the problem simply as the latter confuses about the nature of the problem and the efficacy of various fixes.


except when in PHP you occasionally have a feature that itself is a security hole by definition, aka. register_globals


that's why next php versions should remove all that crap and only keep PDO , things must be breaking , libraries that use deprecated stuffs should not be used period.

Removing confusing stuffs in a language makes it better, but php language designers dont have a clue.


I like that project but have been trying to avoid saying "Use this!" because I haven't personally audited it and am scared that Rails devs might adopt it and think it means it definitively resolves all of their issues for February, which it might not.


So far, people I've seen using it are "security-aware" developers who know there is always a remaining risk and who are just trying to reduce the surface attack a bit.

For the others, well, we'll have to educate them, or push a safe yaml default quickly into Ruby [1].

This is no holy grail for sure, and there are plenty of other topics to be addressed (eg: secret tokens stored in SCM, pushed to third-party CI services and shared with freelancers and remote employees using non-encrypted disk, shared as well between production and non-regularly updated staging servers etc!)

[1] https://github.com/tenderlove/psych/issues/119


I realise you probably meant this for the more general case, but if you are building an API it should not accept YAML. If you are not building an API and are using YAML to de-serialze stuff you serialised yourself (i.e. use YAML as intended) then there is no issue anyway.

So having "dangerous_load" is not going to help much with YAML: if it's exposed to untrusted input then grep for "YAML" not for "dangerous_load".


So I guess the STD/STI prevention community was actually ahead of the computer security community in having the term "safer sex" replace "safe sex"?


From the article:

> PyYAML has a .load() method and a .safe_load() method. Why do serialization implementers do this? If you must extend the format with dangerous features, provide them in the non-obvious method. Provide a .load() method and a .dangerous_load() method instead.

I think this is a very good advice that holds in general:

The default should never be the most feature-rich version, but the most safe version. This is also why you should generally prefer a whitelist approach over a blacklist approach. And this is why templating systems should perform escaping by default, forcing you to explicitly disable it, at concrete places, when including raw HTML.


+1 for `dangerous_load`. If it was implemented this way, rather than `load` and `safe_load`, I doubt we would have seen vulnerabilities like the one in both tastypie and piston, the two leading API libraries for Django.

See https://www.djangoproject.com/weblog/2011/nov/01/piston-and-...

Whilst python might seem safer than the state in Ruby/Rails, it also has its history of vulnerabilities.


The comments on this post are really interesting ( save the first, which unfortunately stoops to much the sort of tribal content-free attacks we've seen on rails vuln news on HN recently). I particularly liked the long one from Nick Coghlan, and though it does seem there are still some worrying vulnerabilities in python, they are ahead of ruby in their packaging system at least in relying on signatures. keen to see ruby gems step up and take security equally seriously.


Yes, Ruby and Ruby on Rails have received a lot of flak lately, but it is well-deserved and I don't think it is "tribal" in nature.

To many of us, these are just yet another set of tools in our very large toolbox. It's obvious that some tools are inherently better than others, however.

When I point out that Ruby on Rails or JavaScript have some serious inherent problems, it's not because I think that I belong to some Python "tribe" or the Perl "camp", for instance. It's because I'm doing rational, emotionally-detached analysis of certain pieces of software, and this analysis shows there to be serious problems with said software.

I think the same goes for the other people out there who have the courage to point out flaws with JavaScript, Ruby and related technologies. If anyone is acting "tribal", it's those who are so emotionally tied to a particular language or web development framework that they can't stand to hear legitimate concerns regarding important factors like security, performance, maintainability and reliability.


In my opinion, the first comment on the post isn't 'tribal' simply because it blindly attacks Ruby, but because it also blindly disregards the vulnerabilities in Python (which the author carefully and reasonably outlined).


Have you ever looked at the pickle docs though? There is like, literally a 'red alert' banner saying how dangerous it is in combination with arbitrary untrusted input.

PyYAML's warning is not quite so blatant (http://pyyaml.org/wiki/PyYAMLDocumentation) but when you get past installation blah blah blah onto actually loading YAML, the first thing it says (in bold) is that using .load is as dangerous as pickle.load, and it references looking as .safe_load instead.

However, it would be better for the tutorial that immediately follows to use safe_load() everywhere that it can reasonably be used and to only mention .load as an advanced topic (except to mention at first that it exists but shouldn't be used)


You're missing the "real" vulnerability here. Yaml or in general object instantiation is the attack vector - and admittedly a particularly stupid and painful one - but the real vulnerability is sharing code via unsigned repositories. There are more vectors to break into a repository server. So the more serious problem is "How do we secure code?" and "How do we establish trust for shared code?"

This issue will follow us around for quite a bit, even after the YAML bugs have been fixed and gemcutter rebuilt. For ruby that means sign gems, for python sign pip packages. And that's a point where python is not substantially better of than ruby, heck, even PHP (packagist), node (npm) and java (Maven) are in the same boat here. There's something more to learn here than pointing out that "the pickle docs are better than the psych docs" and for all the pain this incident brought the ruby developers, I'd be grateful if all other language communities learned from it - but if you rather prefer leaning back in your chair and pointing out your your communities docs state the danger clearly[1], then you're welcome. This is exactly the trap that the first commenter on the blog falls into. He sees this mess as a pure ruby problem and attacks ruby instead of stepping back and trying to figure out why this affects him as well.

[1] e.g. like npm, which helpfully states that packages should be inspected before installing them. How many people do you expect to actually do that?


You're exactly right to bring up the issues with code distribution (e.g. for CPAN, PyPI), but the YAML usage is a more general/different problem. As far as I understand it Ruby on Rails would have been vulnerable even if you installed it from cryptographically-signed tarballs without any additional code from Rubygems.

But pointing out that other languages don't have super-secure code distribution systems doesn't change that they at least understand the danger of deserializers that can run arbitrary code or create arbitrary objects, especially when Ruby is also weak in this area.

At least for Python I would hope that had learned their lesson in 2011 when some popular third-party Django plugins used YAML.load instead of YAML.safe_load, instead of waiting for this. Of course, RoR devs might have noticed the same issue at that time, but there's nothing we can do about it now.


See, there's a lot more to ruby than rails - I love and use padrino, since it doesn't include as much magic. It doesn't suffer from the rails vulnerabilities caused by the yaml usage. There's also a problem, that in ruby yaml is the to-go marshalling and config format, thats what allowed the attack against rubygems. But now, suddenly all ruby applications were in danger - even my apps, even though I don't use yaml. Shell-scripts, daemons, everything that uses ruby - whether it uses yaml.load or safe_yaml.load. Chef and Puppet were at risk - and those provide root access to hundreds of servers, machines that don't even run ruby apps.

And that is because gems are not signed. If gems were signed, all of this would be a major nuisance, but with limited fallout. Signatures would get checked, approved, done - no matter which attack was used to get to the gem repo. There will be more attacks, using other vectors - a kernel exploit, a webserver exploit, a mail account hacked into. And containing that fallout is way more important. And that's the lesson that needs to be learned by all language communities: Code distribution needs to be secured since otherwise a single attack puts the whole community at risk.

So feel free to point at the python docs and pretend that that's the lack of insight about yaml is what caused the problem. It's the spark that blasted the powder keg, but we were sitting on it long before.


> emotionally-detached analysis of certain pieces of software

Really ? Ok so please expose us your analysis if you have one instead of repeating again an again your supposed conclusion without giving us any argument.


Yes the packaging system has signatures but unfortunately Python's SSL implementation doesn't verify certificates, nor do its packaging tools. The signatures come from the same server as the packages, so if someone MITMs the server you can still get hosed. There was a talk at last year's PyCon demonstrating this and these kinds of problems still have not been fixed.


and don't forget to sign your packages, because vulnerabilities will always happen anyway, and if those compromise a distribution point, its hard to authenticate those said packages.


If PyPi was compromised like RubyGems I'm not sure they'd be more able to reliably recover from it quickly either.

PyPi does support package signing (with GPG) but pip doesn't support signature verification and hardly any packages are actually signed anyway. It's actually probably more secure to load your python packages off of specific commits on a public git repo over HTTPS right now. (Except that pip also doesn't validate HTTPS certificates either...) And if you are lucky enough to be using a package that is signed, establishing a WOT with the author to validate their cert might not be easy.

It's not like people aren't working on this stuff though. https://www.updateframework.com/ have a 'secure' (the upstream ins't obviously) PyPi mirror and the PEP427 Wheel http://wheel.readthedocs.org/en/latest/ format seems to be giving security more consideration than previous attempts at Python packaging have.


Please +1 this pip ticket if you feel that supporting TLS cert and GPG verification should be given the highest priority.

https://github.com/pypa/pip/issues/425

I think it's paramount that pip gets this done right now. Installing code directly from PyPI is extremely scary now and has been for years...


That's exactly why I posted this comment. I'm fully aware of the state of the things :)

I'm hoping this starts to raise enough attention for people to actually fix these things up. The patches for supports are around, and btw, pip now also supports HTTPS properly. (yeah, the experimental branch this is actually merged in)

But, people in general have to understand the benefit AND figure out it actually serves a purpose (because as always with security, nobody gives a crap 'til someone gets compromised in a terrible way). That's also because all the library makers (which is virtually everyone and their dog nowadays) have to actually get a proper gpg key, understand how it works, and actually sign their stuff. That's a major effort.


I'm not sure if we're talking about a language problem here. The tools are there for you to use them in the proper way and there's always a chance that someone misunderstands how things work.

Perl Data::Dumper can be used with eval for serialization, but it is a bad idea. Just like using pickle in Python (most of the time, at least).

No matter what you do, there's always room for someone doing something stupid. So I rather have the tools.


Saying that it is a language problem might be true for that specific case. YAML#load and #dump do Object marshaling, which is always dangerous when the marshaled objects come from untrusted sources and parsed without a template. Similar techniques exist in almost all other languages and object instantiation attacks are nothing unheard of. So most of the bile is unwarranted, unless you are asking for bile back if something like this happens in your language. Everyone who is ranting about the security bugs of other projects clearly lacks the humility that Nick Coghlan is asking for in the comments.

But the reason why this specific one is very widespread is actually a cultural one: YAML was propagated as a very convenient serialization format despite the described property, especially as it was in stdlib very early. It turned into one of Rubys beloved conventions. E.g. some static website generators like Jekyll use YAML for meta data, called "front matter" (they use safe_yaml now, don't try), Rubygems used it to dump their specs, etc. Combine that with the fact that Rails activated certain parameter parsers without the users knowledge (by convention, again) and made everyone vulnerable and you have a recipe for desaster.

This is hard to fix, but thats the pain of suddenly being under attack. But instead of all the hate, members of other communities should take away these learnings and educate everyone they know that uses Ruby about these topics. In clear words, but without hate.


If the original author is reading, those are George Orwell's "pithy maxims", not Allen Short's, but entertaining usage on getting a Big Brother reference into a security/trust context.


George Orwell wrote them, but Allen Short was the person who applied them in that way to a security context. Much of the post is apparently greatly due to Allen Short, so he deserves some mention.


For what it's worth, I applied "freedom is slavery" and "ignorance is strength" to programming back in the 90s in a rambly post on my website. I don't know if Allen ever saw it, and security wasn't much on my mind back then. (We're acquaintances, I admire him, and I'm glad to hear of this talk.)


I think you are aggressively misunderstanding the attribution here.

The maxims being referenced are the ones about code ('input is an attack', 'small interfaces'). The pithy is writing them in terms of Orwell.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: