I think an important bit of context here is that computers are very, very good at speculative happy-path execution.
The examples in the article seem gloomy: how could a JIT possibly do all the checks to make sure the arguments aren’t funky before adding them together, in a way that’s meaningfully better than just running the interpreter? But in practice, a JIT can create code that does these checks, and modern processors will branch-predict the happy path and effectively run it in parallel with the checks.
JavaScript, too, has complex prototype chains and common use of boxed objects - but v8 has made common use cases extremely fast. I’m excited for the future of Python.
That makes it so that in absolute terms, Python is not as slow as you might naively expect.
But we don't measure programming language performance in absolute terms. We measure them in relative terms, generally against C. And while your Python code is speculating about how this Python object will be unboxed, where its methods are, how to unbox its parameters, what methods will be called on those, etc., compiled code is speculating on actual code the programmer has written, running that in parallel, such that by the time the Python interpreter is done speculating successfully on how some method call will resolve with actual objects the compiled code language is now done with ~50 lines of code of similar grammatical complexity. (Which is a sloppy term, since this is a bit of a sloppy conversation, but consider a series "p.x = y"-level statements in Python versus C as the case I'm looking at here.)
There's no way around it. You can spend your amazingly capable speculative parallel CPU on churning through Python interpretation or you can spend it on doing real work, but you can't do both.
After all, the interpreter is just C code too. It's not like it gets access to special speculation opcodes that no other program does.
I love this “real work”. Real work, like writing linked lists, array bounds checking, all the error handling for opening files, etc, etc? There is a reason Python and C both have a use case, and it’s obvious Python will never be as fast as C doing “1 + 1”. The real “real work” is in getting stuff done, not just making sure the least amount of cpu cycles are used to accomplish some web form generation.
Anyway, I think you’re totally right, in your general message. Python will never be the fastest language in all contexts. Still, there is a lot of room for optimization, and given it’s a popular language, it’s worth the effort.
I can't figure out what your first paragraph is about. The topic under discussion is Python performance. We do not generally try to measure something as fuzzy as "real work" as you seem to be using the term in performance discussions because what even is that. There's a reason my post referenced "lines of code", still a rather fuzzy thing (which I already pointed out in my post), but it gets across the idea that while Python has to do a lot of work for "x.y = z" for all the things that "x.y" might mean including the possibility that the user has changed what it means since the last time this statement ran, compiled languages generally do over an order of magnitude less "work" in resolving that.
This is one of the issues with Python I've pointed out before, to the point I suggest that someone could make a language around this idea: https://jerf.org/iri/post/2025/programming_language_ideas/#s... In Python you pay and pay and pay and pay and pay for all this dynamic functionality, but in practice you aren't actually dynamically modifying class hierarchies and attaching arbitrary attributes to arbitrary instances with arbitrary types. You pay for the feature but you benefit from them far less often than the number of times Python is paying for them. Python spends rather a lot of time spinning its wheels double-checking that it's still safe to do the thing it thinks it can do, and it's hard to remove that even in JIT because it is extremely difficult to prove it can eliminate those checks.
I understand what you're saying. In a way, my comment is actually off-topic to most of your comment. What I was saying in my first paragraph is that the words you use in your context of a language runtime in-effeciency, can be used to describe why these in-effeciences exist, in the context of higher level processes, like business effeciency. I find your choice of words amusing, given the juxtoposition of these contexts, even saying "you pay, pay, pay".
You claimed churning through Python interpretation is not "real work". You now correctly ask the question: what is "real work"? Why is interpreting Python not real work, if it means I don't have to check for array bounds?
To put it another way, I choose Python because of its semantics around dynamic operator definition, duck typing etc.
Just because I don’t write the bounds-checking and type-checking and dynamic-dispatch and error-handling code myself, doesn’t make it any less a conscious decision I made by choosing Python. It’s all “real work.”
Type checking and bounds checking aren't "real work" in the sense that, when somebody checks their bank account balance on your website or applies a sound effect to an audio track in their digital audio workstation, they don't think, "Oh good! The computer is going to do some type checking for me now!" Type checking and bounds checking may be good means to an end, but they are not the end, from the point of view of the outside world.
Of course, the bank account is only a means to the end of paying the dentist for installing crowns on your teeth and whatnot, and the sound effect is only a means to the end of making your music sound less like Daft Punk or something, so it's kind of fuzzy. It depends on what people are thinking about achieving. As programmers, because we know the experience of late nights debugging when our array bounds overflow, we think of bounds checking and type checking as ends in themselves.
But only up to a point! Often, type checking and bounds checking can be done at compile time, which is more efficient. When we do that, as long as it works correctly, we never† feel disappointed that our program isn't doing run-time type checks. We never look at our running programs and say, "This program would be better if it did more of its type checks at runtime!"
No. Run-time type checking is purely a deadweight loss: wasting some of the CPU on computation that doesn't move the program toward achieving the goals we were trying to achieve when we wrote it. It may be a worthwhile tradeoff (for simplicity of implementation, for example) but we must weigh it on the debit side of the ledger, not the credit side.
______
† Well, unless we're trying to debug a PyPy type-specialization bug or something. Then we might work hard to construct a program that forces PyPy to do more type-checking at runtime, and type checking does become an end.
Well, originally I wrote "more like Daft Punk", but then I thought someone might think I was stereotyping musicians as being unoriginal and derivative, so I swung the other way.
Yeah, I get it, but I found the choice of words funny, because these words can apply in the larger context. It's like saying, Python transfers work from your man hours to cpu hours :)
"There is a reason Python and C both have a use case [..]"
If you mean historically, then yes, but I don't think there is an inherent reason why we couldn't have a language as convenient as Python and fast as C.
In order to get conciseness, you need to accept defaults that work for everyone. Those defaults might not be that great for you, but they're good enough.
In order to get performance, those "good enough defaults" are suddenly no longer good enough, and now you need to get very specific with what you're doing.
Maybe with a smart enough compiler, a high-level language could be compiled to something with very good performance, but the promise of that "sufficiently smart compiler" has yet to be fulfilled.
I've been hearing promises about "better than C" performance from Python for over 25 years. I remember them on comp.lang.python, back on that Usenet thing most people reading this have only heard about.
At this point, you just shouldn't be making that promise. Decent chance that promise is already older than you are. Just let the performance be what it is, and if you need better performance today, be aware that there are a wide variety of languages of all shapes and sizes standing by to give you ~25-50x better single threaded performance and even more on multi-core performance today if you need it. If you need it, waiting for Python to provide it is not a sensible bet.
I maintain a program written in Python that is faster than the program written in C that it replaces. The C version can do a lot more operations, but it amounts to enumerating 2^N alternatives when you could enumerate N alternatives instead.
Certainly my version would be even faster if I implemented it in C, but the gains of going from exponential to linear completely dominate the language difference.
So you're saying two different programs implementing two different algorithms perform differently and that lets you draw a conclusion about how the underlying language/compliers/interpreters behave?
I must have been unclear. C is faster. The Python program would be faster if I reimplemented it in C. However, Python makes it so much easier to transform the problem into a linear form that it’s a bigger win to use Python than to continue maintaining the C version.
And that’s my point: raw execution speed is only helpful when you’re executing the right thing. Don’t discount how much easier it can be to implement the right thing in Python.
I am a bit older than Python :). I imagine creator of clang and LLVM has fairly good grasp on making things performant. Think of Mojo as Rust with better ergonomics and more advanced compiler that you can mix and match with regular python.
Mojo feels less like a real programming language for humans and primarily a language for AI's. The docs for the language immediately dive into chatbots and AI prompts.
The main problem is when the optimizations silently fail because of seemingly innocent changes and suddenly your performance tanked 10x. This is a problem with any language really (CPU cache misses are a thing afterall and many non-dynamic languages have boxed objects) but it is a much, much worse in dynamic languages like Python, JS and Ruby.
Most of the time it doesn't matter, most high-throughput python code just invokes C/C++ where these concerns are not as big of a problem. Most JS code just invokes C/C++ browser DOM objects. As long as the hot-path is not in those languages you are not at such high risk of "innocent change tanked performance"
Even server-side most JS/Python/Ruby code is just simple HTTP stack handlers and invoking databases and shuffling data around. And often large part of the process of handling a request (encoding JSON/XML/etc, parsing HTTP messages, etc) can be written in lower-level languages.
Although JS supports prototype mutations, the with operator and other constructs that make optimization harder, typical JS code does not use that. Thus JIT can add few checks for presence of problematic constructions to direct it to a slow path while optimizing not particularly big set of common patterns. And then the JS JIT does not need to care much about calling arbitrary native code as the browser internals can be adjusted/refactored to tune to JIT needs.
With Python that does not work. There are simply more optimization-unfriendly constructs and popular libraries use those. And Python calls arbitrary C libraries with fixed ABI.
So optimizing Python is inherently more difficult.
> but v8 has made common use cases extremely fast. I’m excited for the future of Python.
Isn't v8 still entirely single threaded with limited message passing? Python just went through a lot of work to make multithreaded code faster, it would be disappointing if it had to scrap threading entirely and fall back to multiprocessing on shared memory in order to match v8.
Multithreaded code is usually bottlenecked by memory bandwidth, even more so than raw compute. C/C++/Rust are great at making efficient use of memory bandwidth, whereas scripting languages are rather wasteful of it by comparison. So I'm not sure that multithreading will do much to bridge the performance gap between binary compiled languages and scripting languages like Python.
To be slightly flip, we could say that the Lisp Machine CISC-supports-language full stack design philosophy lives on in how massive M-series reorder buffers and ILP supports JavaScriptCore.
I wonder if branch prediction can still hide the performance loss when the happy path checks become large/complex. Branch prediction is a very low level optimisation. And if the predictor is right you don't get everything for free. The CPU must still evaluate the condition, which takes resources, albeit it's no longer on the critical path. However I'd think the CPU would stall if it got too far ahead of the condition execution (ultimately all the code must execute before the program completes). Perhaps given the nature of Python, the checks would be so complex that in a tight loop they'd exert significant resource pressure?
There's an obvious answer - run everything on GPUs. Each speculative branch runs in parallel on its own core, and you add a layer of super-fast branch switching when a branch runs into a problem.
Given the current state of computing, I am unable to state definitively if this suggestion is satire.
The examples in the article seem gloomy: how could a JIT possibly do all the checks to make sure the arguments aren’t funky before adding them together, in a way that’s meaningfully better than just running the interpreter? But in practice, a JIT can create code that does these checks, and modern processors will branch-predict the happy path and effectively run it in parallel with the checks.
JavaScript, too, has complex prototype chains and common use of boxed objects - but v8 has made common use cases extremely fast. I’m excited for the future of Python.