Python is just another language VM. C Python should work just fine in WebAssembly, and bridging the APIs to the web (e.g. filesystem, etc) is already possible through Emscripten.
We don't plan to do things for WebAssembly, we plan to make WebAssembly powerful enough that anyone can run their language VM :-)
I think what they meant is that they could use python without having to also ship the implementation like CPython. I mean, that thing is huge. How is it feasible to serve 10mb for a website?
That's awesome and means alongside ObjectiveC, I effectively waited out deep diving on 2 (soon to be) formerly popular programming languages. I get to use my preferred language, Python, for even more.
Julia would still require the garbage collector but I don't think the downvotes here are fair at all. If there was a language that would be a natural transition to people used to javascript but wanting native speed, julia would be it. In a way it is a glimpse of what javascript could have been given a lot of experience and much more design.
Question please: Even though there is stated desire to support python etc vm, this will be hampered by the need for the client to download the large VM vs native code that can be distributed itself, correct?
I see your excitement about Numba, I guess you follow it closely, or perhaps have other valid reasons. So all the best. However, the last time I tried to use it, about 4 months ago, installation was a bitch and it would crap out compiling some functions. It holds promise of course, so does alternatives.
What I think the current post is about is that similar nice and performant abstractions for D. I love Python a lot but I have to pay extra attention so that it is performant, so that I don't make stupid typos. I have to rewrite parts in Cython or Weave (sadly the latter gets no love anymore). Larger the code base becomes I have to spent more time in the tail part of the 'tooth to tail' ratio. In D many of these are taken care by default, I have to worry a lot less about these things. The other fantastic thing is DMD has super fast compilation times. It used to be faster than Go in compilation time but the latter has caught up in that department (grossly lacking in others). D binaries execute a lot faster (when compiled with LDC or GDC). It may lack some libraries here and there, but that does not worry me as much, everyone has to start somewhere, and to be honest Python is lacking in that respect compared to R. What would be worrisome is the presence of structural aspects that may get in the way of such an eco-system emerging. I don't see anything of that kind in D.
In anycase I don't quite see how Numba obviates the issues I mentioned. It does not change the Ndarray representation, and the extra indirections lay right there. OTOH some copy elision I would grant.
I can give concrete examples. The isotonic regression implementation in scikits.learn was absolutely pathetic. It has been rewritten several times and its performance is still severely lacking in spite of being Cythonized. You can take a stab at it with Numba and see what you can do. In C++ (D would have been nicer) the very first idiomatic implementation in STL was faster than any of these rewrites. I wrote it once and moved on. But C++ is a horrifying mess, now if there was a language that gave me C++ performance (not asking for much) but none of the mess and numpy level expressiveness to boot, that would be sweet. Julia, Nim, D, PyPy are some of the contenders trying to reach this holy grail
Numba certainly does not obviate all of these issues. I think user `tadlan` is referring to some of the compiler optimizations, like loop unrolling or fusion (e.g. noticing that two subsequent loops can be 'fused' into the subordinate execution block of one single loop). These things can offer speed-ups even beyond NumPy, and they can work even when the Python code you start out with already uses NumPy.
The thing is, which `tadlan` seems unaware of, a lot of this stuff just fails in production environments and hits corner cases that the Numba compiler does not handle (I'm talking about the first part of the Numba compiler pipeline, where it converts to Numba IR, and not yet to LLVM IR, and does things like examine the CPython bytecode to alter the representation from a CPython stack-based representation to a register-based representation that will be compatible with LLVM and ultimately with the actual machine itself). In that compilation step, the only things that are able to be handled are things that the Numba team (I used to be a member of it) explicitly support. They don't support a full-blown compiler for the entire Python language, nor even for every type of NumPy operation. That's not a knock against Numba at all -- it's a specializing compiler and obviously they need to prioritize what to support, and make longer term goals about supporting more general things. But the point still remains that you cannot just assume that if you call `jit` on any arbitrary Python code, it will always become faster. In some cases, it can even become slower.
I suspect `tadlan` is very interested in Numba and enthusiastic about knowing the taxonomy of Numba details, but it does not seem like that user has had real world experience trying to get Numba to work in production, and seeing all of the numerous buggy and missing features. I don't want to diminish anyone's enthusiasm for Numba, so it's probably best just not to engage with `tadlan` about it. That user's mind seems made up already.
As a former member of the Numba team, what is your outlook on the technology and the broader continuum ecosystem?
I'm looking at Numba and friends (Dynd, blaze) for a new stack, but I'm not sure where the development arc will end up vs say Julia. I'm also curious about the sustainability of Continuum's business model and practices in the medium and long term.
Any thoughts on this? I understand if you are limited in what you can say, but I'm open to any nuggets.
This is one of those questions that is extremely hard to predict. Even though I was part of that team, it doesn't mean I have any special insight into what will happen.
A lot will depend on sources of funding. Will folks like Nvidia start sponsoring Numba, and if so what will it mean for support of Nvidia alternatives like OpenCL?
It seems like NumbaPro as a stand-alone for-pay product is not viable on its own, but that claim could be wrong based on more recent data that I don't have access to. So external sponsorship may be necessary.
One form of this could be through Continuum's already established business model of consulting and support services. But then the question is whether the nature of those consulting and support projects will allow for developers to actually further the cause of Numba, or just merely hack in poorly conceived features that are demanded by the consulting and support customers? Since Numba is open source, it should be easy enough for anyone to follow along with commits and discussions on GitHub and make their own opinion about what direction that is going.
The other question that is always hard is staffing. Far and away the colleagues I had the chance to work with on the Numba team were amazingly good. But it's not clear if working solely on Numba can justify the sort of salary that would be required to attract very top engineers and grow the team. You might start to see more interns and/or post-doc type labor feeding into Numba, and again I don't know what that will mean for the project ... could be good or bad.
At the same time, you've also got a lot of active development for Julia, PyPy, and a lot of people still prefer to use Cython rather than jitting functions. Some people even call into question the entire goal of making something that is "easy" but also a "black box" -- like the way just dropping in `jit` works for people who merely use, but don't understand the inner workings of, CPython.
It's an exciting area, and the Numba team has as much talent and ability to claim a significant piece of the tool space surrounding high performance computing as anyone else. Whether that will pan out for them is still really hard to predict.
What do you think about the foundational tech of numba itself?
Do you think it is any more of a black box than say Julia? Is there anything about it that would impede extension into a more stable, predictable and feature rich product?
Julia is really fast in 95% cases, but 5% still "make the weather". The pairwise summation is an example. I will post benchmarks D vs Julia next week ;)
I would say the biggest benefit of D is static typing. In Julia you can run a simulation and discover only after half an hour that you misspelled a function
Considering that in the benchmarked example, only one like of Numpy code was used which already uses compiled C, I have a hard time believing that that would catch up to using all compiled code.
While I agree broadly that the benchmark example in the post is not representative or useful as a comparison between D and NumPy, I disagree with your strong insistence on Numba in this case.
There is still a lot of Python code that the Numba-to-LLVM compiler cannot handle. Yes, it is true that Numba can do a decent job of removing CPython virtual machine overhead, even for functions in which you statically type the arguments merely as 'pyobject' -- but not universally.
And there are also a ton of super basic things, for example creating a new array inside the body of a function when using Numba in 'nopython' mode [1], that Numba doesn't handle. These things will improve over time, but they may not improve quickly enough for a given use case, and the D language and this ndarray implementation may be a fair competitor to Numba in the short-to-medium term (and could even be superior in the long run, who knows).
A fairer alternative would be comparing with the use of Cython, which for my money still hands down beats anything like D or Nim/Pymod for performance enhancement without sacrificing pleasantness of design and development.
Though, of course, none of this stuff holds a candle to Haskell :)