HN2new | past | comments | ask | show | jobs | submitlogin
Julia, Python and Cython (groups.google.com)
107 points by synparb on April 22, 2012 | hide | past | favorite | 15 comments


> It's a common theme when we scientific Python users talk that we don't really use Python for the language. We use it for the community and the libraries.

This statement doesn't make sense to me. First, I use Python for scientific/numerical work, and I definitely use it "for the language" - at least, certain aspects of it that are rare elsewhere.

Second, the language, community, and libraries are all interdependent. The design of a language strongly influences the tone of its community, as well as the libraries available (both the number and the types).


> The design of a language strongly influences the tone of its community, as well as the libraries available (both the number and the types).

I certainly think this is true, and I think that Dag's underestimating the importance of the underlying Python language here.

At the same time, a lot of scientists I know are used to just using whatever language has the right library in it (either for functionality, or to duplicate another paper's results). I know that in grad school I used Python, Matlab, IDL, Fortran, and Java, all on the same scientific project (if not the same software project). I used each language less because I liked it, but because it had this one important piece that I really wanted to use.

And while that was a Frankenstein of a project, it wasn't all that unusual compared to the other researchers I knew. We weren't full-time software developers, we wanted to avoid re-implementing anything we could... so the libraries won, all the time.


This kind of frankensteinism was one of the motivating reasons for creating Julia (see this InfoWorld interview: http://www.infoworld.com/d/application-development/new-julia..., please ignore the misleading title). Python, although it can get you closer to single-language scientific computing, can't quite get you there because of the split language model: at the very least, libraries have to be written in C for adequate performance; often complex projects end up requiring parts written in C.

Doing practical scientific computing only using code implemented in Julia is completely unrealistic. At the very least, a lot of fast kernels (BLAS, LAPACK, etc.) are going to be implemented in C and/or Fortran for the foreseeable future. What Julia brings to the table is the ability to call C/Fortran libraries without writing anything but Julia code — you can just load a .so file and call functions without compiling anything (see http://julialang.org/manual/calling-c-and-fortran-code/). If we could have the same level of trivial interoperability with Python, then you could potentially use scientific libraries written in Python, while still only ever writing Julia code. That helps alleviate the "frankenstein project" problem without throwing out all the excellent work that's been done by the SciPy community.

In the other direction, there are people who already have projects that are all or mostly Python. Forcing them to have some random piece in Julia just to use a nice bit of Julia functionality is making the frankenstein problem worse. If they can trivially call Julia code (perhaps via a .so, as it were just a library written in C), then that's one less frankenstein project in the world.


I think it's hard to overstate how important this is.

Consider the number of bolt-in solutions there are to get decent performance in python: numexpr, weave, pysco (dead), the emerging island of NumPyPy, theano, the brilliant/crazy chimera language cython, and now perhaps numba.

The multi-modal nature of development affects libraries (too few devs needing too many hammers) and even the very sustainability of the ecosystem -- as evidenced by myriad numpy-discussion threads in the past year [0][0.1] about core implementation issues with 100-150+ messages, but about three people [1] of those who have the requisite interest, C chops, Python and CPython API understanding, and time to do work in the NumPy core. As a result the core has been mostly stuck in a local maximum for years now.

Contrast Julia: Arrays are written in the language itself [2]. Folks practically off-the-street with single-digit weeks of Jl experience are implementing non-trivial array datatypes [3][4]. Julia appears to have the flexibility to allow the kind of fine-tuned optimizations that are seen in the guts of a BLAS library [5] while simultaneously facilitating carefree, throw-away data exploration scripts by MATLAB refugees.

If anything, I fear that Julia may be susceptible to the LISP curse if extreme care is not taken to ensure the interoperability that is needed to build an ecosystem. (The Python straitjacket is a blessing in this respect). On the other hand, maybe this time github will help balance the equation...

[0] http://thread.gmane.org/gmane.comp.python.numeric.general/44...

[0.1] http://thread.gmane.org/gmane.comp.python.numeric.general/48...

[1] http://thread.gmane.org/gmane.comp.python.numeric.general/49...

[2] https://github.com/JuliaLang/julia/blob/master/base/array.jl

[3] https://groups.google.com/d/topic/julia-dev/YYRa6Iveevg/disc...

[4] https://groups.google.com/d/topic/julia-dev/x3xFSa8iCog/disc...

[5] https://groups.google.com/d/msg/julia-dev/vETgqnpesDk/OuZaB7...


Being able to call libraries in language A from language B is only half of the solution. Automatic compatibility layers won't be able to convert semantic differences between the languages. What you will end up is writing language A in language B. For example "writing C in python" or "writing Java in python", instead of writing pythonic code. I think most people will agree that Java is a better Java than python is.


As a grad student I was definitely a polygot programmer, but I've since really settled into an almost 100% python workflow, except when I used highly optimized simulation packages that run on massively parallel machines. The reason why python stuck for me was a combination of the simplicity of the language, its ability to just let me get work done and the robust scientific library support, not to mention that I rarely need to step out python to do things.

Need to set up a quick and dirty batch processing queue to churn through a series of simulations on my workstation: `import multiprocessing` and set up a worker pool. Need to automate the creation of several hundred simulation config scripts, simple with standard library string templating. Need to work with hdf5 files: `import h5py`, etc, etc. That and its been dead simple to wrap existing c code with cython and write small compiled modules for cpu intensive tasks when needed. And IPython for interactive data exploration and development.

So from that perspective, I'd be much more interested in calling Julia from within Python, or writing some small function that needs optimization in Julia and then calling it from Python, than the other way around.


I've been playing with Julia a bit lately, but I haven't had any good reasons to write real code in it yet. I've got to say, a useful Python compatibility layer between the two languages would be awesome. Being able to call SciPy libraries from Julia would make it a lot more interesting to start writing Julia code.



Heh. That was the easy part. Being able to get a PyObject back from the embedded Python interpreter and manipulate it in useful ways is the real trick. But at least it's a start. And it was pleasantly easy, due to two things: how easy it is to call C functions from Julia, and the fact that Python has made it trivial to be called from C code.

(Imagine using a C++ library from any other language like this. Wait, you can't.)


I don't mind writing cython to judiciously optimize my scientific python code when needed, but efforts along these lines, as well as Travis Oliphant's Numba project (https://github.com/ContinuumIO/numba) seem like promising alternatives. It's great to see some productive conversations between the Julia and Scipy-dev folks.


There is also a project to port parts of Numpy/Scipy to pypy, which would probably give interesting results.

The following years are going to be exciting for open-source numerical work. But I fear fragmentation might ruin it for everybody.


Can someene of the team or early adaptors explain to me: WHY USE "END" statement?


The syntax is inspired by/based on that of MATLAB, likely so as to be as familiar as possible.


I don't think one should do something just because it is familiar syntactically. Having left MATLAB for python, there were enough compelling reasons to ditch the familiar.


Agree. The extra "end"'s makes the code look hideous.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: