Nice read, even though I haven't written any Python in a long time. Memory management in modern OSes is very complicated and there isn't a simple answer to "how much memory is my process using exactly". [1] Memory growth / usage isn't usually something to worry about unless memory growth is constant under load and/or OOM killer kicks in.
Also consider that `heapy` will probably only report on objects created by your Python code, and not any memory taken up by native code in the interpreter itself or any shared libraries.
Once I had to fix a severe memory leak in piece of Python code. None of the available tools revealed anything useful. I ended up just adding debug statements to the hot loop to see which step caused the memory usage to jump. Should have known the answer: a bad C extension was leaking memory. Lesson learned.
libzmq apparently has a bug where it creates structures to handle connections every time you connect, and it cleans them up after processing data. This means that if you connect then disconnect without sending data, you leak memory.
It's trivial to run a "for i in …" and cause a zmq app to leak gigabytes of memory in a matter of seconds. Highly problematic.
I encountered the "memory hogging" behavior of Python processes once, where I was sure that my GC worked correctly and that I released all unused objects but still the memory of the process would keep growing. I also remember having this problem with a C++ once as well. It doesn't seem a problem though since the OS should normally release the memory if it's needed by another process. Still, this kind of behavior can definitely drive you nuts.
BTW Celery is usually not a good fit for long running processes, because if you have many of those processes running in parallel within a production system it will get really difficult restarting the Celerey daemon as it will have to wait for all these processes to stop (during which no new tasks can be processed). Why restart at all you ask? Well, restarting is necessary to reload the code, as it is not recommended to use the autoreloader in a production system. This problem persists even when using the multiprocessing module btw (as the author suggested), since on Linux Python uses fork() to create a new process, thereby just copying the whole memory of the given process.
We solved the Celery + long running + code reloading problem by having each push of new code be associated with a new Celery queue. On push of new code, start the new queue, and SIGTERM the old one, which will wait for any long-running tasks to finish before exiting.
Hm that's an interesting idea, thanks for sharing this! How exactly do you perform the restart of the Celery service and where are your queues configured? I guess you don't specify them in /etc/defaults/celeryd?
Correct, we didn't use /etc/defaults/celeryd, or use a standard init/systemd/upstart/etc-based way of starting celeryd. Instead we used daemontools to add and remove celeryd processes with their own configuration. It took a bit of work, but ultimately let us do code updates with zero downtime.
Except under Linux fork() has COW semantics, so only heap variables that change will be copied, thou I suspect that could be a large portion of a Python process anyway
Yes the operation is quite efficient, the problem is that fork() normally makes reloading of code (without resorting to special tricks) impossible, as the interpreter will usually already have loaded all imports before forking. The only way to circumvent this would be to delay the loading of required modules until after the fork.
For launching a bunch of IO-bound "tasks", for example calling external services from Django views, I'd consider using Twisted (or Tornado, or asyncio). Your tasks would need to be either written in async style, or you'd need to spawn new processes from within Twisted (but built-in functionality makes this rather easy). Still, Twisted is rock-solid, doesn't leak and is capable of handling a lot of (IO-bound) tasks concurrently.
If your tasks are CPU bound you pretty much have no choice other than something based on multiprocessing. You can still use Twisted, but only in the second way. If the code of your tasks doesn't use C extensions you could use Jython with threads. This way you'd get parallelism without having to rewrite much code.
If you need your tasks parallelized and you want to run a lot of them concurrently then I'm afraid you're out of options in Pythonland. Personally I'd go for Erlang with ErlPort, but I know Erlang rather well.
On the other hand, Celery is a nice piece of code. I think in most cases you don't need anything else, or at least nothing drastically different, like the options above. Perhaps rq would be a good idea. I also encountered an interesting project called Pulsar (http://pythonhosted.org/pulsar/overview.html), but it seems to be usable only on 3.3 and above.
If you find one, let me know ;) I'm looking for something myself currently.
There's RQ (http://python-rq.org/) but it seems to have a similar design as Celery (just a simpler architecture) so it probably suffers from the same problem.
A good solution would be to have a series of workers that can launch new independent Python processes for each task, e.g. using the subprocess module.
I have long running jobs (say, 5 minutes on average - up to an hour). I originally used celery (after picloud shut down) but it just doesn't work well with those charcteristics. Each worker reserved an extra job so it was impossible to get good cpu utilisation.
I switched to rq and it's all been much easier. The behaviour is easy to understand and it's easy to inspect redis to see what's going on.
In terms of the code restart angle - I'm fairly sure you can effectively restart the workers. They run as a single process that forks to do work. Each copy you run only has a single worker, so you need to run multiple instances yourself. If you kill the parent it waits until the child has finished the job it is on before terminating.
I could be wrong about some of the details. I'd recommend giving it a shot. I must have run 100,000s of jobs through it now and I haven't had a single issue.
There are in fact bad outcomes of continued memory growth - firstly there is a catastrophic performance degradation when processes start getting swapped to disk, then when swap space runs out Linux's OOM killer starts killing processes in an attempt to free up memory. Kind of surprised you didn't know this.
If you think you actually have a memory leak in python, the first thing to do is to recompile without the internal memory manager in python. This avoids the problem the author is addressing.
Also consider that `heapy` will probably only report on objects created by your Python code, and not any memory taken up by native code in the interpreter itself or any shared libraries.
1. For example see:
- http://stackoverflow.com/questions/860878/tracking-actively-...
- https://mail.gnome.org/archives/gnome-list/1999-September/ms...
- http://bmaurer.blogspot.co.uk/2006/03/memory-usage-with-smap...