Diagnosing Memory “Leaks” in Python

hassy · on Sept 7, 2015

Nice read, even though I haven't written any Python in a long time. Memory management in modern OSes is very complicated and there isn't a simple answer to "how much memory is my process using exactly". [1] Memory growth / usage isn't usually something to worry about unless memory growth is constant under load and/or OOM killer kicks in.

Also consider that `heapy` will probably only report on objects created by your Python code, and not any memory taken up by native code in the interpreter itself or any shared libraries.

1. For example see:

- http://stackoverflow.com/questions/860878/tracking-actively-...

- https://mail.gnome.org/archives/gnome-list/1999-September/ms...

- http://bmaurer.blogspot.co.uk/2006/03/memory-usage-with-smap...

IgorPartola · on Sept 8, 2015

Once I had to fix a severe memory leak in piece of Python code. None of the available tools revealed anything useful. I ended up just adding debug statements to the hot loop to see which step caused the memory usage to jump. Should have known the answer: a bad C extension was leaking memory. Lesson learned.

danudey · on Sept 8, 2015

libzmq apparently has a bug where it creates structures to handle connections every time you connect, and it cleans them up after processing data. This means that if you connect then disconnect without sending data, you leak memory.

It's trivial to run a "for i in …" and cause a zmq app to leak gigabytes of memory in a matter of seconds. Highly problematic.

toong · on Sept 8, 2015

Are you referring to https://hackernews.hn/item?id=10175751 ?

ThePhysicist · on Sept 7, 2015

I encountered the "memory hogging" behavior of Python processes once, where I was sure that my GC worked correctly and that I released all unused objects but still the memory of the process would keep growing. I also remember having this problem with a C++ once as well. It doesn't seem a problem though since the OS should normally release the memory if it's needed by another process. Still, this kind of behavior can definitely drive you nuts.

BTW Celery is usually not a good fit for long running processes, because if you have many of those processes running in parallel within a production system it will get really difficult restarting the Celerey daemon as it will have to wait for all these processes to stop (during which no new tasks can be processed). Why restart at all you ask? Well, restarting is necessary to reload the code, as it is not recommended to use the autoreloader in a production system. This problem persists even when using the multiprocessing module btw (as the author suggested), since on Linux Python uses fork() to create a new process, thereby just copying the whole memory of the given process.

polpo · on Sept 8, 2015

We solved the Celery + long running + code reloading problem by having each push of new code be associated with a new Celery queue. On push of new code, start the new queue, and SIGTERM the old one, which will wait for any long-running tasks to finish before exiting.

ThePhysicist · on Sept 8, 2015

Hm that's an interesting idea, thanks for sharing this! How exactly do you perform the restart of the Celery service and where are your queues configured? I guess you don't specify them in /etc/defaults/celeryd?

polpo · on Sept 8, 2015

Correct, we didn't use /etc/defaults/celeryd, or use a standard init/systemd/upstart/etc-based way of starting celeryd. Instead we used daemontools to add and remove celeryd processes with their own configuration. It took a bit of work, but ultimately let us do code updates with zero downtime.

rdc12 · on Sept 8, 2015

Except under Linux fork() has COW semantics, so only heap variables that change will be copied, thou I suspect that could be a large portion of a Python process anyway

ThePhysicist · on Sept 8, 2015

Yes the operation is quite efficient, the problem is that fork() normally makes reloading of code (without resorting to special tricks) impossible, as the interpreter will usually already have loaded all imports before forking. The only way to circumvent this would be to delay the loading of required modules until after the fork.

ledil · on Sept 7, 2015

What do you consider as alternative to celery?

klibertp · on Sept 7, 2015

It depends on what do you use Celery for.

For launching a bunch of IO-bound "tasks", for example calling external services from Django views, I'd consider using Twisted (or Tornado, or asyncio). Your tasks would need to be either written in async style, or you'd need to spawn new processes from within Twisted (but built-in functionality makes this rather easy). Still, Twisted is rock-solid, doesn't leak and is capable of handling a lot of (IO-bound) tasks concurrently.

If your tasks are CPU bound you pretty much have no choice other than something based on multiprocessing. You can still use Twisted, but only in the second way. If the code of your tasks doesn't use C extensions you could use Jython with threads. This way you'd get parallelism without having to rewrite much code.

If you need your tasks parallelized and you want to run a lot of them concurrently then I'm afraid you're out of options in Pythonland. Personally I'd go for Erlang with ErlPort, but I know Erlang rather well.

On the other hand, Celery is a nice piece of code. I think in most cases you don't need anything else, or at least nothing drastically different, like the options above. Perhaps rq would be a good idea. I also encountered an interesting project called Pulsar (http://pythonhosted.org/pulsar/overview.html), but it seems to be usable only on 3.3 and above.

BufordTJustice · on Sept 8, 2015

Under some circumstances, if you're running periodic jobs, APScheduler works well in a pinch.

ThePhysicist · on Sept 7, 2015

If you find one, let me know ;) I'm looking for something myself currently.

There's RQ (http://python-rq.org/) but it seems to have a similar design as Celery (just a simpler architecture) so it probably suffers from the same problem.

A good solution would be to have a series of workers that can launch new independent Python processes for each task, e.g. using the subprocess module.

aidos · on Sept 7, 2015

I have long running jobs (say, 5 minutes on average - up to an hour). I originally used celery (after picloud shut down) but it just doesn't work well with those charcteristics. Each worker reserved an extra job so it was impossible to get good cpu utilisation.

I switched to rq and it's all been much easier. The behaviour is easy to understand and it's easy to inspect redis to see what's going on.

In terms of the code restart angle - I'm fairly sure you can effectively restart the workers. They run as a single process that forks to do work. Each copy you run only has a single worker, so you need to run multiple instances yourself. If you kill the parent it waits until the child has finished the job it is on before terminating.

I could be wrong about some of the details. I'd recommend giving it a shot. I must have run 100,000s of jobs through it now and I haven't had a single issue.

http://python-rq.org/docs/workers/

ThePhysicist · on Sept 8, 2015

Thanks, that sounds interesting!

cheez · on Sept 7, 2015

Kind of surprised the poster didn't know that operating systems often hold on to memory.

> we noticed that the memory of the celery process was continuing to grow.

Doesn't look like there was any bad outcome related to this observation. Was any process not getting the memory it wanted?

ldarby · on Sept 7, 2015

There are in fact bad outcomes of continued memory growth - firstly there is a catastrophic performance degradation when processes start getting swapped to disk, then when swap space runs out Linux's OOM killer starts killing processes in an attempt to free up memory. Kind of surprised you didn't know this.

cheez · on Sept 8, 2015

Except that didn't happen, according to the post.

ldarby · on Sept 8, 2015

The post says: "Indeed, the processes grew from their initial size of 100MB, slowly, all the way up to 1GB before we killed them."

The important part is that they manually killed the processes before they let it start swapping.

cheez · on Sept 16, 2015

Again, they didn't actually have anything bad happen. Bad: processes couldn't get memory. Not bad: I was using 3 GB of memory.

falsedan · on Sept 8, 2015

buy more RAM

KIND OF SURPRISED YOU DIDN'T BUY MORE RAM

mkesper · on Sept 7, 2015

Problem seems to be solved in Python 3.3+: http://bugs.python.org/issue11849 The article mentions some interesting disgnostic tools, though.

Drdrdrq · on Sept 7, 2015

The linked issue probably has nothing to do with the case (see last 3 comments on issue page).

Nice summary of python memory debugging tools though. I would add dowser to the list too.

hydrogen18 · on Sept 7, 2015

If you think you actually have a memory leak in python, the first thing to do is to recompile without the internal memory manager in python. This avoids the problem the author is addressing.

jsmeaton · on Sept 8, 2015

Can you expand on this a little more or provide some links that explain? What happens to memory if you remove the memory manager?

hydrogen18 · on Sept 8, 2015

It just uses the provided malloc & free implementations from the underlying C standard library.

vvanders · on Sept 8, 2015

Kinda surprised the author didn't know about compacting GCs that solve memory fragmentation(at the cost if having a GC).

rasz_pl · on Sept 8, 2015

tldr: uses python 2.x, discovers known, fixed long ago behaviour

curiousjorge · on Sept 7, 2015

import resource import objgraph

these are some golden modules that I never knew about