HN2new | past | comments | ask | show | jobs | submitlogin

> when consumers are lagging behind, producer throughput falls off a cliff because lagging consumers introduce random reads

I am confused by this. The format of Kafka's log files is designed to allow reading and sending to clients directly using sendfile, in sequential reads of batches of messages. http://kafka.apache.org/documentation/#maximizingefficiency



Kafka brokers handle connections to consumers and data storage. This creates contention as the primaries for each partition have to service the traffic and handle IO. Consumers that aren't tailing the stream will cause slowdowns because Kafka has to seek to that offset from files which aren't cached in RAM.

Pulsar separates storage into a different layer (powered by Apache Bookkeeper) which allows consumers to read directly from multiple nodes. There's much more IO throughput available to handle consumers picking up anywhere in the stream.


Kafka works best when the data it is returning to consumers is in the page cache.

When consumers fall behind, they start to request data that might not be in the page cache, causing things to slow down.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: