> when consumers are lagging behind, producer throughput falls off a cliff becau...

manigandham · on Jan 2, 2020

Kafka brokers handle connections to consumers and data storage. This creates contention as the primaries for each partition have to service the traffic and handle IO. Consumers that aren't tailing the stream will cause slowdowns because Kafka has to seek to that offset from files which aren't cached in RAM.

Pulsar separates storage into a different layer (powered by Apache Bookkeeper) which allows consumers to read directly from multiple nodes. There's much more IO throughput available to handle consumers picking up anywhere in the stream.

geeio · on Jan 2, 2020

Kafka works best when the data it is returning to consumers is in the page cache.

When consumers fall behind, they start to request data that might not be in the page cache, causing things to slow down.