Enterprise Integration Zone is brought to you in partnership with:

Mitch Pronschinske is a Senior Content Analyst at DZone. That means he writes and searches for the finest developer content in the land so that you don't have to. He often eats peanut butter and bananas, likes to make his own ringtones, enjoys card and board games, and is married to an underwear model. Mitch is a DZone Zone Leader and has posted 2573 posts at DZone. You can read more from them at their website. View Full User Profile

Kafka 0.8 is Bringing Replication. Is it Ready?

  • submit to reddit

Kyle Kingsbury had a blog post last month that showed some interesting insight into the Apache Kafka, a distributed messaging system.  The developers of Kafka are planning to introduce replication to the 0.8 release, which should improve Kafka's durability and availability by duplicating each shard’s data across multiple nodes.

The post does an excellent job of explaining Kafka's in plain English.  Anyone who's still not clear about what Kafka is or how it works should spend a minute reading this post.  Kingsbury closes with some potential issues he sees in the current implementation of the replication feature and some recommendations to solve these issues:

Kafka’s replication claimed to be CA, but in the presence of a partition, threw away an arbitrarily large volume of committed writes. It claimed tolerance to F-1 failures, but a single node could cause catastrophe.

I made two recommendations to the Kafka team:

  1. Ensure that the ISR never goes below N/2 nodes. This reduces the probability of a single node failure causing the loss of commited writes.
  2. In the event that the ISR becomes empty, block and sound an alarm instead of silently dropping data. It’s OK to make this configurable, but as an administrator, you probably want to be aware when a datastore is about to violate one of its constraints–and make the decision yourself. It might be better to wait until an old leader can be recovered. Or perhaps the administrator would like a dump of the to-be-dropped writes which could be merged back into the new state of the cluster.

Kyle Kingsbury