Old implementation
...
The following section is from a readout from volodymyrx.mytnyk@intel.com on Kafka
Kafka Overview
"Apache Kafka is a distributed commit log service that functions much like a publish/subscribe messaging system, but with better throughput, built-in partitioning, replication, and fault tolerance. Increasingly popular for log collection and stream processing" 0
Kafka core concepts:
"Kafka is run as a cluster on one or more servers.
The Kafka cluster stores streams of records in categories called topics.
Each record consists of a key, a value, and a timestamp" 2.
- "Producers – consume the data feed and send it to Kafka for distribution to consumers". 1
- "Consumers – applications that subscribe to topics; for example, a custom application or any of the products listed at the bottom of this post". 1
- "Brokers – workers that take data from the producers and send it to the consumers. They handle replication as well". 1
...
Topics and partitions
"Partitions – the physical divisions of a topic, as shown in the graphic below. They are used for redundancy as partitions are spread over different storage servers" 1.
"Topics – categories for messages. They could be something like “apachelogs” or “clickstream”". 1
...
Kafka - collectd
Producer throughput:
50 million small (100 byte) records as quickly as possible.
Test Case | Measurement |
1 producer thread, no replication | 821,557 records/sec (78.3 MB/sec) |
1 producer thread, 3 asynchronous replication | 786,980 records/sec (75.1 MB/sec) |
1 producer thread, 3 synchronous replication | 421,823 records/sec (40.2 MB/sec) |
3 producers, 3 async replication | 2,024,032 records/sec (193.0 MB/sec) |
Consumer throughput
Consume 50 million messages.
Test Case | Measurement |
Single Consumer | 940,521 records/sec (89.7 MB/sec) |
3x Consumers | 2,615,968 records/sec (249.5 MB/sec) |
End-to-end Latency | ~2 ms (median) |
...
1 https://anturis.com/blog/apache-kafka-an-essential-overview/
...
2 Documentation (http://kafka.apache.org/documentation.html)
- Benchmarking Apache Kafka (https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines)
- Robust high performance C/C++ library with full protocol support (https://github.com/edenhill/librdkafka)
...