Core Concepts
ShuffleBench introduces a new benchmark focusing on large-scale data shuffling operations in stream processing frameworks, addressing a critical gap in existing benchmarks.
Abstract
ShuffleBench is a novel benchmark designed to evaluate the performance of modern stream processing frameworks, emphasizing data shuffling operations. It provides valuable insights into throughput, latency, and scalability across different frameworks like Flink, Hazelcast, Kafka Streams, and Spark. The study highlights the importance of considering various factors such as record size, consumer count, and selectivity in assessing framework performance. Through detailed experiments and analysis, ShuffleBench offers a standardized approach for comparing stream processing implementations.
Stats
Flink achieves the highest throughput.
Hazelcast processes data streams with the lowest latency.
Spark's throughput can be increased at the cost of higher latency.
Quotes
"We propose ShuffleBench as a new stream processing benchmark focusing on large-scale data shuffling."
"Throughput results show Flink leading followed by Kafka Streams."
"Hazelcast processes data with very low latency compared to other frameworks."