Core Concepts
ShuffleBench introduces a new benchmark for evaluating stream processing frameworks' performance in large-scale data shuffling operations.
Abstract
Distributed stream processing frameworks help build scalable and reliable applications for continuous data streams.
ShuffleBench focuses on shuffling data records for state-local aggregations.
The benchmark provides metrics for latency, throughput, and scalability.
Flink achieves the highest throughput, while Hazelcast has the lowest latency.
The paper outlines the benchmark's design, task sample, and evaluation methods.
Experimental evaluations compare Flink, Hazelcast, Kafka Streams, and Spark.
Results show Flink with high throughput, Hazelcast low latency, and Spark's trade-off between throughput and latency.
Further research aims to support additional qualities like reliability.
Stats
이 논문은 Flink가 가장 높은 처리량을 달성하고, Hazelcast가 가장 낮은 지연 시간을 갖는다.
ShuffleBench는 지연 시간, 처리량 및 확장성을 위한 측정 항목을 제공한다.
Quotes
"Flink achieves the highest throughput, while Hazelcast processes data streams with the lowest latency."