Core Concepts
Streaming workflows can significantly enhance data throughput and reliability, revolutionizing data analysis for time-sensitive experiments.
Abstract
Abstract:
Recent advancements in detector technology have increased data complexity.
High-performance computing (HPC) offers efficient data processing.
A streaming workflow at NERSC bypasses storage I/O bottlenecks.
Introduction:
Transition to digital data poses challenges in managing large datasets.
NERSC's Superfacility Project integrates EOS facilities with HPC resources.
Background:
The 4D Camera captures electron diffraction patterns rapidly for 4D-STEM analysis.
Large datasets from the camera require efficient processing methods.
Methods:
ZeroMQ-based pipeline facilitates direct data transfer for on-the-fly processing.
Pipeline pattern ensures fair distribution of messages across nodes.
Results:
Streaming workflow achieves up to a 14-fold increase in data transfer speed compared to file transfer methods.
Improved system predictability and reliability benefit time-sensitive experiments.
Related Work:
Various software packages leverage message queues and network data transfer in DAQ systems.
Conclusions and Outlook:
The streaming workflow reduces processing delays and dependency on shared file systems at NERSC.
Stats
The new workflow achieves up to a 14-fold increase in data throughput compared to traditional methods.
The streaming pipeline significantly enhances system predictability, reducing delays for time-sensitive experiments.
For larger datasets, the streaming pipeline outperforms the file transfer method, achieving faster processing times.