แนวคิดหลัก
This paper documents the technical foundations for extending the Topology ToolKit (TTK) to distributed-memory parallelism using the Message Passing Interface (MPI), enabling the analysis of large-scale datasets on supercomputers.
บทคัดย่อ
This paper addresses the issue of extending the Topology ToolKit (TTK) to distributed-memory parallelism using the Message Passing Interface (MPI). TTK is an open-source library that implements a substantial collection of algorithms for topological data analysis and visualization.
The key highlights and insights are:
- Formalization of the distributed model for input data representation and output distribution.
- Extension of TTK's internal triangulation data structure to support distributed datasets, including the computation of global simplex identifiers, ghost layers, and boundary information.
- Development of an interface between TTK and MPI, enabling the consistent combination of multiple topological algorithms within a single, distributed pipeline.
- Taxonomy of TTK's topological algorithms based on their communication needs, with examples of hybrid MPI+thread parallelizations.
- Detailed performance analyses showing parallel efficiencies ranging from 20% to 80%, with negligible computation time overhead from the MPI-specific preconditioning.
- Illustration of TTK's new distributed capabilities with an advanced analysis pipeline combining multiple algorithms, run on a dataset of 120 billion vertices distributed on 64 nodes (1536 cores).
- Roadmap for the completion of TTK's MPI extension, with generic recommendations for each algorithm communication category.
สถิติ
The largest publicly available dataset used in the experiments contains 120 billion vertices.
The experiments were run on a compute cluster with 64 nodes, for a total of 1536 cores.
คำพูด
"Unlike previous work, this paper does not focus on the distributed computation of a specific topological object (such as merge trees or persistence diagrams). Instead, it documents the necessary building blocks for the extension to the distributed setting of a diverse collection of topological algorithms such as TTK."
"To support topological algorithms, a data structure must be available to efficiently traverse the input dataset, with possibly advanced traversal queries. TTK [8], [71] implements such a triangulation data structure, providing advanced, constant-time, traversal queries, supporting both explicit meshes as well as the implicit triangulation of regular grids (with no memory overhead)."