Core Concepts
The authors propose a novel approach to enhance data provenance and model transparency in federated learning systems with practical communication overhead, leveraging cryptographic techniques and efficient model management.
Abstract
The content discusses the challenges of ensuring data integrity and traceability in federated learning systems. It introduces a methodology that combines cryptographic hashing, model snapshots, and multithreading to improve transparency, reproducibility, and trustworthiness of trained models across diverse scenarios.
The authors highlight the significant impact of their proposed methodologies on reducing training time overheads while maintaining data integrity. By optimizing baseline provenance features through multithreading and cryptographic hashing, they demonstrate improved efficiency in storing model snapshots and tracking data transformations.
Key datasets like CIFAR-10, MNIST, and CelebA are used for benchmarking different machine learning models such as ResNet-18 and Vision Transformer. The experiments showcase the feasibility of implementing data provenance systems for enhanced transparency in federated learning.
Stats
Our solution mitigates overheads by almost 50% through multithreading.
Cryptographic hash insertion decreases overhead to 3% for CIFAR10 and MNIST datasets.
Multithreaded optimization reduces training time overhead by approximately 20%.
Quotes
"The lack of auditability in FL systems has been a major point of criticism."
"Extensive experimental results suggest that integrating a database subsystem into federated learning systems can improve data provenance."