Belangrijkste concepten
A novel approach called Vaegan that employs generative machine learning to generate synthetic data that closely mirrors the distribution of real high-level synthesis data, enabling more comprehensive system-level design space exploration.
Samenvatting
The paper proposes a novel approach called Vaegan to generate synthetic data for system-level high-level synthesis (HLS) design space exploration (DSE). Existing HLS benchmarks and datasets are limited in their diversity and may not cover complex, multi-component system-level scenarios.
Vaegan consists of three stages:
- Input Transformation: Diverse HLS data (directives, report estimation, post-synthesis, post-implementation) is transformed into a binary fixed-point format to enable handling both continuous and discrete data types.
- Generative Models: Vaegan explores two generative machine learning models - Variational Autoencoder (VAE) and Generative Adversarial Network (GAN) - to generate synthetic HLS data. The authors find that the VAE-based MLPVAE model outperforms the GAN-based DCGAN model in terms of generating high-fidelity synthetic data.
- Model Evaluation: The generated synthetic data is evaluated using state-of-the-art metrics like Maximum Mean Discrepancy (MMD), Sum of the Square of the Distances (SSD), Percentage Root-mean-square Difference (PRD), and Cosine Similarity (COSS) to ensure it closely matches the distribution of real HLS data.
Experimental results show that Vaegan, particularly the MLPVAE model, generates synthetic HLS data that is 44.05% closer to the real data distribution compared to prior works. The authors also demonstrate the practical usefulness of Vaegan through a case study on a wearable pregnancy monitoring system, where the synthetic data generated by Vaegan expands the Pareto frontier and uncovers new Pareto-optimal solutions not present in the original dataset.
Statistieken
The HLS data used in the experiments was obtained from the HLSDataset, which includes HLS directives, HLS report estimation, post-synthesis, and partial post-implementation data.
The dataset contains 9,557 samples without HLS directives and 3,717 samples with HLS directives for the FPGA part xc7v585tffg1157-3.
The input data was transformed into a 32-bit binary fixed-point format, with 20 bits for the integer portion and 12 bits for the fractional portion.
Citaten
"Generating new data using existing HLS benchmarks can be cumbersome, given the expertise and time required to effectively generate data for different HLS designs and directives."
"Synthetic data offers numerous benefits for both ML-based and heuristic-based HLS DSE. For instance, it is cost effective and time efficient, and provides data diversity."