The author proposes a method to generate synthetic data that maintains correlations from the original dataset while ensuring privacy. The approach aims to balance utility and disclosure levels effectively.
The author introduces a novel approach to generating synthetic data for system identification by leveraging knowledge transfer from similar systems. This method aims to enhance model generalization and robustness in scenarios with data scarcity.
Quality-Diversity Generative Sampling (QDGS) improves fairness and accuracy in training classifiers with balanced synthetic datasets.
Generative AI and Large Language Models are revolutionizing synthetic data generation, addressing data scarcity and privacy concerns while pushing the boundaries of AI development.
DataTune, a method to transform existing datasets into a format aligned with the requirements of target tasks, can significantly improve the quality and diversity of synthetically generated data compared to direct language model generation.
MALLM-GAN is a novel framework that leverages multi-agent large language models (LLMs) to generate synthetic tabular data, particularly addressing the challenge of limited sample sizes often encountered in healthcare and other domains.
This paper introduces an algorithm for generating large, realistic synthetic datasets of power injections in electric power grids, addressing the challenge of limited access to real-world operational data for training machine learning models in the power systems domain.
This paper introduces a novel framework leveraging large language models (LLMs) and a retrieval-reasoning approach to generate synthetic clinical trials with binary success/failure labels, demonstrating their potential to augment real datasets, enhance model training for clinical trial outcome prediction, and accelerate clinical research while upholding patient privacy.
Montessori-Instruct, a new data synthesis framework, enhances the training of large language models by optimizing the generation of synthetic training data to align with the specific learning preferences of student models, leading to significant performance improvements.
This paper introduces ERGAN, a novel framework leveraging ensemble learning and recurrent GANs to generate synthetic residential load data that accurately reflects real-world patterns while preserving diversity and statistical properties.