Large language models like GPT-4 can be effectively leveraged to sanitize tabular data in a way that hinders the extraction of sensitive user information while retaining the ability to extract useful features.
This paper introduces Balanced and Entropy-based Mix (BEM), a novel data mixing approach that re-balances both the class distribution of data quantity and uncertainty to enhance long-tailed semi-supervised learning.