Core Concepts
A multi-task representation learning technique that leverages information from multiple related domains to improve the detection of classes from unseen domains by cultivating a domain-invariant latent space.
Abstract
The paper introduces a multi-task representation learning approach to improve the detection of out-of-distribution (OOD) intrusion classes. The key ideas are:
Cultivate a latent space from data spanning multiple source and cross-domains to amplify generalization to OOD domains.
Disentangle the latent space by minimizing the mutual information between the input and the latent space, effectively de-correlating spurious correlations among the samples of a specific domain.
Jointly optimize the classification loss, the multi-domain reconstruction loss, and the mutual invariance regularization in the latent space.
The authors evaluate the proposed method on multiple cybersecurity datasets, showing improved performance on both unseen in-distribution and OOD classes compared to contemporary domain generalization methods. The key is to leverage cross-domain data in a principled way and apply a mutual information-based regularization to learn a domain-invariant latent representation.
Stats
The authors use several cybersecurity datasets in their experiments:
CSE-CIC-IDS2018: SOLARIS, GOLDENEYE as source domains, INFILTRATION, BOTNET as cross-domains, and RARE, SLOWHTTPS, HOIC as OOD domains.
CICIoT 2023: BENIGN, DoS, DDoS as source, RECON as cross-domain, and WEB, MIRAI as OOD.
CICIoMT 2024: BENIGN, DDoS, DoS as source, RECON, SPOOFING as cross-domain, and MQTT as OOD.
Quotes
"We consider the scenario where we know that each domain has its own spuriously correlated features which hurts the generalization performance of the model when tested on OOD domains."
"Our methodology jointly optimizes the classification loss, the multi-domain reconstruction loss, and the mutual invariance regularization in the latent space."
"We show that cross-domain data when added in a principled way, can improve generalization performance on the IN and OOD classes."