Conceitos Básicos
This paper explores extending the IEEE Standard 1671 (ATML) to enable effective and near real-time operational test and evaluation (T&E) of machine learning (ML) applications, particularly in edge ML contexts, by modeling various ML-specific tests such as cross-validation, adversarial robustness, and drift detection.
Resumo
This paper addresses the need for messaging standards to support operational test and evaluation (T&E) of machine learning (ML) applications, especially in edge ML contexts. It examines the suitability of the IEEE Standard 1671 (ATML) for this purpose and explores extending ATML to encompass the unique challenges of ML applications.
The paper models various ML-specific tests, including:
Cross-validation: The authors demonstrate how cross-validation can be described using ATML, including specifying the dataset to be used.
Adversarial robustness testing: The paper shows how adversarial robustness tests can be specified in ATML, including defining the adversarial perturbation parameters and the expected robustness score.
Drift detection: The authors present an ATML-based test description for monitoring data drift, including steps for comparing the current data distribution against a reference distribution and detecting significant drift.
The paper also discusses extending ATML beyond just test descriptions, exploring the use of other ATML schemas such as Unit Under Test (UUT) description, Test Station description, Test Adapter description, and Test Results description in the context of ML applications.
The authors conclude that ATML is a promising tool for enabling effective and near real-time operational T&E of ML applications, which is a critical aspect of AI lifecycle management, safety, and governance. While some minor extensions may be necessary, ATML can be adapted to address the unique challenges of ML testing.