核心概念
Online learning systems can be formally modeled and designed using a systems theoretic approach that considers both the structure and behavior of the learning system over time, enabling the identification and control of key design parameters to enhance the robustness and reliability of online learning.
摘要
This paper presents a systems theoretic framework for modeling and designing online machine learning (OML) systems. The key insights are:
OML systems can be formally defined as systems that sequentially update their knowledge over time to improve prediction performance, where the system structure (input-output spaces) and system behavior (data distributions) may change non-stationarily.
The system structure can be analyzed in terms of homogeneous, partially heterogeneous, and fully heterogeneous relationships between sequential system observations. Homomorphic mappings can be used to relate heterogeneous system structures.
The system behavior is characterized by concept drift, which can manifest as virtual drift (changes in the input distribution) or real drift (changes in the input-output relationship). Concept drift introduces complexity and variability that must be accounted for in the OML system design.
The knowledge base of the OML system can be updated using the observed input-output pairs, the learned model parameters, or by finding shared feature representations between sequential system observations.
The systems theoretic perspective shifts the focus from algorithm design to the identification and control of key design parameters related to the structure and behavior of the OML system, which is crucial for ensuring robust and reliable online learning.
The paper uses healthcare provider fraud detection as a running example to ground the theoretical discussion in a real-world OML challenge.