Core Concepts
The V-Model from Systems Engineering provides an effective approach to address the interdisciplinary collaboration challenges encountered when building machine learning-enabled software systems.
Abstract
This paper explores the application of the Systems Engineering V-Model to address the collaboration challenges in building machine learning (ML)-enabled software systems. The key insights are:
Requirement Engineering:
System-level requirements should be created and actively maintained to keep up-to-date with new requirement changes, with the participation of owners of ML and non-ML components.
The V-Model's clear system boundaries and responsibilities help ensure consistent requirements are defined across the system, subsystems, and components.
Architecture, Design, and Implementation:
System-level architecture design with elements, interfaces, responsibilities, alternatives, and expected performances should be created and actively maintained.
Risks such as design changes or improvements due to uncertainty in ML components must be actively identified and mitigated.
The V-Model's enforcement of validation and verification (V&V) and risk management helps address these challenges.
Model Development:
Requirements and detailed design of ML components with interfaces, alternatives, and expected performances should be created, with participation of owners of external and internal components like data and infrastructure.
The V-Model's clear boundaries and responsibilities ensure these aspects are properly defined and reviewed.
Data Engineering:
Data should be treated as a separate component with standalone requirements, design synthesis, and system validation (data validation and monitoring).
The V-Model's component-level focus enables proper attention to data quality and evolution.
Quality Assurance:
V&V at both system, subsystem and component levels (ML, non-ML, data, infrastructure) should be enforced with identified owners.
The V-Model's consistency checks across system levels help discover issues for quality assurance.
Process:
The software development lifecycle for ML-enabled systems should follow layered decomposition of systems, subsystems, and components, with continuous in-process V&V and risk management.
The V-Model's clear boundaries and responsibilities address the ad-hoc nature of ML development processes.
Organization, Teams, and Responsibility:
Documentation at the system, subsystem, and component levels should be created, approved, and tracked, with consolidated terminology understood by all roles.
The V-Model's inclusive documentation and access control help bridge the knowledge gap between different roles.
Overall, the study found that despite requiring additional efforts, the characteristics of the V-Model align effectively with several collaboration challenges encountered when building ML-enabled systems. Future research should investigate new process models that leverage the V-Model's strengths.
Stats
"To train a new ML model, we need to backfill the data in the past few weeks to be used in training, and it requires non-trivial work and is burdensome." (P5)
"Engineers had to do multiple iterations to fix several data issues and finally get the correct data they needed for model training." (P3)
"The quality of data (team's wiki documents) was so low that the ChatGPT output using the low-quality data turned out to work poorly." (P3)
Quotes
"SDEs are more emphasized on the coding standards. Scientists are more focused on model accuracy and are less focused on coding standards and code comments, class interface definitions, etc. It's a problem for scientists that how to improve their coding standard in ML model development, and who will own and maintain the code of the model." (P6)
"it's not clear if the root cause of the ticket is in the ML component or non-ML component." (P1, P2)
"it would help a lot if there were metrics, monitoring, or tools on ML and non-ML components to distinguish whether the issue is from the ML model or not." (P1, P2)