Core Concepts
The core message of this article is to develop a novel constrained Markov decision process (CMDP) model to optimally select and dispatch information flows to multiple edge servers, accounting for the heterogeneous capacity constraints of the access network and edge servers, as well as the preferences of deployed applications. The authors propose a specialized primal-dual Safe Reinforcement Learning (SRL) algorithm, DR-CPO, that solves the resulting optimal admission control problem by reward decomposition, achieving higher reward and faster convergence compared to existing Deep Reinforcement Learning (DRL) solutions.
Abstract
The article presents a novel system model and solution approach for optimal flow admission control in edge computing environments. Key highlights:
System Model:
Flows belong to different classes and are generated according to Poisson processes, with each class having a specific utility for the applications deployed on edge servers.
Edge servers have limited computational capacity and the access network has limited bandwidth capacity, which must be accounted for in the admission control decisions.
Applications can be replicated and deployed on multiple edge servers.
Optimal Admission Control:
The admission control problem is formulated as a constrained Markov decision process (CMDP), where the objective is to maximize the expected discounted reward subject to the capacity constraints.
Structural properties of the optimal admission control policy are derived, showing that it can be randomized in at most M states, where M is the number of edge servers.
Safe Reinforcement Learning Algorithm:
The authors propose a specialized primal-dual Safe Reinforcement Learning (SRL) algorithm, called DR-CPO, that solves the CMDP problem by leveraging reward decomposition.
DR-CPO achieves 15% higher reward compared to existing DRL solutions, while requiring only 50% of the learning episodes to converge.
Load Balancing:
The authors also investigate the joint optimization of admission control and load balancing, proposing an iterative procedure that alternates between optimizing the admission control policy and the load balancing policy.
The article provides a comprehensive and rigorous approach to the problem of optimal flow admission control in edge computing, with a focus on developing efficient learning algorithms that can handle the complexity of the underlying CMDP model.