核心概念
A QoE-aware resource allocation algorithm is proposed to accelerate split inference in NOMA-based edge intelligence by optimizing the tradeoff between inference delay, resource consumption, and user quality of experience (QoE).
摘要
The key highlights and insights from the content are:
-
The authors identify two important observations regarding the relationship between quality of service (QoS) and quality of experience (QoE) in edge split inference:
- Users' QoE and inference delay follow a sigmoid-like curve, allowing for relaxation of strict QoS requirements to reduce resource consumption while maintaining acceptable QoE.
- Minimizing the sum of all users' inference delay does not guarantee high QoE performance for the overall system due to heterogeneous user requirements.
-
Based on these observations, the authors formulate a joint optimization problem to minimize inference delay, resource consumption, and maximize QoE for edge split inference in a NOMA-based system.
-
To solve this complex optimization problem, the authors propose a gradient descent (GD)-based algorithm called Li-GD. The key idea is to use the optimal results from previous layers' GD procedures as the initial values for the current layer's GD, reducing the complexity caused by discrete parameters.
-
The authors analyze the properties of the proposed Li-GD algorithm, proving its convergence, bounding the complexity, and estimating the approximation error.
-
Experimental results demonstrate that the proposed ERA algorithm outperforms previous studies in achieving the optimal tradeoff between inference delay, resource consumption, and QoE for edge split inference.
統計資料
The computation task of each layer in the main branch can be calculated as: flδ = mδ1fconv + mδ2fpool + mδ3frelu, where mδ1, mδ2, and mδ3 denote the number of convolutional layers, pooling layers, and ReLU layers, respectively.
The received signal-to-interference-plus-noise ratio (SINR) of AP n for device i on subchannel m is given by: Υn,i
m = pn,i
m |hn,i
m |2 / (∑βn,v
m pn,v
m |hn,v
m |2 + ∑∑βl,t
mpl,t
m|gl,t
m|2 + σ2).
The achievable data rate of user i by SIC on subchannel k of AP j is: Φj,i
k = βj,i
k Bdown/M log2(1 + Ψj,i
k), where Ψj,i
k is the SINR.
引述
"Even the AI has been widely used and significantly changed our life, deploying the large AI models on resource limited edge devices directly is not appropriate."
"The model split strategies between device, edge server and cloud has been investigated by previous works, and some excellent works have been proposed, such as [12-19]. These studies find the optimal model segmentation points or early-exist points to minimize inference delay and resource requirements while maintaining high inference accuracy by reinforcement learning [12], convex optimization [13-16], heuristic algorithm [17-19], etc."
"These algorithms mainly concentrate on improving and optimizing the system's Quality of Service (QoS), such as low inference latency and energy consumption, high inference accuracy, etc., ignore the effect of Quality of Experience (QoE) which is one of the critical items for users except for QoS."