The paper considers the problem of encoding and decoding a finite-alphabet Markov source without any delay, known as the zero-delay lossy coding problem. This problem can be formulated as a Markov Decision Process (MDP) with the belief state (conditional probability distribution of the current source symbol) as the state and the quantizer as the action.
The key insights are:
The MDP formulation has an uncountable state space (the set of beliefs), making traditional dynamic programming and value iteration methods computationally prohibitive.
The authors present a quantized Q-learning algorithm that can efficiently compute a near-optimal coding policy by discretizing the belief state space.
The authors prove the asymptotic optimality of the proposed algorithm, first for the discounted cost problem and then for the average cost problem, by relating the optimal solutions for the two criteria.
The technical analysis involves showing the unique ergodicity of the belief process under a memoryless exploration policy, which is necessary for the convergence of the Q-learning algorithm.
Simulations demonstrate the superior performance of the proposed algorithm compared to existing heuristic techniques for zero-delay coding.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Liam Cregg,T... at arxiv.org 05-07-2024
https://arxiv.org/pdf/2311.12609.pdfDeeper Inquiries