Core Concepts
A multi-agent and self-adaptive framework utilizing deep reinforcement learning to dynamically balance the trade-off between portfolio returns and risks under volatile financial market conditions.
Abstract
The paper proposes a multi-agent and self-adaptive framework, called MASA, to address the limitations of single-agent deep reinforcement learning (RL) approaches in portfolio management. The key aspects of the MASA framework are:
-
It employs two cooperating agents - an RL-based agent and a solver-based agent. The RL-based agent uses the TD3 algorithm to optimize the overall portfolio returns, while the solver-based agent adjusts the portfolio to minimize potential risks.
-
It integrates a market observer agent that provides estimated market trends as additional feedback to help the RL-based and solver-based agents quickly adapt to changing market conditions.
-
The multi-agent RL scheme of MASA aims to achieve a better balance between portfolio returns and risks compared to single-agent RL approaches, especially in highly volatile financial markets.
-
The MASA framework adopts a loosely-coupled and pipelining computational model, making it more resilient and reliable as the overall framework can continue to work even if any individual agent fails.
The paper evaluates the MASA framework on challenging datasets of the CSI 300, Dow Jones Industrial Average, and S&P 500 indexes over the past 10 years. The results demonstrate the potential strengths of the MASA framework in balancing portfolio returns and risks compared to various well-known RL-based approaches.
Stats
"The total value of a portfolio at time t is Ct = ∑N i=1 at,i × pct,i, where N is the number of assets in a portfolio, at,i is the weight of ith asset, and pct,i is the close price of ith asset at time t."
"The short-term portfolio risk σp,t at time t is defined as σp,t = σβ + σα,t, where σα,t = √A⊤t Σk At = ∥Σk At∥2, and the covariance matrix Σk∈RN×N between any two assets can be calculated by the rate of daily returns of assets in the past k days."
"The long-term portfolio risk Volp is defined as the strategy volatility that is the sampled variance of the daily return rates rp,t of a trading strategy over the whole trading period."
"The Sharpe Ratio (SR) is a performance indicator for evaluating a portfolio in terms of the total annualized returns Rp, risk-free rate rf and annualized long-term portfolio risk Volp."
Quotes
"Deep or reinforcement learning (RL) approaches have been adapted as reactive agents to quickly learn and respond with new investment strategies for portfolio management under the highly turbulent financial market environments in recent years."
"To overcome the above pitfall, a multi-agent and self-adaptive framework namely the MASA is proposed in this work in which two cooperating and reactive agents are utilised to implement a radically new multi-agent RL scheme so as to carefully and dynamically balance the trade-off between the overall returns of the newly revised portfolio and their potential risks especially when the concerned financial markets are highly turbulent."