核心概念
A deep reinforcement learning framework can outperform conventional portfolio management strategies by dynamically allocating weights to assets to maximize risk-adjusted returns.
要約
The paper proposes a deep reinforcement learning (DRL) framework for portfolio management. The framework consists of an environment and an agent that interact to devise an optimal trading algorithm.
The environment provides the agent with the current state of the portfolio, which includes preprocessed asset prices, moving averages, and a correlation matrix. The agent's task is to assign weights to the assets in the portfolio to maximize the cumulative reward, which is measured by the daily returns.
The agent uses a combination of exploration (random weight assignment) and exploitation (using a neural network to predict optimal weights) to learn the optimal policy. A replay buffer is used to store past experiences, which are then used to train the neural network and update the Q-tables.
The performance of the DRL model is compared to conventional portfolio management strategies, such as minimum variance and maximum returns. The DRL model outperforms these strategies in terms of risk-adjusted returns, as measured by the Sharpe ratio.
The key insights from the paper are:
- Eliminating the intermediate step of buying, holding, or selling assets and directly assigning weights to the assets can lead to better portfolio optimization.
- Incorporating a correlation matrix in the state representation can help the agent learn the diversification of risk in the portfolio.
- The use of a replay buffer and a neural network to approximate the Q-function can help the agent learn the optimal policy more efficiently.
Overall, the paper demonstrates the potential of DRL in portfolio management and provides a framework for further research in this area.
統計
The portfolio consists of 28 assets, including cryptocurrencies and ETFs.
The data is collected from Yahoo Finance, starting from January 1, 2010 for stocks and January 1, 2016 for cryptocurrencies.
The time-series data includes the following features: closed price, open price, highest price, lowest price, volume, and adjusted close price.
引用
"The exploration process assigns weights through a uniform random function in the range -1 and 1 in a vector. The vector is divided by the absolute vector sum to make the sum of elements one."
"The replay buffer provides a means to record the experience. This experience is a set of the previous state, current actions, current state, and the reward."