Deep Reinforcement Learning

Zaloguj się

spostrzeżenie - Deep Reinforcement Learning

Variational Dynamic Model for Efficient Self-Supervised Exploration in Deep Reinforcement Learning

The core message of this work is that modeling the multimodality and stochasticity of environmental dynamics through a variational dynamic model (VDM) can lead to more efficient self-supervised exploration in deep reinforcement learning.

Effiziente Batch-Normalisierung in Deep Reinforcement Learning für höhere Stichprobeneffizienz und Einfachheit

CrossQ ist ein leichtgewichtiger Algorithmus für kontinuierliche Steuerungsaufgaben, der sorgfältig Batch-Normalisierung einsetzt und Zielnetze entfernt, um die derzeitige Spitzenleistung in Bezug auf Stichprobeneffizienz zu übertreffen, während er ein niedriges UTD-Verhältnis von 1 beibehält.

Analyse von Deep RL mit hohen Update-Verhältnissen

Trotz hoher Update-Verhältnisse kann Deep Reinforcement Learning ohne das Zurücksetzen von Netzwerkparametern effektiv sein, wenn die Q-Werte korrekt behandelt werden.

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Die Verwendung von Klassifizierungsverlusten verbessert die Leistung und Skalierbarkeit des Deep Reinforcement Learning erheblich.

Regression 대신 Classification을 사용한 Scalable Deep RL의 Value Functions 훈련

Value functions trained with categorical cross-entropy significantly improve performance and scalability in various domains, showcasing the potential of using classification instead of regression in deep RL.

Snapshot Reinforcement Learning: Leveraging Prior Trajectories for Efficiency in Deep Reinforcement Learning

Effizienzsteigerung durch Nutzung vorheriger Trajektorien in der tiefen Verstärkungslernung.

Enhancing Exploration Timing with Value Discrepancy and State Counts in Deep Reinforcement Learning

Leveraging Value Discrepancy and State Counts optimizes exploration timing in Deep Reinforcement Learning.

CrossQ: Batch Normalization in Deep Reinforcement Learning for Sample Efficiency

CrossQ introduces a lightweight algorithm using Batch Normalization to improve sample efficiency in Deep RL.

CrossQ: Improving Sample Efficiency in Deep Reinforcement Learning with Batch Normalization

CrossQ introduces a lightweight algorithm for continuous control tasks that enhances sample efficiency by leveraging Batch Normalization and eliminating target networks.

Uni-O4: Unifying Online and Offline Deep Reinforcement Learning with Multi-Step On-Policy Optimization at ICLR 2024

Proposing Uni-O4 for seamless offline and online learning with on-policy optimization.

O nas

Produkty

Zasoby