Citation: | ZHANG Longfei, FENG Yanghe, LIANG Xingxing, LIU Shixuan, Cheng Guangquan, Huang Jincai. Sample strategy based on TD-error for offline reinforcement learning[J]. Chinese Journal of Engineering. doi: 10.13374/j.issn2095-9389.2022.10.22.001 |
[1] |
Vinyals O, Babuschkin I, Czarnecki W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 2019, 575(7782): 350 doi: 10.1038/s41586-019-1724-z
|
[2] |
Kiran B R, Sobh I, Talpaert V, et al. Deep reinforcement learning for autonomous driving: A survey. IEEE Trans Intell Transp Syst, 2022, 23(6): 4909 doi: 10.1109/TITS.2021.3054625
|
[3] |
Degrave J, Felici F, Buchli J. et al. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 2022, 602(7897): 414
|
[4] |
Fawzi A, Balog M, Huang A, et al. Discovering faster matrix multiplication algorithms with reinforcement learning. Nature, 2022, 610(7930): 47 doi: 10.1038/s41586-022-05172-4
|
[5] |
梁星星, 馮旸赫, 黃金才, 等. 基于自回歸預測模型的深度注意力強化學習方法. 軟件學報, 2020, 31(4):948
Liang X X, Feng Y H, Huang J C, et al. Novel deep reinforcement learning algorithm based on attention-based value function and autoregressive environment model. J Softw, 2020, 31(4): 948
|
[6] |
Mnih V, Badia A P, Mirza M, et al. Asynchronous methods for deep reinforcement learning//International Conference on Machine Learning. New York, 2016: 1928
|
[7] |
Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor//International Conference on Machine Learning. Stockholm, 2018: 1861
|
[8] |
Fujimoto S, Hoof H, Meger D. Addressing function approximation error in actor-critic methods // International Conference on Machine Learning. Stockholm, 2018: 1587
|
[9] |
Hafner D, Lillicrap T, Fischer I, et al. Learning latent dynamics for planning from pixels // International Conference on Machine Learning. California, 2019: 2555
|
[10] |
Hafner D, Lillicrap T, Ba J, et al. Dream to control: Learning behaviors by latent imagination[J/OL]. arXiv preprint (2020-05-17) [2022-10-22].https://arxiv.org/abs/1912.01603
|
[11] |
Hafner D, Lillicrap T, Norouzi M, et al. Mastering atari with discrete world models[J/OL]. arXiv preprint (2022-02-12) [2022-10-22].https://arxiv.org/abs/2010.02193
|
[12] |
Fujimoto S, Meger D, Precup D. Off-policy deep reinforcement learning without exploration // International Conference on Machine Learning. California, 2019: 2052
|
[13] |
Zhang L F, Zhang Y L, Liu S X, et al. ORAD: A new framework of offline Reinforcement Learning with Q-value regularization. Evol Intel, 2022: 1
|
[14] |
Mao Y H, Wang C, Wang B, et al. MOORe: Model-based offline-to-online reinforcement learning[J/OL]. arXiv preprint (2022-01-25) [2022-10-22]. https://arvix.org/abs/2201.10070
|
[15] |
Fujimoto S, Gu S S. A minimalist approach to offline reinforcement learning. Adv Neural Inf Process Syst, 2021, 34: 20132
|
[16] |
Kumar A, Zhou A, Tucker G, et al. Conservative Q-learning for offline reinforcement learning // Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, 2020: 1179
|
[17] |
Fu J, Kumar A, Nachum O, et al. D4rl: Datasets for deep data-driven reinforcement learning[J/OL]. arXiv preprint (2021-02-06) [2022-10-22]. https://arxiv.org/abs/2004.07219
|
[18] |
Schaul T, Quan J, Antonoglou I, et al. Prioritized experience replay[J/OL]. arXiv preprint (2016-02-25) [2022-10-22]. https://arxiv.org/abs/1511.05952
|
[19] |
Liu H, Trott A, Socher R, et al. Competitive experience replay[J/OL]. arXiv preprint (2019-02-17) [2022-10-22]. https://arxiv.org/abs/1902.00528
|
[20] |
Fu Y W, Wu D, Boulet B. Benchmarking sample selection strategies for batch reinforcement learning[J/OL]. OpenReview. net (2022-01-29) [2022-10-22]. https://openreview.net/forum?id=WxBFVNbDUT6
|
[21] |
Lee S, Seo Y, Lee K, et al. Offline-to-online reinforcement learning via balanced replay and pessimistic q-ensemble // Conference on Robot Learning. London, 2022: 1702
|
[22] |
Bellman R. A Markovian decision process. J Math Mech, 1957: 679
|
[23] |
Hessel M, Modayil J, Van Hasselt H, et al. Rainbow: Combining improvements in deep reinforcement learning// The Thirty-Second AAAI Conference on Artificial Intelligence. New Orleans, 2018: 3215
|
[24] |
Schulman J, Levine S, Abbeel P, et al. Trust region policy optimization // International Conference on Machine Learning. Lille, 2015: 1889
|
[25] |
Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[J/OL]. arXiv preprint (2017-08-28) [2022-10-22]. https://arxiv.org/abs/1707.06347
|
[26] |
Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge: MIT press. 2018
|
[27] |
Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529 doi: 10.1038/nature14236
|