Citation: | XU Cong, LI Qing, ZHANG De-zheng, CHEN Peng, CUI Jia-rui. Research progress of deep reinforcement learning applied to text generation[J]. Chinese Journal of Engineering, 2020, 42(4): 399-411. doi: 10.13374/j.issn2095-9389.2019.06.16.030 |
[1] |
Sutton R S, Barto A G. Reinforcement Learning: An Introduction. 2nd Ed. Massachusetts: MIT Press, 2018
|
[2] |
Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529 doi: 10.1038/nature14236
|
[3] |
Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529(7587): 484 doi: 10.1038/nature16961
|
[4] |
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436 doi: 10.1038/nature14539
|
[5] |
Littman M L. Reinforcement learning improves behaviour from evaluative feedback. Nature, 2015, 521(7553): 445 doi: 10.1038/nature14540
|
[6] |
Li Y X. Deep reinforcement learning: an overview[J/OL]. arXiv Preprint (2017-09-15) [2019-06-16]. https://arxiv.org/abs/1701.07274
|
[7] |
Baroni M, Zamparelli R. Nouns are vectors, adjectives are matrices: representing adjective-noun constructions in semantic space // Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Cambridge, 2010: 1183
|
[8] |
Lapata M, Mitchell J. Vector-based models of semantic composition // Proceedings of the Meeting of the Association for Computational Linguistics. Columbus, 2008: 236
|
[9] |
Su P H, Ga?i? M, Mrk?i? N, et al. On-line active reward learning for policy optimisation in spoken dialogue systems // Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, 2016: 2431
|
[10] |
Vinyals O, Le Q. A neural conversational model[J/OL]. arXiv Preprint (2015-07-22) [2019-06-16]. https://arxiv.org/abs/1506.05869
|
[11] |
Wen T H, Vandyke D, Mrksic N, et al. A network-based end-to-end trainable task-oriented dialogue system[J/OL]. arXiv Preprint (2017-04-24) [2019-06-16]. https://arxiv.org/abs/1604.04562
|
[12] |
Wen T H, Ga?ic M, Kim D, et al. Stochastic language generation in dialogue using recurrent neural networks with convolutional sentence reranking // Proceedings of 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Prague, 2015: 275
|
[13] |
Henderson M, Thomson B, Williams J. The second dialog state tracking challenge // Proceedings of 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Philadelphia, 2014: 263
|
[14] |
Eric M, Manning C D. Key-value retrieval networks for task-oriented dialogue[J/OL]. arXiv Preprint (2017-07-14) [2019-06-16]. https://arxiv.org/abs/1705.05414
|
[15] |
Lowe R, Pow N, Serban I V, et al. The ubuntu dialogue corpus: a large dataset for research in unstructured multi-turn dialogue systems // Proceedings of 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Prague, 2015: 285
|
[16] |
Brown P F, Pietra V J D, Pietra S A D, et al. The mathematics of statistical machine translation: Parameter estimation. Comput Linguist, 1993, 19(2): 263
|
[17] |
Koehn P, Och F J, Marcu D. Statistical phrase-based translation // Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. Edmonton, 2003: 48
|
[18] |
Zhang J J, Zong C Q. Deep neural networks in machine translation: an overview. IEEE Intell Sys, 2015, 30(5): 16 doi: 10.1109/MIS.2015.69
|
[19] |
Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks // Proceedings of Advances in Neural Information Processing Systems. Montréal, 2014: 3104
|
[20] |
Cho K, Merri?nboer van B, Bahdanau D, et al. On the properties of neural machine translation: encoder–decoder approaches. Comput Sci, 2014: 103
|
[21] |
Luong M T, Pham H, Manning C D. Effective approaches to attention-based neural machine translation // Proceedings of the Conference on Empirical Methods in Natural Language Processing. Lisbon, 2015: 1412
|
[22] |
Wu Y H, Schuster M, Chen Z F, et al. Google’s neural machine translation system: bridging the gap between human and machine translation[J/OL]. arXiv Preprint (2016-10-08) [2019-06-16]. https://arxiv.org/abs/1609.08144
|
[23] |
He Z J. Baidu translate: research and products // Proceedings of the ACL 2015 Fourth Workshop on Hybrid Approaches to Translation (HyTra). Beijing, 2015: 61
|
[24] |
Cho K, Merrienboer van B, Gulcehre C, et al. Learning phrase representations using RNN encoder–decoder for statistical machine translation // Proceedings of the Conference on Empirical Methods in Natural Language Processing. Doha, 2014: 1724
|
[25] |
Xu K, Ba J L, Kiros R, et al. Show, attend and tell: Neural image caption generation with visual attention // Proceedings of 32nd International Conference on Machine Learning. Lille, 2015: 2048
|
[26] |
Das A, Kottur S, Gupta K, et al. Visual dialog[J/OL]. arXiv Preprint (2017-08-01) [2019-06-16]. https://arxiv.org/abs/1611.08669
|
[27] |
Hodosh M, Young P, Hockenmaier J. Framing image description as a ranking task: Data, models and evaluation metrics. J Artif Intell Res, 2013, 47: 853 doi: 10.1613/jair.3994
|
[28] |
Young P, Lai A, Hodosh M, et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Trans Assoc Comput Linguist, 2014, 2: 67
|
[29] |
Lin T Y, Maire M, Belongie S, et al. Microsoft coco: common objects in context // Proceedings of European Conference on Computer Vision. Zurich, 2014: 740
|
[30] |
Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-Learning // AAAI Conference on Artificial Intelligence. Phoenix, 2016: 2094
|
[31] |
Schaul T, Quan J, Antonoglou I, et al. Prioritized experience replay[J/OL]. arXiv Preprint (2016-02-25) [2019-06-16]. https://arxiv.org/abs/1511.05952
|
[32] |
Wang Z, Schaul T, Hessel M, et al. Dueling network architectures for deep reinforcement learning // Proceedings of 33rd International Conference on Machine Learning. New York, 2016: 1995
|
[33] |
Schulman J, Levine S, Mortiz P, et al. Trust region policy optimization // Proceedings of 31st International Conference on Machine Learning. Lille, 2015: 1889
|
[34] |
Kandasamy K, Bachrach Y, Tomioka R, et al. Batch policy gradient methods for improving neural conversation models[J/OL]. arXiv preprint (2017-02-10) [2019-06-16]. https://arxiv.org/abs/1702.03334
|
[35] |
Bhatnagar S, Sutton R S, Ghavamzadeh M, et al. Natural actor-critic algorithms. Automatica, 2009, 45(11): 2471 doi: 10.1016/j.automatica.2009.07.008
|
[36] |
Grondman I, Busoniu L, Lopes G A D, et al. A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans Syst Man Cybern Part C Appl Rev, 2012, 42(6): 1291 doi: 10.1109/TSMCC.2012.2218595
|
[37] |
Mnih V, Badia A P, Mirza M, et al. Asynchronous methods for deep reinforcement learning // Proceedings of 33rd International Conference on Machine Learning. New York, 2016: 1928
|
[38] |
Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[J/OL]. arXiv Preprint (2016-02-29) [2019-06-16]. https://arxiv.org/abs/1509.02971
|
[39] |
Kulkarni T D, Saeedi A, Gautam S, et al. Deep successor reinforcement learning[J/OL]. arXiv Preprint (2016-06-08) [2019-06-16]. https://arxiv.org/abs/1606.02396
|
[40] |
Xu C, Li Q, Zhang D, et al. Deep successor feature learning for text generation[J/OL]. Neurocomputing, (2019-04-25) [2019-06-16]. https://doi.org/10.1016/j.neucom.2018.11.116
|
[41] |
Zhang J W, Springenberg J T, Boedecker J, et al. Deep reinforcement learning with successor features for navigation across similar environments[J/OL]. arXiv Preprint (2017-07-23) [2019-06-16]. https://arxiv.org/abs/1612.05533
|
[42] |
Bowling M, Burch N, Johanson M, et al. Heads-up limit hold’em poker is solved. Science, 2015, 347(6218): 145 doi: 10.1126/science.1259433
|
[43] |
Liu X, Xia T, Wang J, et al. Fully convolutional attention localization networks for fine-grained recognition[J/OL]. arXiv Preprint (2017-03-21) [2019-06-16]. https://arxiv.org/abs/1603.06765
|
[44] |
Zoph B, Le Q V. Neural architecture search with reinforcement learning[J/OL]. arXiv Preprint (2017-02-15) [2019-06-16]. https://arxiv.org/abs/1611.01578
|
[45] |
Theocharous G, Thomas P S, Ghavamzadeh M. Personalized ad recommendation systems for life-time value optimization with guarantees // International Joint Conferences on Artificial Intelligence. Buenos Aires, 2015: 1806
|
[46] |
Cuayáhuitl H. Simple D S: A simple deep reinforcement learning dialogue system // Dialogues with Social Robots. Springer, Singapore, 2017: 109
|
[47] |
He D, Xia Y C, Qin T, et al. Dual learning for machine translation // Advances in Neural Information Processing Systems. Barcelona, 2016: 820
|
[48] |
Zhang X X, Lapata M. Sentence simplification with deep reinforcement learning // Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark, 2017: 584
|
[49] |
Narasimhan K, Kulkarni T D, Barzilay R. Language understanding for text-based games using deep reinforcement learning // Proceedings of the Conference on Empirical Methods in Natural Language Processing. Lisbon, 2015: 1001
|
[50] |
Williams R J, Zipser D. A learning algorithm for continually running fully recurrent neural networks. Neural Comput, 1989, 1(2): 270 doi: 10.1162/neco.1989.1.2.270
|
[51] |
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput, 1997, 9(8): 1735 doi: 10.1162/neco.1997.9.8.1735
|
[52] |
He J, Chen J, He X, et al. Deep reinforcement learning with a natural language action space // Proceedings of 54th Annual Meeting of the Association for Computational Linguistics. Berlin, 2016: 1621
|
[53] |
Guo H. Generating text with deep reinforcement learning[J/OL]. arXiv Preprint (2015-10-30) [2019-06-16]. https://arxiv.org/abs/1510.09202
|
[54] |
Papineni K, Roukos S, Ward T, et al. BLEU: a method for automatic evaluation of machine translation // Proceedings of 40th Annual Meeting of Association for Computational Linguistics. Philadelphia, 2002: 311
|
[55] |
Sutton R S, McAllester D A, Singh S P, et al. Policy gradient methods for reinforcement learning with function approximation // Advances in Neural Information Processing Systems. Denver, 2000: 1057
|
[56] |
Ranzato M A, Chopra S, Auli M, et al. Sequence level training with recurrent neural networks[J/OL]. arXiv Preprint (2016-05-06) [2019-06-16]. https://arxiv.org/abs/1511.06732
|
[57] |
Li J W, Monroe W, Shi T L, et al. Adversarial learning for neural dialogue generation[J/OL]. arXiv Preprint (2017-09-24) [2019-06-16]. https://arxiv.org/abs/1701.06547
|
[58] |
Lin C Y. Rouge: A package for automatic evaluation of summaries // Proceedings of Workshop on Text Summarization Branches Out, Post Conference Workshop of ACL 2004. Barcelona, 2004: 8
|
[59] |
Rennie S J, Marcheret E, Mroueh Y, et al. Self-critical sequence training for image captioning[J/OL]. arXiv Preprint (2017-11-16) [2019-06-16]. https://arxiv.org/abs/1612.00563
|
[60] |
Vedantam R, Lawrence Z C, Parikh D. CIDEr: Consensus-based image description evaluation // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, 2015: 4566
|
[61] |
Banerjee S, Lavie A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments // Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Ann Arbor, 2005: 65
|
[62] |
Wang L, Yao J L, Tao Y Z, et al. A reinforced topic-aware convolutional sequence-to-sequence model for abstractive text summarization // Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm, 2018: 4453
|
[63] |
Wu Y X, Hu B T. Learning to extract coherent summary via deep reinforcement learning // Proceedings of Thirty-Second AAAI Conference on Artificial Intelligence. New Orleans, 2018: 5602
|
[64] |
Li J W, Monroe W, Ritter A, et al. Deep reinforcement learning for dialogue generation[J/OL]. arXiv Preprint (2016-09-29) [2019-06-16]. https://arxiv.org/abs/1606.01541
|
[65] |
Takanobu R, Huang M, Zhao Z Z, et al. A weakly supervised method for topic segmentation and labeling in goal-oriented dialogues via reinforcement learning // Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. Stockholm, 2018: 4403
|
[66] |
Bahdanau D, Brakel P, Xu K, et al. An actor-critic algorithm for sequence prediction[J/OL]. arXiv Preprint (2017-03-03) [2019-06-16]. https://arxiv.org/abs/1607.07086
|
[67] |
Su P H, Budzianowski P, Ultes S, et al. Sample-efficient actor-critic reinforcement learning with supervised data for dialogue management[J/OL]. arXiv Preprint (2017-07-05) [2019-06-16]. https://arxiv.org/abs/1707.00130
|
[68] |
Wang Z Y, Bapst V, Heess N, et al. Sample efficient actor-critic with experience replay[J/OL]. arXiv Preprint (2017-07-10) [2019-06-16]. https://arxiv.org/abs/1611.01224
|
[69] |
Peters J, Schaal S. Natural actor-critic. Neurocomputing, 2008, 71(7-9): 1180 doi: 10.1016/j.neucom.2007.11.026
|
[70] |
Chen L, Su P H, Gasic M. Hyper-parameter optimisation of gaussian process reinforcement learning for statistical dialogue management // Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Prague, 2015: 407
|
[71] |
Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets // Advances in Neural Information Processing Systems. Montréal, 2014: 1
|
[72] |
Yu L T, Zhang W N, Wang J, et al. SeqGAN: Sequence generative adversarial nets with policy gradient // Proceedings of Thirty-First AAAI Conference on Artificial Intelligence. Palo Alto, 2017: 2852
|
[73] |
Pfau D, Vinyals O. Connecting generative adversarial networks and actor-critic methods[J/OL]. arXiv Preprint (2017-01-18) [2019-06-16]. https://arxiv.org/abs/1610.01945
|
[74] |
Serban I V, Sankar C, Germain M, et al. A deep reinforcement learning chatbot[J/OL]. arXiv Preprint (2017-11-05) [2019-06-16]. https://arxiv.org/abs/1709.02349
|
[75] |
He D, Lu H Q, Xia Y C, et al. Decoding with value networks for neural machine translation //Advances in Neural Information Processing Systems. Long Beach, 2017: 177
|
[76] |
Mnih V, Badia A P, Mirza M, et al. Asynchronous methods for deep reinforcement learning // Proceedings of 33rd International Conference on Machine Learning. New York, 2016: 1928
|
[77] |
Casanueva I, Budzianowski P, Su P H, et al. Feudal reinforcement learning for dialogue management in large domains // Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. New Orleans, Louisiana, 2018: 714
|
[78] |
Dayan P, Hinton G E. Feudal reinforcement learning // Advances in Neural Information Processing Systems. Denver, 1993: 271
|
[79] |
Xiong W, Hoang T, Wang W Y. DeepPath: a reinforcement learning method for knowledge graph reasoning // Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, 2017: 564
|
[80] |
Buck C, Bulian J, Ciaramita M, et al. Ask the right questions: active question reformulation with reinforcement learning. arXiv Preprint (2018-03-02) [2019-06-16]. https://arxiv.org/abs/1705.07830
|
[81] |
Feng J, Huang M L, Zhao L, et al. Reinforcement learning for relation classification from noisy data // Proceedings of Thirty-Second AAAI Conference on Artificial Intelligence. New Orleans, 2018: 5779
|
[82] |
Zhang T Y, Huang M L, Zhao L. Learning structured representation for text classification via reinforcement learning // Proceedings of Thirty-Second AAAI Conference on Artificial Intelligence. New Orleans, 2018: 6053
|