Citation: | LIU Jian-wei, LIU Jun-wen, LUO Xiong-lin. Research progress in attention mechanism in deep learning[J]. Chinese Journal of Engineering, 2021, 43(11): 1499-1511. doi: 10.13374/j.issn2095-9389.2021.01.30.005 |
[1] |
Carrasco M. Visual attention: The past 25 years. Vision Res, 2011, 51(13): 1484 doi: 10.1016/j.visres.2011.04.012
|
[2] |
Walther D, Rutishauser U, Koch C, et al. Selective visual attention enables learning and recognition of multiple objects in cluttered scenes. Comput Vis Image Underst, 2005, 100(1-2): 41 doi: 10.1016/j.cviu.2004.09.004
|
[3] |
Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate // Proceedings of the 3rd International Conference on Learning Representations. San Diego, 2015: 1
|
[4] |
Cho K, van Merriënboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation [J/OL]. arXiv preprint (2014-6-3) [2020-12-16]. https://arxiv.org/abs/1406.1078
|
[5] |
Xu K, Ba J L, Kiros R, et al. Show, attend and tell: Neural image caption generation with visual attention // Proceedings of the 32nd International Conference on Machine Learning. Lille, 2015: 2048
|
[6] |
Luong T, Pham H, Manning C D. Effective approaches to Attention-based Neural Machine Translation // Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, 2015: 1412
|
[7] |
Yang Z C, Yang D Y, Dyer C, et al. Hierarchical Attention Networks for Document Classification // Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego, 2016: 1480
|
[8] |
Zhang J M, Bargal S A, Lin Z, et al. Top-down neural attention by excitation backprop. Int J Comput Vis, 2018, 126(10): 1084 doi: 10.1007/s11263-017-1059-x
|
[9] |
Gehring J, Auli M, Grangier D, et al. Convolutional sequence to sequence learning // Proceedings of the 34th International Conference on Machine Learning. Sydney, 2017: 1243
|
[10] |
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need // Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, 2017: 6000
|
[11] |
Shen T, Zhou T Y, Long G D, et al. DiSAN: directional self-attention network for RNN/CNN-free language understanding // Proceedings of the AAAI Conference on Artificial Intelligence. Louisiana, 2018: 32
|
[12] |
Lin Z H, Feng M W, Santos C N, et al. A structured self-attentive sentence embedding [J/OL]. arXiv preprint (2017-3-9) [2020-12-16].https://arxiv.org/abs/1703.03130
|
[13] |
Shen T, Zhou T Y, Long G D, et al. Bi-directional block self-attention for fast and memory-efficient sequence modeling // Proceedings of the 6th International Conference on Learning Representations. Vancouver, 2018: 1
|
[14] |
Shen T, Zhou T Y, Long G D, et al. Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling // Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. Stockholm, 2018: 4345
|
[15] |
Kim Y, Denton C, Hoang L, et al. Structured attention networks [J/OL]. arXiv preprint (2017-2-16) [2020-12-16].https://arxiv.org/abs/1702.00887
|
[16] |
Chaudhari S, Mithal V, Polatkan G, et al. An attentive survey of attention models [J/OL]. arXiv preprint (2019-4-5) [2020-12-16]. https://arxiv.org/abs/1904.02874
|
[17] |
Mnih V, Heess N, Graves A. Recurrent models of visual attention // Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge, 2014: 2204
|
[18] |
Chan W, Jaitly N, Le Q, et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition // 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Shanghai, 2016: 4960
|
[19] |
Kiela D, Wang C H, Cho K. Dynamic Meta-Embeddings for Improved Sentence Representations // Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, 2018: 1466
|
[20] |
Maharjan S, Montes M, González F A, et al. A genre-aware attention model to improve the likability prediction of books // Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, 2018: 3381
|
[21] |
Lu J S, Yang J W, Batra D, et al. Hierarchical question-image co-attention for visual question answering. Adv Neural Infor Processing Syst, 2016, 29: 289
|
[22] |
Wang W, Pan S J, Dahlmeier D, et al. Coupled multi-layer attentions for co-extraction of aspect and opinion terms // Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. San Francisco, 2017: 3316
|
[23] |
Ying H C, Zhuang F Z, Zhang F Z, et al. Sequential recommender system based on hierarchical attention networks // Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. Stockholm, 2018: 3926
|
[24] |
De-Arteaga M, Romanov A, Wallach H, et al. Bias in bios: a case study of semantic representation bias in a high-stakes setting // Proceedings of the Conference on Fairness, Accountability, and Transparency. Atlanta, 2019: 120
|
[25] |
Lee J, Shin J H, Kim J S. Interactive visualization and manipulation of attention-based neural machine translation // Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Copenhagen, 2017: 121
|
[26] |
Liu S S, Li T, Li Z M, et al. Visual Interrogation of Attention-Based Models for Natural Language Inference and Machine Comprehension // Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Brussels, 2018: 36
|
[27] |
Jain S, Wallace B C. Attention is not explanation [J/OL]. arXiv preprint (2019-2-26) [2020-12-16].https://arxiv.org/abs/1902.10186
|
[28] |
Wiegreffe S, Pinter Y. Attention is not not explanation // Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, 2019: 11
|
[29] |
Jetley S, Lord N A, Lee N, et al. Learn to pay attention [J/OL]. arXiv preprint (2018-4-6) [2020-12-16].https://arxiv.org/abs/1804.02391
|
[30] |
Sharma S, Kiros R, Salakhutdinov R. Action recognition using visual attention [J/OL]. arXiv preprint (2015-12-12) [2020-12-16]. https://arxiv.org/abs/1511.04119
|
[31] |
Kataoka Y, Matsubara T, Uehara K. Image generation using generative adversarial networks and attention mechanism // 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS). Okayama, 2016: 1
|
[32] |
Gregor K, Danihelka I, Graves A, et al. Draw: A recurrent neural network for image generation // Proceedings of the 32nd International Conference on Machine Learning. Lille, 2015: 1462
|
[33] |
Parmar N, Vaswani A, Uszkoreit J, et al. Image transformer // Proceedings of the 35th International Conference on Machine Learning. Stockholm, 2018: 4052
|
[34] |
Huang P Y, Liu F, Shiang S R, et al. Attention-based Multimodal Neural Machine Translation // Proceedings of the First Conference on Machine Translation. Berlin, 2016: 639
|
[35] |
Zhang H, Goodfellow I, Metaxas D, et al. Self-attention generative adversarial networks // Proceedings of the 36nd International Conference on Machine Learning. Long Beach, 2019: 7354
|
[36] |
Cohn T, Hoang C D V, Vymolova E, et al. Incorporating structural alignment biases into an attentional neural translation model // Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego, 2016: 876
|
[37] |
Feng S, Liu S, Yang N, et al. Improving attention modeling with implicit distortion and fertility for machine translation // Proceedings of the 26th International Conference on Computational Linguistic. Osaka, 2016: 3082
|
[38] |
Eriguchi A, Hashimoto K, Tsuruoka Y. Tree-to-sequence attentional neural machine translation // Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, 2016: 823
|
[39] |
Sankaran B, Mi H, Al-Onaizan Y, et al. Temporal attention model for neural machine translation [J/OL]. arXiv preprint (2016-8-9) [2020-12-16].https://arxiv.org/abs/1608.02927
|
[40] |
Cheng Y, Shen S Q, He Z J, et al. Agreement-based joint training for bidirectional attention-based neural machine translation // Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. New York, 2016: 2761
|
[41] |
Liu L, Utiyama M, Finch A, et al. Neural machine translation with supervised attention [J/OL]. arXiv preprint (2016-9-14) [2020-12-16].https://arxiv.org/abs/1609.04186
|
[42] |
Britz D, Goldie A, Luong M T, et al. Massive exploration of neural machine translation architectures // Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, , 2017: 1442
|
[43] |
Tang G B, Müller M, Rios A, et al. Why Self-attention? A targeted evaluation of neural machine translation architectures // Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, 2018: 4263
|
[44] |
Yin W P, Schütze H, Xiang B, et al. ABCNN: attention-based convolutional neural network for modeling sentence pairs. Trans Assoc Comput Linguist, 2016, 4: 259 doi: 10.1162/tacl_a_00097
|
[45] |
Zhuang P Q, Wang Y L, Qiao Y. Learning attentive pairwise interaction for fine-grained classification. Proc AAAI Conf Artif Intell, 2020, 34(7): 13130
|
[46] |
Zhou P, Shi W, Tian J, et al. Attention-based bidirectional long short-term memory networks for relation classification // Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, 2016: 207
|
[47] |
Wang Y Q, Huang M L, zhu X Y, et al. Attention-based LSTM for aspect-level sentiment classification // Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin, 2016: 606
|
[48] |
Ma Y, Peng H, Cambria E. Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive LSTM // Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. New Orleans, 2018: 5876
|
[49] |
Zhang S C, Loweimi E, Bell P, et al. On the usefulness of self-attention for automatic speech recognition with transformers // Proceedings of 2021 IEEE Spoken Language Technology Workshop (SLT). Shenzhen, 2021: 89
|
[50] |
Sar? L, Moritz N, Hori T, et al. Unsupervised speaker adaptation using attention-based speaker memory for end-to-end ASR // Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Barcelona, 2020: 7384
|
[51] |
Chorowski J, Bahdanau D, Serdyuk D, et al. An online attention-based model for speech recognition [J/OL]. arXiv preprint (2015-06-24) [2020-12-16].https://arxiv.org/abs/1506.07503
|
[52] |
Bahdanau D, Chorowski J, Serdyuk D, et al. End-to-end attention-based large vocabulary speech recognition // Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Shanghai, 2016: 4945
|
[53] |
Shen S, Lee H. Neural attention models for sequence classification: Analysis and application to key term extraction and dialogue act detection [J/OL]. arXiv preprint (2016-3-31) [2020-12-16].https://arxiv.org/abs/1604.00077
|
[54] |
Liu B, Lane I. Attention-based recurrent neural network models for joint intent detection and slot filling [J/OL]. arXiv preprint (2016-9-6) [2020-12-16].https://arxiv.org/abs/1609.01454
|
[55] |
Shen Y, Tan S, Sordoni A, et al. Ordered neurons: Integrating tree structures into recurrent neural networks [J/OL]. arXiv preprint (2018-10-22) [2020-12-16].https://arxiv.org/abs/1810.09536
|
[56] |
Nguyen X P, Joty S, Hoi S C H, et al. Tree-structured attention with hierarchical accumulation [J/OL]. arXiv preprint (2020-2-19) [2020-12-16].https://arxiv.org/abs/2002.08046
|
[57] |
Tsai Y H H, Srivastava N, Goh H, et al. Capsules with inverted dot-product attention routing [J/OL]. arXiv preprint (2020-2-19) [2020-12-16].https://arxiv.org/abs/2002.04764
|