Clinical named entity recognition from Chinese electronic medical records using a double-layer annotation model combining a domain dictionary with CRF

GONG Le-jun; ZHANG Zhi-fei

doi:10.13374/j.issn2095-9389.2019.09.04.004

Volume 42 Issue 4

Apr. 2020

Turn off MathJax

Article Contents

Article Navigation > Chinese Journal of Engineering > 2020 > 42(4): 469-475

GONG Le-jun, ZHANG Zhi-fei. Clinical named entity recognition from Chinese electronic medical records using a double-layer annotation model combining a domain dictionary with CRF[J]. Chinese Journal of Engineering, 2020, 42(4): 469-475. doi: 10.13374/j.issn2095-9389.2019.09.04.004

Citation:

GONG Le-jun, ZHANG Zhi-fei. Clinical named entity recognition from Chinese electronic medical records using a double-layer annotation model combining a domain dictionary with CRF[J]. Chinese Journal of Engineering, 2020, 42(4): 469-475. doi: 10.13374/j.issn2095-9389.2019.09.04.004

Citation:

GONG Le-jun, ZHANG Zhi-fei. Clinical named entity recognition from Chinese electronic medical records using a double-layer annotation model combining a domain dictionary with CRF[J]. Chinese Journal of Engineering, 2020, 42(4): 469-475. doi: 10.13374/j.issn2095-9389.2019.09.04.004

PDF( 730 KB)

Clinical named entity recognition from Chinese electronic medical records using a double-layer annotation model combining a domain dictionary with CRF

doi: 10.13374/j.issn2095-9389.2019.09.04.004

GONG Le-jun^{1, 2
,
,},
ZHANG Zhi-fei^{1, 2}

1.
School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
2.
Jiangsu Key Lab of Big Data Security & Intelligent Processing, Nanjing 210023, China

More Information

Corresponding author: E-mail: glj98226@163.com
Received Date: 2019-09-04
Publish Date: 2020-04-01

Abstract

Abstract

As a document recorded by professional medical personnel, electronic medical records contain a large and important clinical resource. How to use a large amount of potential information in electronic medical records has become one of the major research directions. Chinese electronic medical records are knowledge-intensive, in which the data has considerable research value. However, they have more complex entities because of the language features of Chinese, and the composite entity is long. These sentences components in the text are missing. Moreover, the boundaries of clinical entities are often unclear. Labeling corpus is a job that requires a great deal of manpower because of the technical language used in a given text. Therefore, the recognition of Chinese clinical named entities is a hard problem. Considering these characteristics of Chinese electronic medical records, this paper proposed a double-layer annotation model that combined with a domain dictionary and conditional random field (CRF). A medical domain dictionary was constructed by statistical analysis method, and combined with CRF to mark two different granularity labeling operations. The manually constructed medical domain dictionary has extremely high accuracy for the recognition of registered words, and machine learning could automatically recognize unregistered words. This work integrated the two aspects based on these advantages. With the proposed method, diseases, symptoms, drugs, and operations could be recognized from Chinese electronic medical records. Using the test dataset, the Macro-P with 96.7%, the Macro-R with 97.7% and the Macro-F1 with 97.2% were obtained. The recognition performance of the proposed method was greatly improved compared with that of a single-layer model. The recognition effect of deep neural network with attention was also analyzed, which did not perform well due to the size of the domain dataset. The experimental results show the efficiency of the double-layer annotation model for the named entity recognition of Chinese electronic medical records.
- Chinese electronic medical records,
- clinical named entity recognition,
- medical domain dictionary,
- conditional random field,
- attention

FullText(HTML)

References(27)

References

[1]	張立邦. 基于半監督學習的中文電子病歷分詞和名實體挖掘[學位論文]. 哈爾濱: 哈爾濱工業大學, 2014 Zhang L B. Word Segmentation and Named Entity Mining Based on Semi Supervised Learning for Chinese EMR[Dissertation]. Harbin: Harbin Institute of Technology, 2014
[2]	Huang Z H, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging[J/OL]. arXiv preprint. (2015-08-09) [2019-09-04]. https://arxiv.org/abs/1508.01991
[3]	Wang Y Q, Yu Z H, Chen L, et al. Supervised methods for symptom name recognition in free-text clinical records of traditional Chinese medicine: an empirical study. J Biomed Inf, 2014, 47: 91 doi: 10.1016/j.jbi.2013.09.008
[4]	Xu Y, Wang Y N, Liu T R, et al. Joint segmentation and named entity recognition using dual decomposition in Chinese discharge summaries. J Am Med Inf Assoc, 2014, 21(e1): e84 doi: 10.1136/amiajnl-2013-001806
[5]	Lei J B, Tang B Z, Lu X Q, et al. A comprehensive study of named entity recognition in Chinese clinical text. J Am Med Inf Assoc, 2014, 21(5): 808 doi: 10.1136/amiajnl-2013-002381
[6]	許源, 葛艷秋, 王強, 等. 基于CRF與RUTA規則相結合的卒中入院記錄醫學實體識別及應用. 中山大學學報(醫學版), 2018, 39(3):455 Xu Y, Ge Y Q, Wang Q, et al. Medical name entity recognition and application in Chinese admission record of stroke patients based on CRF and RUTA rule. J Sun Yat-sen Univ Med Sci, 2018, 39(3): 455
[7]	張祥偉, 李智. 基于多特征融合的中文電子病歷命名實體識別. 軟件導刊, 2017, 16(2):128 Zhang X W, Li Z. Chinese electronic medical record named entity recognition based on multi-feature fusion. Softw Guide, 2017, 16(2): 128
[8]	于露, 金龍哲, 王夢飛, 等. 基于深度學習的人體低氧狀態識別. 工程科學學報, 2019, 41(6):817 Yu L, Jin L Z, Wang M F, et al. Recognition of human hypoxic state based on deep learning. Chin J Eng, 2019, 41(6): 817
[9]	夏宇彬, 鄭建立, 趙逸凡, 等. 基于深度學習的電子病歷命名實體識別. 電子科技, 2018, 31(11):31 Xia Y B, Zhen J L, Zhao Y F, et al. Deep learning based named entity recognition of electronic medical record. Electron Sci Technol, 2018, 31(11): 31
[10]	Li F, Zhang M S, Tian B, et al. Recognizing irregular entities in biomedical text via deep neural networks. Pattern Recognit Lett, 2018, 105: 105 doi: 10.1016/j.patrec.2017.06.009
[11]	Liu Z J, Yang M, Wang X L, et al. Entity recognition from clinical texts via recurrent neural networks. BMC Med Inf Decis Making, 2017, 17(Suppl 2): 67
[12]	Chowdhury S, Dong X S, Qian L J, et al. A multitask bi-directional RNN model for named entity recognition on Chinese electronic medical records. BMC Bioinf, 2018, 19(Suppl 17): 499
[13]	申站.基于神經網絡的中文電子病歷命名實體識別[學位論文]. 北京: 北京郵電大學, 2018 Shen Z. Named Entity Recognition for Chinese Electronic Record with Neural Network[Dissertation]. Beijing: Beijing University of Posts and Telecommunications, 2018
[14]	Wei Q K, Chen T, Xu R F, et al. Disease named entity recognition by combining conditional random fields and bidirectional recurrent neural networks. Database, 2016, 2016: baw140 doi: 10.1093/database/baw140
[15]	Wu Y H, Yang X, Bian J, et al. Combine factual medical knowledge and distributed word representation to improve clinical named entity recognition. AMIA Annu Symp Proc, 2018, 2018: 1110
[16]	Jagannatha A N, Yu H. Bidirectional RNN for medical event detection in electronic health records // Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. California, 2016: 473
[17]	Rajkomar A, Oren E, Chen K, et al. Scalable and accurate deep learning with electronic health records[J/OL]. arXiv preprint. (2018-05-11) [2019-09-04]. https://arxiv.org/abs/1801.07860
[18]	Wang Y, Wang L, Rastegar-Mojarad M, et al. Clinical information extraction applications: a literature review. J Biomed Inf, 2018, 77: 34 doi: 10.1016/j.jbi.2017.11.011
[19]	Luka G, Andrey K, Paul G, et al. Named entity recognition in electronic health records using transfer learning bootstrapped neural networks[J/OL]. arXiv preprint. (2019-07-29) [2019-09-04]. https://arxiv.org/abs/1901.01592
[20]	栗偉, 趙大哲, 李博, 等. CRF與規則相結合的醫學病歷實體識別. 計算機應用研究, 2015, 32(4):1082 doi: 10.3969/j.issn.1001-3695.2015.04.029 Li W, Zhao D Z, Li B, et al. Combining CRF and rule based medical named entity recognition. Appl Res Comput, 2015, 32(4): 1082 doi: 10.3969/j.issn.1001-3695.2015.04.029
[21]	施聰鶯, 徐朝軍, 楊曉江. TFIDF算法研究綜述. 計算機應用, 2009, 29(增刊 1):167 Shi C Y, Xu Z J, Yang X J. Study of TFIDF algorithm. J Comput Appl, 2009, 29(Suppl 1): 167
[22]	李航. 統計學習方法. 北京: 清華大學出版社, 2012 Li H, Statistical learning methods. Beijing: Tsinghua University Press, 2012
[23]	楊錦鋒, 關毅, 何彬, 等. 中文電子病歷命名實體和實體關系語料庫構建. 軟件學報, 2016, 27(11):2725 Yang J F, Guan Y, He B, et al. Corpus construction for named entities and entity relations on Chinese electronic medical records. J Softw, 2016, 27(11): 2725
[24]	Uzuner O, South B R, Shen S Y, et al. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inf Assoc, 2011, 18(5): 552 doi: 10.1136/amiajnl-2011-000203
[25]	Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J/OL]. arXiv preprint. (2017-12-06) [2019-09-04]. https://arxiv.org/abs/1706.03762
[26]	Luo L, Yang Z, Yang P, et al. An attention-based BiLSTM-CRF approach to document level chemical named entity recognition. Bioinformatics, 2018, 34(8): 1381 doi: 10.1093/bioinformatics/btx761
[27]	Zhang Y, Wang X W, Hou Z, et al. Clinical named entity recognition from Chinese electronic health records via machine learning methods. JMIR Med Inf, 2018, 6(4): e50 doi: 10.2196/medinform.9965