基于深層特征抽取的日文詞義消歧系統

雷雪梅; 王大亮; 田中貴秋; 曾廣平

doi:10.13374/j.issn1001-053x.2010.02.024

留言板

尊敬的讀者、作者、審稿人, 關于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復。謝謝您的支持!

姓名

郵箱

手機號碼

標題

留言內容

驗證碼

基于深層特征抽取的日文詞義消歧系統

doi: 10.13374/j.issn1001-053x.2010.02.024

1. 北京科技大學信息工程學院, 北京 100083;
2. 中國電信集團系統集成公司, 北京 100035;
3. NTT通信科學研究所自然語言研究組, 京都 6190237

基金項目:

國家高技術研究發展計劃資助項目（No.2007AA01Z170）

詳細信息

作者簡介:
雷雪梅(1972-),女,博士研究生;曾廣平(1962-),男,教授,博士生導師,E-mail:zgping20012002@yahoo.com.cn

中圖分類號: TP391
計量
- 文章訪問數: 108
- HTML全文瀏覽量: 15
- PDF下載量: 5
- 被引次數: 0
出版歷程
- 收稿日期: 2009-05-01

Japanese word sense disambiguation system based on deep feature extraction

1. School of Information Engineering, University of Science and Technology Beijing, Beijing 100083, China;
2. System Integration Company, China Telecom Corporation, Beijing 100035, China;
3. Natural Language Research Group, NTT Communication Science Laboratories, Kyoto 6190237, Japan

摘要

摘要: 詞義消歧的特征來源于上下文.日文兼有中英文的語言特性,特征抽取更為復雜.針對日文特點,在詞義消歧邏輯模型基礎上,利用最大熵模型優良的信息融合性能,采用深層特征抽取方法,引入語義、句法類特征用于消解歧義.同時,為避免偏斜指派,采用BeamSearch算法進行詞義序列標注.實驗結果表明,與僅使用表層詞法類特征方法相比,本文構造的日文詞義消歧系統的消歧精度提高2%～3%,動詞消歧精度獲得5%的改善.
- 自然語言處理 /
- 詞義消歧 /
- 最大熵模型 /
- 特征抽取
Abstract: The features of word sense disambiguation (WSD) come from the context. Japanese has linguistic features of both Chinese and English at the same time, thus the feature extraction of Japanese is more complicated. Considering Japanese features, based on the proposed WSD logic model and applying the characteristics of information integration of the maximum entropy model, WSD was solved by the deep feature extraction method, introducing semantics and syntactics features. Meanwhile, for preventing the skewed assignment of lonely word sense, the word sense tagging of word sequences was completed with the BeamSearch algorithm. Experiment results show that compared with WSD methods which only focus on the surface lexical features, the disambiguation accuracy of the Japanese WSD system proposed in this paper increases 2% to 3%, and the WSD accuracy of verbs improves 5%.
- natural language processing /
- word sense disambiguation /
- maximum entropy model /
- feature extraction