Applied Linguistics (Yuyan Wenzi Yingyong)


No. 3 , Pages 103 - 109 , 1999

A Package Scheme for Identifying Unlisted Words in Chinese Segmentation (Article written in chinese)

CHEN Xiaohe

Abstract

Identifying unlisted words is a peculiar problem to Chinese segmentation. The variety and vast amount of unlisted words becomes a bottleneck in processing huge corpora. After discussing various methods, the paper proposes a new package scheme: segmenting twice and calculating the probability of Chinese characters as words vs. the probability of unlisted words in fragments. The result of a preliminary open test is quite inspiring.

[Chinese Version | Index | Applied Linguistics (Yuyan Wenzi Yingyong) | Other Journals | Subscription form | Enquiry ]


Mail any comments and suggestions to hkier-journal@cuhk.edu.hk .