Applied Linguistics (Yuyan Wenzi Yingyong)


No. 2 , Pages 88 - 91 , 1999

On the Consistency of Word-segmented Chinese Corpus (Article written in chinese)

SUN Maosong

Abstract

The large-scale word-segmented corpus is an important resource for the study of both linguistics and computational linguistics. One of the criteria on the quality of corpus is its consistency. This paper discusses the major structural types which are likely to generate word segmentation inconsistencies, discriminates between the concepts of “linguistic word” and “psychological word”, and points out that the basic unit of segmented corpus would better be “psychological word”. We conclude that it is impossible to conduct a fully consistent word-segmented corpus due to the fuzziness of “psychological word”, and that our goal should be adjusted to seeking the consistency under controlled condition instead.

[Chinese Version | Index | Applied Linguistics (Yuyan Wenzi Yingyong) | Other Journals | Subscription form | Enquiry ]


Mail any comments and suggestions to hkier-journal@cuhk.edu.hk .