Acta Psychologica Sinica


Vol. 33 No. 2 , Pages 97 - 103 , 2001

Putonghua Test: Feasibility of Tape-Recorded Marking, Reliability and Economic Efficiency (Article written in chinese)

CHANG Lei, HAU Kit-Tai, HO Wai-Kit, WEN Jian-Bing, & WANG Yuguang

Abstract

This article contains two generalizability studies of the State Putonghua test. In the first study, we examined the consistency between the live and tape-recorded assessment of Putonghua. Twenty-five examinees participated in the first study. There were eight raters divided into four panels of two each. Five examinees were assigned into one of the four panels. The live assessment in the four panels took place simultaneously. During the live assessment, examinees were tape-recorded. Each examinee’s tape was later assessed by all eight raters. Standard assessment instrument (prepared by the State Language Commission) was used. For the purpose of this study, all examinees received the same items. The items were rated on a three-point scale where 0 = no credit, 1 = partial credit, and 2 = full credit. The objects of measurement were examinees (e) who were nested in panels (p). The raters (r) who were also nested in panels were crossed with examinees. Items (i) and mode (m) of assessment (i.e., recorded versus live assessment) were crossed with the rest of the conditions. The G study design was (e × r): p × i × m. Except for the mode facet which was considered fixed, all other facets and the objects of measurement were assumed random. This special G study focused on determining the consistency between the live and recorded assessment. The results indicated a relatively high degree of consistency. The signal-to-noise ratio reached 0.80, meaning that 80% of the absolute domain status differences in Putonghua were exchangeable between these two modes of assessment.

The purpose of the second study was to determine an efficient tape-recording assessment procedure to be adopted in the future. The objective was to employ an efficient number of raters and items in measuring Putonghua which will maximize reliability and minimize costs. The second study adopted a fully crossed design so that unique variance could be estimated. Tapes of 25 examinees were each rated by the same six raters on 50 single-word items. The G study design was e × r × i. Among the seven variance components, the largest was associated with the item facet, indicating the importance of sampling more items. Using two raters and 100 items will achieve a satisfactory reliability of 0.90 and 0.84 for norm and domain referenced use of the test.

Keywords: generalizability theory; reliability; Putonghua test

[Chinese Version | Index | Acta Psychologica Sinica | Other Journals | Subscription form | Enquiry ]


Mail any comments and suggestions to hkier-journal@cuhk.edu.hk .