Computerized Adaptive Testing as a means for Mathematics Assessment

Leung Chi Keung, Eddie
Department of Mathematics, Hong Kong Institute of Education

Introduction

Using information technology (IT) in education is a hot issue in Hong Kong recently. Many people connect using IT in education with computer-assisted instruction (CAI), computer-assisted learning (CAL) and seeking information from the internet. Indeed, using IT may also bring in other advantages such as a continuous and dynamic approach to testing.

Assessment is an essential part of local Primary and Secondary Mathematics teaching and learning. It provides useful information to a variety of users for a variety of purposes. The various purposes for which users require information from assessment can be summarized as formative, summative, evaluative, predictive, comparative and selective (Education Commission, 1990, p.64). There are many methods of assessments such as oral questioning, written work, portfolio, observation and use of tests. All these methods can be used formally or informally. Each method has its own advantages as well as limitations. One should choose the right form according to the specific objectives of the assessment.

In Hong Kong, the class sizes of primary and secondary schools are generally around 40. Assessment of student's mathematics knowledge based on observations on individual performance would be difficult and unreliable in such large classes. This makes written tests the most popular format of assessment used by the mathematics teachers for practical reasons such as validity and reliability. If a student succeeds on a written test, it would be fairly confident to believe that the student has possessed the skills and/or knowledge which the test aims to measure. The weakness of students who fail in the test would be reflected in their work and can be further identified by in-depth questioning. With the test results, the teacher can evaluate his or her teaching and devise planning for later stages.

What is CAT?

No one can deny that written tests are indispensable component of summative assessment, though this format cannot meet all the six Mathematics Assessment Standards proposed by the National Council for Teachers of Mathematics (1995). Yet most written tests meet the first and the most fundamental standard: "Assessment should reflect the mathematics that all students need to know and be able to do" (NCTM, 1995, 11). With the advances in computing technology and psychometrics, most paper and pencil (P&P) tests can be transformed into the format of computerized adaptive testing (CAT; see, e.g. Lord, 1980; Weiss, 1976; Wainer et al., 1990). One of the main advantages of CAT over P&P is that it enables more efficient and precise trait estimation (Owen, 1975; Weiss, 1982; Wainer, 1990). It must be emphasized that computerized adaptive testing is different from computerized administrated testing which usually refers to a mechanism that randomly select a test item or a subtest from a pool of items with regardless of the ability of the testee (e.g. see Beevers et al., 1995). In contrast, a CAT system adaptively selects an item according to the estimate of the ability of testee based on his or her responses to previous items. In other words, CAT is a dynamic system that can provide tailor-made tests for individuals. If one gets an item correct, then the next item would be more difficult. With the same token, the next item would be easier if one gets a wrong answer for the one right on the screen. Because of the adaptive nature of CAT, examinees always face items that closely match their own individually estimated ability. Consequently, individual test forms of CAT should be shorter as there are less inappropriate items for each individual. At the end of the test, no one is likely to get all answers wrong and scores zero mark; the less competent students would find some items that they could solve and hence retain their interest in the subject. Neither anyone is likely to get all answers correct and scores full mark; thus even the top students understand that there are rooms for improvement. It may happen that two testees get the same number of items correct, however they may have different scores that depend on the parameters like difficulty, discrimination and guessing of the items (Lord, 1980; Hambleton & Swaminathan, 1985). All the conversions of score on the same continuum are done by the statistical procedures of the system.

There are many technical issues on how to build, maintain and use a CAT system. Interested readers may refer to Wainer et al. (1990) to start with.

Is it the right time to establish CAT?

CAT systems have been successfully developed overseas in different areas such as French language proficiency (Burston, Harfouch & Monville-Burston, 1995), Japanese language proficiency (Brown & Iwashita, 1996), and ESL reading comprehension (Young, Shermis, Brutten & Perkins, 1996). Other large scale CAT systems such as Graduate Record Examination (GRE), Graduate Management Admission Test (GMAT) and National Council Licensure Examination for Nurses are run in the United States (see Chang & Ying, 1997). The two new directions in the local education policies: target-oriented curriculum (TOC) and using information technology in education, have pushed the author to advocate the establishment of computerized adaptive testing for upper primary to junior secondary mathematics.

As suggested by Clark et al. (1994, p. 11): "the TOC initiative would need to devise forms of assessment designed to measure students' learning against criteria embodied in standards, in order to measure what they were able to do and how well they could do it, and to highlight their strengths and weaknesses in order to inform future teaching and learning". Within the framework of TOC, attainment targets for individual topics of Mathematics at various stages will be laid down. More and more test items measuring students' achievement in these targets will be constructed with the effort of ED officials, teachers, publishers, tutorial centers, educational researchers and so on. Those items that satisfy the 3-parameter item response theory (IRT) models (Lord, 1980; Hambleton & Swaminathan, 1985) and passes the sensitivity test (Flaugher, 1990), can be calibrated and gathered to form a rich item bank that covers a wide range of abilities.

As the government is planning to equip schools with more computers and establish an intranet among schools, the CAT system can be developed and administered with the support and coordination by the government. If schools have sufficient resources and support, they may download the relevant item banks and establish their own CAT systems.

Is it worthwhile?

At least four parties: students, teachers, school administrators and officials of education department, will benefit from a well-developed CAT system coordinated by the government. Firstly, the pressure on teachers could be partially released. In the study of Leung, Man & Kong (1998), it is found that Mathematics teachers working in TOC schools have more pressure in setting test and examination papers. Teachers are not sure whether the test set by them can cover all the attainment targets. If there is a CAT system containing the item bank for measuring the skills and concepts concerning a certain topic (for example, fractions in the Stage 2 of TOC), then the teachers may simply help the pupils to activate the CAT system and let the computer do the rest. This would save teachers a lot of time on the preparation of tests and marking of scripts when performing summative assessments. Teachers can then utilize their energy on planning and preparation of other kinds of assessments, purposeful and meaningful learning activities for their pupils. In addition, all students experience the same set of examination questions in a formal examination, some of them may try to cheat or look over the shoulders. These kinds of misbehaviour may arouse discipline problems that teachers have to tackle. But if the examination is in CAT form, any two neighbouring students are unlikely to face the same set of questions, thus reducing the number of student misbehaviours.

Secondly, the students can have objective assessments. The computers recognize neither the names nor the faces of individual students, so no marks will be added or deducted by impression. Besides, the CAT system can cater for individual differences by delivering tailor-made tests. A competent student will not face too many simple questions that may lead to an underestimate of his or her proficiency if careless mistakes are made. On the other hand, less able students will not face too many questions that are difficult to them. Thus, their confidence and interests in the subject would not be seriously hampered. In addition, students would spend less time on individual test as the test generated by a CAT system would generally be about half of the length of its P&P counterpart. Furthermore, a well-developed system would be able to immediately issue individual reports on the performance of the testees. Hence, the strengths and weaknesses of the students would be identified. If a student has unsatisfactory result in the test, he or she can re-take the test at the time that he or she feels confident after revision or remedial teaching. Once the students are familiar with the testing procedures and environment, teachers need not accompany the students in their second and subsequent trials. Students just need to book the computers and inform the teachers in advance. If the item bank is rich, the test items at various attempts are very unlikely the same. Since the students themselves can determine the dates for subsequent attempts after failure, their motivation of learning may be stronger when their sense of ownership of learning increases.

Thirdly, the school administrators can have a clearer picture on the achievements of their students and the teaching effectiveness of their staff since the teachers do not know in advance what test forms will be generated by a CAT system. It is not an unusual practice that teachers give tips or similar quiz to their students once they know the test questions. There are several reasons for this kind of action: some teachers worry that the principals may invite them to explain if the performances of their classes are below average; some feel a higher sense of satisfaction if their classes apparently perform better than other classes; and some try to cover up the facts that they do not teach properly and so on. The information gathered from objective data would help the administrators to make better decisions and adjustments in school policies. Besides, schools will be aware that their achievements will be in comparison with other schools in an objective system. Then, they will develop clear and coherent educational goals and utilize their resources wisely to achieve the goals.

Last but not least, the Education Department can obtain more objective data by replacing some of the Hong Kong Attainment Tests in Mathematics with CAT systems. The delivery of the test and the marking can be done by the computers directly. On one hand, it saves teachers lots of time on marking and on the other hand, schools and teachers have less improper ways to boasting up their students' achievement. The officials can use the information to monitor the general standards of students' achievement and assess the effectiveness of new educational initiatives. This would lead to a better decision on resources allocation and future directions.

Conclusion

Computerized adaptive testing is one of the many methods of mathematics assessment. It may not be able to measure all kinds of intellectual ability of students such as communicative skills. However, it can help answer questions commonly asked by various parties such as "How good is my mathematics compared with the same age group?", "How good is my child at mathematics?" and "Are there any differences between students' mathematics achievement this year and the last year?". It provides objective measurement on students' knowledge in mathematics concepts and skills. Its implementation will certainly reduce the workload of mathematics teachers who may then spend more time on the planning and preparation of purposeful and meaningful learning activities for their students. The information gathered from objective data may lead to better decisions and adjustments on teaching- learning cycles, setting of educational goals and targets, resources allocation and professional support.

With the two recent directions in education policy: the implementation of TOC and using IT in education, it is the right time to start establishing a CAT system. CAT can replace many P&P tests of Mathematics. There are many technical issues involved in developing a CAT system. So it may take several years to put the first CAT in mathematics for the public use even if we start planning it now.

References




回《數學教育》第七期目錄
數學教育 第七期 EduMath 7 (12/98)