The Use of IRT for Adaptive Computer Tests Service and Consulting

The Article Plan:

What the IRT is?
1. Introduction –definition
2. Example – explanation what are the differences between the classical and IRT approach to skill measurement.
Advantages of IRT implementation:
1. The exact analysis of test structure – the gain of extra information about test due to use of IRT.
2. Creating a computer adaptive testing program (CAT).
What requirements have to be met in order to create CAT?
Our services.

What the IRT is?

1a) Introduction.

In psychometrics IRT (Item Response Theory) is a paradigm of test designing, analyzing and scoring. Tests may measure skills, attitudes, an intensity of opinions and other variables. In contrast with an alternative and simplified approaches to test designing and scoring (eg. Classical Test Theory), IRT does not presupposes, that every test item measures the same intensity of measured phenomena and that every item is of the same quality. Below I explain those differences.

1b) Example

For the purposes of this explanation, I created an example of a cognitive test (mathematical knowledge test) and an example of an affective test (life satisfaction test)

Cognitive test: A Subject is asked about 3 following question. There are only 2 possible results: either subject will render a correct answer or not.

Mathematical Knowledge Test	Subject’s response:	Score
A. 2 + 2 = ?	4	1
B. 8+8*8=?	128	0
C. (3!-4)³⁺¹/(2⁴)=?	1	1
	Global index	2

Analyzing results of the above test with accordance to the widespread classical approach we would sum punctation of the subject’s answer, getting global index. This way of calculating global index requires the assumption that all items are identical with reference to the importance of measured phenomena, in other words the response pattern of subject 1: A-1 B-1 C-0 (sum = 2) will render the same global index as a pattern of subject 2: A-1 B-0 C-1 (sum = 2). That’s why in the classical approach we willy-nilly assume importance equality of items.

And how would this case look like in the IRT approach?

First of all thanks to applying IRT we would be able to obtain 4 pieces of informations about each of test item. Those pieces are:

Item discrimination („a”) – information about how well given position split subject on those of lower and higher level of measured phenomena. Let’s focus on item A from the above mathematical test. If it would turn out that this position is well suited for checking whether a subject had finished kindergarten or not (all of those who finished answered correctly in contrast to those who had not fished) then we would say that item characterized with high discrimination or parameter “a”. No matter how difficult item is, if it neatly split subjects into groups of higher and lower levels of measured phenomena it is a discriminative item, and this is what we look for.
The difficulty of test position („b”) – information how difficult was given position. Thanks to IRT we may calculate standardized difficulty score, which will inform us on which intensity level of measured phenomena (knowledge or attitude) the subject will have a 50% chance of answering given question correctly.
The lower asymptote of test item („c”)
The upper asumptote of test item („d”)

„c” and „d” parameters are used seldom, and their explanation would demand more complicated descriptions, that’s why they are out of the article’s scope.

IRT is as well suited to affective tests (tests where there is no obviously correct answer because most often subjects describe themselves or others in the context of psychological state). In other words, discrimination and difficult parameter could be as well calculated for each item of below “life satisfaction test”.

Affective test: Subject is asked for answer 3 below questions on 5 points Likert scale, where 5 denotes “I totally agree” and 1 denotes “I totally disagree”.

Questions from the “Life Satisfaction Test”	1	2	3	4	5
A. My life is perfect in every possible aspect.
B. When I wake up in the morning I’m full of positive energy.
C. I like my life.

Summarizing, by using IRT we don’t presuppose, that each position is the same, but we are able to describe each one in terms of discrimination and difficulty. How much benefits we are able to draw from this surplus pieces of information is discussed further.

What are the advantages of IRT implementation

2a) Exact analysis of test structure.

If we want to verify a if test consists of items measuring phenomena on each intensity level and simultaneously control their quality (discrimination) you’ve got no other option then apply IRT.

2b) Creating computer adaptive testing (CAT)

The most interesting aspect begins now.

Thanks to theoretical bases of IRT we may create a test which will select only those items for a given subject which will fit his level of measured phenomena in the best way. How does adjustment look like? I will discuss this on the example of an intelligence test.

Traditionally intelligence tests begin with very simple questions, through moderate, and after 20/25 minutes to very hard ones. This procedure is invented in this way to allow for testing all possible subjects with one test. However, we lose a lot of time, when subjects high on intelligence need to go through a lot of easy questions, to finally face some demanding one at the end.

CAT resolve this problem in a very elegant way. CAT calculate the most probably subject’s level taking into account hitherto pattern of response. Customarily test begins with a moderate hard question, if the subject responds correctly, then algorithm choose an item from item base which fit calculated level in the best way. The item which will fit the calculated level best is one, on which subject will have a 50% chance of correct answers. This means that if the subject resolving test would be a child we would present him/her question “A” from mathematical knowledge test, the correct answer would tell us that probably subject had finished kindergarten. There would be no sense to present question “C” to a child because easily predictably fact of incorrect answer would be very low informative.

Summarizing, thanks to CAT we are able to reduce measurement time by 50% and at the same time maintain its reliability.

CAT put much less burden on subjects, allowing for maintaining their motivation through all the test time, same rendering results less laden with measurement error.

CAT Requirements

Creating a CAT test however is more demanding than in a case of the classical approach. First of all, we need to possess a big item base, from which the CAT algorithm will select ones to the presentation. Depending from test type and phenomena set under measurement item base should consist of a number of items 3 to 10 times higher than presumed test length. If we are planning to test matematical knowledge with 10 positions, then we should prepare an item base consisting of 30 to 100 test positions.

In contrast to classical approach, where after item construction we can start using test right away before we implement items to a CAT, we need to first examine those position by asking sufficiently high amount number of subjects for answering on them. How big this amount should be depend from a lot of factors and is assessed for the purposes of every individual test.

Our services.

If you are looking for a team of specialist dedicated to measurement issue and skilled in IRT and CAT projects we gently would like to offer our services. No matter, what kind of phenomena you’re dealing with, if you are interested in:

creating a test for:
- recrutation purposes
- measuring attitude of respondents toward a certain object
- measuring subjects skills in a particular field of knowledge
or you possess already existing test which you would like to:
- modify into CAT one
- test its structure

Please contact us. We are confident of potential benefits which CAT and IRT can derive you.