BETA

Written by: Desislava Dimitrova, New Bulgarian University, Sofia
e-mail: ddimitrova@nbu.bg

Dimitar Atanasov, New Bulgarian University, Sofia
e-mail: datanasov@nbu.bg

Download: Presentation in Adobe .PDF format
This presentation is based on our experience in developing a new test for Bulgarian as a foreign language.
There are four main steps in the test development process: development, test production process, live exam, analysis and review. On the stage of development two main features are very important: decisions and task design. They are the clue for theory-based validity, context validity and absence of bias. On the stage of test production the important thing is to gather information for the appropriate language use. On the stage of live exam scoring validity should be establish, and on the stage of analysis and review are very important to collect evidence for criterion-related validity, absence of bias and beneficial consequences.
The nature of language ability use to be the most difficult question in test development. But nowadays, Bachman (Bachman & Palmer 2010, forthcoming Language assessment in practice: developing language tests and justifying their use the real world) argues that: “Primary use of an assessment is to gather information to help us make decisions that will lead to beneficial consequences for stake holders.”
So the test development process begins with the understanding of consequences, decisions and interpretations about the test taker’s language ability.
For the new test we have described the decision’s chain like this:

  • It will be a test for foreigners who do not live in Bulgaria, but they learn the Bulgarian language for some reasons.
  • The test score will be used from test takers to understand their own progress in learning, and to give them a reasonable language profile with weaknesses and strengths.
  • The test score will be used from the teacher for course design.
  • The test score will be interpreting on terms of communicative language competence and the interpretation framework will be criterion reference, following the scale and descriptors of CERF.

The interpretation of test taker’s language ability in terms of CERF is a matter of art, even if you are familiar with CERF. The context of CERF gives two kinds of interpretative scheme: Descriptive Scheme and Common Reference Levels.
The key elements in the descriptive scheme are the various components of communicative language competence, language activities and domains.
The Common Reference Levels provide a set of 6 defined criterion levels for use as common standards.
The first problem is a decision of which one is a measurable element in the construct of language ability, which one is an assessment criteria and which one is the task in language test.
The second problem is a decision of how many language tasks are enough for the true interpretation of the test taker’s language ability.
We need to make a profile of communicative language ability, we need to give feedback about strengths and weaknesses. We need to gather information in an appropriate way, we need to collect evidences.
We claim to undertake the principle of cognitive diagnosis assessment for creating a language profile of the test takers.
We claim to use CDA to interpreting the test results in term of communicative language competence, and to give feedback for learners language profile in terms of strengths and weaknesses on various components of communicative language competence: linguistic, sociolinguistic and pragmatic.
CDA is designed to measure specific knowledge structures and processing skills in students so as to provide information about their cognitive strengths and weaknesses.
CDAs have two major components: a) content analysis of test items to identify their relationships to cognitive attributes of interest, and b) psychometric modeling of these attributes and items. Attributes refer to a set of specific skills, knowledge, competencies, mental processes, and strategies that examinees should master or possess to answer an item correctly.
Because of the principles of CDA and established by the purposes of our test, we have decided to create a Q-matrix on the basis of framework. At first we have chosen the elements of linguistic and pragmatic language competence. We define 6 attributes, following the CERF description. Then we have chosen the language activities, from the list of CERF, interpreting them as a product of language use.
Finally we have chosen the evidence for each language activity, interpreting them as assessment criteria.
At this stage we make the tests’ task design in connection of language activities from CERF.

  • 1LC 1 knowledge of, and ability to use, the vocabulary of a language, consists of lexical elements and grammatical elements.
  • 2LC 2 knowledge of, and ability to use, the grammatical resources of a language.
  • 3LC 3 a knowledge of, and skill in the perception and production of the sound-units (phonemes) of the language and their realisation in particular contexts
  • 4LC 4 knowledge of and skill in the perception and production of the symbols of which written texts are composed.
  • 5PK 1 is the ability of a user/learner to arrange sentences in sequence so as to produce coherent stretches of language.
  • 6PK 2 the use of spoken discourse and written texts in communication for particular functional purposes

Here is an example of Q-matrix. As mentioned above two steps are needed to make conclusions about test takers language profile, based on the principle of CDA.
First: we need to verify the content of Q-matrix: if the identified attribute are the same attributes, that examinees should master or possess to answer an item correctly.
Second: we need a statistical method for estimating a set of attribute (or skill) mastery patterns based on an examinee’s responses to test items. These statistical models are called cognitively diagnostic psychometric models.
In our work we apply the model, developed by D. Dimitrov and D. Atanasov. A demo version of this statistical package is available on the web-site of Assessment Centre in NBU.
We run statistical analysis for 34 items of the test, which tests listening, grammar and reading for orientation.
We identify tree types of cases:

  1. When the attributes are not enough to perform the item correctly. For this case we need to identify more attributes. If the examinee has mastered supposed attributes, the performance should be easier.
    But the IRT difficulty parameters show that most of the items are not so difficult. Our conclusion for this case is to change the content of Q-matrix.
  2. When the number of attributes is more than the examinee needs to perform the item correctly. We supposed more complicated performance, which involves mastering in tree or four attributes. Our conclusion for this case is the same: to change the number of attributes, mapping to this group of items.
  3. When the attributes are enough for successful item performance. This case is an example for a true relationship between attributes and item performance.

The revision of Q-matrix should be concern with attribute number 3, 4 and 6. Attribute Number 1, 2 and 5 are shown as fully engaged in test performance from low to high ability. Attribute 4 is in relationship with low ability and attribute number 6 is not in relationship with language ability. It’s a part of full range of ability: from the lowest to the highest level.
In order to achieve useful feedback for language profile of the test taker we need to change the grain size of attributes. We need a finer structure of description.
The CERF gives us this point of view. One way could be to identify the various components of communicative competence: as group of knowledge, group of understanding and group of applying.
In order to achieve useful feedback for language profile of the test taker we need to change the grain size of attributes. We need a finer structure of description.
The CERF gives us this point of view. One way could be to identify the various components of communicative competence: as group of knowledge, group of understanding and group of applying.
References:

  • Cyril J. Weir, Limitation of the Common European Framework for developing comparable examinations and tests. Language Testing, 2: 281-300, 2005.
  • Dimiter Dimitrov, Least Squares Distance Method of Cognitive Validation and Analysis for Binary Items Using Their Item Response Theory Parameters in Applied Psychological Measurement. 2007. Sage Publication
  • Eunice Eunhee Jang, Demystifying a Q-matrix for Making Diagnostic Inferences about L2 Reading Skills. Language Assessment Quarterly, 6: 210-238, 2009.
  • Lyle Bachman and Adrian Palmer, Language Assessment in Practice. 2010. OUP
  • Yong-Won Lee, Cognitive Diagnosis Approaches to Language Assessment: An Overview.
  • Language Assessment Quarterly, 6: 172-189, 2009.
  • Yong-Won Lee, Application of Three Cognitive Diagnosis Model to ESL Reading and Listening Assessments. Language Assessment Quarterly, 6: 239-263, 2009