Written by: Helen Trugman, Holon Institute of Technology
This paper focuses on the role of testing at the third stage of motivation process dealing with sustaining the learner’s initial interest. The paper will examine several factors related to the test design and how they influence the students’ decisions about their further efforts. In particular, I will analyze test characteristics that function as motivation-busters, i.e. demotivate students and might even contribute to ‘learned helplessness’ (Seligman 1975). At the end some practical recommendations will be provided on how to avoid problems with test design and thus turn learning into a gratifying task for both students and teachers.
Motivation is a multifaceted phenomenon, which is defined in Williams and Burden (1997:120) as “a state of cognitive and emotional arousal, which leads to a conscious decision to act and gives rise to a period of sustained intellectual and/or physical effort in order to attain a previously set goal”. In other words, motivation is analyzed as a three-stage process encompassing more than simple arousal of interest. It also involves sustaining that interest and investing time and energy into the necessary effort to achieve certain goals. In this paper I will focus on the role of testing at the third stage, which deals with sustaining the learner’s initial interest. It will be examined what factors related to the test design might influence the choices students make about their further efforts. In particular, I will distinguish between motivation-boosters, i.e. test characteristics that make students persist in achieving their goals, and motivation-busters, i.e. test characteristics that demotivate students and interfere with their decision to pursue their goals.
Motivation-boosters and motivation-busters are closely related to harmful and beneficial backwash, which is understood as a negative or a positive effect of testing on the teaching process respectively (see Hughes 1996). Sources of harmful backwash discussed in Hughes (ibid.), such as drilling for skills checked at the test or neglecting skills and topics not included in the test, can be analyzed as motivation-busters leading to student disappointment and discontent with the teaching process. On the other hand, beneficial backwash acts as a motivation-booster, which is achieved by designing tests that form an organic part of the teaching process. Specifically, a good test checks students’ skills and knowledge acquired in a course or, alternatively, helps to reveal which skills and knowledge students have not yet acquired, and thus forces a teacher to introduce them in class.
2. General background
In Israel, university students are required to take reading comprehension courses in English for Academic Purposes in order to be able to cope with vast materials in English in their content courses. The goal of such a language course is to equip students with reading skills and strategies needed to meet their academic requirements in other courses. As discussed in Kirschner, Wexler, and Spector-Cohen (1992: 537), such conditions call for “extensive and varied testing in order to maintain student motivation and cooperation and to ensure some degree of standardization among courses with a large teaching staff”. Since university programs usually range over a broad array of subjects, no uniform tests for English programs exist or can ever be designed. Therefore each institution devises its own tests following the general guidelines of the reading comprehension program. The lack of uniformity in the test format is often accompanied by the freedom in teaching techniques and choice of materials enjoyed by university teachers to a certain extent. The latter, though a welcome and generally positive phenomenon in the academia, might inadvertently lead to negative backwash. Specifically, a teacher’s teaching techniques might happen to be at variance with the course objectives in an institution. The fact that many English teachers teach in several different places and thus might use the same materials or techniques across various institutions also contributes to the problem. The discrepancy between the teacher’s materials and methodology, on the one hand, and the course objectives in a particular institution, on the other, might result in negative backwash, and thus demotivate students. As English program coordinator and teacher-trainer, I have witnessed the negative impact of badly-designed tests on student motivation. The purpose of this paper is to present a number of factors contributing to bad test-writing with the aim to raise the teachers’ awareness of these stumbling blocks and, ultimately, to make learning more gratifying for both students and teachers.
3. Factors to be considered in test-writing
3.1. Thematic relevance of the test
Testing should aim at checking what students have learned and whether they can apply that knowledge to new real-life tasks. Therefore it seems only natural to test, for instance, a design student on reading materials that are thematically related to her major courses. Thematically unrelated testing materials will only complicate the student’s task by affecting her anxiety level and increasing the likelihood of failure due to general misunderstanding of the issue discussed. Hence, poor choice of the text for a reading comprehension test may function as a primary motivation buster. In addition, it might send a wrong message with respect to making a further effort in class. Specifically, when students invest time and effort in acquiring topic-specific vocabulary in class yet have to deal with thematically unrelated articles on a test, they are left with the question—‘What is the purpose of learning new words in class if I have to rely heavily on dictionary use during a test anyway?’ If we teachers admit that we cannot teach students all the words they need, so why cannot we admit that students cannot know all the words and give them a thematically relevant test? By carefully choosing the topics for unseen tests the teacher will avoid antagonizing her students—they will not feel cheated by the teacher and the system as a whole. Moreover, a test article that is theme-related to classroom discussion turns the test from the motivation-buster to the motivation-booster: familiar topics and vocabulary not only reduce students’ test anxiety, but also demonstrate how worthwhile their efforts in class were and hence inspire them to persist in their learning effort.
3.2 Level of difficulty
It is obvious that a test which is more demanding and challenging than anything practiced in class will have an adverse impact on students’ confidence and subsequently their motivation. Such a test will yield a heated debate in class with the teacher being accused of wrongdoing, unfairness, and eventually being the source of students’ failures. Besides getting embittered, students get a wrong message again: ‘No matter how hard I learn in class it does not prepare me for the test.’ Their self-esteem deflates, desire to persist in learning gradually dissipates, motivation gets busted. Some students end up feeling that they are so lacking in control over what happens to them that they lose all incentive to try to succeed. This learning style has been identified by Seligman (1975) and got the name of ‘learned helplessness’. Learners who do not perceive themselves to be in control of their own actions, become easily demotivated, find it difficult to discriminate between appropriate and inappropriate responses, show symptoms of anxiety and depression (cf. Dweck and Wortman 1982), and give up trying altogether.
While failures and negative student feedback are predicted when the test is of higher level of complexity and difficulty than classroom activities, negative backwash and subsequent demotivation of students is not an evident outcome in the opposite case—when a test is easier than classwork. However, this discrepancy also has a harmful effect on sustaining students’ efforts. Specifically, it also sends a wrong message to the student (though of a different kind)—‘You have already attained the level necessary to pass the test. Do not exert yourself! Relax! Lie back, and enjoy your life!’ Although students may be happy with high grades on a test, they do not necessarily realize that it is an easy test which is responsible for their high grades, but rather tend to praise themselves for the achievement. This is a quite natural response to success: we blame others for our failures and attribute successes to ourselves (M. R. Leary 2004). Some students might become overly, and unjustifiably so, confident and lose interest in further studies, which in turn, may affect their performance on a following test in an unpredictable way.
We may thus conclude that in order to avoid these pitfalls the test writer has to correctly evaluate the level of difficulty which will be appropriate for each test, for its particular purpose and time in a course.
3.3 Concord between testing items and course objectives
Tests (either progress or achievement ones) serve as an important criterion for both teachers’ and students’ achievements, checking not only what has been taught in class but also what students have managed to learn. Therefore final achievement tests should be based on course objectives and should not involve an element of surprise or novelty for students. The final test is not devised for learning; it is devised for evaluating students’ acquired knowledge and expertise in the course and hence how successful the course has been in accomplishing its objectives. Only tests based on course objectives promote a beneficial backwash effect on teaching. When, however, a teacher’s syllabus does not correspond to the course objectives we expect negative backwash. In such cases a test may contain items that have been specified in the course objectives, yet have not been introduced in class by a teacher. However, a bigger problem arises when the test-writer includes test items that have not been specified in the course objectives and hence were not taught in class. For example, this happens when the course objectives explicitly emphasize developing reading skills, yet tests contain tasks checking specific grammar or vocabulary items.
Moreover, types of skills introduced and practiced in class might be at variance with those tested. For instance, if a teacher focuses only on intensive reading skills while a test also checks global reading skills, her students will be at a disadvantage. In one case, students of the pre-intermediate level, who only studied for a month, were given a test with a global reading question spanning four paragraphs of the text and requiring cross-paragraph mapping of the argument. There was no cue provided leading students to a specific section of the text and thus facilitating their task. Even if a teacher expected such a question on a test she could not possibly prepare her students for this advanced task within the four weeks of studies. Nor could she prepare them when such skills were not stated as objectives of the pre-intermediate course at all.
In a similar vein, students who are used to doing in class only items checking their language recognition, such as multiple-choice or true/false questions, will be puzzled by test items that require language production skills, such as sentence completion, open-ended questions, flow charts, blank-filling exercises, etc.
We may conclude that in order to avoid student disenchantment with a test and with a language course in general, a tripartite link is required: what is taught in class and what is tested must correspond to the course objectives. This correspondence is achieved by familiarizing the teachers with the course objectives and the test format before the course starts, as well as by their direct involvement in test writing and/or test reviewing during the course.
3.4 Test item design
This factor directly pertains to the second principle of test writing posited in Kirshner, Wexler and Spector-Cohen (1992:539): test questions should be as easy for test takers to process as possible. There is no general consensus among English teachers or language testers on this assumption either. Sometimes test items can contain language at a higher level of difficulty than the language of the text. Then comprehension of test items becomes an inherent part of the reading comprehension test. However, this is not a very common practice (cf. Heaton 1995:116). Nevertheless, clarity and logical coherence of test items are generally accepted as a good test-writing principle for they directly affect students’ performance on a test. Ambiguous, unclear, illogical questions do not serve any other purpose than confusing students and forcing them to guess the right answer rather than deducing it. In this section, I will illustrate a number of common problems with test item design. All of them adversely affect students’ performance and result in failures or lower than average grades, and thus have grave ramifications for student motivation. Specifically, students are demotivated not only by the mere fact of getting a lower than expected grade, but also by the belief that no matter how hard they learn they cannot prepare for the test, which is unpredictable and hard to understand. Students become convinced of their powerlessness and sometimes may develop test anxiety symptoms or even get depressed.
The following common types of problems with the item design will be illustrated: 
A. ambiguous instructions
B. illogical instructions
C. imprecise or unclear instructions
D. true/false tasks based on compound propositions
E. unjustified inferences
F. multiple-choice questions with multiple or no answers
a. ambiguous instructions
Ambiguity of instructions implies two or more possible interpretations of the test item. It often stems from sloppy formulation of item instructions. The following examples illustrate the ambiguity of the instruction ‘paragraph X is one of…’. It is inherently ambiguous due to its vague wording—it might refer to intra-paragraph relations, i.e. the organization of ideas within one paragraph, as in example 1.
- Between two and five million Gypsies live in this world today. For centuries they’ve wandered about Europe, Africa, the Middle East, South America, and even in North America. Yet hardly anyone knows anything about these secret people. Read the history books—the Gypsies are never mentioned.
Q: Paragraph 1 is one of…
a) comparison c) contrast
b) example d) chronological order
In this example the connector of contrast ‘yet’ is supposed to signal the correct answer: Paragraph 1 has contrasting ideas, hence it is one of contrast. Alternatively, the same instruction can refer to inter-paragraph relations, i.e. involve identification of the function or purpose of a paragraph with respect to other paragraphs in the text, as shown in example 2.
- A woman was near death from a special kind of cancer. One drug might have saved her life but it was available only from a druggist who charged 10 times what it cost him to make it. The woman’s husband, Marty, could only come up with half of what the druggist charged and the druggist refused to sell the drug more cheaply or to let Marty pay the balance later. Desperate, Marty broke into the druggist’s store to steal the drug. Should Marty steal the drug when that is the only way he can save the life of his dying wife?
2. During the past two decades, children and adults around the world have been asked this question as well as others presenting similar moral dilemmas. Following are some typical responses to the Marty dilemma.
Q: Paragraph 1 is one of…
a) chronological order c) example
b) comparison d) contrast
In example 2, students have to understand that the first paragraph of the text introduces the topic by providing an example of the moral dilemma; hence, (c) is the correct answer.
If students encounter one and the same instruction used differently within one test it will definitely confuse them. They might interpret the same instruction in a similar vein and get a wrong answer. A more serious problem arises when one and the same instruction can be interpreted in two ways for one and the same test item, and the test-writer overlooked this possibility, as demonstrated in example 3.
8. Thus, the 45 studies that I have reviewed provide striking support for the universality of Kohlberg’s first four stages. However, his higher stages do not account for moral reasoning involving principles of collective or communalistic well-being. It appears that in other cultural groups and social classes, mature moral principles are held that are distinct from our own.
Q: Paragraph 8 is one of…
a) contrast c) example
b) summary d) illustration
The presence of the connector ‘however’ inside the paragraph signals that choice (a), ‘contrast’, is the right answer. However, the connector ‘thus’ at the beginning of the paragraph allows to interpret this paragraph as presenting a summary of the discussion. As a result of this ambiguity, all students who interpreted the instruction as referring to the intra-paragraph relations failed, for the intended answer was (b), requiring analysis of inter-paragraph relations.
b. illogical instructions
Lack of clarity in the test item can also stem from faulty logic. This kind of problem tends to confuse better students who are good at logical reasoning. Consider the following example.
Laura Ashley expanded her tiny operation not to maximize profits but to defend and promote traditional British values, which she felt, were under siege from sex, drugs, and miniskirts in the 1960s. From the beginning, she and Bernard exercised tight control over all aspects of the business, keeping design, manufacturing, distribution, and retailing in-house. The couple opened a central manufacturing and distribution center in Wales; and they proudly labeled their garments “Made in Wales”. They provided generous wages and benefits to their employees, thereby avoiding the labor unrest that crippled many British industries throughout the 1970s. They also established close relationships with their franchisees and customers, who grew fiercely loyal to the company’s products and the values they embodied.
Q: Name two causes and their effects for the success of Laura Ashley in the 1970’s.
This item aims at checking students’ understanding of cause-effect relations within the text. Cause-effect is usually a binary relation, with a cause (or a number of them) leading to an effect/result (or a number of them). In the above task cause-effect relations are not clearly defined: if we have to find causes of the success of Laura Ashley’s company in the 1970’s, then the company’s success is the effect. The illogical instruction ‘Name… effects for the success…’ further complicates elicitation of the right answer. After a careful analysis of the passage, it becomes clear that in the text there are three interrelated factors that represent a chain of causal relations, with the immediate result of the first cause functioning as the cause of the next event:
For this complex relationship to be understood properly the question should be rephrased as—‘What two causes and their (immediate) effects ensured the success of Laura Ashley in the 1970’s?’ Yet a better way to present this complex logical relationship is to give it as a flow chart with the final result provided, as shown below.
c. imprecise or unclear instructions
I call imprecise those instructions which are carelessly written and not sufficiently explicit. Imprecise instructions cause confusion on a test and cause students to waste time on trying to figure out the genuine intention of the test writer. It often happens whenever a text allows for a number of answers, depending on the degree of their generality. For instance, one of the most important reading skills taught in a reading course is distinguishing between general and supporting ideas, or between main ideas and specific details exemplifying them. We teach students to pay attention to these distinctions and we expect them to apply their knowledge on a test. However, when a question can be answered either in a general or a specific way, it is important to indicate what kind of answer is required. In order to save students’ time spent on pondering over which answer is better, it is advisable to provide additional instructions, such as ‘Give general ideas’ or ‘Give specific information’ if such confused is to be expected. In the absence of such instructions, care should be taken to reveal the intentions by the question format. The following example illustrates how a question format misled students with implied miscues.
15. Experts see more disturbing patterns. “It definitely diminishes self-esteem,” said Stanly Rosner, a psychologist in New Canaan, Conn., and co-author of “The Marriage Gap,” a study of the breakdown of contemporary marriages. “These people end up feeling like losers even though they may only be responding to external cues.” Repeated emotional upheaval, he said, is driving some of them to heavy drinking and drug use.
- Children of parents who marry several times are another concern. “The youngsters are the saddest part,” said Dr. Robert Garfield, an assistant professor of psychiatry at Hahnemann University in Philadelphia who specializes in stepfamily issues.
17. ‘I see children not being able to concentrate, a sense that nothing lasts and a loss of faith in relationships.” Dr. Garfield said. “They never develop trust or long-term values. They become self-centered and cynical.” Other experts, including Dr. Rosner, maintain that such children will continue the pattern because it is so familiar.
18. Multiple marriages also aggravate sibling conflicts. “We are entering a period of interfamily feuds the likes of which you have never seen,” said William Selsberg, a lawyer in Stamford, Conn., “Who is entitled to get college money if there isn’t enough to go around? How do you equitably settle the claims of the children from the different marriages when the parent dies? What if the children from a former marriage are left out of the will? Estate planning is becoming impossible.”
- The adults also suffer financially, gradually getting poorer. “You see their life style getting progressively worse each time they marry again,” said Mr. Felder, the Manhattan divorce lawyer. “Equitable distribution depletes the assets. They’re stuck with regular commitments to a spouse or children from a previous marriage. It’s a finite cup. Only so many people can take from that cup before it’s empty.”
Q: List five negative effects of multiple marriages mentioned by the experts.
If a student goes by the rules and looks for general ideas stated in topic sentences (italicized in the passage), he immediately encounters a problem: Where is the fifth effect? Since there is no other general idea found in the text, the student faces a dilemma: Am I supposed to give specific examples or general ideas? If specific details are needed then which of those mentioned in this passage am I to cite? There are at least a dozen specific examples of negative outcomes mentioned in this passage. The lack of precise instructions coupled with a misleading question format leads to an infinite number of combinations that can be given as possible answers. Moreover, the teacher is forced to accept all of them since the instructions do not mention either paragraph numbers on which to base the answers or what kind of answers are expected. Asking for five answers serves as a miscue to students and leads to their confusion and puzzlement.
It is important to note that it is not always the weaker students who are misled by ambiguous, imprecise or illogical instructions exemplified above. On the contrary, better students who can provide an alternative interpretation of the test item seem to be more often confused by such instructions (cf. Huges 1996:39). Therefore it is a moral duty of test-writers to carefully examine test items for ambiguity and let other colleagues scrutinize them before administering a test.
d. true/false tasks based on compound propositions
True/False questions are not considered to be a very sophisticated task. In fact, they are a variation on multiple-choice questions with a 50-50 chance of guessing. However, they might present a real problem if they are badly constructed. This might occur, for instance, when the proposition given in a test item is compound and the decision on its truthfulness is to be made based on the sum of true/false conditions for each part of the proposition. Consider the following example:
These cues, which may be words, gestures, facial expressions, customs, or norms, are acquired by all of us in the course of growing up and are as much a part of our culture as the language we speak or the beliefs we accept. All of us depend for our peace of mind and our efficiency on hundreds of these cues, most of which we do not carry on the level of conscious awareness.
Q: True/ False: Culture cues are as important to us as our language, and we are aware that we are using these cues.
Support your answer with a quote from the text.
In this example, students are presented with a compound sentence and asked to decide whether it’s true or false according to the text. In addition, they are to provide justification for their answer. It can be inferred from the text that the first part of the proposition is true, while the other part is false. Determining the truthfulness of such a sentence entails not only determining the truthfulness of each member of the compound but also some knowledge of logic. Specifically, students have to know that a conjunction of a true and a false proposition necessarily yields a false proposition (i.e. – & + = –). Moreover, they usually puzzle over which part of the sentence has to be supported, and in many cases give partial or incorrect support, thus losing all the points. Therefore, such true/false questions, which are in essence a sum of two independent true/false items, should certainly be avoided on tests.
e. unjustified inferences
Another source of confusion lies within test items based on inferences. Inferring is an advanced reading skill which has to be developed and practiced extensively. Yet, the borderline between inferring information based on the text and plain guessing is not always clear-cut. Quite often students find it hard to distinguish between a logical conclusion which can be drawn from the text and an idea that cannot be judged true or false based on the text. A particular problem emerges when students are asked to judge a sentence as true or false when the information in the original text does not allow them to draw such a conclusion, as in the following example:
Westerners tend to value a tough, individualistic and dominating leadership style including the ability to take independent decisions and have them successfully implemented. The higher a Japanese manager rises in a company, the more pains he will take to hide his ambition and capability and not to be seen as a forceful leader. Westerners who look for a decisive and charismatic Japanese boss are likely to be disappointed.
Q: True/False: Although Western leaders are dynamic and motivated, while their Japanese counterparts are modest, there is a reciprocal respect for these traits. Support your answer with a quote from the text.
Does it follow from the text that Japanese or Western leaders reciprocally respect or disrespect one another’s traits? Certainly not! The text conveys an idea that leadership styles in two cultures, Western and Eastern, differ in a certain way, and one would not find openly tough or individualistic Japanese leaders, in contrast to Western ones. However, no conclusion can be drawn on their mutual attitudes. Hence the question is intrinsically confusing, and there is no justification for either answer in the text. Note that the teacher will also experience problems trying to justify in class why the sentence is false, as the key prescribes. A heated argument with students about such questions can only trigger their disappointment, hostility towards the teacher, feelings of powerlessness and loss of interest in the language learning process. Motivation gets busted!
f. multiple-choice questions with multiple or no answers
As convincingly argued in Hughes (1996: 61), it is very difficult to write good distractors for multiple-choice questions. One of the common mistakes with this kind of item is the presence of several correct answers. This problem is illustrated in example 8.
The final question I considered was whether all instances of genuine moral reasoning in all cultures match one of Kohlberg’s stages. A number of studies, I found, have reported examples of clear moral judgments that were very difficult to score using Kohlberg’s model. In my study of Israel kibbutzniks, for example, I found that the cooperative working class values of the kibbutz – communalistic equality and happiness – were missing from Kohlberg’s model and scoring manual. Some kibbutzniks argued that Marty had every right to steal the drug because allocation of the drug should be in the hands of the community and used to promote the ideals of collective equality and happiness. Psychologists Anne Tietjen of the University of Washington and Lawrence Walker of the University of British Columbia had a similar finding in Papua, New Guinea. There, some village leaders placed blame for the Marty dilemma on the entire community…
Q: Paragraph 7…
a) lists reasons to support Kohlberg’s theory.
b) brings examples of different cultures concerning moral judgments.
c) summarizes the main idea of the whole article.
d) shows contrasting points of view to Kohlberg’s theory.
From the topic sentence of this paragraph it is clear that not all types of moral reasoning find a correlate in Kohlberg’s theory, some cultures present judgments distinct from those predicted by the theory. Therefore (d) seems to be a plausible answer. However, when the reader proceeds to read she finds two examples of such opposing views given from Israel and Papua, New Guinea. Hence, (b) appears to be the correct answer as well. The student is left with an irresolvable dilemma and has to resort to guessing.
Another common fault with multiple-choice items is the lack of the correct answer, as demonstrated in example 9.
1. The problem is not an inability to take action but an inability to take appropriate action. There can be many reasons for the problem – ranging from managerial stubbornness to sheer incompetence – but one of the most common is a condition that I call active inertia. Active inertia is an organization’s tendency to follow established patterns of behavior – even in response to dramatic environmental shifts. Stuck in the modes of thinking and working that brought success in the past, market leaders simply accelerate all their tried-and-true activities. In trying to dig themselves out of a hole, they just deepen it.
2. Because active inertia is so common, it’s important to understand its sources and symptoms. After all, if executives assume that the enemy is paralysis, they will automatically conclude that the best defense is action. But if they see that action itself can be the enemy, they will look more deeply into all their assumptions before acting. They will, as a result, gain a clearer view of what really needs to be done and, equally important, what may prevent them from doing it. And they will significantly reduce the odds of joining the ranks of fallen leaders.
Q: The main idea of the text is:
a. to show how only successful companies are paralyzed when they are confronted with disruption in business conditions.
b. to illustrate the inability of organizations to take appropriate action in response to vast business changes.
c. to follow the successes and failures of several major organizations in various fields of business.
d. to compare and contrast the business methodology of good companies that go bad.
This task presumably checks students’ ability to identify the main idea of a long academic text. However, all the multiple choices provided express the purpose of the discussion, the goal that the writer wants to achieve in his article rather than the main idea (i.e. ‘There are several reasons for companies to stagnate, with active inertia being the most common one.’). Moreover, none of the choices matches the real purpose announced by the writer in the second paragraph of the excerpt: to better understand the sources and symptoms of active inertia in order to prevent it and avoid failure. Thus we can only predict students’ frustration with such a task and their inability to make an intelligent choice. Similarly to example 8, in this case students are forced to succumb to plain guessing and a negative backwash effect is inevitable.
3.5 Test terminology and layout
Confusion can also stem from such a trivial thing as use of unfamiliar terminology for instructions on tests. Since there exist synonymous terms for nearly every language phenomenon, special care should be taken to use consistent terms in class and on tests. For instance, if students use in class such terms as ‘general questions’ and ‘intensive reading questions’ but the test paper mentions ‘global reading questions’ and ‘close reading questions’ instead, a student’s time will be lost on figuring out the instructions rather than answering the questions. To prevent confusion of this type students should be exposed to all kinds of terminology and learn synonymous terms before a test is administered. Alternatively, some standard terminology and question format can be worked out at the department to be used both in class and on tests. This will not only save students’ time and effort on a test but will also boost their confidence due to their familiarity with instruction terminology and question format.
I would like to conclude the discussion with some practical recommendations in order to help teachers, test-writers and course developers avoid the problems discussed in the paper. It is crucial for the teaching staff to be actively involved in test preparation at all its stages:
a) firstly, by evaluating the topic (factor 1) and the level of difficulty (factor 2) of the text intended for the unseen test;
b) secondly, by providing feedback on the suitability of test items types and their level of complexity (factor 3);
c) thirdly, by scrutinizing all test items in an attempt to find any fault with the instructions, item choices and eliminating ambiguity or imprecision (factor 4);
d) finally, by checking the test format for inconsistent instructions, confusing terminology and poorly designed layout (factor 5).
Peer reviewing of tests is a time-consuming and tiring process, yet its results are gratifying—high-quality tests that reliably check both teachers’ and students’ work and help them sustain the interest for long time. In this way tests stop being motivation busters and start working as motivation boosters, turning language learning into a fruitful and enjoyable task for both students and teachers.
- Dwek, C.S. and C. B. Wortman (1982) “Learned helplessness, anxiety and achievement motivation.” In: H.W. Krohne and L. Lanx (eds.) Achievement, Stress and Anxiety. London: Hemisphere.
- Heaton, J. B. (1995) Writing English Language Tests. (Longman Handbooks for English Teachers). London and New York: Longman Inc.
- Hughes, A. (1996) Testing for Language Teachers. Cambridge Handbooks for Language Teachers. Cambridge University Press.
- Leary, M. R. (2004) The Curse of the Self: Self-Awareness, Egotism, and the Quality of Human Life. Oxford: Oxford University Press.
- Kirschner, M., C. Wexler, and E. Spector-Cohen (1992) “Avoiding Obstacles to Student Comprehension of Test Questions”, TESOL QUARTERLY, Vol.26, No. 3, pp.537-556.
- Seligman, M. (1975) Helplessness: On Depression, Development, and Death. San Francisco: Freeman.
- Williams, M. & R. L. Burden (1997) Psychology for Language Teachers: A social costructivist approach. Cambridge Language Teaching Library. Cambridge University Press.
A list of texts used for language tests cited in the paper:
- Example 1: “This Race of Strangers”
- Example 2, 3, 8: “A question of morality” by John Snarey, Psychology Today, June 1987
- Example 4, 9: “Why good companies go bad” by Donald N. Sull, Financial Times, 3 October 2005
- Example 5: “Repeated marriages- a growing trend” by Andrée Brooks
- Example 6: an excerpt from Orberg, K. (1958) Culture Shock and the Problem of Adjustment to a New Cultural Environment. Washington, DC: Department of State.
- Example 7: “Leadership”, downloaded from the Internet, © Nicholas Brealey Publishing
 This does not necessarily apply to other types of tests, such as practice or mock seen or unseen tests, which can contain test items unfamiliar to the students. Practice tests put the emphasis on learning rather than assessment of students’ achievements and can be administered at the beginning of a course and self- or peer-graded. Hughes (1996: 10-11) also concedes that ‘the content of these [final achievement tests] must be related to the courses with which they are concerned’, however, he acknowledges that ‘the nature of this relationship is a matter of disagreement amongst language testers’.
 See Kirschner, Wexler and Spector-Cohen (1992:542) for the discussion on why grammar and vocabulary knowledge should not be tested on an integrative reading comprehension test.
 All the examples cited in this paper have been taken from reading comprehension tests administered at institutions of higher education in Israel. The list of articles used for language tests exemplified here is given at the end of the paper. Italicization in reading passages and questions is mine.