ITEM ANALYSIS

ITEM ANALYSIS

 

Ratna Komala Dewi

Nuke Sari Nastiti

Item analysis is a method that is used in education to evaluate test item. This can ensure that questions are in appropriate standard and measure the effectiveness of individual test item. Item analysis is purposed to improve test items and identify unfair or biased item. Item analysis is conducted to avoid the chance of tests that are too difficult (and have an insufficient floor) tend to lead to frustration and lead to deflated scores, and tests that are too easy (and have an insufficient ceiling) facilitate a decline in motivation and lead to inflated scores.

Item analysis can be analyzed by computing: Difficulty Index, Discrimination Index, Validity Coefficient, and Effectiveness of Distraction.

A. Difficulty Index

According to Wilson (2005),  item difficulty is the most essential component of item analysis.  Item difficulty is determined by the number of people who answer a particular test item correctly.  It is important for a test to contain items of various difficulty levels in order to distinguish between students who are not prepared at all, students who are fairly prepared, and students who are well prepared.

To compute level of difficulty we use the formula:

Difficulty Index (p) = C/T

p = Difficulty Index

C = the number of students who answer item X correctly

T = the number of total students who answer item X

For example:

There are 50 students who answer an item X, 30 of whom can answer the item correctly. So that, the level of difficulty is:

Level of difficulty (p) = 30 / 50

= 0.6

  • The highest score for p is 1.0 and the lowest score is 0
  • p always has positive value
  • The higher score for p the easier item
  • The lower score for p the harder item

According to Allen & Yen (1986) to avoid a test is too difficult or too easy,  each items in a test should have a difficulty range from 0.3 to 0.7. It is used to differentiate between individuals’ level of knowledge, ability, and preparedness.

 

B. Discrimination Index

Discrimination goes beyond determining the proportion of people who answer correctly and looks more specifically at who answers correctly. In other words, item discrimination determines whether those who did well on the entire test did well on a particular item. An item should in fact be able to discriminate between upper and lower scoring groups.

To compute discrimination index, the first thing we have to do is dividing test-takers into two groups, upper group and lower group. Then using this formula:

Discrimination index (D) = Pu – Pl

D = Discrimination index

Pu = Level difficulty of item X from the upper group

Pl = Level difficulty of item X from lower group

For Example:

30 students are divided into two group, 15 students in the lower group and 15 others in the upper group.  In the upper group, there are 12 students who answer item X correctly whereas in the lower group only 6 students who answer item X correctly. Discrimination index is:

Answer:

Pu = 12/15 = 0.8;

Pl = 6/15 = 0.4;

D = Pu – Pl

D = 0.8 – 0.4

D = 0.4

C. Coefficient Validity

Coefficient validity can be computed by using correlation. There are two technique of corellation which is popular to be used. They are Point-biserial technique and Biserial technique.

D. The Effectiveness of Distraction

Multiple choice tests have one question and several options of answer. Among the options, there is only one answer which is correct, and the other options are wrong answers. Those wrong answers in multiple choice tests are called distraction. A distraction is called effective when there are a lot of students choosing it.  According to Fernandez (1984) a distraction can be called as a good distraction when there are 2% of test takers choosing it.

Advertisements

CHARACTERISTIC OF A GOOD TEST

CHARACTERISTIC OF A GOOD TEST

Ratna Komala Dewi

Nuke Sari nastiti

A. Reliability

Reliability is linked to consistency. Reliability means stable or consistence. A reliable test is a test that will have consistent score in every condition.  Scores demonstrate consistency no matter who administers the test. The scores are consistent no matter when or when the test is administrated.

Consider the following sets of hypothetical score data collected from 5 students A, B, C, D, E in different condition. These conditions are follows: same raters (Rater1), different raters (Raters 1 and 2), different time of administration as shown in Table 1 and 2

Table 1

Students

Rater 1

Rater2

A

8

8.2

B

8.6

8.5

C

9

9.1

D

8

8.2

E

9.4

9.3

 

Although the scores produced by different rates (Raters 1 and2) exhibit different values, these scores are still be considered consistent between rates. The differences are slight. In table 2 the scores demonstrate considerable differences

Table 2

Students

Rater 1

Rater2

A

3

8

B

8.6

5

C

3.1

9

D

8

8

E

9.4

4

 

Reliability of tests scores can be estimated using several approaches. They are test-retest, equivalent forms or parallel forms, and internal consistency.

1. Test-Retest

Test-retest is administrated to know the reliability of scores by doing test twice in different time. The material of the test is similar. For example, the first test is administrated on Friday 13 and the scores are recorded. Then, the second test is conducted on Monday 29 which is similar material. The result is also recorded. After the administrator know two results of the test, the administrator correlate those results to know the reliability. This approach has several disadvantages. The first disadvantage is not easy to create a similar condition on different testing occasions. The second disadvantage is by involving the same subject in the test administration, the test taker will be bored. The third advantage is that administration this kind of approach will consume much time.

2. Parallel Forms

Almost similar to the test-retest which is administrated test twice. To know the reliability of the scores, the administrator is administrated test twice on different time. However, the time of administration is not too far different or parallel. The parallel tests are made equal in every aspect of test. For example, 25 junior high school students are tested twice in parallel form. The first test is administrated on Friday and the second test on Wednesday. The tests have equal aspect. Then, the results of those tests are compared to know the reliability. Basically, this approach is almost similar with test-retest approach. This approach has also advantages which are almost similar with test-retest.

3. Internal Consistency Method

In internal consistency reliability estimation we use our single measurement instrument administered to a group of people on one occasion to estimate reliability. In effect we judge the reliability of the instrument by estimating how well the items that reflect the same construct yield similar results. We are looking at how consistent the results are for different items for the same construct within the measure. There are a wide variety of internal consistency measures that can be used.

B. Validity

1. Face Validity

The concept of face validity relates more to what a test appears to measure than the what test actually measures’ (Cohen et al., 1988:125). Example: a speaking test is constructed and claims to test speaking abilities.

2. Content Validity

Wiersma and Jurs (1990:184) defines content validity as ‘… the extent to which the test is representative of a defined body of content consisting of topics and processes.’ Example: a grammar test contains grammatical points to be tested such as infinitive, gerunds, modals, tenses, etc.

3. Empirical Validity

To examine the validity of a test being experimented or developed, empirical, external verification may be sought for. Example: when a paper-and-pencil communicative test of grammar has already been constructed, it is possible to examine the validity of the test by seeking empirical evidence out of the test itself that supports the quality of accuracy of the test being developed. There are two kinds of empirical validity. They are concurrent validity and predictive validity.

¡  Concurrent validity: how well the test or instrument estimates current performance on some valued measure other than the test itself.

¡  Predictive validity: how well a test or instrument predicts future performance or behavior on some valued measure other than the test itself.

4. Construct Validity

(Grondlund, 1985) The extent to which the test performance can be interpreted in terms of one or more psychological constructs. In construct validity, we believe there are things exist, but can only be probed through indicators as reflected in test scores. It is not dealing with how we construct a test of validity.

5. Wash-back Validity

Also known as Backwash, it is the influence of test to teaching and learning. Pearson (1998): “Public examinations influence the attitudes, behaviors, and motivation of teachers, learners, and parents, and because examinations often come at the end of a course, this influence is seen working in a backward direction, hence the term, wash-back”. Hughes (1989) analyze that there are 2 major types of wash-back. They are positive wash-back (beneficial) and negative wash-back (harmful).

1. Positive Wash-back

Positive Wash-back deals with both Micro-level and Macro-level

¡  Micro Level (Classroom Settings)

  • Tests induce teachers to cover their subjects more thoroughly
  • Tests make students work harder (Anderson and Wall, 1993)
  • Tests encourage positive teaching-learning processes (Pearson, 1988)

 

¡  Macro-Level (Educational/Societal System)

  • Decision makers (govt.) use the authority power of high stakes testing to achieve the goals of teaching and learning, such as the introduction of new textbooks and new curricula (Shohamy, 1992 ; Wall & Alderson 1993; Cheng;2005).

2. Negative Wash-back

Negative Wash-back deals with both Macro-Level and Micro-Level, too.

¡  Macro-Level (Classroom Settings)

  • Tests encourage teachers to make “teaching to the test” curriculum (Shohamy, 1992)
  • Tests bring anxiety to both teachers and students and distort their performance (Shohamy, 1996).
  • Tests drag students to learn discrete points of knowledge that are tested (Madaus, 1988).
  • Tests make students create a negative judgment toward tests, and alter their learning motivation (Wiseman, 1961)

¡  Micro-Level (Educational/Societal System)

  • Decision makers overwhelmingly use tests to promote their political agendas and seize influence and control of educational systems (Shohamy, 1996)

APPROACHES IN LANGUAGE TESTING

 

APPROACHES IN LANGUAGE TESTING

 

Ratna Komala Dewi

Nuke Sari Natiti

1. Discrete-point Testing Approach

Discrete Point tests are constructed on the assumption that language can be divided into its components parts, and those parts can be tested successfully. The components are the skills of listening, speaking, reading, writing, and various unit of language of phonology, morphology, lexicon, and syntax. Discrete point tests aim to achieve a high reliability factor by testing a large number of discrete items, but each question tests only one linguistic point.

2. Integrative Testing Approach

This approach involves the testing of language in context and is thus concerned primarily with meaning and the total communicative effect of discourse. This approach stated that communicative competence is so global that it requires the integration of all linguistic abilities. According to Oller (1983), if discrete items take language skill apart, integrative tests put it back together; whereas discrete items attempt to test knowledge of language a bit at a time, integrative tests attempt to assess a learner’s capacity to use many bits all at the same time.

The fact that discrete point and integrative testing only provided a measure of the candidate’s competence rather than measuring the candidate’s performance brought about the need for communicative language testing (Weir 1990). By the mid-1980s, the language testing field had abandoned arguments about the unitary competence and had begun to focus on designing communicative language testing (Brown, 2004)

3. Communicative Testing Approach

Communicative testing approach lays more emphasis  on the notion and function, like agreeing, persuading, or inviting, that language means in communication. Communicative language testing approach is used to measure language learners’ ability to use the target language in authentic situations. The approach beliefs that someone/  a student is considered successful in learning the target language if she/he can communicate or use knowledge and skills by way of authentic listening, speaking , reading and writing . Communicative language tests have to be as accurate a reflection of that situation as possible. The example of communicative language test is role play. The teacher asks students to do a role play such as pretending that the students come to the doctor, pretending that the students are in the market.

The principles of testing in the communicative language testing can be describe as the following (Anon,1990) :

  • Tasks in the test should resemble as far as possible to the ones as would be found in real life in terms of communicative use of language
  • There is a call for test items contextualization .
  • There is a need to make test items that adress a definite audience for a purposeful communicative intent (goal) to be envisioned (might happen).
  • Test instructions and scoring plans should touch on effective, communication of meaning rather than on grammatical accuracy

4. Performance testing approach

Any assesment can be considered a type of performance when a student is placed in some context and asked to show what they know or can do in that context. Performance-based assessment believes that the students will learn best when they are given a chance to perform and show what they know according to their own plan, collect data, infer pattern, draw conclution, take a stand or diliver presentation. According to Brown(2004), In developing performance-based assessment, we as teacher should consider the following principle:

  • State the overall goal of the performance
  • Specify the objectives (criteria) of the performance in details
  • Prepare students for performance in stepwise progressions
  • Use a reliable evaluation form, checklist or rating sheet
  • Treat performances as opportunities for giving feedback and provide that feedback systematically

If possible, utilize self-  and peer-assessments judiciously(wisely/carefully)

Strengths and Weaknesses of Language Testing Approach

1. Discrete-point Testing Approach

Strength

  • The test of this approach can cover a wide range of scope of materials to be put in the tests.
  • The test allows quantification on the students’ responses.
  • In the term of scoring, the test is also reliable because of its objectivity; the scoring is efficient, even it can be perform by machine

Weaknesses

  • Constructing discrete point test items is potentially energy and time consuming.
  • The test do not include social context where verbal communication normally take place.
  • Success in doing the test is not readily inferable to the ability of the test taker to communicate in real life circumstances.

2. Integrative Testing Approach

Strength

  • The approach to meaning and the total communicative effect of discourse will be very useful for pupils in testing
  • This approach can view pupils’ proficiency with a global view.
  • The strength of the test such as dictation, writing, and cloze test is that relatively cheap and easy to make

Weaknesses

  • Even if measuring integrated skills are better but sometimes teacher should consider the importance of measuring skills based on particular need, such as writing only, speaking only
  • The scoring is not efficient and not reliable

3. Communicative Testing Approach

Strength

  • The tests are more realistic to evaluate the students’ language use, as the students in a role as though they were to communicate in the real world / daily lives.
  • It increases students’ motivation since they can see the use of language they learnt in class in the real world.

Weaknesses

  • Not efficient (time and energy consuming)
  • Problem of extrapolation (Weir, 1990) (we cannot guarantee that the students who successfully accomplish the task in class will also be successfull in the communication in real life)

4. Performance Testing Approach

Strengths

  • Increasing learning motivation (The students tend to be more motivated and involved when they are allowed to perform according to their own plan, collect data, infer a pattern, draw conclutions, take stand, or deliver a presentation.)
  • Meaningful (it is meaningful assessment since we require students to show what they can do through project, performance, or observation. It will give them learning experience more than just paper and pencil test)
  • Authentic (since the materials and topics we use in class is authentic, the students can see the relation of what they learn with the reality in their daily lives)
  • Challange high order thinking of students (In order to prepare for the best performance, the students will try their best to  analyze the problem deeper and find many learning sources by themselves )

Weaknesses

  • Time consuming (for students: they need to prepare the performance e.g. Download information for the Internet or preparing the costume and property for role play, for teacher: Teachers need to provide guidance in every stage they are going to be through. For example, in assessing the students to make portofolio of essay, we need to check every single paper of the students one by one every week, and when it has been revised, we have to check it again.
  • Expensive (Students: the students should provide extra money to prepare the performance such as costumes for role play)
  • Challage the teacher to match performance assessment to classroom goals and learning objectives.

Kind of Test

KIND OF TEST

Ratna Komala Dewi

Nuke Sari Nastiti

 

A. Based on Purposes

There are many kinds of tests; each test has specific purpose and a particular criterion to be measured. This paper will explain about five kinds of tests based on specific purposes. Those tests are proficiency test, diagnostic test, placement test, achievement test, language aptitude test.

1. Proficiency Test

The purpose of proficiency test is to test global competence in a language. It tests overall ability regardless of any training they previously had in the language. Proficiency tests have traditionally consisted of standardized multiple-choices item on grammar, vocabulary, reading comprehension, and listening comprehension. One of a standardized proficiency test is TOEFL.

2. Diagnostic Test

The purpose is to diagnose specific aspects of a language. These tests offer a checklist of features for the teacher to use in discovering difficulties. Proficiency tests should elicit information on what students need to work in the future; therefore the test will typically offer more detailed subcategorized information on the learner. For example, a writing diagnostic test would first elicit a writing sample of the students. Then, the teacher would identify the organization, content, spelling, grammar, or vocabulary of their writing. Based on that identifying, teacher would know the needs of students that should have special focus.

3. Placement Test

The purpose of placement test is to place a student into a particular level or section of a language curriculum or school. It usually includes a sampling of the material to be covered in the various courses in a curriculum. A student’s performance on the test should indicate the point at which the student will find material neither too easy nor too difficult. Placement tests come in many varieties: assessing comprehension and production, responding through written and oral performance, multiple choice, and gap filling formats. One of the examples of Placement tests is the English as a Second Language Placement Test (ESLPT) at San Francisco State University.

4. Achievement Test

The purpose of achievement tests is to determine whether course objectives have been met with skills acquired by the end of a period of instruction. Achievement tests should be limited to particular material addressed in a curriculum within a particular time frame. Achievement tests belong to summative because they are administered at the end on a unit/term of study. It analyzes the extent to which students have acquired language that have already been taught.

 

 

5. Language Aptitude Test

The purpose of language aptitude test is to predict a person’s success to exposure to the foreign language. According to John Carrol and Stanley Sapon (the authors of MLAT), language aptitude tests does not refer to whether or not an individual can learn a foreign language; but it refers to how well an individual can learn a foreign language in a given amount of time and under given conditions. In other words, this test is done to determine how quickly and easily a learner learn language in language course or language training program. Standardized aptitude tests have been used in the United States:

  1. The Modern Language Aptitude Test (MLAT)
  2. The Pimsleur Language Aptitude Battery (PLAB)

 

B. Based on Response

There are two kinds of tests based on response. They are subjective test and objective test.

1. Subjective Test

Subjective test is a test in which the learners ability or performance are judged by examiners’ opinion and judgment. The example of subjective test is using essay and short answer.

2. Objective Test

Objective test is a test in which learners ability or performance are measured using specific set of answer, means there are only two possible answer, right and wrong. In other word, the score is according to right answers. Type of objective test includes multiple choice tests, true or false test, matching and problem based questions.

Advantages and Disadvantages of Commonly Used Types of Objective Test

Type of test

Advantages

Disadvantages

True or False Many items can be administered in a relatively short time. Moderately easy to write and easily scored.

 

Limited primarily to testing knowledge of information. Easy to guess correctly on many items, even if material has not been mastered.

 

Multiple Choice Can be used to assess a broad range of content in a brief period. Skillfully written items can be measure higher order cognitive skills. Can be scored quickly. Difficult and time consuming to write good items. Possible to assess higher order cognitive skills, but most items assess only knowledge. Some correct answers can be guesses.

 

Matching Items can be written quickly. A broad range of content can be assessed. Scoring can be done efficiently.

 

Higher order cognitive skills difficult to assess.

 

Advantages and Disadvantages of Commonly Used Types of Subjective Test

Type of test

Advantages

Disadvantages

Short Answer Many can be administered in a brief amount of time. Relatively efficient to score. Moderately easy to write items.

 

Difficult to identify defensible criteria for correct answers. Limited to questions that can be answered or completed in a few words.

 

Essay Can be used to measure higher order cognitive skills. Easy to write questions. Difficult for respondent to get correct answer by guessing.

 

Time consuming to administer and score. Difficult to identify reliable criteria for scoring. Only a limited range of content can be sampled during any one testing period.

 

 

C. Based on Orientation and The Way to Test

Language testing is divided into two types based on the orientation. They are language competence test and performance language test. Language competence test is a test that involves components of language such as vocabullary, grammar, and pronounciation while performance test is a test that involve the basic skills in English that are writing, speaking, listening and reading. Moreover language testing is also divided into two types based on the way to test. They are direct testing and indirect testing. Direct testing is a test that the process to elicit students competences uses basic skill, like speaking, writing, listening, or reading while indirect languange testing is a test that the process to elicit students competences does not use basic skills.

From the explanation above, language testing can be divided into four types based on orientation and the way to test. They are direct competence test, indirect competence test, direct performance test, and indirect performance test.

  Direct Indirect
Competence/ system I II
Performance III IV

1.  Direct Competence Tests

The direct competence test is a test that focus on to measure the students knowledge about language component, like grammar or vocabulary, which the elicitation uses one of the basic skills, speaking, listening, reading, or writing. For the example, a teacher want to know about students grammar knowledge. The teacher ask the students to write a letter to elicit students knowledge in grammar.

 

 

2. Indirect Competence Test

The indirect competence test is a test that focus on to measure the students knowledge about language component, like grammar or vocabulary, which the elicitation does not use one of the basic skills, speaking, listening, reading, or writing. The elicitation in this test uses other ways, such as multiple choice. For example, the teacher want to know about students grammar knowledge. The teacher gives a multiple choice test for the students to measure students knowledge in grammar.

3. Direct Performance Test

Direct performance test is a test that focus on to measure the students skill in reading, writing, speaking, and listening that the elicitation is through direct communication. For example, the teacher want to know the students skill in writing, the teacher ask the students to write a letter, or to write a short story.

4. Indirect Performance Test

Indirect performance test is a test that focus on  measure the students skill in reading, writing, speaking, and listening that the elicitation does not use the basic skill. For example, the teacher want to measure the sutedents skill in listening. The teacher gives some picture and asks the students to arrange the students the pictures into correct order based on the story that they listen to.

 

D. Based on Score Interpretation

            There are two kinds of tests based on score interpretation. They are norm-referenced test and criterion-referenced test.

1. Norm-Referenced Test

Norm-referenced tests are designed to highlight achievement differences between and among students to produce a dependable rank order of students across a continuum of achievement from high achievers to low achievers (Stiggins, 1994). School systems might want to classify students in this way so that they can be properly placed in remedial or gifted programs. The content of norm-referenced tests is selected according to how well it ranks students from high achievers to low. In other words, the content selected in norm-referenced tests is chosen by how well it descriminates among students. A student’s performance on an norm referenced test is interpreted in relation to the performance of a large group of similar students who took the test when it was first normed. For example, if a student receives a percentile rank score on the total test of 34, this means that he or she performed as well or better than 34% of the students in the norm group. This type of information can useful for deciding whether or not students need remedial assistance or is a candidate for a gifted program. However, the score gives little information about what the student actually knows or can do.

2.    Criterion-Referenced Test

Criterion-referenced tests determine what test takers can do and what they know, not how they compare to others (Anastasi, 1988). Criterion-referenced tests report how well students are doing relative to a pre-determined performance level on a specified set of educational goals or outcomes included in the school, district, or state curriculum. Educators may choose to use a criterian-referenced test when they wish to see how well students have learned the knowledge and skills which they are expected to have mastered. This information may be used as one piece of information to determine how well the student is learning the desired curriculum and how well the school is teaching that curriculum. The content of a criterion-referenced test is determined by how well it matches the learning outcomes deemed most important. In other words, the content selected for the criterion-standard tets is selected on the basis of its significance in the curriculum. Criterion-referenced tests give detailed information about how well a student has performed on each of the educational goals or outcomes included on that test.

ITEM ANALYSIS

ITEM ANALYSIS (GROUP 7)

   Lies Nureni & Lukman Chamdani

Item analysis is a process which examines students’ responses to individual test items (questions) in order to assess the quality of those items of the test as a whole.

Purposes:

To improve test items which will be used again in the next test.

To identify unfair or biased items.

To avoid the chance of tests that are too difficult or too easy.

To increase instructors’ skills in test construction.

To identify specific areas of course content which need greater emphasis or clarity.

Method of item analysis:

Qualitative analysis (conceptual): done by experts. Experts validation >> get feedback >revise

Quantitative analysis (empirical): try out to the students

After that, we can get the information about:

  1. Level Difficulty Index

It is important for a test to contain items of various difficulty levels in order to distinguish between students who are not prepared at all, students who are fairly prepared, and students who are well prepared.

To compute the level difficulty, we use this formula:

p = level difficulty

C = the number of students who answer item X correctly

T = the number of students who answer item X

The highest score for p is 1.00 and the lowest is 0.

The level difficulty of good items is 0.3 – 0.7,  0-0.3 difficult, 0.3-0.6 moderate, 0.61-0.7 easy

  1. Discrimination Index

To find out whether the items discriminating the students’ ability of a topic with the students who does not have the ability. The formula:

D = discrimination index

Pu = level difficulty of item X from the upper group

Pi = level difficulty of item X from the lower group

  1. Item Validating / Correlation Index
  • Point biseral technique
  • Biserial technique
  • Phi technique

Valid is when more than .2

  1. Effectiveness Distraction

Multiple choice tests have one question and several options of answer. Among the options, there is only one answer which is correct and the other options are wrong. Those wrong answers in a multiple choice test are called distraction. A distraction is called effective when there are a lot of students choosing it. According to Fernandez (1984) a distraction can be called as a good distraction when there are 2% of test takers choosing it.

SCORING, GRADING, AND TEST SCORE INTERPRETATION

SCORING, GRADING, AND TEST SCORE INTERPRETATION

  Lies Nureni & Lukman Chamdani

  1. A.       SCORING

Scoring is a process o utilizing a number to represent the responses made by the test takers. There are two types of scoring:

  1. Based on the test takers’ responses
    1. Dichotomous

It entails viewing and treating the response as either one of two distinct. The number utilized in this kind of scoring is zero (0) and one (1). Therefore, the test takers’ performance is put into one of those categories.

e.g. multiple choice, true false, etc

  1. Continuous

It views and treats the test takers’ responses as being graded in nature. The test takers’ responses are considered as having a gradation or degree in it. The responses scored as 0, 1, 2, 3, 4, 5 or 0 – 100.

e.g. scoring for writing and speaking skills

 

  1. Based on the methodologies
    1. Holistic

Holistic scoring considers the test taker’s response as a whole totality rather than as consisting of fragmented parts. The scoring is then performed on the basis of the rater’s general overall impression on the test taker’s performance in the test.

  1. Primary Trait

Primary trait scoring focuses on one specific type of features or traits that the test takers need to demonstrate. The key of primary trait is specificity of discourse to be exhibited by the test takers. Thus, there are criteria like organization of ideas, sentence structure, etc.

  1. Analytic

Analytic scoring emphasizes on individual points or components of the test taker’s response. In an analytic scoring plan, linguistic and non linguistic features considered important and these features are commonly evaluated separately as individual components. For example, to rate the students’ performance in speaking, several essential linguistic features are considered, they are grammar, vocabulary, comprehension, fluency, pronunciation, and task.

 

  1. B.        GRADING

Grading is setting up a weighting system of criteria for determining a final grade in a course. There are four criteria of grading:

  1. It is essential for all components of grading to be consistent with an institutional philosophy and or regulation.
  2. All of the components of a final grade need to be explicitly stated in writing to students at the beginning of a term of study, with a designation of percentages or weighting figures for each component.
  3. If a grading system includes items improvement through motivation in the questionnaire, it is important for the rater to recognize the subjectivity.
  4. Consider allocating relatively small weight to items oral participation in the class through punctuality and attendance so that a grade primarily reflects the achievement.

 

  1. C.        TEST SCORE INTERPRETATION

Understanding score interpretation enables us to understand how a student’s performance on a test compared to other students. Scores are typically described as following:

  1. Average – These scores are in or around the 50th percentile. When a group of students are assessed, about 68 of them will fall within the average range. “Average” is another way of saying “typical for most children”.
  2. Above average – These scores fall above the average range. Those at about the 85th percentile are considered high average. At the 98th percentile, students may be considered gifted in some programs.
  3. Below average – These scores fall below the average range.
  4. Borderline – These scores approach the lower 5th percentile and are suggestive of learning problems.
  5. Low – These scores approach lower than 5th percentile and are suggestive of significant learning problems.