Effective Teaching Methods: Criteria for Selecting Tests

Knoji reviews products and up-and-coming brands we think you'll love. In certain cases, we may receive a commission from brands mentioned in our guides. Learn more.
The two major criteria for selecting tests are reliability and validity. No matter what type of test one uses, it should be reliable and valid. A third criterion for selecting a test is usability. A test should be easy for students to understand, easy to

Image Credit

The two major criteria for selecting tests are reliability and validity. No matter what type of test one uses, it should be reliable and valid. Reliability means that the test yields similar results when it is repeated over a short period of time or when different form is used. It is used for measuring the accuracy and consistency with which a measuring device measures what it purports to measure. A reliable test can be viewed as consistent, dependable and stable. Validity means that the test does measure what it is represented as measuring. It is used for measuring the degree to which measuring instrument measures what it purports to measure.

Test reliability can be improved by the following factors:

  • Increased number of tests items: Reliability is higher when the number of items is increased, because the test involves a larger sample of the subject matter covered.
  • Heterogeneity of the learner group: Reliability is higher when test scores are spread over a range of abilities. Measurement errors are smaller from a group that is more homogeneous in ability.
  • Moderate item difficulty: Reliability is higher when the test items are of moderate difficulty because this spreads the scores over a greater range than a test composed of mainly difficult or easy items.
  • Objective scoring: Reliability is greater when test can be scored objectively. When subjective scoring, the same responses can be scored differently on different occasions even if it is the same person. A machine-scored test is more reliable than a hand-scored test because it is less subject to human error.
  • Limited time: A test in which speed is a factor is more reliable than a test that all students can complete in the time available.


Depending on the teacher's knowledge of research for administering the test, he can choose from different types of validity.

  • Content validity: When constructing a test for a particular object, the teacher must be aware whether the items adequately reflect the specific content of that subject.
  • Curricular validity: A test that reflects the knowledge and skills presented in a particular school's curriculum has curricular validity. In such a test the items adequately sample the content of the curriculum the students have been studying. Many standardized tests have excellent content validity on a nationwide basis.
  • Criterion validity: Criterion validity is the extent to which a particular test correlates with some acceptable and valid test or measure of performance of the learners. Example: A test for creativity is given to students. Scores of the tests are compared with scores on another test or measure of creativity that is accepted as valid. If there is a high correlation between the high and low scores of the new and established tests, then the new test is considered to have criterion validity.


A third criterion for selecting a test is usability. A test should be easy for students to understand, easy to administer and score, within budget limitations if it has to be purchased, suitable to the test conditions (for example: time available) and appropriate in the degree of difficulty.

A test may be valid in content, but the questions may be so ambiguous or the directions so difficult to follow that a student who understands the materials may give the wrong answer.

Factors Affecting Usability

  • Unclear directions: Directions that do not clearly indicate to the learners how to respond to the items, whether it is permissible to guess, and how to record the answers will tend to reduce validity.
  • Reading vocabulary and sentence structure are too difficult: Vocabulary and sentence structure that are too difficult or complicated for the learners taking the test will result in the test's measuring reading comprehension and aspects of intelligence rather than of learners performance it intended to measure.
  • Poorly constructed test items: Test items that unintentionally provide clues to answer will tend to measure the learner's alertness in detecting clues as well as the aspects of learner's performance that the test is intended to measure.
  • Ambiguity: Ambiguous statements in test items contribute misinterpretations and confusion. Ambiguity sometimes confuses the bright learners more than the dull ones, causing the items to discriminate in a negative direction.
  • Test items inappropriate for the outcomes being measured: Attempting to measure understandings, thinking skills and other types of achievement with test forms that are appropriate only for measuring factual knowledge will invalidate the results.
  • Test is too short: A test is only a sample of the many questions that might be asked. If a test is too short to provide a representative sample of the performance we are interested in, validity will suffer accordingly.
  • Improper arrangement of items: Test items are typically arranged in difficulty with the easiest items first. Placing difficult items early in the test may cause the learners to spend too much time on these and prevent them from reaching items they could easily answer.


carol roach
Posted on Jun 25, 2011
Felisa Daskeo
Posted on Jun 25, 2011
Darla Smith
Posted on Jun 25, 2011