Wednesday, August 28, 2013

What are Cut Scores and How Do They Impact My Students?

By Saler Axel, RME Research Assistant

Many researchers and practitioners believe that tests are used for accountability in education now more than ever before. The media often report the percentage of students placed into particular performance standards on high-stakes tests and the resulting impact on students, schools, districts, and states can be considerable. We frequently hear how impactful high-stakes tests can be, but what about those assessments developed by teachers?

Teachers are the most frequent users and producers of tests (Nunally, 1964). Teachers’ assessments account for at least 75 percent of all educational measures (Nunally, 1964). They are responsible for testing students individually and interpreting student-related measurement data (Nunally, 1964; Torgerson & Adams, 1954). If created well, classroom tests can be more useful than a standardized exam, particularly as a measure of content (Worthen et al., 1993). This is great news for those of us that want to formatively gauge students’ understanding of classroom content in a well-constructed and accurate manner! (See Beth Richardson’s blog on test development guidance.)

Beth’s blog highlights the components of a well-developed assessment. After reading it, your next consideration might be: What are cut scores and performance standards? How do I interpret my students’ test scores? What do these assessments tell me about my students? And ultimately, how do these test scores impact my students?

What are cut scores and performance standards?

Examinees are often classified in a pass-fail or “mastery-proficiency-competency” (Berk, 1980) manner. You have likely used these categories before in your own teaching. Researchers call these categories performance standards. Cut scores are the points between each grouping. Performance standards are defined as qualitative distinctions between adjacent levels of what test takers know and what they can do at specified levels (Kane, 2001). Cut scores, defined as quantitative points on a performance continuum, serve as operational versions of the corresponding performance standard (Kane, 2001). When you combine the two concepts, the cut score is a statement of how much knowledge of the content domain an examinee needs to demonstrate to fall within a particular performance standard (Haertel, 1985; Jorgensen & McBee, 2003).

How do I interpret my students’ test scores?

In other words, when you administer an assessment to your students, you are testing their knowledge of a particular construct. This may include your first grade students’ knowledge and ability to use addition properties, such as commutativity and associativity, to add whole numbers. Imagine that this assessment includes three performance standards and two cut scores.

 Students that score below cut score 1 are considered competent users of addition properties. Students that scores between cut score 1 and cut score 2 are considered proficient users of addition properties. Students that score above cut score 2 are considered having mastered addition properties.

What do these assessments tell me about my students?

Performance standards are similar to rubrics. They describe what concepts a student must understand (and demonstrate the knowledge and skills pertaining to it) to place into a particular performance standard and receive a certain test score. Performance standards list characteristics of students’ skills. Using our prior example, the competent user of addition properties to add whole numbers may be able to utilize the properties when prompted, but require scaffolded guidance to implement them. Proficient users may only need prompting but once reminded, need no further assistance. Students that have mastered addition properties may be able to utilize them without prompting or scaffolded guidance.

How do these test scores impact my students?

If a student’s test score is inaccurately interpreted, they may place into a performance standard that does not reflect their true knowledge and skill level. This can cause unintended consequences (AERA, APA, & NCME, 1999) that may include inaccurate course placement or even denied access to special instruction (AERA et al., 1999). As a result, when you take time to create a test, make sure that you have considered all intended and possible unintended consequences that may arise from your students placing into performance standards that do not accurately reflect their true knowledge and skills.

Questions for consideration

Reflect on a test that you have recently administered in your classroom.
  • Did you take the time to really consider the performance standard categories and what their impact on your student might be? 
  • How can you apply your new (or enhanced!) knowledge of performance standards to the next test you administer in your classroom? 
  • What types of things will you do to inform your instruction after calculating their scores?
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. 

Berk, R. A. (1980). Introduction. In R. A. Berk (Ed.), Criterion-Referenced Measurement: The State of the Art (pp. 3-9). Baltimore, MD: The Johns Hopkins University Press. 

Haertel, E. (1985). Construct validity and criterion-referenced testing. Review of Educational Research, 55(1), 23-46. 

Jorgensen, M. A. & McBee, M. (2003). The new NRT model. Retrieved from Kane, M. T. (2001). So much remains the same: Conception and status of validation in setting standards. In G. J. Cizek (Ed.), Setting performance standards: Concepts, methods, and perspectives (pp. 53-88). Mahwah, NJ: Lawrence Erlbaum Associates. 

Nunnally, J. C. (1964). Educational measurement and evaluation. New York, NY: McGraw-Hill Book Company. 

Torgerson, T. L. & Adams, G. S. (1954). Measurement and evaluation for the elementary-school teacher. New York, NY: The Dryden Press. 

Worthen, B. R., Borg, W. R., & White, K. R. (1993). Measurement and evaluation in the schools. New York, NY: Longman Publishing Group.

Thursday, August 22, 2013

How to Write a Smart Test

By Beth Richardson, RME High School Mathematics Coordinator

My career in education began as a high school math teacher. Throughout my teaching, I wrote countless math “questions” to check my students’ understanding, from daily bell-ringers to full-length tests. However, it wasn’t until I became immersed in the world of assessments that I learned some important components of a well-written test. First of all, the “questions” that make up a test are commonly referred to as items by researchers in the field of assessment, which is what I’ll call them from here on.

Test Math Knowledge in Different Ways

There are many different levels in which the brain engages in mathematical concepts. The book Adding It Up: Helping Children Learn Mathematics (2001) identifies five specific types of thinking that together determine a person’s proficiency in math. Here’s a brief explanation of each and example of items with the same skill (slope) assessed at the different proficiency levels:
  • Conceptual Understanding – comprehension of mathematical concepts, operations, and relations
  • Procedural Fluency – carrying out procedures flexibly, accurately, efficiently, and appropriately
  • Strategic Competence – ability to formulate, represent, and solve mathematical problems 
  • Adaptive Reasoning – capacity for logical thought, reflection, explanation, and justification
  • Productive Disposition – habitual inclination to see math as sensible, useful, and worthwhile, coupled with a belief in one’s own efficacy. 
Sample Conceptual
Sample Procedural
Sample Strategic
Sample Adaptive

As teachers, we can only test the first four of these in a traditional test setting. However, productive disposition is something you can learn about each of your students as you interact with them daily.

Multiple-Choice Tests

Basic Multiple-choice Item Components:
  • Skill: Comes from TEKS, district curriculum, etc.
  • Mathematical Proficiency Level: Procedural, Conceptual, Strategic, or Adaptive
  • Stem (Text/Graphic): Make sure the text and graphics you use are purposeful and relevant to the underlying mathematical skill/concept being assessed
  • 4 Response Options: 1 correct response and 3 distractors that are well-thought out - no throw away distractors!
Write Plausible Distractors

For multiple-choice tests, the responses you provide are just as important as the question you ask.

Take the time to write distractors that are based on students’ common mistakes and misconceptions. To help ensure the distractors are plausible, write a rationale for each distractor. Also, avoid using give-away distractors that do not relate to the item. Here’s an example of a spreadsheet that can be used when writing a test. This spreadsheet can easily be copied and changed to create multiple forms of the same test. The specific details in the stem can be changed, but the same distractor rationales can be used. This will allow you to analyze the knowledge of all students even across different test forms. You can also use this spreadsheet for free response items (ex: items 11 and 12).

Where to find the most common mistakes your students will make:

1) Your students: Daily: During class discussion or student activities, take note of how students explain and talk about concepts. Previous Assessments: While grading homework, quizzes, and tests take note of the most common errors your students make and misconceptions your students have about particular operations or topics.

2) Research-based resources: IES Practice Guides; Adding It Up; And many more…

Summing it All Up:
When writing any assessment, it is important to include items that test students’ conceptual understanding, procedural fluency, strategic competence, and adaptive reasoning skills because each of these components is equally important in their overall math proficiency. When writing items for multiple-choice tests, make sure to be purposeful in the response options you include.

National Research Council. (2001). Adding it up: Helping children learn mathematics. J. Kilpatrick, J. Swafford, and B. Findell (Eds.). Mathematics Learning Study Committee, Center for Education, Division of Behavioral and Social Sciences and Education. Washington, DC: National Academy Press.

Siegler, R., Carpenter, T., Fennell, F., Geary, D., Lewis, J., Okamoto, Y., Thompson, L., Wray, J. (2010). Developing effective fractions instruction for kindergarten through 8th grade (NCEE 2010-4039). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. Retrieved from