Blogs
Questions Worth Asking: Research-Based Best Practices for Writing Multiple Choice Questions
By Hannah Jackson
Why do we still use multiple choice questions, anyways?
When you think of a test, what is the first thing that comes to mind?
A. A written essay question
B. A group project
C. A series of multiple choice questions
D. An oral exam
While all of these options are valid assessments, (and I am quick to advocate for assessment that goes beyond a paper-pencil exam) you likely chose C. The multiple choice exam has been around for so long that your middle school science teacher used it, your kids take them in school today, and you probably assigned one on Isidore this term.
So, what’s the deal? Are multiple choice questions an inferior testing option? Well, it’s not that simple. They can be poorly written, measure lower-level thinking skills, and decimate the validity and reliability of your assessment data. Yet, they can be a wonderful way to assess students when written with research-supported guidelines!
The Benefits of Using Multiple Choice
- Multiple choice assessments are simple to administer, complete, and grade. Really, in the academic world we live in now, why wouldn’t you assess with something that saves you precious grading time?
- When done correctly, multiple choice assessment can test most higher-order thinking skills.
- It does not discriminate against students with lower writing proficiency or English language levels.
- Multiple choice questions have a certain perceived grading objectivity from students. With full acknowledgment that no assessment will ever be truly objective, students may feel this type of question is more fair.
“To call such a test ‘objective’ does not, however, mean that MCQ tests and the like assess candidates ‘objectively’, that is, without any prejudice or bias. Indeed, such a thing is not possible. No assessment method can be objective; all assessment methods are subjective to a greater or lesser extent.” (Baldwin & Scharf, 2007)
The most important aspect of any assessment is the extent to which the collected data is reliable and valid. Reliability refers to the consistency and dependability of the results. Validity refers to the test measuring what it is actually intended to measure. We’ll look at how to make multiple choice tests valid and reliable in the best practices a bit further along.
Anatomy of a Question
Multiple choice questions are labeled in a few ways. Firstly, there is the stem. This is the question or prompt at the top of the question. Next, there are generally 3-5 alternatives, or possible answers. The right answer is the key, and the incorrect options are distractors. Some questions may have a lead-in, an informative or scenario-based prompt that precedes the stem.
Best Practices: Stems
- Specificity: The name of the game when writing your test questions is specificity. Questions should be specific, focused, and include no extraneous details to the content being tested. This is how we ensure validity. Ask yourself, does this question ask my students to use the skill required of them from the student learning objective? Or, did you include extra information that really isn’t congrue nt with your learning objectives?
- Complete Questions: Write your question in question format– skip the fill-in-the blank. Some studies say a question format reduces cognitive load.
No: Harry Styles was born in ______.
Yes: In what year was Harry Styles born?
You want to avoid writing test questions that turn into a reading test. Unless you are testing students on reading comprehension, it is not tied to your learning objective, and you shouldn’t be testing for it. Make questions clear, concise, and simple. The concept of “irrelevant difficulty” comes up often when discussing assessment validity.
“Irrelevant difficulty”, as its name suggests, is used to describe a question, which is made difficult for reasons that are unrelated to the trait that is the focus of the assessment. It is therefore also a threat to the validity of the assessment” (Coughlin and Featherstone, 2017). - Be Positive: Another way to reduce cognitive load is to state questions in the positive. In the field or the workplace, we identify what we need rather than eliminating what is not true or not needed, and your questions should do the same. In their guide to writing test questions in the medical field, Coughlin and Featherstone put it, “[Negatively stated] questions do not mimic typical reasoning in the workplace: the normal approach is to look for the best, not the worst outcome for patients.”
Yes: Mary is studying animal cells and finds some dysfunction with the mitotic spindle assembly. Which organelle is related to her observations?
No: Which of the following functions does the centrosome not perform?
You will notice in the previous question that the “good” example is also asking students to apply their knowledge, while the poor question requires a simple information recall. - Write for Higher-Order Thinking Skills: While writing multiple choice questions, the use of situations, scenarios, or lead-ins can assist in bringing your question from a case of knowledge recall (In what year was Harry Styles born?) to application, analyzation, and evaluation of information. If you have painstakingly written your learning objectives to align with a Bloom’s Taxonomy higher-order thinking skill, then your test questions should assess students at the same order of thinking. Asking a student to recall specifics of a historical event does not tell you if they were able to evaluate the cultural impact of the event, for example. In this case, a scenario-based question may prompt students to connect and analyze the information before selecting an answer.
“A stem that presents a problem that requires application of course principles, analysis of a problem, or evaluation of alternatives is focused on higher-order thinking and thus tests students’ ability to do such thinking” (Brame, 2013). - Align Content to Learning Objectives: To ensure that the data is really evaluating your student learning objectives, questions should be in line with the syllabus, the topic of your module, and the content you asked students to learn. Use your learning objectives to guide your way through the course, and come back to them as you write each and every assessment question.
Best Practices: Alternatives
When it comes to your alternatives, the reliability and validity (I said it again) of your test depends on how you write and format the choices.
- Keep them homogeneous: All alternatives should be homogeneous in nature and plausible. Ensure that they are all the same length and include about the same level of detail. No one alternative should be longer or shorter than another. Test-wise students who do not know the answer may choose the option with the most detail, which is often the correct one. Likewise, do not repeat specific language between your stem and your alternatives.
- All or None of the Above: Research recommends eliminating “All of the above” and “None of the above” as options. If a student knows that one alternative is not correct, they also now know “All of the above” is not correct, and they have a fifty-fifty chance of guessing the correct choice. In turn, your assessment data for that student’s question is no longer reliable, yet you are none the wiser as to whether the student guessed or really knew the material!
- No Overlaps: It is also recommended to avoid alternatives that include two or more pieces of information per item or overlap (e.g. “Both A and B”). Students may rely on Convergence Theory, guessing the alternative with the two terms or ideas that were most listed in the alternatives.
- 3 - 4 Choices: Research has found that three to four alternatives per question give the most reliable results.
- Order: Finally, your alternatives should be randomized, unless they can be listed in logical order. Values, dates, percentages, and the like should be listed in ascending or descending order.
- Evaluate and Update: At the end of a term, collect and review your assessment data. Be ready to evaluate and swap out distractors that are consistently chosen by only a few students or are selected more often than the correct choice.
Best Practices: Grading and Feedback
Grading multiple choice exams is a gray area that needs more research. Number-Right Scoring is the most common method of grading (the instructor adds the total points from correct questions together). Other methods include Negative Marking, in which the number of points from questions answered incorrectly are subtracted from the number of points correctly answered. This is argued to reduce guessing, though the data here is weak. Similarly, another school of thought awards points for omitted items rather than taking away a point for guessing and answering incorrectly to try to curb guessing.
Did you know you can offer partial credit on multiple choice exams? Partial credit grading can work in a few ways:
- A liberal multiple choice test is one where the student selects more than one response when they are not sure between two alternatives.
- In elimination testing, the student crosses out alternatives that they know are incorrect.
- With confidence weighting, the student gives their answer as well as a value indicating how confident they are in their choice.
(While Isidore settings can accomplish a couple methods of grading multiple choice exams, come talk to us at the Center for Online Learning if you are looking to be extra adventurous.)
Alas, we don’t know which grading method is optimal for reducing guessing and putting a point value on student learning. One area that has been well-researched, however, is feedback.
Feedback delivered shortly after the answering of a question is very valuable to student learning. In Isidore, instructors have the option to give question-level feedback, which is the most meaningful to a student when correcting their thinking. With question-level feedback given immediately after finishing the test, “[Students] can still remember why they thought and did what they thought and did, and can thus reflect on this” (Baldwin & Scharf, 2007). Your feedback can include resources or a textbook chapter for the student to review, or it can be as simple as saying, “The longest big cat gestation period is a lion at 110 days.”
What now?
The more equipped we are to build worthwhile questions, the more valid and reliable assessment data will be. While there are certainly many situations for which multiple choice is not the best assessment option, this humble question format is here to stay. Building stems and alternatives that give students the most opportunity to show their learning, writing meaningful question-level feedback, and designing questions to test higher-order thinking skills can transform the quality of your data garnered from a multiple choice exam.
Reference this short guide to use as a checklist of best practices while you build your next assessment.
Sources:
- Boland, R. J., et al. “Writing Multiple-Choice Questions.” Academic Psychiatry, vol. 34, no. 4, 2010, pp. 310–316., https://doi.org/10.1176/appi.ap.34.4.310.
- Catts, Ralph. “Q. How Many Options Should a Multiple-Choice Question Have? (a) 2. (b) 3. (C) 4. at-a-Glance Research Report.” ERIC, 30 Nov. 1977, https://eric.ed.gov/?id=ED173354.
- Considine, Julie, et al. “Design, Format, Validity and Reliability of Multiple Choice Questions for Use in Nursing Research and Education.” Collegian, vol. 12, no. 1, 2005, pp. 19–24., https://doi.org/10.1016/s1322-7696(08)60478-3.
- Coughlin, P.A., and C.R. Featherstone. “How to Write a High Quality Multiple Choice Question (MCQ): A Guide for Clinicians.” European Journal of Vascular and Endovascular Surgery, vol. 54, no. 5, 2017, pp. 654–658., https://doi.org/10.1016/j.ejvs.2017.07.012.
- Lesage, Ellen, et al. “Scoring Methods for Multiple Choice Assessment in Higher Education – Is It Still a Matter of Number Right Scoring or Negative Marking?” Studies in Educational Evaluation, vol. 39, no. 3, 2013, pp. 188–193., https://doi.org/10.1016/j.stueduc.2013.07.001.
- Mcdaniel, Rhett. “Writing Good Multiple Choice Test Questions.” Vanderbilt University, Vanderbilt University, 7 Dec. 1970, https://cft.vanderbilt.edu/guides-sub-pages/writing-good-multiple-choice-test-questions/.
- Scharf, Eric M., and Lynne P. Baldwin. “Assessing Multiple Choice Question (MCQ) Tests - a Mathematical Perspective.” Active Learning in Higher Education, vol. 8, no. 1, 2007, pp. 31–47., https://doi.org/10.1177/1469787407074009.
- Sullivan, Gail M. “A Primer on the Validity of Assessment Instruments.” Journal of Graduate Medical Education, U.S. National Library of Medicine, June 2011, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3184912/#:~:text=Reliability%20refers%20to%20whether%20an,of%20the%20assessment%20of%20validity.
- Validity Threats: Overcoming Interference with Assessment Data. https://onlinelibrary.wiley.com/doi/full/10.1046/j.1365-2923.2004.01777.x.
- VIOLATO, CLAUDIO. “Item Difficulty and Discrimination as a Function of Stem Completeness.” Psychological Reports, vol. 69, no. 7, 1991, p. 739., https://doi.org/10.2466/pr0.69.7.739-743.
- Xu, Xiaomeng, et al. “Multiple-Choice Questions: Tips for Optimizing Assessment in-Seat and Online.” Scholarship of Teaching and Learning in Psychology, vol. 2, no. 2, 2016, pp. 147–158., https://doi.org/10.1037/stl0000062.