statistics

Floor Effect, Ceiling Effect and Computing Internal Consistency Reliability at Post-test

ByAnnMaria De Mars January 28, 2013

Very often, researchers (including me) use multiple-choice tests to collect data to determine whether or not an intervention has worked. Does the Dance Your Way to Math curriculum really result in higher test scores? Does Lollipop Spelling reduce the number of spelling errors? and on and on.

I remember being told that statistics to be generalized to the population, like internal consistency reliability or test-retest reliability should be computed either only using the pre-test scores (in the case of internal consistency) or only the control group in the case of both test-retest correlations and post-test internal consistency reliability. The reason, we are told, is that “something has been done” to the intervention group, which means that they are no longer representative of the population. While I agree with that reasoning in the case of test-retest correlation, I am not so convinced in the case of internal consistency.

Let’s talk about floor and ceiling effects for a minute.

A floor effect is when most of your subjects score near the bottom. There is very little variance because the floor of your test is too high. In layperson terms, your questions are too hard for the group you are testing. This is even more of a problem with multiple choice tests. With other types, if the subject doesn’t know, they aren’t likely to guess that the answer is, say (a+b)(a-b) and so they get it wrong. With a multiple-choice test with four choices, they will randomly get it correct 25% of the time. If there are a bunch of questions that are too hard, you have a bunch of people randomly getting each one right just by chance. Combine low variance with a lot of random error and your internal consistency reliability is going to be in the toilet. So, let’s say you have exactly that on your pre-test. Then, you test again after some time and your control group, having had no training in the meantime, is equally low, the problems are still too hard, you still have random guessing and low variance.

A ceiling effect is the opposite, all of your subjects score near the top. There is very little variance because the ceiling of your test is too low. In layperson terms, your questions are too easy for the group you are testing. Here you don’t have the problem of random guessing, but you do have low variance. Think back to Statistics 101 – restriction of range attenuates correlations. Again, in layperson terms, if you correlate height and weight of NBA players, for example, you find almost no relationship between height and weight because they are ALL very tall and ALL very heavy. If you make the questions on your pretest easier, that may give you better internal consistency reliability at pre-test, but since a good percentage of your subjects knew the questions at the beginning, by the end of your training maybe nearly all of them will, and then you run into a ceiling effect.

My suggestion is to compute internal consistency reliability at the beginning of your study for the whole group and at post-test for the control and intervention groups separately. You may find that, having successfully avoided both floor and ceiling effects for the post-test intervention group that you get good internal consistency reliability for them.

Data Analysis by Example: That’s funny …

ByAnnMaria De Mars June 26, 2016June 26, 2016

In the last post, I used SAS Enterprise Guide to filter out a couple of ‘bad’ records that came from test data, then I created a summary table of the number of questions answered and the percentage correct. Then, I calculated the mean percentage correct for the around 84%. That seemed a bit high to me….

Software | statistics | Technology

Explaining SPSS and Python Modules to the Cat

ByAnnMaria De Mars February 20, 2012February 20, 2012

We had a cat named Beijing. “Had” is the key word in that sentence. This weekend I dropped Bejing off to live with my daughter, Jenn, who actually likes cats in general and this cat in particular. The main reason (besides that Jenn wanted her) is that the cat always wanted to sit on my…

Software | statistics

Parallel Analysis Criterion Simplified?

ByAnnMaria De Mars October 23, 2014October 23, 2014

Am I missing something here? All of the macros I have seen for the parallel analysis criterion for factor analysis look pretty complicated, but, unless I am missing something, it is a simple deal. The presumption is this: There isn’t a number like a t-value or F-value to use to test if an eigenvalue is…

Software | statistics | Technology

More adventures with SAS web editor

ByAnnMaria De Mars June 4, 2013June 4, 2013

After dropping The Spoiled One off at the beach and setting a personal best for calls returned, I’m back on setting up data sets, assignments and more for the fall semester. So ….. I tried using the UPLOAD option from the menu in the SAS Web Editor and that was a sad failure. Next, I…

Dr. De Mars General Life Ramblings | Software | statistics

CHAPTER 1: AFTER THE DATA STEP

ByAnnMaria De Mars June 1, 2011June 1, 2011

Any person who claims to know all of SAS is either clinically insane or a liar. However, that is not you. YOU are reading this book. Based on this one fact, I can conclude a couple of things about you. First, you know the basics of SAS. You can code a DATA step. You have…

statistics

What’s all that factor analysis crap mean anyway? Part 1 of Several

ByAnnMaria De Mars July 10, 2013July 10, 2013

My doctoral advisor, the late, great Dr. Eyman, used to tell me that my psychometric theory lectures were, A light treatment of a very serious subject. Hmph. Well, with all due respect to a truly wonderful mentor, I still have to state unequivocally that the majority of students when looking at a factor analysis for…

6 Comments

Qetelo says:

June 14, 2014 at 6:06 am

Makes good sense! One more question:
Can the same test suffer both floor- and ceiling- efects? Possible? Please explain to me.
AnnMaria says:

June 16, 2014 at 2:09 am

The same test could not have both floor and ceiling effects for the same subjects. Most of the subjects could not score near the top and near the bottom. It could have floor effects for, say, 4th-graders and a ceiling effect for college students.
Pingback: Standardized testing: Solving your reliability problem : AnnMaria's Blog
Donbila says:

March 15, 2017 at 1:45 pm

What are the significant of Floor and Ceiling Effect
Radhika says:

April 24, 2019 at 7:56 pm

Ok…but what may be in the context of a health research…while administering a quality of life scale for eg.?

Thank you
Keren Ecija says:

June 19, 2020 at 7:49 am

How could problems caused will these effects be overcome in the experiments?

Similar Posts

6 Comments

Leave a Reply