statistics

Floor Effect, Ceiling Effect and Computing Internal Consistency Reliability at Post-test

ByAnnMaria De Mars January 28, 2013

Very often, researchers (including me) use multiple-choice tests to collect data to determine whether or not an intervention has worked. Does the Dance Your Way to Math curriculum really result in higher test scores? Does Lollipop Spelling reduce the number of spelling errors? and on and on.

I remember being told that statistics to be generalized to the population, like internal consistency reliability or test-retest reliability should be computed either only using the pre-test scores (in the case of internal consistency) or only the control group in the case of both test-retest correlations and post-test internal consistency reliability. The reason, we are told, is that “something has been done” to the intervention group, which means that they are no longer representative of the population. While I agree with that reasoning in the case of test-retest correlation, I am not so convinced in the case of internal consistency.

Let’s talk about floor and ceiling effects for a minute.

A floor effect is when most of your subjects score near the bottom. There is very little variance because the floor of your test is too high. In layperson terms, your questions are too hard for the group you are testing. This is even more of a problem with multiple choice tests. With other types, if the subject doesn’t know, they aren’t likely to guess that the answer is, say (a+b)(a-b) and so they get it wrong. With a multiple-choice test with four choices, they will randomly get it correct 25% of the time. If there are a bunch of questions that are too hard, you have a bunch of people randomly getting each one right just by chance. Combine low variance with a lot of random error and your internal consistency reliability is going to be in the toilet. So, let’s say you have exactly that on your pre-test. Then, you test again after some time and your control group, having had no training in the meantime, is equally low, the problems are still too hard, you still have random guessing and low variance.

A ceiling effect is the opposite, all of your subjects score near the top. There is very little variance because the ceiling of your test is too low. In layperson terms, your questions are too easy for the group you are testing. Here you don’t have the problem of random guessing, but you do have low variance. Think back to Statistics 101 – restriction of range attenuates correlations. Again, in layperson terms, if you correlate height and weight of NBA players, for example, you find almost no relationship between height and weight because they are ALL very tall and ALL very heavy. If you make the questions on your pretest easier, that may give you better internal consistency reliability at pre-test, but since a good percentage of your subjects knew the questions at the beginning, by the end of your training maybe nearly all of them will, and then you run into a ceiling effect.

My suggestion is to compute internal consistency reliability at the beginning of your study for the whole group and at post-test for the control and intervention groups separately. You may find that, having successfully avoided both floor and ceiling effects for the post-test intervention group that you get good internal consistency reliability for them.

statistics

You can’t “exactly” tell anything with statistics

ByAnnMaria De Mars October 3, 2018October 3, 2018

Some people believe you can say anything with statistics. I don’t believe that is true, unless you flat out lie, but if you are a big fat liar, I am sure you would lie just as much without statistics. However, a point was made today when Marshall and I were discussing, via email, our presentation…

Phi coefficients, odds ratios and the F-word

ByAnnMaria De Mars February 4, 2009February 4, 2009

Yes, I am the F-word – a feminist. I was at a faculty meeting this weekend and one of the presenters began by saying, pointing to a colleague in the audience, “I am sure Dr. Y knows more about this than me.” Several times in her presentation on analysis of assessment data she would pause…

Dr. De Mars General Life Ramblings | statistics

Hi-de-ho, off to Salem I go

ByAnnMaria De Mars October 9, 2013

Hey, if you are a furloughed federal employee looking for something free to do on Thursday because you still haven’t received your back pay that Congress promised you, you can drop on into Salem, Oregon to the Oregon SAS Conference. I will be speaking on Categorical Data Analysis, Telling Stories with Your Data and How…

Software | statistics | Technology

SAS Studio – where and wow

ByAnnMaria De Mars September 22, 2014

I’m pretty certain I did not deliberately hide these folders. When I opened up my new and improved SAS Studio, it had tasks but my programs were missing. If this happens to you and you are full of sadness missing your programs, look to the top right of your screen where you see some horizontal…

statistics

All the little models come home to nest

ByAnnMaria De Mars December 12, 2008

Categorical data analysis used to be simple. You had two nominal variables and you did a chi-square analysis. If it was statistically significant, that was all it took to make life good. Then, logistic regression came along, with the reasonable notion that: A. Dichotomous choices such as bought a candy apple/ate tofu & bean sprouts…

Software | statistics

What’s Awesome about OpenGov

ByAnnMaria De Mars October 19, 2010

I am a huge fan of the OpenGov initiative. In brief, this is an effort mandated by executive order to increase transparency in government. One of the major benefits to people like me with equal parts curiosity, cynicism and gigabytes of RAM is that anyone can access just mountains of government data by going to…

6 Comments

Qetelo says:

June 14, 2014 at 6:06 am

Makes good sense! One more question:
Can the same test suffer both floor- and ceiling- efects? Possible? Please explain to me.
AnnMaria says:

June 16, 2014 at 2:09 am

The same test could not have both floor and ceiling effects for the same subjects. Most of the subjects could not score near the top and near the bottom. It could have floor effects for, say, 4th-graders and a ceiling effect for college students.
Pingback: Standardized testing: Solving your reliability problem : AnnMaria's Blog
Donbila says:

March 15, 2017 at 1:45 pm

What are the significant of Floor and Ceiling Effect
Radhika says:

April 24, 2019 at 7:56 pm

Ok…but what may be in the context of a health research…while administering a quality of life scale for eg.?

Thank you
Keren Ecija says:

June 19, 2020 at 7:49 am

How could problems caused will these effects be overcome in the experiments?

Similar Posts

6 Comments

Leave a Reply