Logistic regression is based on logarithms. Ordinary Least Squares regression and analysis of variance uses the actual values as the dependent and independent variables in an equation. Logistic regression does not.
What is a log, anyway?
Let’s start with the very basics. First we learned to add:
5+5+5+5 = 20
After about eight years of age, we realized that was pretty inefficient so we started multiplying
5 x 4 = 20
We got a few years older, thought, why stop there, and got into exponents
5 x 5 x 5 x 5 became 5 to the fourth power = 625
Then we get into logarithms, where the log to the base 5 of 625 = 4
Think about this. Really think about it. Go to the wikipedia page that has a good explanation of logarithms and read that.
Calculate the logs of several numbers to different bases, just for the heck of it. I have noticed that, so often, students skip over topics like logarithms thinking, “I don’t need to know that.”
This is wrong on a whole lot of fronts, just one of those reasons being it is a really bad habit to get into. I don’t know how many reports I read in the newspaper of people losing their homes that included the statement,
“Mr & Mrs John Q Public said that they did not understand the mortgage papers, that they just trusted the real estate agent, the banks or the ad they watched on TV at 2 a.m.”
So, whose fault is that. Understand what you are doing! Start with logarithms. It’s as good a place as anywhere else.
I was going to write about the log of odds ratios and explain logarithms. This is the very, very odd fact I have noticed in most of social science – people in doctoral programs are often thrust into statistics course for which they really don’t have the basic mathematical foundation. This is because if they were really into mathematics they probably would have gone into that in the first place, but they didn’t. They majored in history or liberal studies or something else. They became teachers or social workers or counselors. In doing so, they took the one (count ‘em, ONE) required mathematics course to get a college degree. Further dismaying news, the one mathematics course has been changed dramatically since I was in college – now, you can get a degree with a C- in College Algebra – whatever that is. It definitely does not involve logarithms. When I went to college, Algebra was something you were supposed to have had in high school. But I digress. This whole post is a digression so now I have digressed squared.
So, what did I do all day if not write about logarithms?
Downloaded a dataset from the Interuniversity Consortium for Political and Social Research, which is a site I truly love. What a great idea! Finished with your data? Upload it to the Internet and let anyone else use it for whatever they can find.
Spent an hour (I am embarrassed to confess this) trying to find the error in my SPSS syntax and could not figure out why the HELL it kept saying “file not found” when I could clearly see it there. Finally, as a last resort, went to the c:\ prompt, listed the files and realized that Windows, designed by Machiavelli, hides part of the file name so that my file was actually named college.txt.txt . AAAGH !
Monir, the travel lady, stopped by and took care of my reservations and registration for SAS Global Forum.
Worked on the PowerPoint for my Enterprise Guide class next week. For once, did not waste time rotating the charts in space to see how my bar chart would look sideways (oh, don’t pretend you never did it!)
Enterprise Guide runs pretty slow on my old computer with only 1 G RAM, and EVERYTHING runs really slow with the size of some of the datasets I have been using. My Mac desktop only has 512M RAM (I know, I am deprived) and I was kind of tired of using my laptop and the Unix server for everything.
Today, Justin a.k.a., our hardware guy, came by and told me he had a new computer for me. Like everywhere else, we are watching the budget but somehow he came up with one. So far, I have installed SPSS 17, run a factor analysis with 191,000 subjects and it ran in less time than it took me to type this sentence. I am very happy. Installing the applications I need took a good bit of time, but it will be well worth it in the end in saved time and aggravation. I decided to try something different, so I installed Seamonkey as my browser and downloaded Open Office and Gimp instead of Microsoft Office and Photoshop.
Speaking of sea monkeys, I thought this would be a good example of how statistics can be applied to everything. Even though I work about a block from the museums and Exposition Park, I have only been there once in the last year. So, Tuesday, I walked over to the science museum shop and bought a sea monkey kit. My idea was that I would have it on my desk, collect data and use it in different statistical analyses.
This could also be an example of creativity in analysis in trying to come up with different variables. My original thought was perhaps I could begin with the number of sea monkeys hatched. Unfortunately, statistics for the day are :
Specks floating around that could possibly be sea monkeys – somewhat less than a zillion. I would give a close approximation as 1,000.
Matter which can be definitely distinguished as sea monkeys- 0.
I probably should re-read that code on break point analysis before the meeting tomorrow. I should finish editing the on-line ethics course. Instead, though, I am sitting here with Jenn watching episodes of Numbers on DVD.
She wanted to know, “Is that really true? Can you tell that someone cheated by random numbers? That doesn’t make sense.”
I told her that it was exactly true. It’s like the old Sherlock Holmes story, the Curious Case of the Dog in the Night Time – what was significant was what you DIDN’T see. Sometimes, the telling evidence in an evaluation is that relationships don’t exist where they should, because the numbers were just made up and entered in the database, because, after all, who would ever know?
Yes, I am the F-word – a feminist. I was at a faculty meeting this weekend and one of the presenters began by saying, pointing to a colleague in the audience,
“I am sure Dr. Y knows more about this than me.”
Several times in her presentation on analysis of assessment data she would pause and make comments such as,
“Well, I am not very good at statistics, but this is pretty easy to understand.”
I was a bit annoyed at her self-deprecating manner. I wanted to walk up to her and say,
“You understand this perfectly well and I know Dr. Y, who is very smart and competent, but no more so than you.”
Even more annoying was another presenter, also a woman, also very competent, who gave a very good presentation on assessment. Near the end of it, she said,
“You don’t have to use numbers. For those of you who don’t do math, you can put your students in categories as having exceeded criterion, met criterion or failed. You can just put it in bullet points.”
For those of you who don’t do math …. ????
What the hell? This is a university faculty meeting; 99% of the people in the room have graduate degrees and at least three-fourths of them have Ph.D.’s.
Since when has it become acceptable to not be competent, particularly in math??? Would that same presenter have started a sentence with,
“For those of you who can’t read, I have recorded this presentation as a podcast?”
There may be some people who can’t read because they are visually impaired or have a learning disability, but we consider this a disability, not a lifestyle choice.
This particular department is overwhelmingly female, and I could not help but wonder if the same sort of statements would be made in a predominantly male department? In my admittedly non-random and non-representative experience, the answer is, “No.”
So, first of all, for all of you women (and men), who say you aren’t good at math – cut it out! That’s a lot of nonsense that some people are naturally good at math and some aren’t. It’s a lot like swimming. You aren’t born knowing how to swim and, yes, very few people will become Olympic swimmers, but the vast majority of people can learn to dive in a pool and swim a few laps. It just takes time and effort to practice.
Let’s start with the phi coefficient. I blatantly stole this table from the Children’s Mercy Hospital website because I thought it was very well-explained and easy to understand – until I realized that it wasn’t and I only understood it because I already knew exactly how to calculate a phi coefficient. However, not one to let any act of larceny go to waste, I used it anyway.
The formula for Phi is
Notice that Phi compares the product of the diagonal cells (a*d) to the product of the off-diagonal cells (b*c). The denominator is an adjustment that ensures that Phi is always between -1 and +1.
Let me explain this a little better. We have two categorical variables, gender – coded 1 =female, 2= male, and “Did you eat today?” – coded 0 = no , 1 = yes
In our table below, you can see that there is zero correlation between gender and if you ate today, as males and females are both equally likely to have had something to eat.
Gender \Ate today? NO YES TOTAL
Female 10 90 100
Male 10 90 100
Total 20 180 200
When we subtract (10*90) – (10*90) — obviously, the numbers are the same, so we get zero. There is zero relationship. In the formula above, a, b, c & d are the numbers in each cell.
So, we have mathematically shown that there is no relationship between gender and whether one eats or not. Let’s try another question, “Did you do the dishes?” This time, we get the following results:
Gender \Washed Dishes? NO YES TOTAL
Female 10 90 100
Male 90 10 100
Total 100 100 200
Let’s look at the phi coefficient again.
10*10 – 90*90 = 100 – 8100 = -8,000
100*100*100*100 = 100,000,000 and the square root of that is 10,000
So, our phi coefficient is -8,000/ 10,0000 or -.80. That is a pretty high correlation, considering that the coefficient ranges from -1 to +1.0 . A negative coefficient means that those who are lower on one variable (1= female, 2= male) are more likely to be higher on the other variable (0 = did not do the dishes, 1 = washed dishes).
So, our conclusion is that, while women are no more likely to eat each day than men, they are significantly more likely to do the dishes with data that I just made up to prove it. My daughter, Maria, tells me that any married woman knows that without the need for statistics.
Why did I just go into this in such detail and all about one coefficient? Because I think that is a big part of the reason that many people don’t learn math is that there are so many assumptions that we can “just skip over this”. In fact, the reason I liked the Mercy Hospital site is it did not start out with n10n21 – n21n10 / √(n0+n1+n+1n+2)
and assume that everyone knew what marginal distributions and array subscripts meant, because, I can guarantee you, that they don’t.
Sheila Tobias wrote a really interesting book about teaching and learning science, the title of which is “They’re not dumb, they’re different”.
Maybe, but I guarantee you that part of the problem is that they’re not clairvoyant. No one was born knowing that n10 means the number in the cell where the row value =1 and the column value = 0. It doesn’t help that at other times that same cell would be represented as n11 as the first row and first column.
If you can make that switch in your mind easily, it is no doubt because you, like me, have looked at thousands of matrices and had that notation explained to you so long ago that it is probably like learning to swim, you can’t even remember it. The secret to being good at math is the same as being good at swimming – practice!
Completely random fact – in my misspent youth, I was the first American to win the world championships in judo. If you type judo blog into google, the first of 3,000,000+ pages that comes up is mine. And my most recent judo blog was on outliers and practice. Rather unusual when the two halves of my split personality come together.
As to odds ratios, I have more to say about those, but it is 1:30 a.m. and I have to get up in 7 1/2 hours to go to work, so that will have to wait until another day.
I hate SQL. This is probably completely irrational, like that guy I turned down for a date in junior high school who my mom always tells me founded a very successful company and is making piles of money. No wait, it wasn’t irrational, he always tried to copy off me in Algebra, plus he was just plain boring. I think that is my problem with SQL, too, boredom. There is only so much left join, right join, outer, inner and dataset.variable I can tolerate before my brain tries to escape through my right ear just to get away from monotony. I have met people who love SAS. I have meet people who love SPSS. I have even met people who love Stata. Nobody loves SQL. They are just with it for the money.
What practical use is Mokken’s H, really? Yes, it is true that the maximum phi is determined by the marginal distributions, and if you get a phi of .20, for certain distributions, that might be the maximum you can get, but so what? Maybe I was scarred in my youth by reading some of the articles on bias in mental testing where those who were so determined to prove that intelligence was genetic corrected correlations for attenuation, sometimes to as high as 1.20 and then averaged the corrected correlations!
From a purely theoretical standpoint now, it’s completely different. If you are interested in the analysis of binary data – and how could you not be – you’ll like this paper by David Armstrong, at the University of Oxford. I like it because he is very sensible. He doesn’t take a stance like “You should never use phi, never analyze bivariate data in a factor analysis, ” etc. He takes a very measured view, which I like because really, so few things in the world are always true, except brain-dead obvious facts like you should not correct correlations to be above 1.0 ! (Clearly, I have still not gotten over that.) I have several SPSS workshops coming up. I think I will import the data from our evaluation of after-school programs to illustrate just how much the phi and tetrachoric coefficients move around when the marginal distributions change a lot. It’s a tough job, but somebody has to do it.
I can’t see a lot of people who are experienced SAS programmers switching to Enterprise Guide. Who I can see using it is people who use SQL, ACCESS, Excel or who are just starting to use statistics in their education or profession.
Hello, my name is Catain Obvious…. All that Data Step stuff you were missing and could not find in SAS Enterprise Guide? It was cleverly hidden in the menu under the word DATA.
“Must be a new meaning of the word ‘filter’ with which I was previously unfamiliar.”
Okay, maybe not so obvious is the fact that you need to go under the Data menu to Filter and Query to add two datasets together. I thought filter meant to hold back certain elements. Oh well, I guess it makes as much sense as going to the start menu to shut down your computer.
So, if you want to compute variables, recode variables, add tables or join tables, go to Data > Filter and Query.
Did I mention that I hate SQL ?
When I read textbooks, whether in mathematics or other fields, these are usually as boring as watching a light bulb flicker. Searching the Internet for Algebra problems can get to be pretty depressing. (Whether someone who spends her spare time looking for Algebra problems might already have mental health issues is a separate question not to be discussed at this time.)
Seriously, though, I don’t believe math is inherently boring. Today, I am doing a repeated measures Analysis of Variance. The question I want to answer is how far you can go from the original plan for a training program before it ceases to be effective. No one would imagine that if, instead of teaching Algebra on-line for an entire semester, you walked up to a group of students with a flat piece of slate and a rock, scratched out the Associative Property:
(a +bX) +cY = a + (bX + cY)
then went out for beer for the rest of the semester, that the students would learn an equivalent amount as in our full-semester, state-of-the-art course. Where is the dividing line, though? How many days could you skip? COULD you replace the computers with sharp rocks and flat pieces of slate and learn just as much? One way to test for this would be to check the significance of the interaction effect between type of class and the improvement on test scores.
I could go into great detail about what we are actually doing, and I probably will next time, but for now I am going to lament the sad state of Algebra. Here are a few examples of Algebra problems
The DeVry University page has questions about how much things cost if apples are fifteen cents and oranges are thirty-five cents or what the area of a circle is when r is increased by three.
The Broome Community College page asks you to factor 16x – 8.
This GRE practice site is a little better. It asks questions to problems that are mildly interesting, such as calculating total income from investments with different rates of return.
There are thousands of sites like those above, and these reflect nearly every Algebra textbook in America. One thing these all have in common is that I don’t much like them. We are asking students to apply a formula to a neat little problem. There are several reasons these are not the way I think we should teach Algebra.
- Most real problems are messy. It is not immediately apparent which formula you should use.
- Students are learning procedures rather than understanding mathematics. When a problem looks like this, apply the first formula. When it looks like that, apply the second formula. But why? I think there is a big difference between learning rules and thinking. A really big difference.
- In life, you have to ask your own questions most of the time. Someone else doesn’t give them to you.
- Questions that can be answered in 15 seconds aren’t the kind that really promote thinking.
” Find a function that expresses where a child sits on a seesaw in terms of her weight.”
If you woke up in the morning and everything was twice as big, how could you know?
Part of learning Algebra, I think, should be requiring students to come up with questions as well as answers. Questions could be either useful ones, such as about the effectiveness of changing course design, or simply interesting, like how you could know if the whole world doubled in size. You see, I absolutely believe that Algebra can be both interesting and useful. Unfortunately, the way it is generally taught, it is neither.
I was reading a book this week, Mathematics for the Intelligent Non-mathematician. If it was a person, this book would be your grandmother, not terribly exciting but pleasant to spend time with and if you paid attention you were likely to learn something.
Since I use mathematics for my living, you might reasonably wonder why I would be reading this book. The answer is that I believe in considering different perspectives. I’ve never really quite “got” the whole humanities thing. When I took history in school, I was secretly thinking, “They’re all dead. Get over it.” In English class, I was the kind that made teachers throw up their hands in despair. They wanted me to discuss, “The deep meaning of Moby Dick, what do you think it is really about?”
What did I think it was really about. I thought it was about a big white whale, for crying out loud, because it said that on the first page and about seven hundred more times throughout the book. The title? That’s the name of the whale, hello? Apparently, that was not the correct answer and you are supposed to say that it is a metaphor for the universal struggle of man against the sea, or man against himself or for man’s domination of marmots.
As you might guess, the second I had the opportunity for classes in college like Accounting, Calculus and Statistics where the questions had actual answers, like 42, I jumped at the chance. This isn’t to say that I made A’s in all of those classes initially, as that would have interfered with my plan of going to parties at night and sleeping through the morning. This plan was ended through a talk with the dean and some threatening words about losing my scholarship and having to find $20,000 under a mattress. Heck, I didn’t even own a mattress, much less $20,000 to find under it.
So, here I am thirty years after graduation looking at mathematics from a more naive point of view, which brought out a couple of points I had never really given much thought.
The first is that mathematics is the most general thing in the world. You cannot apply psychology to rocks or biology to building a space shuttle or oceanography to orthopedic surgery. However, as the author said, you can count devils or angels, whales or stars. In fact, when I went from being an industrial engineer to studying for my Ph.D. in Educational Psychology I used the exact same equations I had applied to predict which cruise missile would fail testing before launch to predict which child with a disability would die within the next five years. (Yeah, I wasn’t a lot of fun at parties back then.)
The second interesting point was one that is obvious after someone else states it, i.e., some ideas in mathematics are more important than others. For example, it is a fact that the digits in multiples of nine always add up to nine, e.g., 2x 9 = 18 and 1+8 = 9. This is not a key fact on which a lot of mathematics is based. So, this led me to thinking about the ideas in mathematics that I think are crucial and wondering about what other people think.
I always thought that the basic properties of real numbers, such as the distributive property -
A x B = B x A or A+ B = B +A was one of the most fundamental ideas in mathematics.
A second really important idea was the associative property, -
A(B+ C) = AB + AC
and the commutative property is a third
(4A + 2B) + 11C = 4A + (2B + 11C)
Once a student understands these properties, it opens up an enormous number of problems that he or she can now solve.
And that is why I like teaching Algebra.
Silence is one of the most under-used teaching techniques. As Julia learns mathematics, I notice major differences in the way my husband and I respond to her. After I ask her a question, I wait for an answer. The period at the end of that sentence is deliberate. I don’t do anything else. I don’t give her any prompts or hints. If she whines that she can’t get it, I tell her to keep thinking about it. If she comes up with the wrong answer, I tell her that it’s wrong and she should try again. Almost always, she can find the mistake she made.
Dennis, like most people, will try to help her if she doesn’t answer right away, by giving her a hint. Often that makes it more difficult to solve the problem because she now has the original problem to solve plus trying to figure out how the hint relates to it, not an easy task for a fourth-grader. Alternatively, he will give her the answer and then tell her to try the next problem, which is always just like the previous problem, that being the way math textbooks in America are structured. Since she could not figure out the previous problem, she is not going to get this one, either.
Dennis has degrees in Mathematics and Physics from UCLA. He was an excellent student in math and he acts the way his teachers acted in school. Paradoxically, this is not the way he learned mathematics. He had taught himself Calculus by the eighth grade from books he checked out of the public library.
My three recommendations for anyone who wants to be a better math teacher.
- Give students fewer problems.
- Give them the time to solve those problems on their own.
- Be quiet and let them do it.
Sites I liked today on teaching Algebra
Purple Math - I especially liked their “how do I really do this stuff” lessons. Readable and easy to understand. Also recommended for adults who knew they once knew, e.g. what a negative exponent was. Those of you who have not had a math class in years can peruse this site for lots of those moments when you smack your forehead and say, “Oh,yeah, THAT’S what that is.”
Teaching College Math Technology Blog – offers thoughts on demonstrations, learning activities and the use of technology.
The Wolfram Demonstrations Project is way cool - I say this being full aware of the fact that if there is such a thing as a visual learner, I am not it. You can download the Mathematica player for free and run anyone of their demonstrations. Be aware that even with high-speed access the player takes a long time to download. Be patient.
When I look at the wealth of resources, from the straightforward, readable pages on Purple Math to the high-tech demonstrations of the Wolfram Project, it is hard to believe that every math class in this country is not an amazing place to learn. One reason why is that after teachers have finished teaching, tutoring students after school, grading papers and preparing for the next day’s lesson, they just don’t have time. In the summer, far too many are painting houses, teaching summer school or other second jobs just to make ends meet.
I really do think one solution for teachers, just like for Julia, is providing time and silence. If we paid our teachers for those two months in the summer to come in and work on making their mathematics classes better, I wonder how our schools would change for the better.
Yes, this is a little known fact, but I discovered quadratic equations, back in the 1970s when I was in high school. Well, no, I was not the first person to discover them, but it happened like this…
My math teacher was a conscientious objector (the Vietnam War was going on) and his alternative service was to teach mathematics at our high school, which gives you some idea of the type of class we were, that his only worse option was to go to foreign country on the other side of the world and get shot at. One day, he walked into class, drew a picture of Mickey Mouse on the board, said:
“Figure out the equations to tell a computer how to draw this. I am going to the teachers’ lounge to have a cup of coffee.”
And he left.
Since we (like you) did not get credit for the course unless we actually finished the work, we were all pretty motivated to get the answer. Plus (don’t you dare laugh) none of us had ever actually gotten to touch a computer. You see, back then, computers were these hugely expensive things that took up an entire room. So, the idea that we could actually write instructions to one was kind of cool.
The picture, or something pretty close to it, is reproduced right here. I solved it using probably a lot more quadratic equations than I should have because my picture, or the one the computer printed, actually, did not come out looking so much like this, but it was recognizable as Mickey Mouse.
Sam has already taken a shot at it and published a comment on The Julia Group forum. You can take a look at it, add to what she said, or just start on your own without anyone else’s opinion.
I tried to find a podcast or video for this lesson but all I found were the same old boring-as-watching-paint-dry things with a person talking in a voice like the teacher from the old Charlie Brown videos and slow Powerpoint presentations with equations against a blue background. The use of color was the only thing that let me know it wasn’t done in the 1920′s. If you find a decent video or mp3 file, please let me know!