For the first time in two years, an application came in my email for a technical position from a person under 30 who was an American citizen. This isn’t because I don’t look for people. I have talked to lots of young people I know who are pretty good with computers and asked if they would be interested in learning about statistical software. We would train them. Nope. They want to go to law school (lots of them), get an MBA (lots of them) with the odd few who want to be teachers, journalists or artists.
Last night, I was reading a data mining book that had NO equations and I had one of those mental stumbling blocks, you know, like when you can’t remember the name of your youngest child? Well, that happens to ME all the time, anyway. I doubt it is due to all the drugs in college because I’ve always had that problem. [Not that I ever personally did any drugs, of course. I am referring to second-hand smoke.]
Just out of the blue for no reason I was not 100% sure of the definition of an inverse of a matrix. So I asked my husband,
“Hey, the inverse of a matrix is the matrix you multiply it by to get the identity matrix, right?”
“Yes, but sometimes there is no matrix you can multiply by to get the identity matrix. Then the inverse is undefined. That usually doesn’t happen unless your variables are correlated.”
I guess he added the part after “Yes”, just in case a whole section of my memory had been wiped out. Of course the whole problem with multicollinearity in regression is obvious if you know this because you cannot invert a matrix so you cannot solve the normal equations to get your coefficients.
I sat in a graduate course today taught by a very knowledgeable professor, surrounded by graduate students at a selective university in a course they paid a lot of money to take. Several times, he said something like this:
“What is regression? You have some X’s and there is a black box and then you get a predicted Y.”
I am looking at his drawing on the board and thinking to myself, no, it is not a black box. When I looked at his black box, this is what I saw:
And I thought
A. You take the X matrix and transpose it. You know you need to transpose it because you can only multiply a matrix if the number of rows in one matrix equals the number of columns in the other. You multiply that (the transposed matrix) by X (the original matrix).
B. You then take the inverse of the result from step A.
C. Then you multiply the inverse of the product of the transposed X matrix and the original X matrix by the transpose of X.
D. You multiply that by the Y vector
and that gives you the vector of regression coefficients.
Here is a really good explanation of least squares estimates in matrix notation, by the way. Thanks to Pennsylvania State University.
I do not blame the professor at all for not saying any of that because he has two problems with this course, neither of which have ANYTHING to do with his competence as a professor or of the ability of the students. I know because I have experienced this problem growing and growing over the past 25 years.
1. We are cramming a ludicrous amount into courses with names like “research methods” or “data mining” or “statistics”. The poor soul teaching this course must cover data mining, data warehousing and business analytics in one course. That is impossible. Because students are often working full-time while going to graduate school and because schools have gotten more and more expensive, there is a lot of pressure to cut the number of courses. So, what used to be three courses is now one. When I learned multiple regression, it was a course all by itself. The normal equations, above, are not basic but not incredibly difficult, either. Certainly the vast majority of graduate students could learn to transpose a matrix and multiply the result. When I was in graduate school we had the luxury of spending an entire three-hour class just going over these equations and even some of the next week’s class for students who had questions. When we put too much into a course it is impossible to cover ANY of it in-depth. I have seen the same problem in my children’s math textbooks from fifth-grade on up. We wised up with the youngest one and now spend time at home making sure she understands not just the definitions and rules of, say plane geometry, but also how she can apply those. We fool ourselves by saying we are rigorous by cramming 42 topics into one textbook but all that happens is that people learn a little bit about a lot of things and a lot about nothing. I’m not joking here, I think this is why so many people want to go into management and “See the big picture” and will tell you, “I’m not a detail person”. Writing code that runs – that takes details, something as simple as ending a statement with a semi-colon, with knowing the difference in SPSS between rules for batch processing versus interactive. Details matter.
2. Again, because people want to “get out and get it over with” we are requiring fewer and fewer in terms of prerequisites. Many colleges no longer require any mathematics beyond algebra – if that! As I said before, I think College Algebra is an oxymoron. You should have learned algebra in high school. Certainly, many students never learned matrix algebra. When I was in graduate school, the professor could write equations in matrix notation because we were supposed to have learned it as undergraduates and the majority of us did. There was an entire course in descriptive statistics and if you didn’t have it as an undergraduate, guess what, you had to take it. And if it meant that you didn’t finish your graduate degree as soon as you would have liked, oh well. If you hadn’t learned it somehow, there was a teaching assistant and you went to him or her to help you understand the class.
So …. we don’t give our students the prerequisites at the lower level, at the upper level we cram three times as much in a course as they could really hope to comprehend in that short of a time. In the end, they don’t know very much about math and they are convinced that they aren’t any good at it because they don’t have the talent and math is hard. The truth is that math isn’t all that hard, it just takes time, like anything else, and we have no idea if they could be good at if we gave them the time and really tried to teach it to them, starting with,
“The identity matrix has all ones in the diagonal and zeroes in the off-diagonal elements.”
Here is my modest proposal to fix all of this:
1. Have LESS material taught in each math class, that is, fewer topics.
2. Require MORE classes of students
3. Do NOT let students waive or skip prerequisites unless they test out of them. (Do let students test out of classes, by the way. I always encourage that.)
4. Don’t write the mathematics out of courses. Leave it in. If you do #1 -3, students WILL understand it.