In my copious spare time, of which I have none, I occasionally get the urge to actually read technical books from beginning to end.

I think my life took the path of most grown-ups in my field. You get a degree, or two or three or four. Perhaps during the course of that, but certainly at the end, you get what my mother refers to as “a real job”, which is a job outside of a university. In the course of this real job, they require you to do stuff – produce reports, answer questions, write research designs – whatever your real job happens to be. In producing these reports, answering questions and so on, you read PARTS OF the manual. The operative phrase in this sentence being “parts of”. You read the part that tells you how to obtain a Wald statistic using Stata – but you skip the part on what a Wald statistic actually is because you have a meeting at 2:30. You read the article on odds ratios in logistic regression but you skip the part on parallel processing for maximum likelihood methods because you have a report due tomorrow.

So, maybe you have been just skipping over very useful features in software and not having the time to notice. I am sure I must have mentioned this book before,
Programming and Data Management for IBM SPSS Statistics 18: A Guide for IBM SPSS Statistics and SASĀ© Users. It is very well-written and very free. One of the smartest things SPSS has done is make a ton of its documentation available for free, based, I think, on the reasonable notion that the better people can use its software the more likely they are to buy it. Also, as far as the title, it should be noted that 90% of the book is how to use SPSS and the other 10% is how to use SPSS if you know SAS pretty well. I’ve actually found that section extremely useful.

Anyway, as for aggregrate, which you might think I was going to discuss because that is in the title. Aggregate is an incredibly cool feature in SPSS that you may not have ever noticed. My friend works in an Emergency Room in a large city. She is quite concerned that some people are using the ER for primary care or even just for attention. One evening she said to a patient:

“You have a serious problem because I KNOW YOUR NAME! Do you know what the definition of the word ‘emergency’ is? No one should be in the emergency room so often that they and the staff are on a first name basis !”

Let’s say you work in this ER. You have a database with client records and most clients come once, some of them come more than once. You’d like to attach a variable to each client that is “Number of Visits”. You could then do all kinds of analyses, say, pulling out all the patients with 10 or more visits this year and seeing how many visits that represents. Or, you might want to know how much total time these chronic emergencies take up.

Here is what you do:

Go to the DATA menu and select AGGREGATE

For BREAK VARIABLE select “ClientID” or whatever your variable is named.
Check the button next to NUMBER OF CASES. The default name is N_BREAK but I changed it to “Visits” because that was a lot more obvious.
Check the button next to “ADD AGGREGATED VARIABLES TO ACTIVE DATASET” .
Click OK.

Now I want to know how many total visits were from “chronic emergencies” and how many total minutes they took up in my ER. First, I select out these folks by

From the DATA menu choose
SELECT CASES
Click the button next to IF CONDITION IS SATISFIED
In the pop-up window, enter Visits > 9
Click Continue
Click OK

Go to ANALYZE
Then DESCRIPTIVES
Then select DESCRIPTIVE STATISTICS
Move Length of Visit under Variables
Click on the OPTIONS button
Click the button next to SUM
Click CONTINUE
Click OK

If you prefer syntax to pointing and clicking, here you go:

AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=Client
/Visits=N.
COMPUTE filter_$=(Visits > 9).
VARIABLE LABEL filter_$ 'Visits > 9 (FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMAT filter_$ (f1.0).
FILTER BY filter_$.
DESCRIPTIVES VARIABLES=Length_Of_Contact
/STATISTICS=MEAN SUM STDDEV MIN MAX.

There are plenty of other ways you could aggregate your data and add the counter to each record but this way is just so simple it is worth remembering.

TRUE CONFESSION: I hadn’t used aggregate in well over a year. Someone asked me how to do this and I was thinking of the LAG function and Proc summary using SAS and there was this dim memory that there was some other way to do it. So, I just started reading the data management book from page one. A lot of it I skipped over, of course. The second chapter, on programming tips and best practices is either new or I had skipped it when I read the book originally. It was good enough to warrant mentioning randomly, which I just did. Anyway, some time after the second chapter I came across the mention of aggregate and it all came back to me.

I may have told this story before and forgotten it, so I am telling it again which is kind of the point. A few years ago, I was writing a proposal on increasing parental involvement in special education. I said to my incredibly helpful research assistant that someone must have published articles on this, I mean, it isn’t exactly an esoteric topic, so please run through a few databases of scientific articles and bring me the references. She came in less than an hour later, laughing, with a list of articles. She said,

“Yes, you’re right. Someone had done research on this. Four of the first twelve references to pop up were by someone named Rousey.”

This is funny because that was my name before I remarried and not only had someone done research on it but that someone was me! Come on, a couple of them were from 1990. Can you remember what YOU were doing in 1990? If you’re the age of a lot of people I meet at conferences and my helpful young research assistants you were probably drinking juice from a box in Ms. Campbell’s kindergarten class.

So, now you know one of the main reasons I write this blog. I’ll vaguely recall something about crontab or aggregate or a geometry column in Proc GMAP and remember that I used that a year or two ago. If I write a post about it, maybe it will be helpful to someone else, and, in 2012, when I need to do the same thing again, I can search my blog and well, what do you know!

For the first time in two years, an application came in my email for a technical position from a person under 30 who was an American citizen. This isn’t because I don’t look for people. I have talked to lots of young people I know who are pretty good with computers and asked if they would be interested in learning about statistical software. We would train them. Nope. They want to go to law school (lots of them), get an MBA (lots of them) with the odd few who want to be teachers, journalists or artists.

Last night, I was reading a data mining book that had NO equations and I had one of those mental stumbling blocks, you know, like when you can’t remember the name of your youngest child? Well, that happens to ME all the time, anyway. I doubt it is due to all the drugs in college because I’ve always had that problem. [Not that I ever personally did any drugs, of course. I am referring to second-hand smoke.]

Just out of the blue for no reason I was not 100% sure of the definition of an inverse of a matrix. So I asked my husband,

Hey, the inverse of a matrix is the matrix you multiply it by to get the identity matrix, right?”

He answered,

“Yes, but sometimes there is no matrix you can multiply by to get the identity matrix. Then the inverse is undefined. That usually doesn’t happen unless your variables are correlated.”

I guess he added the part after “Yes”, just in case a whole section of my memory had been wiped out. Of course the whole problem with multicollinearity in regression is obvious if you know this because you cannot invert a matrix so you cannot solve the normal equations to get your coefficients.

I sat in a graduate course today taught by a very knowledgeable professor, surrounded by graduate students at a selective university in a course they paid a lot of money to take. Several times, he said something like this:

“What is regression? You have some X’s and there is a black box and then you get a predicted Y.”

I am looking at his drawing on the board and thinking to myself, no, it is not a black box. When I looked at his black box, this is what I saw:

normalequations

And I thought

A. You take the X matrix and transpose it. You know you need to transpose it because you can only multiply a matrix if the number of rows in one matrix equals the number of columns in the other. You multiply that (the transposed matrix) by X (the original matrix).

B. You then take the inverse of the result from step A.

C. Then you multiply the inverse of the product of the transposed X matrix and the original X matrix by the transpose of X.
D. You multiply that by the Y vector

and that gives you the vector of regression coefficients.

Here is a really good explanation of least squares estimates in matrix notation
, by the way. Thanks to Pennsylvania State University.

I do not blame the professor at all for not saying any of that because he has two problems with this course, neither of which have ANYTHING to do with his competence as a professor or of the ability of the students. I know because I have experienced this problem growing and growing over the past 25 years.

1. We are cramming a ludicrous amount into courses with names like “research methods” or “data mining” or “statistics”. The poor soul teaching this course must cover data mining, data warehousing and business analytics in one course. That is impossible. Because students are often working full-time while going to graduate school and because schools have gotten more and more expensive, there is a lot of pressure to cut the number of courses. So, what used to be three courses is now one. When I learned multiple regression, it was a course all by itself. The normal equations, above, are not basic but not incredibly difficult, either. Certainly the vast majority of graduate students could learn to transpose a matrix and multiply the result. When I was in graduate school we had the luxury of spending an entire three-hour class just going over these equations and even some of the next week’s class for students who had questions. When we put too much into a course it is impossible to cover ANY of it in-depth. I have seen the same problem in my children’s math textbooks from fifth-grade on up. We wised up with the youngest one and now spend time at home making sure she understands not just the definitions and rules of, say plane geometry, but also how she can apply those. We fool ourselves by saying we are rigorous by cramming 42 topics into one textbook but all that happens is that people learn a little bit about a lot of things and a lot about nothing. I’m not joking here, I think this is why so many people want to go into management and “See the big picture” and will tell you, “I’m not a detail person”. Writing code that runs – that takes details, something as simple as ending a statement with a semi-colon, with knowing the difference in SPSS between rules for batch processing versus interactive. Details matter.

2. Again, because people want to “get out and get it over with” we are requiring fewer and fewer in terms of prerequisites. Many colleges no longer require any mathematics beyond algebra – if that! As I said before, I think College Algebra is an oxymoron. You should have learned algebra in high school. Certainly, many students never learned matrix algebra. When I was in graduate school, the professor could write equations in matrix notation because we were supposed to have learned it as undergraduates and the majority of us did. There was an entire course in descriptive statistics and if you didn’t have it as an undergraduate, guess what, you had to take it. And if it meant that you didn’t finish your graduate degree as soon as you would have liked, oh well. If you hadn’t learned it somehow, there was a teaching assistant and you went to him or her to help you understand the class.

So …. we don’t give our students the prerequisites at the lower level, at the upper level we cram three times as much in a course as they could really hope to comprehend in that short of a time. In the end, they don’t know very much about math and they are convinced that they aren’t any good at it because they don’t have the talent and math is hard. The truth is that math isn’t all that hard, it just takes time, like anything else, and we have no idea if they could be good at if we gave them the time and really tried to teach it to them, starting with,

“The identity matrix has all ones in the diagonal and zeroes in the off-diagonal elements.”

Here is my modest proposal to fix all of this:
1. Have LESS material taught in each math class, that is, fewer topics.
2. Require MORE classes of students
3. Do NOT let students waive or skip prerequisites unless they test out of them. (Do let students test out of classes, by the way. I always encourage that.)
4. Don’t write the mathematics out of courses. Leave it in. If you do #1 -3, students WILL understand it.

I try very hard not to laugh at a student in class, no matter what he or she says. One day, I was talking about the Kinsey Report finding that, by their mid-twenties, one out of every four or five males has engaged in a homosexual experience to the point of orgasm. I pointed out that, given there were five males in the class in that exact age group that odds are that one of them had a homosexual experience. One student’s hand shot up, a starter on the college football team.

“Yes, you have a question?”

In the exact tone five-year-olds use when playing tag, he called out,

“I just wanted to say – NOT IT!”

Okay, I admit it, I laughed.

I cannot possibly be the only person annoyed by articles such as the one in the LA Times this week talking about the professional women who want to give up their jobs and be a “wingspouse”. WTF?

That term seems synonymous with rich housewife. To quote the article this is the spouse who doesn’t deliver the message but who watches the room to see how the message is received. Lovely. If I have a choice, I want to be the person who comes up with the ideas and communicates them, not the one who sees if other people like them.

Also, money is good. You know those studies that show how the average housewife does $100,000+ in work each year? (Here is a funny blog about that, by the way from a woman who calculated how much her dog was worth. And an even funnier one from a blog called ninepounddictator in response to an assertion that the author doesn’t spend time with her child because she has a nanny. Guess I’m not the only one questioning those numbers.) I’ve always tried to reconcile that with the $12 an hour I paid for a wonderful nanny. Even with the social security and other taxes it’s not within shouting distance of a professional salary. Having been left a single parent twice once through divorce and once widowed I am very happy for me AND my kids that I have my own salary, insurance and 401 k.

What about those women who say it’s difficult to work, be married and care for children? Well, to quote one of my adult daughters

Being a grown-up is hard.

This isn’t to say I haven’t made sacrifices in my career for my family. I think that is part of being a good mother AND a good father . I once attended a retirement party where a man was praised for never missing a day in 30 years and I thought

Is it really possible that there was not one day he needed to be with his family?

Still, I found it amusing that one of the bloggers cited in the article telling women how to be good wives is actually single, works full time and was raised by a single, working mother.

I am writing this on my iPhone watching my youngest daughter and her friends at the Santa Monica Pool. I love her and I like her friends. Seeing that she gets exercise and has good social experiences is important. She is a good kid and I feel lucky. Do I feel like this is a full life? Hell no! I am bored out of my mind and freezing my ass.

As soon as she is done I’m going to go home make her a hot lunch and sit down at my computer to write a program I have been thinking about to solve a client’s problem of analyzing an enormous quantity of data they have collected.

My husband is home working on his own program while doing the laundry. ( I mean hey you throw it in the washer & you have 45 minutes to work. )

No I’ve never made a table centerpiece in my life, our idea of a dinner party is to invite people we like to The Lobster on Santa Monica Pier at sunset and I rented all my kids’ Halloween costumes from Ursula’s Costumes in Venice.

Quite contrary to the LA Times article I intend to work MORE in the future, not less.

Recently, I have been approached about additional work that would require more hours and more travel. I asked my sixth grader her opinion and she replied

Well, that’s kind of what you do anyway isn’t it? I think you should do it if you want to.

Followed by

If you’re making more money can I get an iPhone?

She IS in 6th grade, after all.

Most interesting to me was this LA Times article cites a “Pew Research Center study from October that found only 37% of mothers working outside the home want to be working full-time.” So, two-thirds of working mothers DON’T want to be working full-time? The really interesting thing about this survey is that when I went to the Pew Research Center site and looked at surveys in all of 2009 (October and other months), there was no such survey listed. A search of the site on “working mothers” didn’t turn up this survey, either.

I am very disappointed in the LA Times, which I normally really like. This was the same paper that also this week had a headline saying children in day care engage in more risk-taking behavior. However, if you actually read their article the researchers stated that the results were really that teens who had been in day care for long hours as toddlers answered, on the average, yes to one more item on a 30-item questionnaire than teens who had not been in day care. There was no discussion of how valid this questionnaire was, whether it actually related to behavior (people do lie on questionnaires or interpret questions differently). There was also no discussion of actual items on which children who were in day care might differ. It is very likely that “Questioning authority figures” could be considered risk-taking or rebellious behavior. Personally, I’m okay if my children score higher on that. On the other hand, if the item was “Holding up liquor stores”, then I’m not so complacent.

Let’s assume there really was a study done by the Pew Research Center and let’s assume that they really did find that X% of working mothers didn’t want to work full time. Just let me add here –

NOT IT!
mydesk

On my way back from Tunisia via Paris I ended up in a redneck dive bar somewhere in Georgia reading the New York Times on my Kindle while the lady next to me asked the very drunk waitress if she knew who had won at NASCAR this weekend.

This sounds like the beginning of a joke, but it isn’t.

Yes, it is your fault, Delta Airlines and U.S. Border Patrol, if you’re listening, which I am sure you are not.

The first concern came when I looked at my ticket and saw that I had an hour and twenty minutes between arriving from Paris and leaving for Los Angeles. This did not go well. The passport control computer had some problem which resulted in hundreds of people being stuck waiting for an hour or more to get their passports checked. By the time we got through, we had all missed our flights and the same hundreds of people were sent to Delta Airlines which very unsympathetically said it was not their fault that people missed their flights because it was the federal computers and we all needed to pay for our own hotel rooms and fly out in the morning.

Why is it that we are eager to invest in mortgages, securities and more, with companies assuring us that they can predict the future well but then other companies, with a lot fewer unknowns, swear that they cannot predict problems and “It is not our fault”.

Being the good statistician I am, I started asking people in line who had missed their flights and were getting re-booked, on the shuttle to the hotel and at the hotel how long was the layover between flights. Every one of those people had a layover between one hour and ninety minutes. No one who had a two hour or longer layover was in the very large group of people who missed their flights.

You know how the airport tells you to show up two hours early for international flights? That’s a good idea. You should do that. I have never missed a flight when I came two hours early, although I have sometimes made the plane by just ten minutes or less.

If Delta had a rule in their computer system that did not allow passengers to have flights closer than two hours together when making an international connection, problems like those that happened today would occur far more rarely. They could even have a manual override on that so if you chose to cut it closer, on your head be it. The extraordinarily UNhelpful customer service person at Delta said to me,

“But it was not our fault. Why should Delta pay for your hotel room? If everything had gone right, if there hadn’ been any computer error, then all of these people would have made their flights.”

That is probably true, however, a system that allows for no margin of error, that assumes “everything will go right” is a bad system and it IS their fault. Of course, if Delta implemented my system, they would sell fewer flights. The problem currently is that people like me assumed Delta personnel knew what they were doing, especially given that Atlanta is their hub, and that if, even though the norm is two hours for international flights, they allowed an hour, they must have some knowledge we didn’t.

In fact, it appears that they were willing to accept the risk that hundreds of passengers would miss flights because hey, it didn’t cost them anything but a little aggravation.

I started out my career decades ago as an industrial engineer. Every industrial engineer knows there are two types of hours, standard hours and actual hours. A standard hour is how long it takes to make a widget. You do a time study and figure that it takes five minutes to weld it, an hour for it to be in the cooling area, and another 15 minutes to sand the edges. So, the part takes an hour and twenty minutes in standard hours. Only a complete moron would base their factory schedule or any other plans on that. You see, sometimes, the machine breaks. You run out of parts. The guy who is supposed to be doing the welding is in the bathroom for 10 minutes or out sick for the day. Often the actual hours it takes to get a part done, allowing for machine downtime, operator sick days, parts shortages, and other problems is about double the standard hours.

What about Passport Control? They were the second group that said,
“Hey, it’s not our fault, it’s the computer.”

That has got to be the biggest bullshit excuse on earth when used by anybody. I don’t mean it was the poor guy at the customs’ booth’s fault, but I definitely think if your computer doesn’t work it IS your organization’s fault. You chose to save money by not having a back-up system, but not having enough IT people on staff, by not paying your programmers well enough that you didn’t have a system in place to anticipate this.

If the problem is that there was a failure in accessing the passport database, you can’t tell me there isn’t a back up of that database made nightly. No one thought of having a system where you can switch to the back up?

I can’t say what the specific problem with the computer in U.S. Customs was, but I am skeptical that the problem was unforeseeable and unsolvable, just based on my own decades of experience with computer systems. I find it more likely that there was a decision made somewhere that weighed the costs of possible failure against the costs of back-ups and alternative systems. Since the costs of, e.g, lost revenue from flights or paying extra programmers would be borne by the company/ agency and the costs of staying overnight, missing flights, etc. are borne by consumers, there is an incentive to short-change on customer service.

Computers aren’t delivered by God and programmed by archangels. Organizations make choices in the programs they use, back-ups they purchase and people they hire. If you cut corners to the extent there is no allowance for error then, yes, it IS your fault.

Ronda and the camels

Ronda and the camels

Normally, walking along the beach in the morning with my daughter, I do not expect a random person to come up to us with the question,
“Would you like to ride my camel?”

However, I was not taken nearly as much by surprise as my daughter because I had been to this same beach twice already and although it was deserted both times and, I have to admit, generally much cleaner than the beach at home, there were a couple of piles of, well feces. While distinguishing types of animal feces is not a skill frequently called on in statisticians in Los Angeles, I was pretty certain these weren’t from any small domestic animals or drunken tourists.

I knew there had to be a producer of large feces around here somewhere

So, when the gentleman walked up leading a family of camels, that explained a lot.

After a few hours of camel-riding and sunbathing, I was bored, so I came back up to my hotel room to work on my second paper for WUSS. I already submitted a paper on statistics with Enterprise Guide but I wanted to write something on data visualization, just because, and I figured having a deadline would force me to make some progress.

Now, I knew this was around here somewhere …

Creating a bar graph in Enterprise Guide with bar height = means of a second variable

I usually use TASKS > GRAPH > BAR CHART to create a bar chart and I had yet to spot how to create a bar chart which shows the average of one variable for each value of a second variable. In this case, I wanted to see what is the average income for respondents based on the percentage of African-Americans in their neighborhood.

My original reason for using this was to create a bad example and show that you should NOT have 100 categories. As you will see, it did not work out as expected. In fact, it so did not work out as expected that I tried again with percent African-American residents rounded to to the nearest 10% because I wanted to look at these data again.

I was sure there had to be a way to create a bar chart by means, and when I had plenty of time to look for it, I found two. In the BAR CHART task when you select your column to chart, then under “sum of” select the variable for which you want the means. Next, click the ADVANCED option for the bar chart task. You’ll see an option for “Statistic used to calculate bar”. From the drop-down menu, select average.
[You can also use the bar chart wizard. In step 2, select a variable from the drop-down menu next to bar height. Then click on the sum symbol (the thing that looks like a deformed E) and a window will pop up that lets you select average as the statistic.]

So, I get the chart below and I know it is not supposed to be like that.

Average Income by Neighborhood Percentage African-American

Average Income by Neighborhood Percentage African-American

As can be seen from this graph, there is a curvilinear relationship between the percentage of African-American residents in a neighborhood and income (measured on a 1= < 30K year to 8 = > 250K scale).

While this may be true, I don’t think it is. My first thought is that there are probably a small number of respondents who came from neighborhoods that are 70-100% African-American because this was a random sample of around 1,100 people and there aren’t that many completely segregated neighborhoods in the country.

I take a look at a pie chart

Pie Chart of % African-American in Neighborhood

Pie Chart of % African-American in Neighborhood

and it confirms my suspicions – those bars to the right which are forming that curvilinear pattern are based on a very small sample. All of those bars from 40% on up, COMBINED comprise less than 7.5% of the total sample.

I have major commitments today – going to the beach, eating breakfast and watching my daughter at training camp, which is the reason we are here in Tunisia.

I am going to look at this more later. I actually did a lot more last night and that is the part that troubles me a bit.

I really looked into this because the results were unexpected. I KNOW I should always examine every aspect of the data carefully, but the truth is, I know that I do more testing, more exploration when the results are not what I expected to find. I wonder to what extent we all do this and how much that contributes to us confirming what we already expected to find, because when we do, we don’t keep looking for other explanations.

I had the pleasure of attending a lecture Rand Wilcox gave on the state of research. He was far more amusing than I expected from a statistician (perhaps this reflects low self-esteem on my part). He made the very valid point that all statisticians learn in the infancy of their careers that the general linear model makes certain assumptions, like normal distribution, measurement without error (give me a break!), homoscedasticity. In fact, there is a very well-written summary in an electronic journal in an article entitled Four assumptions of Multiple Regression that Researchers Should Always Test. (Being far less given to snarky comments than me, there was not a parenthetical addition, “But you never do, do you?”). That was one of Wilcox’s points, that SO often analyses are conducted by people who never test the simplest assumptions.

My favorite comment, though, was

“Anyone who thinks they know all of statistics is certifiably insane.”

This is becoming more and more true. I remember chugging along happily in graduate school doing two- and three-way ANOVAs. Then, all of a sudden, if you were going to do an ANOVA with say, ten schools and compare the impact of whole language versus phonics, you had to do a mixed model and specify curriculum type as a fixed effect and school as a random effect. If you did a regular two-way Analysis of Variance it was WRONG (beatings with bamboo sticks for you.) If you switched from a two-way fixed effects model in this case to a mixed model it was more correct. However, did your results turn out dramatically different? Well, actually, no. Slightly different.

Over the years, I have seen the number of statistical software procedures grow dramatically, from those written by Stata users to SPSS add-ons to whole categories of SAS procedures, e.g. Bayesian. What I have NOT seen is a practical increase in the usefulness of our predictions.

From terrorist attacks to volcano eruptions to financial market crises to mortgage prices to unemployment rates, our predictions are so-so in the short-term (as they often amount to no more than – pretty much like now since all predictors are pretty much like now) and not very helpful at all in the long-run and most effective when viewed in reverse. For example, Mashable tells me that if instead of paying $3,000 for a G4 Powerbook back in 2002 if I had invested it in Apple stock I would now have $94,000. Am I the only one who is thinking,

“This prediction would have been a lot more useful in 2002?”

Another thing I have NOT noticed is more understanding of statistics by the general public (unless the words “more” and “understanding” mean the exact opposite of what I think they mean). This commentary by Bill Maher uses the tea partiers as an example, but would apply to just about any group in America (and a whole lot of the rest of the world, too). Maher points out the complete impossibility of cutting taxes, maintaining services and reducing the deficit all at the same time. He notes that Americans want to cut spending rather than increase taxes but when asked what they want to cut spending on, their usual answer is “Nothing”.

Let’s talk about cutting federal spending; 14% of the budget goes to Medicare, 20% to Social Security and 7% to veterans and federal retirees – to be blunt, 41% of the budget is going to old people, of which we are getting more as the population as a whole ages. Another 6% is going to interest payments on our national debt, which we can’t exactly decide not to pay and another 20% is defense spending. So, now we are up to 67%, or two-thirds of the budget. (These statistics are from the Center on Budget and Policy Priorities. ) And yet, every time I turn on the radio or television, I am bombarded with commentators telling me that the problem is with government “pork”, welfare, that we need smaller government. And yet, again, those same people are not arguing that we should decrease social security, Medicare or defense spending. Just how much do they think government spending can be reduced by cutting the 2% that is spent on scientific and medical research? (Answer: At most, 2%. It wasn’t a trick question.)

As statisticians, we are getting better and better at impressing each other with how smart we are. Maybe we are even getting better at impressing the general public, when they think about us at all.

Many years ago, I decided that my role as a teacher was not to leave the class impressed with how much _I_ know but knowing, understanding more themselves.

I’m not sure we’ve made progress in that direction.