# How to compute a standard deviation and control chart when you don’t have raw data

Filed Under Software, statistics, Technology | 1 Comment

It ought to be easier than this and perhaps I could have found an easier way if I had more patience than the average ant or very young infant. However, I don’t.
Here was the problem. I wanted control charts for two different variables, satisfaction with care, surveyed at discharge, and satisfaction with care 3 months after discharge.
The data was given in the form of the number of patients out of a sample of 500 who reported being unsatisfied. PROC SHEWHART does not have a WEIGHT statement. You could try using the WEIGHT statement in PROC MEANS but that won’t work. It will give you the correct means if you have the number unsatisfied (undisc = 1)  and the number satisfied (undisc =0) out of 500, but the incorrect standard deviation because the N will be 2, according to SAS.
So, here is what I did and it was not elegant but it did work.
1. I created two data sets, named q4disc and q4disc3, keeping the month of discharge and the number dissatisfied at discharge and dissatisfied 3 months later, respectively.
2. I read in the 3 values I was given, month of sample, number unsatisfied at discharge and number unsatisfied 3 months later.
3. Now, I am going to create a data set of raw data based on the numbers I have. First, in a do loop, for as many as people said they were unsatisfied, I set the value of undisc (unsatisfied at discharge) to to 1 and output a record to the q4disc dataset.
4. Next, in a do loop for 500- the number dissatisfied, I set undisc = 0 and output a record to the same dataset.
5. Now, repeat steps 3 & 4 to create a data set of the values of people unhappy 3 months after discharge.
6. Following the programming statements are the original data.

So, now, I have created two data sets of 6,000 records each with three variables. Doesn’t seem that efficient of a way to do it but now I have the data I need and it didn’t take long and doesn’t take up much space.

data q4disc (keep = undisc month) q4disc3 (keep = undisc3 month) ;
input month \$ discunwt disc3unwt ;
Do I = 1 to discunwt ;
undisc = 1 ;
output q4disc ;
end ;
Do J = 1 to (500-discunwt) ;
undisc = 0 ;
output q4disc;
end ;
Do k = 1 to disc3unwt ;
undisc3 = 0 ;
output q4disc3 ;
End ;
Do x = 1 to (500 -disc3unwt) ;
undisc3 = 1 ;
output q4disc3;
end;
datalines ;
JAN 24 17
FEB 44 24
MAR 36 15
APR 18 8
MAY 16 11
JUN 19 7
JUL 17 11
AUG 18 9
SEP 27 10
OCT 26 15
NOV 29 12
DEC 26 11
;
RUN ;
proc shewhart data=WORK.Q4disc;
xschart undisc * month /;
run;
According to SAS

“The XSCHART statement creates and charts for subgroup means and standard deviations, which are used to analyze the central tendency and variability of a process.”

For the three months after discharge variable, just do another PROC SHEWHART with q4disc3 as the dataset and undisc3 as the measurement variable.

OR , once you have the dataset created, you can get the chart using SAS Studio by selecting the CONTROL CHARTS task

Either way will give you this result:

Support my day job AND get smarter. Buy Fish Lake for Mac or Windows. Brush up on math skills and canoe the rapids.

For random advice from me and my lovely children, subscribe to our youtube channel 7GenGames TV

# Practical Advice to American Airlines on How Not to Suck

Let me begin by acknowledging that anyone who can afford to fly to Trinidad and Tobago to learn more about the culture and ends up pitching a reality show is in the extreme of privileged and fortunate people on this planet.

I also realize that anyone who travels as much as I do is bound to have some bad experiences as the law of averages catches up with her.

This is not one of those rants about how Airline X sucks and I will never fly them again. First of all, I fly American a lot and they are usually quite good. Secondly, you are usually lying when you say that because in many cases, you have no choice. If I’m flying into Devils Lake, North Dakota, there is one airline and that’s it.

HOWEVER, since we are negotiating for said reality show to be filmed in Tobago, it’s likely that my family, my staff and will be flying into Port of Spain on American Airlines a lot in the near future, I want to offer them some specific advice for how not to suck.

1. Don’t lie. I don’t know when it became your corporate policy to lie, but I highly recommend you stop it. I even suspect lying is making you some money in the short-term. Stop it anyway. Two different American Airlines employees told people they would have to sleep on cots set up in an auditorium because there were no hotel rooms. They gave excuses like there were a lot of festivities in the city. They said they had been trying all day and there were just no hotel rooms. Several people in front of me went off to the auditorium unhappily. I said, “I don’t believe it. There is not one hotel room in the city of Miami? There is no fucking way I am sleeping on a cot. I don’t believe that it is legal for an airline to have your flight delayed by hours and then tell you to sleep on the floor with no compensation.” After I pushed the issue for quite a while, someone finally admitted that, yes, I could get a hotel, pay for it myself and get reimbursed through customer service. After I questioned repeatedly the truth of every hotel room in Miami being booked, they finally admitted that no, it was only the hotels with which American had contracts that were booked. It took me 30 seconds to find a nice hotel 4 miles away using the Travelocity app on my phone. Immediately, 2 other people in line did the same thing. Now, I’m sure that it saved American a lot of money that people slept on cots instead of nice hotel rooms that were not deeply discounted to American. Still, don’t lie to people. That’s bad.
2. Don’t waste people’s time. I don’t know at what point American ran out of those discounted hotel rooms but there were a lot of people in front of me and behind me getting the same story. We waited in line for over an hour. If they write slow, it would have taken 145 seconds for an American Airlines employee with a marker and a piece of cardboard to put up a sign saying: Out of rooms at contract hotels. Your choices are a) Wait in line for food vouchers and sleep on a cot in the auditorium or b) Get your own room and mail in receipts for reimbursement. Hell, they could have made an announcement. Airlines announce something every 3.3 seconds anyway. All the people who wanted to get their own room could have left at that point and not wasted their time. The people who did want to go for the cot could have saved waiting behind all those other people.
3. Take responsibility. It was 100% the airlines fault that we missed our flights. The flight from Miami to Port of Spain was hours late, so the flight back was hours late. Why did it become MY responsibility to find a hotel room, pay for the hotel room and write American Airlines to get paid back? Why did I have to wait in line for 2 hours in Trinidad to get my ticket re-booked and another hour in Miami to be told to sleep on a cot? Call in extra employees to work. YOU fucked up. Why should your customers who pay you have to wait in line for a total of 3-4 hours? That’s not right. Call in more staff to handle customers. Figure out how to pay for people’s hotels. Give out visa gift cards. You can buy them at fucking Wal-Mart, for God’s sake. As the gentleman in front of me said about himself, “I have money and I’m going to go stay at an expensive hotel, no thanks to you people. What about all of these other people who can barely afford to travel – you have older people, families with small children – they shouldn’t be sleeping in a gym. That’s not right.”
4. Don’t get self-righteous when your passengers are angry because you have lied to them, wasted their time and failed to take responsibility. When the American employee told me we would have to sleep on a cot in an auditorium, I told him, “There is no fucking way I’m sleeping in an auditorium. Are you fucking kidding me? I KNOW that the airline can’t have passengers delayed overnight and just say that’s too bad and I fucking GUARANTEE you that there are hotel rooms in Miami.” He told me not to swear and he threatened call security on the gentleman in front of me. If your job is to deny responsibility, waste people’s time and lie to them, don’t be surprised when they get mad at you.

On the other hand Travelocity and your app, you rock. Hotel Colonnade in Coral Gables, we will be back for more margaritas and we loved the family loft.

And American, you can do better. I’m counting on you. We have a lot of travel coming up for that reality show and me launching myself at the next lying bastard would make good reality TV but probably not look to good on my permanent record.

Oh, and by the way, I was supposed to get in last night and I’m still sitting on the plane waiting for a gate at LAX.

# Charts with CDC Data- A step by step example

Perhaps you have watched the Socrata videos on how to do data visualization with government data sets and it is still not working for you. Here is a step by step example of answering a simple question.

### Is the prevalence of alcohol use among youth higher in rural states than urban ones?

You can click on a link below to go directly to that step.

Second, I selected Chronic Disease Indicators for my health area of interest.

Third, I selected ALCOHOL – which brought me to the screen showing all the columns of data and a bunch of choices.

Fourth, I clicked  on FILTER on the right of the screen and then select a column to filter by.

Chrome did not give me a scroll bar so the furthest option I could get was Topic. I switched over to Firefox and was able to get this menu where I selected Question and Alcohol use among youth. You have to type in the value that you want. Make sure it is spelled exactly the same as in the data set.

Fifth, since I wanted to compare urban and rural states, I clicked Add a New Filter Condition and then selected California, New York, North Dakota and Wyoming with LocationDesc as the filter condition. Make sure the box next to each condition you want is clicked on.

Sixth, I looked at my data, saw there was no data for California and I was sad. Not every state participates in every data set.

7. So, I decided to compare urban, eastern states wth rural midwest/west and I selected New York, New Jersey, Massachusetts, Wyoming, North Dakota and Montana  All had data so I was good to go.

In case you were wondering,  I based my choice on the listing of states by urbanization , New Jersey is #2, MA 5 and NY 13
On the other extreme, Wyoming is 39, North Dakota is 42,  and Montana 47 so I thought this was a pretty good split.

8. I clicked on visualize on the right, selected Column as the type of chart, Location Desc as the label data, DataValueAlt as the data value, and there was my chart

Note: I could not select DataValue. My guess is that was a string variable. I had to select DataValuealt, which was the exact same value

9.Just to make it more obvious, I went in and sorted on data value, which caused the chart to be recreated automatically.

You can see below the chart it created. It’s pretty clear that in these data there is no relationship between urbanization and alcohol use among youth.

New York and New Jersey where the lowest and highest prevalence, respectively. I was hoping to see a pattern with more rural states higher, but it seemed to be pretty unrelated.

Perhaps you would prefer to download the data set for import into some other tool, say, Excel or SAS. The first three steps are the same, into you find the data set you want.

This next step is not required , but the data sets can be pretty big, so I’d suggest filtering on at least one major variable first. For example, you can click the three rows next to any column, say, Question, and then select the question that interests you, say Binge Drinking.

Next, click the EXPORT button at the top right of the screen. Select the format in which you want your file to be downloaded. That’s it!

Have kids? Know anyone who has kids? Like kids? Own a computer? Fish Lake will teach fractions and Native American history, with no whining and all for under ten bucks.

# Excel for regression analysis: What a surprise!

I wouldn’t normally consider Excel for analysis, but there are four reasons I’ll be using it sometimes for the next class I’m teaching. First of all, we start out with some pretty basic statistics, I’m not even sure I’d call them statistics, and Excel is good for that kind of stuff. Second, Excel now has data analysis tools available for the Mac – years ago, that was not the case. Since my students may have Mac or Windows, I need something that works on both.  Third, many of the assignments in the course I will be teaching use small data sets – and this is real life. If you are at a clinic, you don’t have 300,000,000 records.Four, the number of functions and ease of use of functions in Excel has increased over the years.

For example,

TRANSPOSE AN ARRAY IN EXCEL

Select all of the data you want and select COPY

Click on the cell where you want the data copied and select PASTE SPECIAL from the edit menu. Click the bottom right button next to TRANSPOSE and click OK. Voila. Data transposed.

PERFORMING A REGRESSION ANALYSIS

Once you have your data in columns (and if it isn’t, see TRANSPOSE above), you just need to

1. Add the Analysis Pack. You only need to do this once and it should be available with Excel forever more.  To do that, go to TOOLS and select EXCEL ADD-INS. Then click the box next to Analysis ToolPak and click OK.
2.  Now, go to TOOLS, select DATA ANALYSIS and then pick REGRESSION ANALYSIS

You just need to select the range for the Y variables, probably one column, select the range for the X variables, probably a column adjacent to it, and click OK. You may also select confidence limits, fit plots, residuals and more.

So, yeah, for simple analyses, Excel can be super-simple.

Believe it or not, this is what I do for fun. In my day job, I make video games that teach math and social studies.

You can check out the games we make here.