Day eight of the 20-day blogging challenge was to write about a professional read – a book, article or blog post that has had an impact on me. To be truthful, I would have to say that the SAS documentation has had a profound impact on me. SAS documentation is extremely well-written (to be fair, so is SPSS) in contrast to most operating system documentation which is written as if feces-flinging monkeys were somehow given words instead, which they flung onto a page which then became a manual. But I digress – more than usual. It’s not reasonable to suggest to someone reading the entire SAS documentation which is several thousand pages by now. Instead, I’d recommend Jennifer Waller’s paper on arrays and do-loops. This isn’t the paper where I first learned about arrays – that was before pdf files and I have met Jennifer Waller and she was probably barely in elementary school at the time. It’s a good paper though and if you are interested in arrays, you should check it out.

Here is what I did today, why and how. I wanted to score a dataset that had hundreds of student records. I had automatically received the raw score for each student, percent correct and what answer they gave for each multiple choice question. I wanted more than that. I wanted to know for each question whether or not they got it correct so that I could do some item analyses, test reliability and create subtests. This is a reasonable thing for a teacher to want to know – did my students do worse on the regression questions, say, than the ones on probability, or vice-versa?  Do the data back up that the topics I think are the hardest are the ones that my students really score worst on?  Of course, test reliability is something that would be useful to know and most teachers just assume but don’t actually assess. So, that’s what I did and why. Here is how.
filename sample “my-directory/data2013.csv”;
libname mydata “mydirectory” ;
data mydata.data2013 ;
infile sample firstobs = 2 dsd missover ;
input group_type $ idnum $ raw pct_correct qa qb qc q1- q70 ;

** These statements read in the raw data, which was an Excel file I had saved as csv file ;
** The first line was the header and I forgot to delete it so I used FIRSTOBS = 2 ;
*** That way, I started reading at the actual data. ;
*** The dsd specifies comma-delimited data. dlm=”,” would have worked equally well ;
*** Missover instructs it to leave any data missing if there are no values, rather than skipping to the next line ;

Data scored ;
set mydata.data2013 ;
array ans{70} q1- q70 ;
array correct{70} c1 – c70 ;
array scored{70} sc1 – sc70 ;

*** Here I created three arrays. One is the actual responses ;
*** The second array is the correct answer for each item ;
*** The third array is where I will put the scored right or wrong answers ;

if _N_ = 1 then do i = 1 to 70 ;
correct{i} = ans{i} ;
end ;

*** If it is the first record (the answer key) then c1 – c70 will be set to whatever the value for the correct answer is ;

else do i = 1 to 70 ;
if ans{i} = correct{i} then scored{i} = 1 ;
else scored{i} = 0 ;
end ;

**** If it is NOT the first record, then if the answer = the correct answer from the key, it is 1 , otherwise 0 ;

Retain c1 – c70 ;

**** We want to retain the correct answers that were in the key for all of the records in the data set ;
**** Since we never put a new value in c1 – c70, they will stay the correct answers ;

raw_c = sum(of sc1 – sc70) ;
*** This sums the raw score ;

pct_c = raw_c/70 ;
*** This gives a percentage score :

proc means data=scored ;
var sc1-sc10 c1 -c10 ;

*** This is just a spot check. Does the mean for the scored items fall between 0 and 1? Is the minimum 0 and the maximum 1 ;
*** The correct answers should have a standard deviation of 0 because every record should be the same ;
*** Also, the mean should either be 1, 2, 3, 4 or 5 ;

proc corr data = scored ;
var raw_c pct_c raw pct_correct ;

*** Spot check 2. The raw score I calculated, the percent score I calculated ;
*** The original raw score and percent score, all should correlate 1.0 ;

data mydata.scored ;
set scored ;
if idnum ne “KEY” ;
drop c1-c70 q1-q70 ;

*** Here is where I save the data set I am going to analyze. I drop the answer key as a record. I also drop the 70 correct answer fields and the original answers, just keeping the scored items ;
proc corr alpha nocorr data= mydata.scored ;
var sc1 – sc70 ;

*** Here is where I begin my analyses, starting with the Cronbach alpha value for internal consistency reliability ;

I want to point something out here, which is where I think the professional statisticians are maybe distinguished from others. It’s second nature to check and verify. Even though this program should work perfectly – and it did – I threw in reality checks at a couple of different points. Maybe I spelled a variable name wrong, maybe there was a problem with data entry.

One thing I did NOT do was write over that original data. Should I decide I need to look at what the actual answers were, say, I wanted to see if students were selecting chi-square instead of t-test (my hypothetical correct answer), that would alert me to some confusion.

Incidentally, for those who think that all of the time they save grading is taken up by entering individual scores, I would recommend having your students take tests on the computer if you possibly can. I was at a school today where we had a group of fourth graders taking two math tests using Google chrome to access the test and type in answers. They had very little difficulty with it. I wrote the code for one of those tests, but the other was created using survey monkey and it was super easy.

I’d love to include pictures or video of the kids in the computer lab but the school told me it was not allowed )-:

The question for Day 3 is :

“What is a website that you cannot live without? Tell about your favorite features and how you use it in your teaching and learning.”

The first part is easy. Oh my God, I love, love, LOVE stackoverflow, a site where all of your programming questions are answered. It’s free , you don’t have to register. You can just go there and search for an answer to why your css is not properly aligning 5 pixels from the left margin of the container, or whatever is bothering you at the moment. Normally, when I type a question into Google one of the first few hits will be on stackoverflow.com and I go read whatever it is. Even if my question isn’t answered, I’ll learn something and I can usually search the site or look at the related topics in the sidebar and find what it is I was trying to learn.

I can’t really say that I use stackoverflow for teaching, except for indirectly. One of The Julia Group companies, 7 Generation Games, is games to teach kids math and many of the problems I encounter are related to game development.

There are sites, I use for teaching and I was going to list more here but I peeked ahead and saw this question comes up again in the 20-day challenge so I’ll save those for later. There are a few other good sites, including a couple of blogs, that I like for statistics, SAS and SPSS but answering the first part of the question, what site, if I woke up tomorrow and it wasn’t there would you find me screaming NO- O – O – O !!! and searching for the nearest lake to drown myself in? Definitely, stackoverflow.com

 

lake for drowning in

Today I’m on day two of the 20-day blogging challenge, the brain child of Kelly Hines and a great way to find new, interesting bloggers. The second day prompt was to share an organizational tip from your classroom, one thing that works for you.

The latest tool I’ve been using is livebinders . Remember when you were in college having a binder full of notes, handouts from the professor, maybe even copies of tests to study for the final? Well, livebinders appears to be designed more for clipping websites and including media from the web but personally I am using it to create binders for teaching statistics. I’ve just started with one but I’m sure this will eventually split off into several binders.

I’m always writing notes to myself but I have them everywhere – I used Google notebook until they got rid of that, evernote, I’ve got notepads on my laptop, desktop, iPad, phone and even paper notebooks around the place. I even have a PadsX program The Invisible Developer wrote years ago just for me (yes, he loves me).

Still, I’m thinking livebinders is going to be really useful for me to organize all of these notes into one spot.

Why do I want to do that, you might ask?

Well, statistics is a big field, and I have taught a lot of it, from advanced multivariate statistics to psychometrics to biostatistics and a lot of special topics courses. It seems to me that we often assume students have a solid grasp of certain concepts, such as variance or standardization, when I’m sure many of them do not. As I read books and articles, I’m trying to note what these assumptions are. My next step is to have pages in the binders where students can get greater explanation of, say, what does a confidence interval really mean. Right now, I feel that universities are trying to cut costs by combining information into fewer and fewer courses. We say that students learned Analysis of Variance in a course, but did they really? The basic statistics I took in graduate school consisted of a descriptive statistics class (I tested out of that). It ended with a brief introduction to hypothesis testing and a discussion of t-tests, z-scores, t-tests and correlation. The inferential statistics course reviewed hypothesis testing, t-tests and correlation, then focused on regression and ANOVA. The multivariate statistics course covered techniques like cluster analysis, canonical correlation and discriminant function analysis. Psychometric statistics covered factor analysis and various types of reliability and validity. These four courses were the BASICS, what everyone in graduate school took. (People like me who specialized in applied statistics took a bunch more classes on top of that.) Oh, yes, and each class came with a three-hour computer lab AFTER the three-hour lecture,  to teach you enough programming so you could do the analyses yourself. Now, many textbooks try to include all of this in one course, which is just a joke, and ends up with students concluding that they “are just not very good at math”.

I can’t change the curriculum, but what I at least can do is provide some type of resource where every time a student feels he or she needs to back up and understand some concept, there is an explanation of that something.

I plan to have this done by the time I teach Data Mining in August.

Suggestions for what to include are welcome.

I don’t use AMOS for structural equation modeling all that often and every time I do I have to look up all of the steps again.

1. Install SPSS and AMOS. Fortunately, it seems to work on Windows 8. Yay! You can either open AMOS by double-clicking on it or you can open it directly from the ANALYZE menu in SPSS

2. Go to FILE > DATA FILES > Click on FILENAME and then go to wherever the SPSS file is saved. When you open the file, if you haven’t opened it from SPSS and want to look at the file to be sure you have the right data, if you click on the View Data tab it opens SPSS and the data file.

3. Click on the RECTANGLE (top left corner) and draw a box for each observed variable.

4. Double-click on each box to give it a variable name and label

5. Click on the single arrow to draw paths, the double arrow to draw covariances

6. Include an other term for error variance

7. Set the regression parameter of one of the paths to 1

8. Click on View > Analysis Properties and select Output. If you don’t do this, you won’t get much output and you will be disappointed. At a minimum here select standardized estimates, but you probably want squared multiple correlations and maybe some other stuff too.

9. Select Calculate Estimates

At this point, you may get the dreaded error … Path is not of a legal form.

illegal path error10. Here is what you need to do – save your file. The AMOS manual says you should be prompted to save your file, but I wasn’t (neither on Windows 7 nor on Windows 8). However, saving the file solved the problem.

My assumption is that AMOS writes output to a path relative to where your AMOS file is saved and if you haven’t saved the file, it causes this error.

So, hurray, hurray it runs and you are looking at the exact same model you were a minute again. Where are the estimates?

 

AMOS diagram without estimates

11.  Click the SECOND button in the top middle pane and change-O presto, your estimates appear on the path diagram. You can also select TEXT OUTPUT under the VIEW menu for some tables.

 

I’ll finish up this project and several months from now when I’m using AMOS again I’ll be glad I wrote this post.

 

 

I was reading the powerpoints that came with a textbook, you know, in the instructor’s packet, and I was already thinking this book was a little more focused on computation over comprehension for my liking when I came to the following learning objective:

“Compute an Analysis of Variance by hand.”

Are you fucking kidding me? I have given this a lot of thought and I have come to the conclusion, “Just, no”.

You know why? Because this is the year 2013 and we have computers. Now, I’m not saying you cannot compute an ANOVA by hand if that makes you happy. I’m also not saying you should be like my friend from graduate school who answered the question on her comps

“What is the multiple R-squared and how do you get it?”

With

“The multiple R-squared is the square of the multiple R and the computer gives it to you.”

I can tease her about this now because she passed her exams the second time around and earned tenure over a decade ago. Contrary to what you think at moments like that, not only WILL you live it down, you will go on to laugh about it.

There will be those who say, “What if your computer doesn’t work?” In that case, I think I’d have more pressing issues on my mind, like getting my computer to work. For one thing, I’m going to assume that you are not just finding sums of squares due to your complete absence of a social life but rather are part of some organization that has an interest in sums of squares, and also probably has more than one piece of hardware. In my case, if one computer doesn’t work, I have two more in my office and four more upstairs. Of course, one each is currently occupied by The Spoiled One and The Invisible Developer, but I’m pretty certain if it came right down to it, I could wrestle a computer away from almost anyone in this group and that includes the dog. (She’s a Dogo Argentino, in case you wondered.)

lovely family

Take tonight, for example. I am very, very annoyed because my class is using the SAS Web Editor and for some unknown reason the site has been down for the past 10 hours. Apparently, SAS has concluded that no one would ever do homework late at night or on weekends so there is no point in having the On-Demand for Academics available.

I do have SAS on my desktop, but that would involve switching over to boot camp. I also have SPSS but again, that would require restarting in Windows which I don’t feel like doing because I’m in the middle of writing a lecture. I installed Office 2010 on my laptop, was dismayed to find that there is no longer a data analysis tool pack for the Mac – yes, I do know it quit shipping with VBA at 2008 – and the third-part stat pack doesn’t do much.

So, what is the conclusion? Well, I guess I’ll see if the SAS Web Editor is up tomorrow. If not, I’ll finish the class that ends this month and go on to finally learn R. I thought the Web Editor was a great idea but you can’t run a program in the cloud that goes down for 14 hours and no one in your organization seems to notice. One of the reasons I have stuck with SAS is that they do have really cool statistical procedures, their model selection procedures are a neat idea and there is generally an enormous legacy of good stuff. I thought perhaps by moving to a web-based model SAS could recover some of the market share it has been losing, maybe even have both something students could use while in school and a product they could use once they graduated by paying a monthly use fee like Adobe has for its Creative Suite.

Contrast this with pair.com which we use for things like email, our MySQL databases, running our PHP scripts. I love pair. They have 24/7 support and not by some person reading out of a manual, but a person who can actually help you. Downtime on pair over the last several years (that we’ve noticed), hasn’t been more than two hours, total, and when we called them, they were already aware of it and able to fix it in under 30 minutes.

In fact, we’re already migrating away from SAS and for small clients that can’t afford a SAS license and require basic statistics, writing their applications in PHP and MySQL.

There are two points here.

First, nowhere in this situation did I think,

“You know what I need to do? I need to start computing statistics by hand, using a pencil and a piece of paper, like I did when I was in graduation school in 1978.”

Second, using SAS is becoming as laborious as computing statistics by hand.  Yes, it’s great if you have it installed on your desktop (and that is often a whole kettle of fish in itself), but that is often thousands of dollars per seat. The Web Editor is a great idea but if it isn’t available, it’s not so great.

Here are your choices – using something that’s thousands of dollars, use something that’s free but doesn’t always work when you need it or use something that’s free and you can download on your desktop. I don’t know that I’m ready to give up on SAS completely let but I have to admit that I see why so many universities have gone to R.

 

This month, I’m teaching biostatistics for National University, and so far I am really enjoying it. There is just a really minor problem, though. While I received a copy of the textbook, I did not receive a copy of the instructor’s manual with answers to the homework problems. Since I am going to grade 20 people based on whatever I get, I need to be 100% correct in everything and it is taking up my time to computer Cumulative Incidence for the population, cumulative incidence for people with hypertension, population attributable risk -  and I am busy.

So, check this out, and all of you epidemiologists, I am sure this is old hat to you …. I had a table that gave me the number of people who were and were not hypertensive and whether or not they had a stroke in the five years they were followed. I wanted cumulative incidence for those with hypertension, those without and the population attributable risk.

And here we go …..

DATA stroke ;
INPUT  Event_E Count_E Event_NE Count_NE;
DATALINES ;
18  252  46  998
;

proc stdrate data=stroke
refdata=stroke
method=indirect(af)
stat=risk
;
population event=Event_E  total=Count_E;
reference  event=Event_NE total=Count_NE;
run;

 

All I need to do is create a data set where I give the number of people who were exposed, (in this case, who had hypertension) who had the event,  a stroke, in my example, and the total number of exposed people. Then, the number not exposed (that is, not hypertensive) who had the event, and the total number not exposed.

I just invoke the PROC STDRATE giving it the name of my dataset and specifying that I wanted risk as the statistics.

In my POPULATION statement, I specify that for the population of interest, people with hypertension, the number who had the event was found in the variable Event_E and the total number was in Count_E .

In my REFERENCE statement, I give the number who had the event and the total number for people who were not exposed to the risk factor.

That’s it.

output showing cumulative incidence and risk

After a fine, productive evening of coding PHP and javascript respectively, The Invisible Developer and I were discussing how to find a developer. We’re making good progress on 7 Generation Games and we’re pretty happy coding our own stunts. We did have someone come in to pinch hit last year when we were running behind schedule and he was great (thanks, Eric Gebhart).

A lot of start-ups we know are not as fortunate as we are and they are looking for developers and having trouble finding them. Some have even tried to poach The Invisible Developer away from me, but they have found it impossible to compete with my offer of paying him six figures, letting him work at home in his underwear and having sex with him. (Every time I say this, The Spoiled One puts her fingers in her ears and chants, “La la la, I can’t hear you!” )

Unhappy camper

If you are looking for a software developer of your very own, here are a few suggestions from me and additions from The I.D. Stop thinking about what YOU need and start thinking what you might offer. Yes, you might be able to go on some random website and find someone willing to code for minimum wage. I guarantee you that the best people (isn’t that who you want by your side to change the world?) are not there. They already are working on other projects.

1. Pay decent money: I don’t care WHAT stupid article you read that said technical people are motivated by more than salary. Yes, there are is a threshold. If you came in and offered us each a million dollars to work for you today doing something like creating a replacement for SQL or a new operating system, neither of us would be interested. The key point is, though, we are already making enough money to live by the beach and shop at Bloomingdales with The Spoiled One. If people can’t pay their bills on what you are paying they will either:

  • Quit your project for one that does pay enough that they can afford housing, food, clothes and Chardonnay, or
  • Take another job to pay the bills and work on yours in their spare time, which will be very limited.

 

2. Have interesting work: The definition of ‘interesting’ is a personal one but anyone who is really good got that way because they were continually learning. In selling your potential developers, talk about how they will have the opportunity to choose the language, IDE, libraries, hardware, etc. they use to develop. Talk about the new things they could learn. Certainly, there will be some limits. At the moment we don’t develop for Linux, although we’d love to, because it’s not compatible with Unity. Both the I.D. and I have left six-figure jobs for other jobs because they weren’t fun. Note that I did not say we left to work for free. Fun only matters after the rent and kids’ tuition are paid (see #1).

3. Have perks: Like interesting, this is a personal definition. For some people, it is having flexible hours so they can spend time with their children. For others, it might be telecommuting. At The Julia Group, we know we can’t match Microsoft or Google in salaries and other financial incentives. We can offer you the flexibility to set your own hours, work from home, maybe buy you the exact hardware and software to your specifications.

4. Address a need the developer is passionate about:  It seems most start-ups looking for a developer start here but I don’t know a lot of developers who do. This isn’t to say that I don’t know some great people who would like to have an impact on the world, but they first would like to pay the rent (Maslow’s hierarchy, anyone?). I know developers who are passionate about climate change, education, inequality – but really not all that many. I mean, they do care about those things but they aren’t any more likely to quit their day jobs and devote their lives to them than the guy who runs the car lot down the street from me. Your mileage may vary. I’m sure people I know personally are not a representative, random sample.

The Invisible Developer added this:

5. Hang out where the developers hang out: If you are looking for someone to create iphone apps, there are iphone developer forums. Lurk there and see who is asking beginner questions and who is answering them. In many forums, people will post if they are available and looking for work. If you’ve read a number of their posts, you might have an idea if you want to contact them or not.

6. Learn to code or at least a little bit about coding: I’m not saying you need to create your own operating system from scratch, but you ought to know the difference between a jpeg file and a website design, have some idea about how long it should take to code a web form (not very) versus a really good 3-D adventure game (the rest of your natural life – just kidding, sort of).

I was wrong.

Somewhere along the line, I got the idea that women did CSS and HTML and men did “real” coding like PHP, SQL, Python, Perl, javascript etc. Since life had taught me that predominantly male fields always paid better than female ones (construction workers get paid more than licensed practical nurses, for example), I decided to ignore CSS beyond the minimal amount to get by because seriously, real programming has arrays and functions.

Getting the pages in the game to all look alike was one of those tasks I put off until later, and maybe we would just hire someone to do it. Well, guess what, later has arrived. So, I spent a day reading Stylin’ with CSS (which rocks, by the way) . The main motivating factor was that I had to do some test questions for our game that match up to the type of items on the new computerized exams testing the Common Core curriculum. This means that I needed things draggable and droppable – no problem with jquery – but I also needed them laid out very specifically on a number line. I could have done this with the canvas tag, but really, css proved the perfect solution.

Not only was I able to use margins, relative positioning and float to get my objects on the page to show up exactly how I wanted them, but I was also able to do it so that I am pretty sure it will look the same in most browsers on most computers and not just my lovely cinema display using Firefox.

On top of this, I learned about the acronym tag which I could not believe I did not know existed before now. (Yes, I know it is an HTML tag but it was in a book on CSS I happened to be reading.)

In short, you do this

<acronym title="What you want to show up when you hover">The thing you hover over</acronym>

In our games, we use many words from the students’ tribal language. For example, for the game we are going to be piloting on the Turtle Mountain reservation this spring,

Grandmother

Nookomis says …

and in 100 places in the past, I had used javascript so that if you clicked on or hovered over something it would show the word and here there was a simple little tag all along.

How was it possible I did not know this? Because I had the stupid idea that CSS was a woman thing and if you want to make money in life and be taken seriously you hire someone at a low salary to do the things that women do and you concentrate in other areas.

I was wrong about CSS and that was the second thing involving stereotypes about women that I was completely wrong about this week. The other one was a book on women in fitness. You can read about that here.

I am old. I remember punched cards, COBOL, dumb terminals and having to walk over to the computer center and load tapes on to the drive if I wanted to use large data sets – large back then meaning 100,000 records or more with a few hundred variables. We thought that was pretty big data.

By the time I finished graduate school in 1990, almost everyone I knew who still programmed using COBOL was over 40. They had learned it in college, or picked it up somewhere along the line and stuck with it. I didn’t know anyone who was learning COBOL. It was pretty clearly on the way out. Java, C++ , PHP, Perl and javascript were all taking up the attention of the cool kids on the block. SAS was a relatively new, cool thing if you were into statistics, while BMDP was on the way out. BMDP – that was another thing no one under 40 seemed to use.

So …. when I went to the Western Users of SAS Software conference this year, I was struck by the fact that I seemed to be about the median age. There were A LOT of people older than me. Most of the younger people were the student scholarship winners and junior professional award winners.

This does not bode well for SAS, and it made me a bit sad, because as I said in a prior post, the model selection procedures were cool, from a statistical perspective, there is a lot of good stuff from SAS.

I used to go to the user group meetings and they would give you a book (yes, on paper, children) that had macros written by SAS users. I think that was the first time I saw the parallel analysis criterion code for factor analysis – a macro I used in my dissertation and in one of the first articles I published.

Tonight, I was looking for a way to do power analysis for a repeated measures ANCOVA and I could not find it for SAS, neither using PROC POWER, PROC GLMPOWER nor any user-written macros. It may exist – I looked several other places as well, found a paper on how to do it using SPSS syntax (although that code did not work!) and someone else wrote a procedure in R that I didn’t try.

SAS used to be the place for the cutting edge. What happened?

One reason is that everyone used to use either SAS or SPSS at universities and that isn’t the case any more. A second is that SAS is really expensive, so universities who do not have a license aren’t inclined to get one.

This all sounds like the death knell is tolling for SAS and it is just a matter of time until it follows COBOL and Blackberry as one of those things that people ask, “Why are you using that?”

I think there is still some possibility for SAS to turn things around – although whether they will or not remains to be seen.

The smartest thing SAS has done in years is to come out with SAS On-Demand for Academics. This makes SAS free for university students and professors. It’s perfect for on-line courses because you can upload your data to the class website and all of your students can access it.

Now the next thing SAS needs to do is start making that available at a reasonable cost once students graduate. Instead of charging them thousands of dollars a year for a license, they can charge $50 a month like Adobe does for its design package or Google does for its apps. (Yes, Google apps for business are cheaper than $50 a month but they don’t do all that much.)

New graduates aren’t going to pay several thousand dollars for a license because they don’t have that kind of money. They might shell out $50 plus occasional extra charges to access some high performance computing capabilities.

SAS already has millions of lines of code and tens of thousands of pages of good documentation. It’s some good stuff.

Think about this – years ago, the Mac was considered a better computer than Windows but over-priced. Many people thought Apple would go under. Instead, they came out with the iPhone and the iPad and they are wildly successful.

The Web Editor and other cloud products could become the SAS version of the iPad.

Here’s to hoping they don’t fuck it up.

 

What the hell is WUSS? Why would *I* ever go to anything with the name WUSS? wuss_hats It’s not just so you can get the cool hats which double as lunch bags, although that’s one possible incentive.

It’s also not just so you can get your own original, signed illustration of the difference between ordinary least squares and maximum entropy methods from Don from SAS,

normal_entropyAlthough if you are lucky, that is an added bonus.

I was very pleasantly surprised to learn more than I expected at WUSS this year. I was aware of the GLMSELECT procedure available to select the best-fitting model, but I have not actually used it. Funda Gunes, from SAS, gave a great talk on model selection methods. To summarize the last hour – you create 1,000 or so bootstrapped samples, then run models with those each of those and select the average coefficient estimates from the 1,000 models. This is the best model not in the stepwise regression sense of giving you the highest explained variance, but as in most likely to correctly reflect the population values. That is a GROSS over-simplification but I highly recommend if you have any interest in model selection techniques, you download and read her paper which should be available from the conference proceedings, which will be published on the WUSS site eventually.

A second good paper on model selection was by Scott Leslie, pretty much on the polar opposite on the technical side from Funda’s, where he showed a series of ROC curves to illustrate the gradual (or sometimes substantial) improvement in a model as new predictors were added. He ended with a discussion of what might be better predictors of adherence to a prescribed medication regimen and how would you get that data.

In Kechen Zhao’s presentation, I learned about using PROC GENMOD to compare four different model types – logistic, log-binomial, Poisson and modified Poisson.  He discussed relative risk as a variable of interest versus odds ratios, and the fact that logistic regression in particular can produce substantially different estimates then the other models. This is worth a whole post in itself that I will try to get to next week.

As an added icing on the cake, in a session by Marie Bowman-Davis I learned about a public use data set, the California Health Interview Survey. (I did not know these data were available for public use and they are obviously a great resource for teaching.)

Despite all of these good things, I left the conference a bit concerned about the future of SAS – the average age of attendees at the conference was probably over 50. More about why that is and why that’s a problem later, since this post is already long enough and I have actual work to do.

 

 

← Previous PageNext Page →