Jun

26

I’ve been asked several times what made me change my mind about SAS Enterprise Guide. SAS EG and my husband have a lot in common. For one thing, neither made a great first impression.

When I first met Dennis and saw that he was the same size as me (which my third daughter says is only the perfect height if your aim is to be shipped in a box), the first thought that went through my mind was,

“Oh my God, I’m dating a munchkin!”

When I first used Enterprise Guide, probably at version 1, my first thought was,

“Who would ever use this $#@ ?”

Here is another similarity between my husband and SAS EG, the more time I spent with both, the more I realized,

“Hey, you are really brilliant.”

Since I have four daughters, I have given a lot of talks about men, types of men, things to look for and what to avoid. Contrary  to my friend who believed that men only come in three types, I think they are more complicated than that.

There are brilliant men who are great to be around because they are just so deep and insightful, you never get bored. At the same time, they are really high maintenance. They expect you to be always perfectly dressed, ready to go to Lago’s at the drop of a hat, take their laundry to the dry cleaners and be interested in their favorite sports teams. That’s SAS. Brilliant, high maintenance, but worth it.

Then there are men who are just as brilliant but a lot more comfortable to be around. They can glance at the third iteration and tell you the final equation before the computer finds the solution. At the same time, they are happy to stay home, drink beer and watch the Daily Show. If you want to go to Lago, that’s fine, too. Brilliant, easy to get along with, but able to rise to any challenge.

How is that SAS Enterprise Guide? First of all, as I said the other day, the time it takes to do the data cleaning and checking can be cut to a FRACTION of what it is with SAS. That is a lot of the hours that go into a research project.

Second, there is the 80-20 rule, where 80% of your research projects are going to use 20% of all possible techniques - ANOVA, linear regression, logistic regression and those in SAS EG are extremely easy to do.

infdads2Third, and really important - think of the children! SAS EG would make a good dad because good dads make things easy for the kids to understand. EG has a much gentler learning curve than does SAS. This is one reason I think it was a brilliant move for the company. As a consultant, most people come to me interested in learning SPSS because it is easy to get started with the pointing and the clicking. Lately, the big interest in classes on campuses has been with SAS Enterprise Guide. EG fits with how people are used to using computers. We have a whole younger generation that expects to use a computer to solve just about every problem in life and certainly does not expect to learn programming.

Fourth, and equally important, let’s say you WANT to clean up nice and go to Chinois on Main - well Enterprise Guide includes a code window (under the PROGRAM menu option in 4.2) where you can write code to your heart’s delight. Although Enterprise Guide is relatively easy and comfortable to use, it combines that with the limitless range of SAS.

Does that sound like an ad? Not quite. There are still two significant disadvantages. While SAS EG is easy to learn relative to programming in SAS it is still not exactly intuitive.  If you have been using SAS for a long time, you’ll probably find EG a piece of cake. If not, it’s kind of like Dreamweaver, compared to coding html from scratch it is easy. Compared to typing it is hard.  For those people who want statistical analysis to be like the genie in Aladdin  -

“Computer, bring me a repeated measures Analysis of Variance.”

well this isn’t quite it. But, it IS closer.

The other really bad thing about SAS EG it also has in common with my husband - it’s difficult to get to know. (Hint to women: If you like the guy, ignore your friends who tell you that anyone who is 42 years old and never been married must be gay. See photo of baby above as evidence of not-gayness. )

We spent frustrating months trying to get the installation working at our site. We finally (I think) have a usable method for individuals but SAS EG is still not working error-free in our labs. That’s my project for next week.  So, yes, it is brilliant, flexible, comfortable and has limitless possibilities. However, if they don’t fix that installation mess, SAS EG may end up like my husband and be forty-two years old before it gets a really great user base.

Jun

18

At the JMP seminar on Monday, when Dick De Veaux said that 65-70% of time in all research projects is spent on data cleaning, everyone in the audience groaned in agreement.

One of the biggest problems I run into is recoding those simple textboxes. For example, we often want to look at data for one reservation or tribe. Now, one would THINK that the answer to a question like:

If you are an enrolled tribal member, in what tribe?

would be SIMPLE? I mean, you know what tribe you’re in, right?Most likely it is on the sign on every **^ building on the reservation.

spiritlaketribesign

No.  People need to abbreviate because it takes too much time to write, for example, “Turtle Mountain”,  so they have to abbreviate it as TM because that extra 4 seconds they saved could be so profitably spent driving slowly on the back roads in front of me when  I  need to be somewhere. Other people, helpfully, put TMT (for Turtle Mountain Tribe), or the legal name, “Turtle Mountain Band of Chippewa Indians”, which is abbreviated by some as TMBCI.  Still others put Ojibwe which is also sometimes spelled Ojibwa which is the name for Chippewa in Ojibwe (or is it Ojibwa).  I can go on for another several paragraphs on this, because there are also Red Lake, White Earth and more reservations of the same tribe.

So … (hyperventilating here), in the old days, if I wanted to do an analysis of just the respondents from the Chippewa tribe,  I would do something like :

1. Do a frequency distribution to find all the zillion permutations of this ONE question.

2. Be refrained by the wiser, kinder members of our office staff from beating the participants in our research with sticks.

3. Explain for the 59th time in the staff meeting why tribe cannot be multiple choice item (there are 562 federally recognized tribes. We cannot have a multiple choice item with 562 choices. )

4. Write   SAS statements that look something like this:

Tribe = upcase(Tribe) ;

Tribe_s = substr(Tribe,1,4) ;

If tribe_s in (”TMBC”, “TURT”,”CHIP”,”OJIB”)

OR

tribe in (”RED LAKE”, “WHITE EARTH”) then chippewa = 1 ;

Except that the IF statement would be much, much, much longer. This is a common type of problem and you can find lots of solutions in many SAS Global Forum papers. Here is my new one, no beatings required.

In SAS Enterprise Guide, go to TASKS, select DATA, select FILTER & SORT. Click on the thing that looks like a filter.

selectinlistFrom the drop-down variable list, I pick TRIBE. From the next list, I select IN A LIST. I click on the three dots and a list of all values appears. I can hold down the shift key and select several in a row, all the Chip, Chippewa, Chippewa Tribe and so on. After clicking OK, I can go back and select more values to add to the list.

Done. Pointing and clicking and no clever uses of SUBSTR, UPCASE or other nifty functions required. (Note to self: Find out what new careers the SAS function- users have now.)

So, now, with a few points and clicks, by going to TASKS, selecting DESCRIBE, then SUMMARY TABLES, I can produce this table that tells me if you are on one of the Chippewa reservations surveyed, your internet usage is related to your years of education and age. The relationship between internet usage and age seems to be curvilinear here.

tableresultsMy suspicion is that older people are slower to adopt new technologies, however,  technology adoption is also enabled by money and those with more money tend to have more education (which is somewhat related to age, you don’t have a lot of 18-year-old college graduates).  I can begin to examine some multivariate relationships now.

Notice what is going on here ! In about 12 seconds I have motored through the data cleaning part combining the frequency distributions, recoding and selection and am now delving into data analysis.

I would not go so far as to say that this is better than sex (hence explaining the four children) but it is definitely way cool and makes me happy.

In the interest of full disclosure, I must say this. If you have never used SAS before, it will take you longer than 12 seconds. Here is why :

  1. The filter on filter and query is three blanks followed by a box with three dots. The reaction of an experienced SAS programmer, especially one who ever used the analyst application,  is to recognize that as an IF statement with the first box as the variable (click arrow for drop down list of variables), the second box as the operation (click arrow for drop down list of operations) and the third box as whatever you want to select (click the three dots for more). The reaction of the rest of humanity is going to be WTF?
  2. In creating computed columns, which I did to recode the internet usage variables, I immediately knew that the format I wanted was $CHARw.  and that I needed a length of 10.

So, SAS EG is a great thing for any researcher who is a SAS programmer. It is also a great thing for any researcher who wants to be a SAS programmer. I won’t lie to you and say it will be completely easy and painless, but it is true that less beating of subjects with sticks will be required.

Jun

12

The model is non-significant, therefore my theory is supported.

Huh?

Just when you thought it was safe to get back into statistics… It took you two years of graduate school but now you have it down. P-value low = good, relationship detected, publication, tenure, Abercrombie & Fitch models at your feet.

P-value = high, no relationship, no publications, no money, dating the creepy guy next door.

Enter Hosmer to screw things up.

There are a whole bunch of reasons you might want to do a logistic regression (no, I’m serious). If you want to predict a categorical dependent variable like death, drop-out or watching Afghan Star. If you were going to do a propensity score match you would start with logistic regression. If you plain can’t think of anything else to do with your evenings.

The first thing would be to see if your dependent had a relationship with your grouping variable or you really are wasting your time. Okay, now that is settled, you have found that people seen in hospitals with Intensive Care Units are more likely to die than those seen at other hospitals.

You also want to see if the variables on which they differ have anything to do with the outcome. For example, I ran an analysis where I coded their favorite colors of pants - blue, brown, white, black or green pants (seriously, who buys green pants?) . People who went into intensive care were more likely to own green pants.  To test if this is significant, I run a logistic regression with  death as the outcome variable and pants color as the predictor.

In SPSS you go to ANALYZE > REGRESSION > BINARY LOGISTIC

So, the Hosmer and Lemeshow Test is statistically significant with a chi-square of 349.06, df = 4 and  p < .001. Is that exciting? Do I immediately publish an article on “The American Apparel Effect” and how poor fashion taste is dangerous to your health?

Not so fast. You see, Hosmer & Lemeshow tests the Goodness of Fit of the model predictions to the observed data. If you reject the hypothesis that your model fits the data, that is bad!

In my next logistic regression, I used age over 65 as a dichotomous variable.  My second variable was the Dr. MechOth scale. Dr MechOth (not her real name) was a friend of mine when I was a young Assistant Professor who occasionally hung out in bars. Dr. MechOth rated all men on a 1 to 3 scale, where 1= “Yes” , 2 =”Maybe if I was drunk” & 3=”I couldn’t get drunk enough”.

The results of the Hosmer & Lemeshow test shown below, with a chi-square = 4.52, df = 3, p > .20  show that the data fit the model somewhat, although it could be better.

significlogistic

Does this mean that in logistic regression high p-values are always a good thing? Nope, that would be too easy for you to remember.  In fact, no sooner have we inverted our understanding of p-values but now it is time to do it again. When interpreting the COEFFICIENTS, a low p-value is a good thing. So, which of Dr. MechOth’s groups one is in, and being really, really old are related to probability of death.

significlogistic2

Sadly, my original hypothesis about death by green pants is not supported and all I have discovered is that if you are really, really old and no one would go home from a bar with you if you are the last person on earth, you are more likely to keel over dead from natural causes or suicide, whichever comes first, than hot, young people.

I do not think I will be winning the Nobel Prize for Medicine any time soon. I wonder if that guy next door likes Cup-A-Noodle soup.

Jun

3

Lately I have been on a roll looking at relatively less common statistical techniques, proportional hazards, survival analysis, etc.

In keeping with that, I have been taking a look at propensity score matching, fondly known as PSM by, - well, by no one actually.

The problem to be solved ….

Think about some of these comparisons:

In all of these cases, and probably a lot more you can think of, there are very likely differences in certain “outcome” variables, whether it be survival in the case of hospital patients, academic achievement of students or annual income of TV versus Internet users. However, all of these comparisons also begin with groups who are already different.

For example …

You have two groups, say people who are treated at a hospital with a specialized unit for terminally ill patients and patients from another hospital without any such specialized unit.  Your outcome variable of interest is whether the patient lived or died.

The simplest way to test this is a chi-square. You compare the percentage of people who survived at St. George of Money Hospital versus Heart of Despair County Hospital.  There is a problem with that, though.  A simple comparison will almost always show WORSE outcomes for hospitals with special units for patients who are terminally ill, seriously burned, extremely premature births, etc. The reason is probably obvious  - if you get sicker patients, they are less likely to live.  If your interest is in knowing whether having a specialized unit increases your chances of survival, you would want to compare similar groups.

It isn’t as simple as just controlling for severity of condition, though. There are other variables, for example, people who are better educated, who have private insurance and who live in urban areas all may be more likely to be patients at more “elite” hospitals. Some of those factors may be related to survival as well. What we’d really like is to compare a  group of people from St. Money’s that is similar to patients from Despair.

In short, certain types of people have a greater propensity to be admitted to one type of place than the other.

Enter propensity score matching — to the sounds of trumpets and wearing a cape.

In fact, the first step is to do a logistic regression analysis and I will admit that it is not strictly necessary to wear a cape while doing so but it would probably be more comfortable than this business suit from Filene’s that I am wearing.

Using SPSS, go to the ANALYZE  menu, select REGRESSION, then select BINARY LOGISTIC. Your dependent variable will be the hospital to which the patient was admitted. Covariates are the variables such education, severity of illness and insurance that you want to control.  For variables that are categorical, e.g., insurance, which could be private, public (a.l.a. MediCal if it hasn’t disappeared in the latest round of state budget cuts) and none, click on the CATEGORICAL button and move those over to the “Categorical covariate” window.

Here’s the really important part  — click on SAVE and select PREDICTED PROBABILITIES - that is your propensity score.

This is what you are going to match on. Hence the name.

This is step one. I would say it gets easier after this point - but it doesn’t.

May

20

Event history models of all types have a few characteristics that make them unique. First of all, forget that whole symmetry thing around zero.

Here our dependent variable of interest is time to event. We are interested in how long a person lives, remains sober, stays with a given company, or, in a study of my parenting skills, goes without threatening to skin a child alive and tack her hide upside the door as a warning to her sisters.

Regardless of the specific nature of the event, we are interested in TIME, which by definition must be positive. You cannot have negative duration.

Let’s take death as an example of an event. We will define death operationally as the time of death written on the death certificate.  As our beginning point, let’s take attack by weasels.  Some people might die right after a weasel attack, if, say, attacked by a particularly large weasel, or a whole sneak of weasels. (Yes, the correct term for a collection of weasels is a ’sneak’. ) If you don’t believe me, look it up.

Others might linger for a while and then die, with their bodies unable to combat those severe weasel-bite wounds. Some additional number may die from complications of infections due to weasel bites and so on.

Our dependent variable we are interested in is T, where T represents the time from the biting weasel onslaught to death. At each time period, there is a baseline hazard rate. Remember this term, because it is important.

The baseline hazard rate is a constant. Weasel attack survival may be like this. Say 5% of weasel attacks are the sneak variety and the victim dies within 24 hours. However, of those who survive, only 1% die within the next 24 hours, and 0.2% catch some type of nosocomial infection and die within the following 48 hours. In an exponential model, the baseline hazard rate is a constant - period - because we assume that the rate of an event does not change with time.For other models, the baseline hazard rate is a constant for a given time interval. The Weibull model, for example, allows for a monotonic hazard rate, i. e., it can be increasing or decreasing but only in one direction.

The baseline hazard rate for that second period is .01.  so h(2)  = .01

However, one can, and usually will, have covariates. I mean a person is more than the sum of his episodes of attack-by-weasel, right? So, while the hazard rate may be .01, it may increase if a person has other pre-existing conditions, such as old age. A 99-year-old weasel attack victim may have a greater hazard rate than a 17-year-old victim. Other factors may have a negative relationship with hazard, for example, having been vaccinated for rabies.

Thus,  Weibull model can be expressed as a log-linear function

log(T) = b0 + b1X1 + b2X2 + σε

where the last part is a stochastic disturbance term, stochastic disturbance sounding better to say than ‘error’ and less likely to draw the attention of malpractice attorneys and hedge fund investors.

What makes the Weibull model different is that it also includes a shape parameter. The covariates alter the scale value but the shape (if it is increasing, decreasing or flat) remains constant.

A Weibull model can work over defined ranges but may not always be the best pick. Think mortality, for example. There is actually a relatively high mortality rate in the first year of life - being born is a risky business - but then mortality drops until age 14 after which your risk of death goes up again until, well, until you die.

May

12

What is an event history model?

Think of it like this - you are interested in whether something happens, what predicts whether it happens and how long until it happens. Let’s take a common one, like, say, death.

An event history model could predict the duration from diagnosis of tuberculosis to death. In this model you have two groups, those who died during the study and those who were still alive at the end of the study. You could use a simple logistic regression model. I guess this says something about me that I use simple and logistic regression model adjacent to one another in the same sentence.

Logistic regression fails to use a critical piece of information, that is, how long the person survived.

Some terms to know thinking about event history analysis:
1. There are various types. Survival analysis is a special case of event history analysis. In this case, the curve eventually reaches zero - in the end, there are no survivors, everybody dies. Also, survival analysis does not have recidivism rate. You only die once. Related to this, it is a final point. You don’t die and then come back. I know every Christian from that original Mormon guy to Father Mike says you do, but it has never happened in the duration of any statistical study in which I have been involved. In mathematical terms, it would be said that the survivor function S(t) is a strictly decreasing function.

2. Some observations are censored. That does not mean they have been running around your study naked (although they could be, there is nothing to prevent a censored subject from going naked). Censored subjects have not experienced the event by the time the study ended or you lost track of them. (If you had kept their clothes, that might have prevented them from running off, but it is too late now. You should have thought of that sooner.) If your study is of the use of illegal drugs, some people will not have used drugs at all by the time the study ends. If your study lasts 700 days and Joseph goes out and does massive amounts of cocaine on day 700, while Mary is at church singing hymns all day for all 700 days of the study, it wouldn’t make any sense to consider Joe as having just one day less of cocaine-free lifestyle. In fact, it is very plausible that Mary will continue drug-free throughout the rest of her life for another 7,000 days or more, and, with behavior like this, she may even come back from the dead and live drug-free hymn-singing some more. You could drop Mary out of the study as “missing data” , since there is no data on when she began using illegal drugs. That’s an unsatisfactory solution also, though. Not only is she not really missing data but the data you do have is usually the outcome you are most interested in - the not-drug-taking, not-dead, not-incarcerated people.

3. Some event history models allow for multiple episodes of the event, whether your variable of interest might be drug use, incarceration, military intervention, or its don’t-try-this-at-home counterpart of domestic violence.

May

10

Whether you are a statistician, SPSS guru, SAS programmer or professor and world-renowned expert on re-incarceration, odds are great that you are susceptible to bubble-vision. You work, breathe and socialize within one or two very narrow bubbles.

p1201799-version-2

This is bad and unhealthy. You’ll miss much of life that is beautiful, exciting, dramatic, interesting, tragic and delightfully fun. You’ll also focus too much on things that are not particularly important because you are looking only at whether your colleague in the Study of Very Important Flagellum Department unfairly criticized your latest conference presentation, who voted for you as Treasurer of the SVIF Society and what that editor of the Journal of SVIF said about your latest article submitted.
juliasface

Be like Julia (the eponym for The Julia Group), live life large, interested and happy. In the interest of that goal, here are some interesting links to follow that relate to the world outside of my personal bubble:

The Disease Management Care Blog - is unfortunately named because, contrary to what you might think, it is far more interesting than a rectal exam. The latest post was on Comparative Effectiveness Research. I don’t wholly agree with the point cited that CER doesn’t take into account co-existing conditions, personal preferences, etc. It may not in all cases but that is no reason it couldn’t. The author discusses both sides of the issue of CER funding, whether we are spending too much on it, too little and does it do any good in the end? These are pretty general questions of life.

I love the New York Times because their coverage is intelligent and thought-provoking. This series on social class in America is even more the case than usual. My family certainly lives in a different class than the one I grew up in. When Julia was about four, I asked her if someone she had mentioned was her friend’s mother and she answered contemptuously,
“No, she’s him’s ‘anny !”

After all, who could be so dumb as to not know it is your NANNY that takes you to the park, not your mommy. Your mommy is probably working on a documentary or writing a blog on statistics or at the hospital delivering a baby.

When I was eight years old, I walked a mile home from school with my brothers and sister. During the summers, we watched ourselves, made ourselves lunch and solved our own fights, by means best not shared with my mother to this day. Let me just say that the broken front window, the broken down bathroom door and the scars on my second brother’s forehead - none of those were me. My oldest brother’s broken finger or the drainpipe inexplicably pulling away from the second floor, well, I plead the fifth.

Their discussion of class was fascinating to me in part because, being over-involved in judo (I am the president of the United States Judo Association) in my copious spare time, of which I have none, I meet people from all possible strata of American society, most of whom haven’t a clue what a stratum is. Some are absolutely infuriated that I do not do as I am told. What the New York Times articles highlighted was the class differences in the value placed on doing what one is told versus finding the right answer. It never even occurred to me that blind obedience could even be considered a virtue.

Wiki-books is an interesting concept. Free textbooks. Not great in quantity, but hey, if you want to contribute, go ahead, or read whatever happens to be there. Every now and then I go just to read at random. Today, I read How to Do Nothing. As anyone who has ever met me can tell you, it is a textbook I sorely need to read.

Speaking of the judo association, another good site to check out is the page on Social Capital from bettertogether.org . This Internet thing is pretty cool. Where else could you read original research by people from Harvard University while sitting in your massage chair? Or find 150 ways to increase your social capital.

Right now, I think I am going to do #86, log off and go to the park, even though I am not, in fact, a nanny.

May

4

FINALLY got a few minutes to download the latest version. For some reason the download I received was for the planned installation as opposed to the basic installation.

In 25 words or less, basic installation is for stand-alone installs on a single machine, which we have hundreds of users doing. The planned installation would be used if you had a meta-data repository, SAS on a server distributed to client machines or some other configuration which we did not have.

So, I have logged in as SAS administrator, downloaded the download manager, applied the order number and key, created a software depot and — nothing.

After slogging through several documents, I realized that we had been sent the wrong thing. Either that, or one of the right things telling us how to use this for a non-planned installation, had been omitted. Got through right away to the lovely Angie McKinley from SAS who sent me a link how to skip the planning part and voila ! My deployment deploys and I now have SAS 9.2 v2 and Enterprise Guide 4.2 on a computer running Windows XP.

By the way, since I am taking this incredibly stupid required course on Workplace Harassment Prevention let me just specify that I do not actually know what Angie McKinley looks like and the lovely is referring to her helpfulness and is not in any way a reflection of ageist/sexist/gender-specificist/racist/lookist stereotypical intent. Come on, I am Hispanic, female and over 40. I believe as a group we are mostly accused of harassing our children for not calling often enough. (”Yes, I know you are covering the World Cup. So, what, they don’t have phones in South Africa?“)

SAS 9.2, which I am testing in between clicking on the stupid harassment training, is so far working well. Opened up an xslx file no problem. Tried Enterprise Guide 4.2 and
mudskipper1Hey wait a minute …. something looks different here…

First of all, there is no longer a DATA menu. Instead, under tasks, there is a FILTER and Sort. There is also a QUERY BUILDER which is where you now create new variables a.k.a. computed columns. Okay, so having just completed the docs on Enterprise Guide on my personal pages, I will need to go recreate them. This does not motivate me to do my little happy dance.

Other than having to redo a few pages I just finished, though, I cannot complain about EG 4.2, personally. With the FILTER & SORT and Query Builder, it looks more Access-ish.

So, what have we got here… a combination of SAS, SQL, Access, Excel and something that looks like the new ODS Graphics. SPSS users will find it WAY easier to move to Enterprise Guide than they would to SAS. Kind of like Esperanto, it has bits of everything to make it a little familiar to anyone who has experience with just about any fringe of data management and statistical software package. Except, unlike Esperanto, I think it will catch on. (You see, I used the Esperanto reference here rather than some breeding analogy so that no one could feel harassed. Except for maybe celibate people who speak Esperanto, but AFAIK they are not a protected class.)

Apr

24

Here is how the Wald statistic works: You divide the maximum likelihood coefficient estimate by its standard error and square the result.

If you wanted to be really specific about it, what you are dividing is the difference between the obtained coefficient estimate and your hypothesized estimate. I would say, though, that 99% of the time your hypothesis you are testing is zero, that is, that the independent variable has zero effect on the outcome variable. Since the coefficient estimate minus zero is the coefficient estimate, it is actually simpler, although somewhat less accurate, to state it the way that I just did.

In my experience, people who use discriminant function analysis and logistic regression usually differ in their intent. Discriminant function analysis attempts to sort people into two (or more) groups. Logistic regression predicts the probability of an individual being in a specific group.

People who use discriminant function analysis are often interested in predicting, for example, who will drop dead of a heart attack and who won’t. If they find that 80% of those who drop dead can be predicted correctly, and 77% of those who don’t can also be predicted correctly using a combination of education, the Selye Stress Scale and how many times a year the patient eats liver with onions, then they are happy. (Topic for future research - why would anyone eat liver? It tastes totally gross.)
kenny-and-edward
People who use logistic regression are often almost as interested in the relative effects of the predictors as they are the overall model. So, they are happy to know that the Pseudo-R is .35 but they are at least as interested in knowing that the coefficient for Stress is positive and substantially higher than education, while the coefficient for liver (no matter how gross it may taste) is non-significant.

From a statistical standpoint, the major difference between discriminant function analysis and logistic regression is that discriminant function analysis makes a lot of assumptions about the distribution of the independent (i.e., predictor) variables, specifically that these are normally distributed and linearly related to the dependent variable. Logistic regression does not make these assumptions.

So, for the person on SAS community who said that for the next Los Angeles Basin SAS Users Group they would like a discussion of non-parametrics so easy a hamster could understand it (BYOH - bring your own hamster) - this was the best I could do on a Friday afternoon.

And yes, I do know that is not a hamster, but all I had hanging around was a guinea pig named Edward G. Robinson and a spare cockatiel.

Apr

20

I will be finishing reading thousands of pages of grants and spend a few days on grant reviews. A grant I have been working on is almost done. The semester is almost over. I have two articles I submitted to journals under review. So… the question is, what’s next?

I thought about trying to make the deadline for the Western Users of SAS Software conference, but there was just no time. Besides, I have done so many dozens of conference papers, like most people my age, I don’t even list them all on my resume, I just pick a dozen or so sample topics each of which I have probably done five or ten times.

Here is what I am thinking about:
1. Writing a final article on on-line education for people with disabilities on American Indian reservations. This was one of the craziest ideas ever, including individuals with mental retardation and reading disabilities. How do you have a web-based course with people who can’t read, for crying out loud? And no, we did not use videos. When the results came in, we were jumping up and down excited. The data collection was completed over a year ago and I still haven’t written this up. Hey, I did two other articles taught classes, reviewed grants. etc.

2. Writing an article or two on the ten years of data on training teachers of English language learners. This includes some really interesting qualitative data on what makes the best teachers. There is also the standard stuff on GPA, test scores. The main question is - what characteristics are shared by those teachers who are the best of the best, the type we remember 20 years later?

3. Writing up data on an after-school tutoring program for hundreds of kids, which at first glance seemed to have failed but I think it actually sort of succeeded. The data were a total mess when I received them, but what I THINK happened was that many of the kids went to tutoring only rarely and those who did go to at least X hours showed improvement. The most interesting question here is to find X.

4. Analyzing qualitative data from interviews of 30 Native American parents of children with disabilities about how they first found out about their child’s diagnosis, the experiences they had with the school personnel and other professionals.

5. Doing something completely different and working on a design I am interested in right now using a combination of social network analysis and proportional hazards models to predict the movement from casual use through abuse to compulsion for youth using alcohol and other drugs.

6. Writing a book on SAS Enterprise Guide as a tool for researchers.

Because I am clearly all over the map here and I have a lot of data that is not being used, I think what I might do is write the book and use each of 1-5 as an example problem. That way, I will have the first draft of part of each article written along with the book. It will also show how you can apply EG to lots of different research problems.

This undoubtedly makes me sound as if my research interests are all over the map, and they are. This doesn’t even include the evaluation reports I am being paid to do. Still, reading these grants, I recognize the names of some of the same people who have been doing the same type of research for the past 15 or 20 years. Some people might call it having a passion for the topic. I call it boring. I don’t care if I was on the French Riviera studying the impact of cocaine on beach-side sexual behavior of porn stars and my covariate was the quality of champagne sipped by the researcher while watching. I’d still be bored with it way before 15 years.

Disclaimer: I don’t know if porn stars actually vacation in the French Riviera, so if you go there and are disappointed, don’t blame me.


Blogroll

WP Themes