# Excel statistics functions – simple answers to simple questions

Filed Under Software, statistics | 2 Comments

I have colleagues who hate Excel with a passion. Why, they demand to know, would ANYONE use Excel for statistics when there are so many options that are so much better? Actually, I don’t find the Excel add-on for statistics that terrible, but that isn’t even the topic of this post.

I use Excel because sometimes:

1. The data sent by the client is in Excel
2. I can use Excel to answer the question in less time than it takes me to open another application.

Here is an example from today, the client needed to know for a few categories the average weekly income. They also were concerned that the employees doing data entry might have, in some cases,  inadvertently entered weekly or monthly income instead of hourly. Relevant fact – the organization did not have any CEOs making thousands of dollars per hour.

First, find the averages:

=AVERAGE(J2:J139)*40

This takes the average of all cells in that column from the second to the 139th and then multiplies that average hourly wage by 40. Click on the corner of that cell and drag across to get the average for each of the columns.

Second, find the standard deviation

=STDEV(J2:J139)

Click on the corner of that cell and drag across to get the standard deviation for each of the columns.

Now, if your standard deviation is something like \$2 or \$4 per hour, you’re fine. If it is \$43 per hour, then someone entered the weekly salary for that column. For the one column where that was the case, I sorted it and, of course, was immediately able to spot the person with the incorrect value.

The whole process took me about a minute to give them the means for the different categories and say, “Oh, by the way, record 47 was incorrect, I fixed it.”

Obviously, no one sends me a data set just to get the means for a few columns and this was just one of 60 different questions they needed answered. The objection to Excel I have heard is that is all some people know and so they use it for everything – “When your only tool is a hammer, every problem looks like a nail.”

That may be, but sometimes, you really DO just need to pull out a nail.

And that realization many years ago is how I overcame my prejudice against Excel.

# It must be a new meaning of the word “qualified”

Filed Under Dr. De Mars General Life Ramblings | 7 Comments

“Arthur: If I asked you where the hell we were, would I regret it?
Ford: We’re safe.
Arthur: Oh good.
Ford: We’re in a small galley cabin in one of the spaceships of the Vogon Constructor Fleet.
Arthur: Ah, this is obviously some strange use of the word safe that I wasn’t previously aware of.”

This is one of my favorite parts of one of my favorite books, Hitchhiker’s Guide to the Galaxy. It states beautifully a feeling I often have when someone is using words in English, which is a language I know well, and yet when I interpret each of those words in the usual way,  I came to a conclusion that makes no sense at all.

For example, I have been repeatedly told that all of the innovations in tech fields happen with young people. I even read an article recently in which an IT manager from  a company in India was quoted as saying that they bring in people at age 20 and after 15 years, by age 35, they are of less value than the new 20-year-olds coming in. This caused me to ask myself,

What the fuck? In India, does “people” actually mean “cars”?

Because I can understand that if you got a 20-year-old car and used it for 15 years then it would probably be ready for the junk heap. (Note: I am by no means suggesting that saying stupid things is limited to managers in India. That just happened to be the article I was reading. I am fairly certain that saying stupid things and pretending they are correct is an international phenomena.)

Now, as Wendy said in Peter Pan, I am ever so much more than twenty. In looking back at what I knew when I was twenty, and at the students not much over twenty who I get the fun of interacting with on a regular basis, I am pretty darn certain that I, and the overwhelming majority of other technical people I know over 35 are one hell of  a lot more qualified. All of us have worked with multiple operating systems and programmed in multiple languages, allowing us to see possibilities beyond the one language our young friends might know relatively well. All of us are very, very good in at least one aspect of a field, having 20+ years of experience in programming in objective C, statistical analysis, etc.

At age 19, I graduated from Washington University in St. Louis with my BSBA, so I think it is fair to say that I was relatively intelligent. I knew a bit of Fortran, a bit of Basic, had a couple of courses in Calculus and statistics, a bunch of economics courses, a smattering of finance and other good stuff. What I know now dwarfs what I knew then. Of course, computers have changed dramatically as well. Back then, we started out with punched cards, which we fed through card readers. By the end of my undergraduate career we had progressed to dumb terminals. I worked with Vax mini-computers and IBM mainframes for years before DOS came out, and the Apple II. I won’t bore you with what a long strange trip it’s been.

My point, and I assure you that I really do have one, is that everything I have learned in educational psychology suggests that people learn MORE when they have a context within which to make it meaningful. This is why it is always easier to teach students multiple regression than to teach them the idea of a simple linear regression, because you can relate to and extrapolate from what they learned previously.

So why, when it comes to programming or developing new technology should everything that had been documented about how people learn go out the window? Now it is BETTER if you don’t have 20 or 30 years of related technical experience to connect to new knowledge?

Perhaps I need someone under 35 to explain to me how this works?

# Statistics Education and the Common Core

Filed Under statistics | 2 Comments

Most people probably have been thinking about Christmas preparations and not so much the Common Core Standards. I’ve actually been thinking about these standards a lot lately. The more I think about them, the less comfortable I feel.

Let me explain, first of all, that the common core standards are an effort to get states to agree that they will have the same things they are trying to teach at each grade. They also start with a good idea, addressing the common complaint that U.S. mathematics education K-12 is a “mile wide and an inch deep”. There are so many different topics kids are supposed to learn, they touch on each one before skipping off to learn about calendars or measurement of volume.

The problem is that statistics doesn’t come up AT ALL until the sixth grade. Now, prior to this, in some states, at least, kids in elementary school would learn some basic ideas of probability – like to differentiate between outcomes that are certain, probable, unlikely and impossible. I think most kids could understand this concept before the seventh grade – which is the first mention of probability.

I suspect if I asked my four-year-old granddaughter whether she thought she got any presents in red wrapping paper, and showed her a picture of this tree, she would guess that she did, which shows an intuitive sense of probability. If I asked her if she thought she got any presents in yellow wrapping paper, I’m pretty sure she would get a puzzled look and tell me, “No.”

I’m not suggesting one, non-random, non-representative child is the basis for national standards. I AM agreeing completely with the American Statistical Association post that stated,

“Instead of the K–12 standards document clarifying and providing a pathway to the statistics standards in the college and career readiness document, much of the statistics content that should be in elementary school and middle school has been pushed to high school.”

As someone who has been teaching statistics since 1985, it is hard for me to accept that knowledge is going to emerge full-blown, like Athena from the head of Zeus.

Now maybe people who came up with these standards went along the Piagetian route and said that children were concrete thinkers and they weren’t going to be able to understand abstract concepts until they were teenagers.

Maybe.

The curriculum seems disjointed to me. Kids learn about bar graphs in second and third grades and then pick it up again in sixth grade with a discussion of distributions. I understand the argument that the old standards often had kids doing the same thing year after year. I understand the desire to have children learn fewer things and learn them well, to attain “deep understanding”. What is still troubling me, though, is the thought that if they are going to get that deep understanding, then I surely hope they are going to have a LOT of time spent on statistics in sixth and seventh grades to present those topics that students have never seen before and don’t see again for years. Probability is covered in seventh grade and then eighth grade picks up bivariate relationships with no discussion of probability.

To be fair, I don’t recall hearing about the Central Limit Theorem until I was in college. Yes, my high school days pre-dated AP Statistics by a good two decades. I took Calculus, Analytic Geometry and Matrix Algebra in high school because those were the courses offered (it was small school). I don’t know that I learned them all that well, but I think that had a whole lot more to do with the fact that I was more interested in skipping school and hanging out at the local fast food joint after doing various illegal things in the parking lot than inappropriateness in the standards or teacher quality. *

The standards are not all bad. I do like very much, for example, the concept of integrating what students learn in mathematics with other subjects, like in this example:

Use the equation of a linear model to solve problems in the context of bivariate measurement data, interpreting the slope and intercept. For example, in a linear model for a biology experiment, interpret a slope of 1.5 cm/hr as meaning that an additional hour of sunlight each day is associated with an additional 1.5 cm in mature plant height.

You can take a gander at the common core standards here. Tell me what you think.

* True story: My math teacher was a conscientious objector to the Vietnam War and teaching in my urban high school for “troubled youth” was his alternative service. I’m not sure there were not days he would rather have been in the jungle, but that was our fault, not his.

# Parenting: Nobody really knows what they’re doing

Filed Under Dr. De Mars General Life Ramblings | 5 Comments

These days, I cannot turn around without seeing a billboard, poster or article with my third daughter’s name on it. She was the first woman signed by the UFC, and will be the first woman to headline a pay per view event in mixed martial arts.

Interestingly, earlier this year, I wrote a post with the tongue-in-cheek title “Why American Mothers are Superior”, in response to an article in the Wall Street Journal about why Chinese mothers are superior, profiling Amy Chua and her book that lectured American parents on why their children are so inferior and they are not getting into Harvard because they don’t have that Chinese mothering, so there!

I got a lot of haters, both here and on Tech Crunch, where the post was also featured. Many of them asked who did I think I was, after all Dr. Chua’s daughters are much more accomplished, what did mine ever do? Besides graduating from NYU at 20, winning emerging journalist of the year, teaching at Tufts, graduating from USC, getting a scholarship to a top prep school, winning a world title – not much.

Now that the four darling daughters are all doing well, everyone is emailing me and asking for my advice. I even have a book of my own coming out in 2013. Never fear, it’s not on mothering. It’s not even on statistics. It’s on matwork for judo and mixed martial arts.

As darling daughter #3 sometimes goes around the house singing,  “How you like me now?”

So, what is different? Seriously, if my oldest daughter wins the Pulitzer next year, does that make me qualified to tell everyone else how to run their life and if their kid doesn’t get into Harvard or win the Olympics they suck as a parent?

The truth is, nobody really knows for sure if they are doing the right thing. My daughters often tease me because I always wear a medal with St. Jude, the patron saint of desperate causes. Every parent feels desperate sometimes, I think.

With the economy tightening, more and more people see getting into a “good” college as the ticket to a “good” life. I’m very empathetic. The Spoiled One is attending boarding school because we thought it was a great opportunity for her, in no small part because  the academics are stellar. When she made First Honors the first quarter (3.6 – 3.99 GPA), I wanted to know why she did not make a 4.0.

Unlike her roommate, who is, coincidentally, from China, Spoiled-a-roo studies less and plays more. On Tuesday, I watched her and a couple dozen of her friends play soccer for two hours. No referees, no adults on the field except two coaches who were there to chaperone. The kids switched sides when they felt the teams were uneven, screamed and cheered themselves when they made goals. I watched for a while but I was told, “Mom, parents don’t watch scrimmages. We’re PLAYING.”

I made her bring her books home, and I have been speaking to her in Spanish every day because I think Spanish is one of her poorer subjects (as in she might possibly get a B+ in it), and I am NOT happy about that. On the other hand, she went to the Posada at church last night and tonight she went to Sky Zone with her friends and jumped around on a trampoline for an hour.

Was that right? Should she have been practicing soccer drills or studying geometry? God knows she has that home situation that is supposed to be what kids need to succeed in mathematics and science- a female role model in a technical career, two parents with graduate degrees ready to help her with any homework problem, a dad who has been buying her electronic kits and science books since she was in preschool.

If she does not get all A’s this semester, beginning of her freshman year, then maybe she won’t get into Harvard or Stanford. I realize that I don’t care. If it means my child does not have a childhood, does not get to play, then maybe that’s not the place for her to be. Or maybe I am ruining her chances to be the first female president.

I think there are some near guarantees of how to NOT raise a child – that is, don’t beat your children, don’t beat your spouse, don’t be an alcoholic, don’t be homeless. There are some obvious things to do – read to them, hug them, tell them you love them. Beyond that, there is not yet the killer prediction equation that predicts success with 100% accuracy.

In large part, I believe, that is because the definition of success varies.

As one of my favorite poets, e.e. cummings said

“It takes courage to grow up and turn out to be who you really are.”

# I Purely Love Open Access Journals

Filed Under Open data | 8 Comments

In my copious spare time, of which I have none, I teach in the doctoral program at a nearby university.  They want me to use the library and keep up on research, both because it looks nice in the alumni newsletter and also so that when students ask me questions about current technologies or findings, I don’t shrug and say, “Your guess is as good as mine.”

That sort of thing makes students wonder whether  getting a PhD is really worth going into debt for the remainder of their lives.

So, the university kindly pays money to a whole bunch of different publishers just so usually ungrateful people like me can engage in such use.

The authors of the research would also like people like me to use their research. They’d like to be cited, because that helps make them look good to funding agencies and tenure review committees. They’d like to think that their Uncle Bob was wrong, that they are not wasting their lives studying something as practical as how many angels can dance on the head of a pin, and that people who are actually teaching school or designing products will use their work to make the world a better place.

What’s the problem? The problem is that between the authors of the research, who probably did not get paid, and the university library, which paid for access, there are a number of barriers thrown up by publishers. Here is what happened yesterday:

2. Go to the library web page and search ejournals for the articles I need
3. Find article, click link to go to year
4. Click link to go to issue
5. Click article
6. Get taken to publisher page
8. Read half of  article – get called away for meeting
9. Come back to find out I have been logged out due to being away from the computer. Go through steps 1-7 again.
10. After answering a couple of calls from clients and students, find I have been logged out due to inactivity. Go through steps 1-7 again.
11. Finish first article, go to second article, which is published by different publisher
12. Find out that even though I have a university id that I have now logged in with twice (not counting the two previous times I was logged out) I need to register for an account with this publisher and log into that
13. Register, log in, read 20 pages. Eat lunch. Come back to find I have been logged out and now need to log into the campus account, go to the library web page, go back to the article and log into the publisher account AGAIN

There was more, but you get the idea. If I can, I download the resource on my computer but often the number of pages I can download is limited. The crazy thing is that all of this is required from someone who has a paid access to the articles.

DIRECTORY OF OPEN ACCESS JOURNALS TO THE RESCUE !

I recalled reading a draft of an article my brother had written and when I told him I’d rather blog because then people could at least read it, he mentioned he was publishing it in an open access journal.

My first stop was the Directory of Open Access Journals and now I am in love.

It was amazing. First of all, they had one journal that had lots of articles that were exactly what I wanted. The Journal of Research in Rural Education, if you are wondering. I got the wild impression from this journal’s website that they actually wanted me to be able to read the articles. Here is the unbelievably crazy thing that happened. I was able to search on the terms I was interested in, 150 results were returned, and when I clicked on a link — IT OPENED WITH THE ARTICLE.

When I went to eat dinner and came back, an amazing feat of technological innovation had occurred – THE ARTICLE I HAD BEEN READING WAS STILL THERE! Apparently, unlike the other publishers, the folks at JRRE are not concerned that part of a band of roaming article sneak thieves prowling the rough neighborhood of Ocean Park will break into my office while I am having my jambalaya and read research on mathematics education in rural contexts without paying \$9 per article.

The effect these overly zealous firewalls have had on me personally are a definite preference for anything that is open access. I did request a few articles via interlibrary loan and I’ll probably order a book or two. Given that the university is less than 10 miles away, it’s faster for me to pick up the material in bulk in print than to read it on line – which is just nuts!

There are a few articles and books I wanted because they were very specifically related to the work I am doing. However, for 90% of it, one article on fidelity of implementation measures is as good as another. So, the result will be that the work in the open access journals will get used and cited and the rest will not. I suspect we are seeing the beginning of a trend here.

# Can online learning make you more productive?

I’ve always been a bit skeptical of online education. I think I’m a good instructor. I know my subject extremely well, put a lot of time into preparing lectures, class activities and assignments. Having done some online classes, I have found it harder to gauge if students are confused or bored. Those hand-raises on goto meeting just aren’t enough for me.

Two things started to change my view. The first is using videos to teach my statistics students who to use SAS Enterprise Guide and SAS Web Editor. We no longer have the luxury of six hours for a statistics class – a three hour lecture followed by a three hour lab. So, I have been making videos. A number of students have reported they found those helpful for reasons ranging from having English as a second language, to being able to stop and rewind the video to hear something over.

The second is personal. I saw this awesome TED talk a while back about on trying something new for 30 days. Last month, I decided to try blogging every day for 30 days. I do two blogs, this one and one on judo and just matwork and grappling in general.  I think I managed to blog all but one day when work was just to hectic.  This month, I decided to try something else and I was torn between exercising every day and learning more about javascript/ jquery every day. I exercise pretty regularly, plus, I really wanted to get better at javascript, so I decided on that.

However, I am really busy at the moment and it has been getting harder and harder to find the time to exercise. I’ve been spending all day sitting at my desk writing proposals, grading papers, writing programs.

The Rocket Scientist has been listening to lots of videos from the World Wide Developers Conference. I thought that was a good idea but I just have a hard time sitting still watching anything. I probably watch a total of five hours of TV a week and half of that while riding the exercise bike.

Then it dawned on me – I could watch youtube videos on the TV in the living room.

So, now every morning I get up and ride the exercise bike for 40 minutes while watching one video after another.

There is another argument in favor of on-line learning. Even if it is not optimal compared to in class learning – and I must admit a lot of the videos move pretty slowly – it allows you to learn any time anywhere (yeah, I know I did not coin that phrase).

More than that – I really have  hard time sitting still and passively learning which has made it difficult for me to just watch videos.

So, now I am literally not sitting still, I’m riding a bike. I’m curious if this method would help students with Attention Deficit Hyperactivity Disorder.

It’s worth a try. Has anyone tried it?

# Should we go back to teaching programming?

Filed Under Dr. De Mars General Life Ramblings | 5 Comments

“I teach statistics to people who don’t want to learn it.”

This is my cocktail party response to people who ask what I do for a living. Even though I usually only teach one course a year, it is a quick way to answer the question and get back to drinking.

Having faced up to the fact that students in education, social science, business – really, any major but statistics – really don’t want to learn statistics, it seemed like years ago when all the point-y click-y interfaces came in, heralded by SPSS, shortly followed by SAS Enterprise Guide, Excel Statistics Add-in, Statistica, Stata and other offerings too numerous to mention, I thought it was a good thing making my students’  lives easier. Now, at least, they did not need to learn programming to get their statistics.

Using both SAS Enterprise Guide and SAS syntax with the web editor this semester, looking at them side by side it has seemed that it is not that much harder to teach programming. Enterprise Guide DOES make it easier to understand what is going on, what is the dependent variable – because it’s labeled.

Today, I read this and, in an article with the obvious title “Teach US kids to write computer code”, and  it really made me think

“Programming a computer is not like being the mechanic of an automobile. We’re not looking at the difference between a mechanic and a driver, but between a driver and a passenger. If you don’t know how to drive the car, you are forever dependent on your driver to take you where you want to go. You’re even dependent on that driver to tell you when a place exists.”

I’ve been teaching since 1985. For seven years of that, I was full-time, tenure track – the five years before and fifteen years after that I taught as an adjunct. In all of that time, social science, education and business majors have not – generally – been that excited about learning to code. I took my first two programming courses as an undergraduate business major, because they were required.  God bless whoever at Washington University in St. Louis back in 1975 who decided that would be a good thing for students to know.

They definitely did not have the concept of students as customers, but as students. Some faculty committee decided that whether students wanted to learn programming or not, they should, because it was good to know.

# Local SAS User Groups, mostly in Tweets

Go to your local users group. If you don’t know if you have a local users group in your area, check the sascommunity.org page that lists bunches of them. There are six in California listed on their site and I heard of two others that started very recently that aren’t listed.

LABSUG is the Los Angles Basin SAS Users Group and it is pretty typical. It only meets once a year, organized by the FABULOUS Kim Le Bouton. If you live around LA, you should go. It is super-cheap at \$35 for early registration, you get to meet about 100 people who are interested in SAS and statistics and the speakers are good.

If you are a hyper-critical type of person, well you can find something to criticize, but don’t sit next to me. Since there is only one session at a time, you may find some sessions too advanced for you and others too basic. I have two suggestions in this case:

1. Try to benefit in some way. Even if it is way too advanced, you can probably glean something. Then, when a year or so down the road you run into that concept or procedure again, it won’t be completely unfamiliar. If it is too basic, if you have forgotten more SAS/ statistics than most people will ever know, there may be something in there you have forgotten. If you are that advanced, you probably present or teach a lot yourself. Personally, I’m always on the look out for good tips, from references to visuals to organization, that help get a point across and keep the audience from falling asleep.
2. Do something else during that talk. Come late or leave early – the agenda is published in advance. If you need to step out and check your text messages, send an email to the office or catch up on work, no one is going to get upset. It’s not middle school. You don’t have to go to every class. We’re all adults and understand everyone has multiple responsibilities.

On the flip side, although not every talk will meet everyone’s interest or need, almost everyone will find at least ONE topic that is useful. It’s something for everybody and the great advantage of local users groups is their accessibility to everyone. You don’t need thousands of dollars in your travel budget.

What you missed, in tweets

(Not only is this an extremely lazy way to do a blog post, but it also accomplishes the main purpose of this blog which is to remind me of stuff I thought and then forgot. For example, the COMPARE statement and looking up what a segment is.)

In reverse chronological order…

Ods HTML gpath=”something” – will save your graphs in the specified directory. nice

I’m thinking of making a bubble chart that looks lie soap bubbles because #maturityIsOverrated

Ods journal style good for graphs that are going to be printed in black and white

With Compare statement with sgscatter you can, for example, have side by side plots of your experimental and control groups

I created each of these plots with just 3 statements – & 1 of the statements was “run” – Lora Delwiche

Lora Delwiche just made everyone in the room a believer in SGPLOT

Renato at LABSug worth knowing the difference between if-then & SELECT statement

Take-away from GTL presentation – you can make any kind of graph you can imagine- whether you should or not is a different issue

If I was doing a talk on graphics I would interleave program statement slides with slides of what this does on the graph

Proc gproject projects data into a Cartesian coordinate space – who knew?

I don’t know what segment does in the maps data set. Must find out

I think if I needed a graph as fancy as some of those in the GTL examples I’d have an artist draw it vs use SAS

I understand the R comment – GTL looks more like “real programming” that typical SAS code. not sure that is good

Interesting population pyramid example at LABsug comparing population distribution of Qatar & US convinced me of use of GTL

Just overheard someone comment that GTL looked like R

Use proc sgrender to put data and template together

GTL = graphics template language to make spiffy graphs A reason to get SAS 9.3

Ods graphics editor – stand alone free install from SAS website? Must check on this

Off to LABSUG I’d call my mom on the drive in but if she hears from me at this hour she’ll wonder who died

If you want to learn about LABSUG, you can find out more on the sascommunity.org site Los Angeles Basin SAS Users Group page

# Semi-programming as a way to simplify life

Filed Under Dr. De Mars General Life Ramblings | 3 Comments

More about the Los Angeles Basin SAS Users Group (LABSUG) later, but I did want to mention one tangential point from the first presentation. It was on the Graphics Template Language (GTL). The first example was pretty cool, looking at the population pyramid by gender and year for the United States, then for Qatar at the same time. This is obviously hundreds of numbers to plot – age by gender for two countries for two years.

You can read how to make this here, in the paper presented at SAS Global Forum this year, Off the Beaten Path.

So, this plot was cool, but there were others that I looked at the plot and thought, “Gee, I’d just have an artist draw that.”

My take-away from that session was that you COULD graph anything anyway with GTL. Whether it was worth the effort or not is another story.

This brings me to “semi-programming” which is a term I just made up. (Not to be confused with semi-infinite programming, which is actually a thing, or quasi-programming, which is what I was going to call it until I found out that was another actual thing.)

Semi-programming is when it is simpler to program half the solution and do the other half some other way.

SEMI-PROGRAMMING EXAMPLE #1

Let’s say your boss wanted you to create a bubble plot of the relative sizes of all the 14 most popular animal species in the Santa Barbara Zoo and have the bubble size be relative to the mean weight of animals of that species. I’m sure you could do something with SAS or some other program with a gazillion programming statements to draw capybaras and parrots and other species on your chart. Or, you could do the bubble plot, find the 14 animal species photos on wikipedia, paste them on to the plot, dragging each photo to be the right size to just fit over that bubble. You could probably do the whole thing with three statements, a few minutes on google and some copying and pasting and be done in half an hour.

Spare me the question,

“What if you have to do it again?”

My initial reaction is, then you quit because your boss is an ass if she regularly asks you to do stuff like that. Seriously though, you could create a bunch with the semi-programming method in the time it would take you to do ONE purely programming.

SEMI-PROGRAMMING EXAMPLE #2

Not convinced? I had another example today. I want to merge my pretest and posttest groups. Because they are very valuable, I don’t want to lose a single subjects. Unfortunately, our subjects are children and due to very strict concerns about confidentiality, we have no personally identifying information. We merge the data sets by username, by grade, by school. In a few cases, the kids did the test twice. It’s online. They accidentally submitted the test, then realized they had skipped a questioned or two, so they opened it again and continued on the test. (There’s a reason we want them to be able to do that.)

So, we have two identical records for that child.

Sometimes, though, there are two usernames because kids at two different schools had similar usernames and one mis-typed it. So, your username might by MightyMan  and mine is MightyMean and I accidentally left out the “e”.  So, it is NOT the same kid twice. One way to see if it is a real duplicate versus an error is to sort the data sets by school, grade and then username and merge them – but what about if the student didn’t enter the name of the school , which a few did not? Or entered the wrong school? (Yes, that happened.)

I thought about all kinds of complicated solutions to this until it occurred to me that there were probably no more than 5 or 6 records that were really a problem. So, here is my semi-programming solution.

1. Write a program that does this:

• If there is no duplicate (it is the only student, username, grade combination) output that to one data set
• If there is a duplicate, output those records to the other dataset

2. Look at the 5 or 6 records that are duplicates. Identify which are the same student twice, in which case, delete the incomplete one. If it is a case of an error, see that MightyMean is really at school number 2, and enter an e in the username.

Since I expect no more than half a dozen of these, it will probably take me less than a minute to fix them manually, after I have my little program to spit out the problem children.

# Logistic regression using SAS On-Demand with SAS Enterprise Guide – a movie and a rant

Filed Under Software, statistics, Technology | 1 Comment

If you have a mad desire to do logistic regression with SAS On-Demand with SAS Enterprise Guide, here is a movie that shows how to do it. It is a .avi file so you may want to just download it and run it on your PC.

Here is why the movie is not all that good — Grrr —  SAS On-Demand does not run on a Mac. Unfortunately, Quicktime does a screen capture video on the Mac version but the Windows version only the professional version does that. I used Debut Video Capture on Windows, which I actually paid for. I made one movie, made a mistake in the middle of it and the guinea pigs were raising a ruckus because they wanted parsley. You could here them squeaking all through it.  So, the second try, when I was doing logistic regression, the sound track was about 15 or 20 seconds ahead of the video! So, as you were listening to the video, you were seeing something different on the screen! That was annoying. So … this third video is a bit sparse.

I also ran tasks before I did the video so I did not have to wait forever for them to run. I ran it on this old, old windows machine we use for testing because I did not want to take the time to re-boot my shiny new 12GB RAM Mac into Windows. That was stupid. It would have been quicker to re-boot the Mac than re-do the movie twice. Also, my Mac has a wired Internet connection so it is much faster all the way around.

Lessons learned today in addition to logistic regression.

1. When using SAS On-Demand, use the fastest computer

2. When using SAS On-Demand, use the fastest Internet connection

3. Get the Windows version of Quicktime to replace Debut Video Capture (there are other reasons  I don’t like it, chief among them being the default format is .avi and if you change it to some other format, it does NOT remember that)

I have had SO much more of a positive experience with the SAS Web Editor – runs on a Mac, faster, no install problem – that I wonder why I ever used SAS Enterprise Guide to begin with. Actually, if you are running it in your office or home with a good Internet connection on a good computer, it’s not too bad. Not only is the lack of programming attractive to many students but from a learning statistics standpoint the fact that it kind of is in your face with the “Dependent variable”, “Classification variable”, “Quantitative variable” distinctions is kind of nice.

Most of all, though, I remembered how clunky SAS Enterprise Guide for the desktop was when it first came out and now I find it very useful, so I am HOPING this will be the direction for SAS On-Demand EG as well. Personally, the single biggest improvement I hope for is that it starts to run on the Mac. The simplest way for that to happen would be if it just ran as a client like the web editor does. Here’s hoping.

Next Page →