As I mentioned yesterday, banging away at 7 Generation Games has led to less time for blogging and a whole pile of half-written posts shoved into cubbyholes of my brain. So, today, I reached into the random file and  coincidentally came out with a second post on open data …

aque

The question for Day 11 of the 20-day blogging challenge was,

“What is one website that you can’t do without? Tell about your favorite features and how you use it in teaching.”

Well, I’m a big open data fan and I am a big believer in using real data for teaching. I couldn’t limit it to one. Here are four sites that I find super-helpful

The Inter-university Consortium for Political and Social Research has been a favorite of mine for a long time.  From their site,

“An international consortium of more than 700 academic institutions and research organizations, ICPSR provides leadership and training in data access, curation, and methods of analysis for the social science research community.

ICPSR maintains a data archive of more than 500,000 files of research in the social sciences. “

I like ICPSR but it is often a little outdated. Generally, researchers don’t hand over their data to someone else to analyze until they have used it as much as their interest (or funding) allows. On the other hand, it comes with good codebooks and often a bibliography of published research. As such, it’s great for students learning statistics and research methods, particularly in the social sciences.

For newer data, my three favorites are the U.S. census site, CHIS and CDC.

census logocensus.gov data resources section is enough to make you drool when it comes to open data. They have everything from data visualization tools to enormous data files. Whether you are teaching statistics, research methods, economics or political science – it doesn’t matter if you’re teaching middle school or graduate school, you can find resources here.

 

Yes, that’s nice, but what if you are teaching courses in health care – biostatistics, nursing, epidemiology – whatever your flavor of health-related interests, and whether you want your data and statistics in any form from raw data to publication, the Center for Disease Control Data & Statistics section is your answer.

Last only because it is more limited in scope is the California Health Interview Survey site where you can get public use files to download for analysis (my main use) as well as get pre-digested health statistics.

It all makes me look forward to diving back into teaching data mining  this fall.

From the random file — I’ve been super-busy working on our new startup, 7 Generation Games , and Darling Daughter Number Three had to defend her world title again which distracted me a bit, so I have a bunch of half-written posts, I thought I’d just put up at random, for the same reason I do everything else on this blog, the hell of it.

902q798q467453q965pq86-34q9e’w5wi34ytrsghsf.ksfbcmn  - random!

I spend some time playing with other people’s data for a whole lot of reasons – for students to analyze as a learning experience, because I’m interested in a problem addressed by the data, to create presentations for elementary schoolchildren showing what one can learn from statistics.

Here are a few tips that may make your life easier:

Read the user’s guide. Most of all check to see if this is a random sample. If you are just using the data for the purpose of teaching your students who to compute a t-test, then it really doesn’t matter whether it is a completely random sample or not. However, if you are going to be drawing any conclusions based on these results, make sure you know whether the data should be weighted, stratified, or just really not used to generalize to the population at all. If your sample consists of actuaries who are also equestrian competitors, I’m afraid not too much generalization should occur. (Don’t write and tell me about your horse, Beau, and how the two of you are exactly representative of the state of Vermont. You’re not and I don’t care any way.)

Much of the open data I work with is very large data sets and I spend several hours trying to get a feel for the data before I do much with it. If I’m going to use the same data set for a course with a lot of students, I’d like it to have lots of variables, and many of them to be numeric so the students could combine them into scales, do a factor analysis or other quantitative uses and they wouldn’t end up all  using the same few numeric variables. They could have a little individuality in their research question and design.

One way to find number of numeric variables in a data set using SAS.

data testmiss ;
set in._500family ;
array allnums {*} _numeric_ ;
x = dim(allnums) ;
proc means data = testmiss ;
var x ;
run ;

 ++ Equally Random +++

artwork from game

If you buy the beta for Spirit Lake now for $9.99 you’ll get our version 2.0 for free in May. It will be good.  I’ve been working on the newest game, Fish Lake for the last two weeks, but soon I’m going to swap with The Invisible Developer and do nothing but work on Spirit Lake for another few weeks.

 

I read this recently in a powerpoint that came with a research methods textbook

If answering the study question adequately requires the use of elaborate analytic techniques, invite a statistical expert to serve as a collaborator and as a coauthor on the resulting paper.

 

I was so non-plussed that I looked up the word non-plussed in the Merriam-Webster dictionary

a state of bafflement or perplexity  … to be at a loss as to what to say, think, or do

 

Um, this is a research methods COURSE, for graduate students, no less, and your advice to them in doing their research is that if it gets too hard they should find someone else to do it for them? This is not limited to one text or one school, either. At two universities where I have worked, both of which are well-respected and grant doctoral degrees, doctoral students, and post-doctoral students are asked before beginning their research,

Do you have a statistician?

In teaching statistics, I have been asked by doctoral students from multiple disciplines,

Why are we learning this when we are just going  to have a statistician do it for us?

Obviously, research no longer means what I thought it meant. I thought that the process of research was that you formulated a question that interested you, you read the scientific literature on that question, generated a hypothesis, collected data from a sample, analyzed that data, evaluated your results and wrote a conclusion. Now, not only is it acceptable, but encouraged to have someone else analyze your data and tell you what it means. I find that perplexing.

mini computerThis is not how it was when I was in graduate school. Back then, data were analyzed using computer software that ran on mainframe computers, or sometimes mini-computers. A mini-computer was not like an iPad mini. It was taller than me. Some of our data came on tapes which we had to walk across campus and load on to the tape drives ourselves, as we were lowly graduate students and it was assumed we had nothing better to do with our time. Fortunately, I was at the University of California by then and no longer at the University of Minnesota, where crossing campus could require skis. I did some consulting writing code for the data analysis for my fellow students, enough that caused the dean to call me into his office and ask me what I was doing and give me strict instructions, along the lines of

You can write their programs for them, since this is not a computer science Ph.D., but that is all. You are not to comment or assist in any way with research design, writing their data collection instruments, choosing what analysis to do nor interpreting that analysis. When a person receives a Ph.D. from this university it is supposed to mean that they know how to conduct research, not that they know where to find someone to pay to do their research for them.

It seems that the tables have turned quite a bit. Even my least quantitatively oriented classmate back in the 1980s was probably equivalent to the average “statistical consultant” today. That is, they passed at least four graduate level statistics courses that required both a paper and a final exam with questions like, “How is Analysis of Variance related to stepwise discriminant function analysis?” This was true whether your Ph.D. was in education, business or psychology, because it was assumed, for example, that if you were going to place students in special education because they scored two standard deviations below the mean you should have a definite understanding of what a standard deviation was, what a normal distribution was and where two standard deviations fell on that distribution. Furthermore, as a superintendent or other school administrator, it was expected that you could evaluate the research literature (hence being skeptical of stepwise methods of all types).

It seems to me that what is required for a doctoral degree has been significantly watered down.

doctorateThis chart shows the growth in doctorate granting institutions from 1920-99. The trend has continued. When I entered graduate school for my Ph.D. in 1985, there were 337 doctorate-granting institutions in the country. Now there are 418 – a growth of 24% over the past 29 years, on top of what had been, as you can see from the chart, a pretty steep growth rate for the 25 years or so prior to that time.

Who is teaching all of these new doctoral students? Well, in many instances, it is a horde of very part-time adjuncts. I don’t think adjuncts are necessarily poor teachers – in fact, I make it a point to teach at least one course a year myself – but I am aware of doctoral programs that are run with only ONE full-time faculty member. Given the paucity of human resources, it is no surprise that there is no one around to individually mentor the students in their research. Now, we are entering an era where those students who are graduating with very little research experience are themselves teaching doctoral students. It is a case of the very near-sighted leading the blind.

All of this is making me wonder where they are going to find those statisticians and how well-trained they are really going to be. I just finished with what will probably be my last student project for the next few years – no reflection on that student, or the other four students I worked with over the past two years, all of whom were a perfect delight – but my schedule is completely booked through October, 2015. Almost all of the really good statisticians I know are in the same boat.

I don’t have an answer to any of this. I am non-plussed.

drinking for science

There are multiple reasons that I haven’t gotten around to Day 10 of the 20-day blogging challenge. In part, because I have been really busy, and the other part is because I read this topic,

“Share ideas that your classroom uses for brain breaks and/or indoor recess”

and I thought

I got nothin’

Anyone who knows me well can tell you that I am NOT a very fun person. I like to think that I have some good qualities, but playfulness is not among them. Ph.D., world champion, founded/ co-founded a few companies, publishes scientific articles – does this sound like I spend a lot of time playing frisbee in the park? No, I didn’t think so. About the closest I come to this in class is on the first day having everyone introduce themselves and talk about their research interests – which is not really very close, I must admit.

For the last SAS assignment of the Public Health Research Methods course, I decided to make a video and upload it to youtube. For one of the dependent variables, I used how often in a year a person engaged in binge drinking, defined as 4 or more drinks per day. I’ve probably had four drinks in a day a few times in my LIFE so I was surprised to find that the average person  (out of over 40,000), said they did this on average 2.4 times per year.

Today has been a really frustrating day. Yesterday, after a margarita at dinner, I came home and was working on our newest game, Fish Lake, and everything was progressing smoothly. Today, for both The Invisible Developer and I, it has been just beating our heads against the wall. For example, I have this PHP script that ran intermittently today – I have three records written to the database – and all of the rest of the times, it failed with an error. The I.D. has been having similar problems.

I took a break and made a video on how to do simple statistics with SAS to test the hypothesis that I could do a screen recording with Quicktime, write a program using SAS On-Demand in Firefox, record the audio in Garageband and drink Chardonnay all at the same time because Von’s had a half-price sale on wine over $20 a bottle and, well, you know – science.

You can determine if my hypothesis – whatever the hell it was – was supported. Bizarrely, the equals signs do not show up in the video. How weird is that?

It’s been a really productive two weeks in North Dakota, installing our game in schools on two reservations, in tribal schools and public schools. I didn’t write this post to talk about that. Rather, in keeping with some of the really useful posts I’ve read about start-up failures, I wanted to share with you the one thing that didn’t go right this week.

Just spoke to the Chief Marketing Officer for our 7 Generation Games start-up and she told me we did not get accepted to the playco lab accelerator. She felt bad about that since she really does think we are a terrific company, we already have traction with games installed in the schools and paying customers, and the fact that she lives in the Bay area meant it would have been very convenient for her.

The Invisible Developer and I had mixed feelings on this. On the one hand, we think our company is awesome and going to be incredibly successful.

We’re really pleased with the work we do and an accelerator (or anybody) saying they don’t want us gives us kind of the same reaction when somebody calls your baby ugly – how dare you!

On the other hand, we’re just coming back from two weeks away from our cocoon-like home offices by the beach, and The ID has said approximately 2,982 times that he doesn’t like to travel. When I told him that Maria had called and said we were not accepted, he did a reasonably good job of hiding his glee.

I don’t mind travel but I am mindful of the sage advice I received from Jenny Q. Ta of sqeeqee to never give up a half a percent of your company before you have to, and we are not at the point where we need outside money. Although having some validation from an outside group might be nice, it might have helped our marketing to have access to a network within the acceleration, we still have 19 months of SBIR funding, as well as our own funds from The Julia Group.  All of that being said, yes, it does bother me that we didn’t get it, because, I think we’re awesome and I want everyone on earth to share that view. Also, whenever I read these articles about people who cannot find start-ups outside Silicon Valley/ with female/ minority founders and they are supposedly really looking , I think, “Gee, we must really suck because we are all of those things and they don’t want us.”

Then reality sets in. I am a statistician, after all, and not too many people know regression lines better than me. (Yes, you may be a far better statistician than me , but 99.99% of the population has no clue what a residual error is and the fact that I just made that percentage up makes it no less true. Try to parse that statement for a moment.)

Years ago, I was listening to (okay, eavesdropping on) a “top executive”  at a Fortune 500 company who was discussing his next career move, he said,

“It has to be perceived as bigger, better and then I’m still on the path to CEO. If it’s seen as smaller, worse, then I’m fucked.”

Maybe that is true in his career path, but in my experience as a statistician, small business owner and human being, life seems more like a regression equation. Even though you may have a straight line in one direction or another, there are ups and downs. Take for example the regression line I just happened to have laying around with 100 data points

r= .70 Overall, the trend here is very positive – about .70, to be precise. If you looked at either of the two low points shown with arrows, you’d say, holy shit, the trend is really going down, I’m failing. In fact, though, if you compare the initial low point, you can see that each of these new lows is higher than the previous one.

In my life, I have seen far more trends like this one, if you are lucky, and really none where every single point falls on the regression line.

Statistics imitates life. Or maybe it’s the other way around. How about that?

Today I’m getting around to day nine of the 20-day blogging challenge while I wait for The Invisible Developer to get out of the shower where he is curled in a fetal position whining about having to go outside when it is 14 below zero. Actually, he is probably just taking a shower, but lots of whining has taken place this week, let me tell you.

Today’s question is what did I do this week that would I do again in teaching or what would I not do again. I think I’ll answer both. Coincidentally, (or maybe not, since I’ve been working on a course re-design to incorporate SAS programming), both of those things have to do with SAS. Short version, if your data are in a form amenable to SAS, it is a godsend for teaching statistics. If your data are not in a very SAS compatible format, it just blows. If, God forbid, you are limited to using SAS On-demand, as I am this week because I have yet to receive the Windows 8 compatible version from the university, and I am in North Dakota, working on my laptop, well then, your life is about to suck, I am sorry to say.

The thing I would totally do again, if I was teaching an epidemiology course, is PROC STDRATE. I love everything about this procedure. The documentation explains the procedure in very plain language which I did not have to rewrite at all for the students, I just included the overview in my livebinder.

“Two commonly used event frequency measures are rate and risk:

  • A rate is a measure of the frequency with which an event occurs in a defined population in a specified period of time. …

  • A risk is the probability that an event occurs in a specified time period. “

It also includes datasets that can be used as an example, and they are easily typed or copied and pasted into your SAS program. Further, these data are very similar in format to the types of data that students will usually come across. Most important, this is one of the most useful procedures for students beginning to learn epidemiology, providing a lot of statistics in one- population attributable risk, population attributable fraction, standardized morbidity rate and more. It will save loads of time over computing statistics on a calculator to answer homework questions – which I think is just silly, because it is 2014 and we have computers. Also, the syntax is relatively easy.

You can read one example of using STDRATE for crude risks, reference risk and attributable fractions here.

So, that was the good part. What I would never do again, if I had any choice at all, is

a) Use SAS to create maps, or really, analyze in any way, data that was either not already in a SAS dataset or in a very easy to read format, e.g. , no missing data, no variable length variables, and

b) Use the SAS Enterprise Guide version of SAS On-Demand for anything, ever

There are some significant drawbacks of the SAS Web Editor as well but they pale in comparison with the slowness of SAS Enterprise Guide in the on-demand version. While some programs you could maybe get a cup of coffee while waiting for it to run, with the on-demand version of SAS EG you can drive to Starbucks, wait in line, by your coffee, drive back to the office, park, take the elevator to your floor and STILL be there just about when your cross-tabulation had completed. It’s ridiculous, which is sad because if it ran ten times faster it would be a really great tool. It’s terrific on my desktop.

Someone on twitter commented that they hated SAS because it did not play well with open data. Aint that the truth! Now the exception is if you can get  your data in a SAS dataset format. Then it’s wonderful. Well, I was using HIV prevalence data from gapminder.org  – great site, by the way – and it took me an HOUR to get it read by SAS Web Editor. You can only upload csv files or SAS files to the web editor, so I couldn’t use PROC IMPORT to read in the Excel file. The data I had used country name as the ID and that didn’t match with the ID in the SAS map files – it’s a long sad story with the moral that if I had the option of not using SAS for maps I would certainly be looking into that right now and if I never have to use SAS Enterprise Guide again (which only seems to have the US map in the On-demand version anyway) it will be too soon.

Yes, in the end, I did get my world HIV in the end. The computer will not defeat me!

world_map

 

Day eight of the 20-day blogging challenge was to write about a professional read – a book, article or blog post that has had an impact on me. To be truthful, I would have to say that the SAS documentation has had a profound impact on me. SAS documentation is extremely well-written (to be fair, so is SPSS) in contrast to most operating system documentation which is written as if feces-flinging monkeys were somehow given words instead, which they flung onto a page which then became a manual. But I digress – more than usual. It’s not reasonable to suggest to someone reading the entire SAS documentation which is several thousand pages by now. Instead, I’d recommend Jennifer Waller’s paper on arrays and do-loops. This isn’t the paper where I first learned about arrays – that was before pdf files and I have met Jennifer Waller and she was probably barely in elementary school at the time. It’s a good paper though and if you are interested in arrays, you should check it out.

Here is what I did today, why and how. I wanted to score a dataset that had hundreds of student records. I had automatically received the raw score for each student, percent correct and what answer they gave for each multiple choice question. I wanted more than that. I wanted to know for each question whether or not they got it correct so that I could do some item analyses, test reliability and create subtests. This is a reasonable thing for a teacher to want to know – did my students do worse on the regression questions, say, than the ones on probability, or vice-versa?  Do the data back up that the topics I think are the hardest are the ones that my students really score worst on?  Of course, test reliability is something that would be useful to know and most teachers just assume but don’t actually assess. So, that’s what I did and why. Here is how.
filename sample “my-directory/data2013.csv”;
libname mydata “mydirectory” ;
data mydata.data2013 ;
infile sample firstobs = 2 dsd missover ;
input group_type $ idnum $ raw pct_correct qa qb qc q1- q70 ;

** These statements read in the raw data, which was an Excel file I had saved as csv file ;
** The first line was the header and I forgot to delete it so I used FIRSTOBS = 2 ;
*** That way, I started reading at the actual data. ;
*** The dsd specifies comma-delimited data. dlm=”,” would have worked equally well ;
*** Missover instructs it to leave any data missing if there are no values, rather than skipping to the next line ;

Data scored ;
set mydata.data2013 ;
array ans{70} q1- q70 ;
array correct{70} c1 – c70 ;
array scored{70} sc1 – sc70 ;

*** Here I created three arrays. One is the actual responses ;
*** The second array is the correct answer for each item ;
*** The third array is where I will put the scored right or wrong answers ;

if _N_ = 1 then do i = 1 to 70 ;
correct{i} = ans{i} ;
end ;

*** If it is the first record (the answer key) then c1 – c70 will be set to whatever the value for the correct answer is ;

else do i = 1 to 70 ;
if ans{i} = correct{i} then scored{i} = 1 ;
else scored{i} = 0 ;
end ;

**** If it is NOT the first record, then if the answer = the correct answer from the key, it is 1 , otherwise 0 ;

Retain c1 – c70 ;

**** We want to retain the correct answers that were in the key for all of the records in the data set ;
**** Since we never put a new value in c1 – c70, they will stay the correct answers ;

raw_c = sum(of sc1 – sc70) ;
*** This sums the raw score ;

pct_c = raw_c/70 ;
*** This gives a percentage score :

proc means data=scored ;
var sc1-sc10 c1 -c10 ;

*** This is just a spot check. Does the mean for the scored items fall between 0 and 1? Is the minimum 0 and the maximum 1 ;
*** The correct answers should have a standard deviation of 0 because every record should be the same ;
*** Also, the mean should either be 1, 2, 3, 4 or 5 ;

proc corr data = scored ;
var raw_c pct_c raw pct_correct ;

*** Spot check 2. The raw score I calculated, the percent score I calculated ;
*** The original raw score and percent score, all should correlate 1.0 ;

data mydata.scored ;
set scored ;
if idnum ne “KEY” ;
drop c1-c70 q1-q70 ;

*** Here is where I save the data set I am going to analyze. I drop the answer key as a record. I also drop the 70 correct answer fields and the original answers, just keeping the scored items ;
proc corr alpha nocorr data= mydata.scored ;
var sc1 – sc70 ;

*** Here is where I begin my analyses, starting with the Cronbach alpha value for internal consistency reliability ;

I want to point something out here, which is where I think the professional statisticians are maybe distinguished from others. It’s second nature to check and verify. Even though this program should work perfectly – and it did – I threw in reality checks at a couple of different points. Maybe I spelled a variable name wrong, maybe there was a problem with data entry.

One thing I did NOT do was write over that original data. Should I decide I need to look at what the actual answers were, say, I wanted to see if students were selecting chi-square instead of t-test (my hypothetical correct answer), that would alert me to some confusion.

Incidentally, for those who think that all of the time they save grading is taken up by entering individual scores, I would recommend having your students take tests on the computer if you possibly can. I was at a school today where we had a group of fourth graders taking two math tests using Google chrome to access the test and type in answers. They had very little difficulty with it. I wrote the code for one of those tests, but the other was created using survey monkey and it was super easy.

I’d love to include pictures or video of the kids in the computer lab but the school told me it was not allowed )-:

Amazingly, given my current schedule, I have made it to Day 4 of the 20-day blogging challenge. This was the  brain child of Kelly Hines as a way to get herself to blog more regularly. Today’s prompt was :

Share a topic/ idea from class this week. What’s one thing you did with students this week that you will (or will not) do again. Why?

I’m not teaching a course right now but I am revising the curriculum for the biostatistics course. The topics students had the most trouble with was hypothesis testing. Even though all of them had a previous course in statistics, many had it back when they were undergraduates, and let’s be honest, how much do you remember from any class you took five or six years ago?

One thing I would do differently is go back to an idea I had when I very first started teaching statistics. I noticed that some students only had a vague idea what an exponent was. A few times I got asked why I wrote that V thing next to numbers (students who had never heard of a square root). I could go on, but you get my point. Students in a GRADUATE program. It turns out you can get a degree in some fields with very, very, very little mathematics. I was teaching the first course in the statistics sequence and I started each term with a 20-item algebra test on the first day of class. It was not part of the course grade, but I told students if they did not get over 85% they were going to have great difficulty in the course. The questions were things like find  A when A² = 9  or identify the coordinates of a given point on a plot.

Usually, I would have one or two students who scored below 60%. Almost always, those students dropped the class, which was, I think, for the best. HOWEVER, important point coming up here …. I did not tell them they couldn’t pass the class. I told them that it would be very much to their benefit to take a course in algebra and come back the following term. I would show them some of the problems later in the course and emphasize that they would be able to do this much, much easier later on if they went and took another math class first. Most students saw my point, dropped the class and took a prerequisite course. Even though it wasn’t an official university prerequisite, it was prerequisite information. The few students who did not knew what they were getting themselves into and planned to meet with me during office hours every single week, and allocated their time for a LOT of extra studying.

The course I am teaching now is not as basic as that one, but I do think the students could benefit by having some assessment of their understanding of basic concepts. Do you know what a z-score is? A normal curve? Percentile?  Yes, I give quizzes, but I don’t mean exactly that. I mean a test of basic concepts, like what does a probability of .45 mean .

So, that is what I am ruminating on tonight. What are the absolute basics of statistics that you need to comprehend before forging ahead?

 

Today I’m on day two of the 20-day blogging challenge, the brain child of Kelly Hines and a great way to find new, interesting bloggers. The second day prompt was to share an organizational tip from your classroom, one thing that works for you.

The latest tool I’ve been using is livebinders . Remember when you were in college having a binder full of notes, handouts from the professor, maybe even copies of tests to study for the final? Well, livebinders appears to be designed more for clipping websites and including media from the web but personally I am using it to create binders for teaching statistics. I’ve just started with one but I’m sure this will eventually split off into several binders.

I’m always writing notes to myself but I have them everywhere – I used Google notebook until they got rid of that, evernote, I’ve got notepads on my laptop, desktop, iPad, phone and even paper notebooks around the place. I even have a PadsX program The Invisible Developer wrote years ago just for me (yes, he loves me).

Still, I’m thinking livebinders is going to be really useful for me to organize all of these notes into one spot.

Why do I want to do that, you might ask?

Well, statistics is a big field, and I have taught a lot of it, from advanced multivariate statistics to psychometrics to biostatistics and a lot of special topics courses. It seems to me that we often assume students have a solid grasp of certain concepts, such as variance or standardization, when I’m sure many of them do not. As I read books and articles, I’m trying to note what these assumptions are. My next step is to have pages in the binders where students can get greater explanation of, say, what does a confidence interval really mean. Right now, I feel that universities are trying to cut costs by combining information into fewer and fewer courses. We say that students learned Analysis of Variance in a course, but did they really? The basic statistics I took in graduate school consisted of a descriptive statistics class (I tested out of that). It ended with a brief introduction to hypothesis testing and a discussion of t-tests, z-scores, t-tests and correlation. The inferential statistics course reviewed hypothesis testing, t-tests and correlation, then focused on regression and ANOVA. The multivariate statistics course covered techniques like cluster analysis, canonical correlation and discriminant function analysis. Psychometric statistics covered factor analysis and various types of reliability and validity. These four courses were the BASICS, what everyone in graduate school took. (People like me who specialized in applied statistics took a bunch more classes on top of that.) Oh, yes, and each class came with a three-hour computer lab AFTER the three-hour lecture,  to teach you enough programming so you could do the analyses yourself. Now, many textbooks try to include all of this in one course, which is just a joke, and ends up with students concluding that they “are just not very good at math”.

I can’t change the curriculum, but what I at least can do is provide some type of resource where every time a student feels he or she needs to back up and understand some concept, there is an explanation of that something.

I plan to have this done by the time I teach Data Mining in August.

Suggestions for what to include are welcome.

I came across this really interesting post on the 20-Day Blogging Challenge for teachers. I’m not sure how likely I am to be able to finish it in January since it is already the sixth and January is a really busy month for me, but we will see.

The first prompt is “Tell about a favorite book to share or teach.Provide at least one example of a cross-curricular lesson.”

One book that I like and have been reading lately is the IBM SPSS Amos 22 User’s Guide, by James Arbuckle. Unlike most documentation, it isn’t just which statements to use when. It gives a good discussion of structural equation modeling from the very basics. (Here’s a link to a free download of the guide for Amos 21. It’s pretty much the same.) The nice thing about it is, if you have Amos installed, it comes with the data that is used in the examples so you can compare your results to the book.

For no reason, I just decide to see how close the  covariance estimates you get with Amos are to the actual covariance . I ran the correlation procedure in SPSS and requested covariances using one of the Amos example data sets. Then I ran the same analysis in Amos. The estimates were all really close but not identical to the actual values, for example, the covariance of recall1 and recall2 was 2.622 and the estimated covariance was 2.556.

As far as a cross-curricular lesson – I think this might be useful if I had a chance to discuss maximum likelihood methods versus ordinary least squares. I just finished teaching a course in biostatistics and even though we did discuss logistic regression and I had a few students use logistic regression for their analysis projects, we did not have nearly enough time to delve into it in depth. I’m teaching a data mining course in August, but it is going to be using SAS Enterprise Miner, so while the concepts in the book might apply in some instances – he covers a lot of territory –  it won’t be the same software.

As I was reading this book, though, I was thinking about the diversity of students in almost every class that I have ever taught. It would be fun to teach a course in SEM, but I know that some students are still struggling with the concept of variance. So … my decision for the day is to start this week on some short instructional videos that can supplement the limited class time that we have. I think I’m going to start with the very basics – what is variance and what is covariance.

 

Next Page →