# Genetic Algorithm: SAS Global Forum 2015 Keeps on Giving

Do you have a bunch of sites bookmarked with articles you are going to go back and read later? It’s not just me, is it?

One of my (many) favorite things at SAS Global Forum this year was the app. It included a function for emailing links to papers you found interesting. Perhaps the theory is that you would email these links to your friends to rub it in that their employer did not like you well enough. I emailed links to myself to read when I had time. Finally catching up on coding, email and meetings, today, I had a bit of time.

I was reading a paper by Lisa Henley

A Genetic Algorithm for Data Reduction.

It’s a really cool and relatively new concept – from the 1970s – compared to 1900 for the Pearson chi-square, for example.

In brief, here is the idea. You have a large number of independent variables. How do you select the best subset? One way to do it is to let the variables fight it out in a form of natural selection.

Let’s say you have 40 variables. Each “chromosome” will have 40 “alleles” that will randomly be coded as 0 or 1, either included in the equation or not.

You compute the equation with these variables included or not and assess each equation based on a criterion, say, Akaike Information Criterion or the Root Mean Square Error.

You can select the “winning” chromosome/ equation either head to head, whichever has the higher AIC/ RMSE , although there are other methods of determination, like giving those with the higher criterion a higher probability of staying.

You do this repeatedly until you have your winning equation. Okay, this is a bit of a simplification but you should get the general idea. I included the link above so you could check out the paper for yourself.

Then, while I was standing there reading the paper, the ever-brilliant David Pasta walked by and mentioned the name of another paper on use of Genetic Algorithm for Model Selection that was presented at the Western Users of SAS Software conference a couple of years back.

I don’t have any immediate use for GA in the projects I’m working on at this moment. However, I can’t even begin to count the number of techniques I’ve learned over the years that I had no immediate use for and then two weeks later turned out to be exactly what I needed.

Even though I knew the Genetic Algorithm existed, I wasn’t as familiar with its use in model selection.

You’ll never use what you don’t know – which is a really strong argument for learning as much as you can in your field, whatever it might be.

I need to apply genetic algorithm on 768 observations with 8 variables each. I need to apply it several times producing different variables required each time. Can you help me how to do that?