{"id":1531,"date":"2011-07-14T17:54:51","date_gmt":"2011-07-14T22:54:51","guid":{"rendered":"http:\/\/www.thejuliagroup.com\/blog\/?p=1531"},"modified":"2011-07-14T18:02:23","modified_gmt":"2011-07-14T23:02:23","slug":"sas-and-spss-give-different-results-for-logistic-regression-but-not-really","status":"publish","type":"post","link":"https:\/\/www.thejuliagroup.com\/blog\/sas-and-spss-give-different-results-for-logistic-regression-but-not-really\/","title":{"rendered":"SAS and SPSS Give Different Results for Logistic Regression but not really"},"content":{"rendered":"<p>When people ask me what type of statistical software to use, I run through the advantages and disadvantages, but always conclude,<\/p>\n<p><em>&#8220;Of course, whatever you choose is going to give you the same results. It&#8217;s not as if you&#8217;re going to get a F-value of 67.24 with SAS and one of 2.08 with Stata. Your results are still going to be significant, or not, the explained variance is going to be the same.&#8221;<\/em><\/p>\n<p>There are actually a few cases where you will get different results and last week I ran across one of them.<\/p>\n<blockquote><p><strong>A semi-true story<\/strong><\/p>\n<p>While I was under the influence of alcohol\/ drugs that caused me to hallucinate about having spare time during the current decade, I agreed to give a workshop on categorical data analysis at the <a href=\"http:\/\/www.wuss.org\/why.html\">WUSS conference this fall<\/a> . After I sobered up (don&#8217;t you know, all of my saddest stories begin this way), I realized it would be a heck of a lot easier to use data I already had lying around than go to any actual, you know, effort.<\/p><\/blockquote>\n<p>I had run a logistic regression with SPSS with the dependent variable of marriage (0 = no, 1 = yes) and independent variable of career choice (computer science or French literature ). There were no problems with missing data, sample size, quasi-complete separation, because like all data that has no quality issues, I had just completely made it up. I thought I would just re-use the same dataset for my SAS class.<\/p>\n<p>So, here we have the SPSS syntax<\/p>\n<p><code>LOGISTIC REGRESSION VARIABLES Married<\/code><\/p>\n<p><code>\/METHOD=ENTER Career<\/code><\/p>\n<p><code>\/CONTRAST (Career)=Indicator<\/code><\/p>\n<p><code>\/CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20) CUT(0.5).<\/code><\/p>\n<p><a href=\"http:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2010\/12\/logistic.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-892\" title=\"logistic\" src=\"http:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2010\/12\/logistic.jpg\" alt=\"\" width=\"561\" height=\"115\" srcset=\"https:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2010\/12\/logistic.jpg 561w, https:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2010\/12\/logistic-300x61.jpg 300w\" sizes=\"auto, (max-width: 561px) 100vw, 561px\" \/><\/a><a href=\"http:\/\/www.thejuliagroup.com\/blog\/?p=891\">As I went on at great boring length in another post<\/a>, if you take e to the parameter estimate B, (\u00a0 Exp(B) in other words) you get the odds ratio for computer scientists getting married versus French literature majors, which are 11 to 1.<\/p>\n<p>Also, I don&#8217;t show it here but you can just take my word for it,<\/p>\n<p><strong>the Cox &amp; Snell R-square was .220 and the Nagelkerke R-square was .306 . <\/strong><\/p>\n<p>If you are familiar with Analysis of Variance and multiple regression, you can think of these as two different approximations of the R-squared and <a href=\"http:\/\/www.ats.ucla.edu\/stat\/mult_pkg\/faq\/general\/psuedo_rsquareds.htm\">read more about pseudo R-squared values on the UCLA Academic Technology Services page. <\/a><\/p>\n<p>So, I run the same analysis with SAS, same data set and again, I just accept whatever the default is for the program.<br \/>\n<code>proc logistic data = in.marriage\u00a0\u00a0\u00a0 ;<br \/>\nclass cs\u00a0 ;<br \/>\nmodel married = cs \/\u00a0\u00a0 expb\u00a0 rsquare;<\/code><\/p>\n<p><a href=\"http:\/\/www.thejuliagroup.com\/blog\/logistic1.html\">If you look at the results, <\/a>you see there is an<\/p>\n<p>R-squared value of .220 and something called a Max-rescaled R-squared of .306<\/p>\n<p>Okay, so far so good, but what is this?<\/p>\n<p><a href=\"http:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2011\/07\/saslogdefault.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1532\" title=\"Output From SAS Logistic regression\" src=\"http:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2011\/07\/saslogdefault.jpg\" alt=\"Output from logistic regression\" width=\"668\" height=\"324\" srcset=\"https:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2011\/07\/saslogdefault.jpg 668w, https:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2011\/07\/saslogdefault-300x145.jpg 300w\" sizes=\"auto, (max-width: 668px) 100vw, 668px\" \/><\/a><\/p>\n<p>For our parameter estimate for both the intercept and our predictor variable we get completely different values,\u00a0 and, in fact, the relationship with career choice is\u00a0 NEGATIVE but the Wald chi-square and significance level for the independent variable, is exactly the same. (This is what we care about most.)<\/p>\n<p>The odds ratio is different, but wait, isn&#8217;t this just the inverse? That is .091 is 1 \/11\u00a0 so SAS is just saying we have 1:11 odds instead of 11:1<\/p>\n<p>Difference number 1: SAS uses the lower value as the reference group, for example NOT being married.<\/p>\n<p>That&#8217;s easy to fix. I do this:<\/p>\n<p><code>Title \"Logistic - Default Descending\" ;<br \/>\nproc logistic data = in.marriage descending\u00a0\u00a0 ;<br \/>\nclass cs\u00a0 ;<br \/>\nmodel married = cs \/\u00a0\u00a0 expb\u00a0 rsquare;<\/code><\/p>\n<p>This is a little better, The two R-squared values are still the same, the odds ratio is now the same, at least the relationship between the CS variable and marriage is now positive. <a href=\"http:\/\/www.thejuliagroup.com\/blog\/logistic2.html\">You can see the results here<\/a> or the most relevant table is pasted below if you are too lazy to click or you have no arms (in which case, sorry for my insensitivity about that and if you lost your arms in the war, thank you for your service &lt;&#8211; Unlike everything else in this blog, <em>I meant that<\/em>.)<\/p>\n<p><a href=\"http:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2011\/07\/logisticSAS2.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1533\" title=\"logisticSAS2\" src=\"http:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2011\/07\/logisticSAS2.jpg\" alt=\"Output with DESCENDING option on Proc Logistic statement\" width=\"702\" height=\"320\" srcset=\"https:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2011\/07\/logisticSAS2.jpg 702w, https:\/\/www.thejuliagroup.com\/blog\/wp-content\/uploads\/2011\/07\/logisticSAS2-300x136.jpg 300w\" sizes=\"auto, (max-width: 702px) 100vw, 702px\" \/><\/a>BUT, the parameter values are still not the same as what you get from SPSS and Exp(B) still does not equal the odds ratio.<\/p>\n<p>Since actual work is calling me, I will give you the punch line thanks very much to David Schlotzhauer of SAS,<\/p>\n<blockquote><p>&#8220;If the predictor variable in question is specified in the CLASS statement with no options, then the odds ratio estimate is not computed by simply exponentiating the parameter estimate as discussed in this usage note:<\/p>\n<p><a href=\"http:\/\/support.sas.com\/kb\/23087\">http:\/\/support.sas.com\/kb\/23087<\/a><\/p>\n<p>If you use the PARAM=REF option in the CLASS statement to use reference parameterization rather than the default effects parameterization, then the odds ratio estimate is obtained by exponentiating the parameter estimate.\u00a0 For either parameterization the correct estimates are automatically provided in the Odds Ratio Estimates table produced by PROC LOGISTIC for any variable not involved in interactions.&#8221;<\/p><\/blockquote>\n<p>So, the SAS code to get the exact same results as SPSS is this (notice the PARAM = ref option on the class statement)<\/p>\n<p>Title &#8220;Logistic Param = REF&#8221; ;<br \/>\nproc logistic data = in.marriage descending\u00a0\u00a0 ;<br \/>\nclass cs\/ param = ref ;<br \/>\nmodel married = cs \/\u00a0\u00a0 expb\u00a0 rsquare;<\/p>\n<p><a href=\"http:\/\/www.thejuliagroup.com\/blog\/logistic3.html\">You can see the output here.<\/a><\/p>\n<p>Did you notice that the estimate with the PARAM = REF (the same estimate as SPSS\u00a0 produces by default)\u00a0 is exactly double the estimate you get by default with the DESCENDING option? That can&#8217;t be a coincidence, can it?<\/p>\n<p>If you want to know why, read the section on odds ratios in the SAS\/STAT User Guide section on the LOGISTIC Procedure. You&#8217;ll find your answer at the bottom of page 3,952\u00a0 (&lt;&#8212; I did not make that up. It&#8217;s really on page 3,952 ).<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>When people ask me what type of statistical software to use, I run through the advantages and disadvantages, but always conclude, &#8220;Of course, whatever you choose is going to give you the same results. It&#8217;s not as if you&#8217;re going to get a F-value of 67.24 with SAS and one of 2.08 with Stata. Your&#8230;<\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[9,11],"tags":[],"class_list":["post-1531","post","type-post","status-publish","format-standard","hentry","category-software","category-statistics"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/1531","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/comments?post=1531"}],"version-history":[{"count":7,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/1531\/revisions"}],"predecessor-version":[{"id":1540,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/posts\/1531\/revisions\/1540"}],"wp:attachment":[{"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/media?parent=1531"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/categories?post=1531"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.thejuliagroup.com\/blog\/wp-json\/wp\/v2\/tags?post=1531"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}