I was at a conference recently when someone asked me what I was passionate about and I answered, “Data”.¬† He seemed very disappointed and wandered off in search of someone more interesting who would give a better answer like curing AIDS or ending world hunger or solving global warming.

I could have added , “and math” but I don’t think that would have improved the situation.

In fact, three issues I care deeply about are poverty, education and inequality, all in equal measure because I think the three are inextricably related and they are all related to – yes, mathematics and data.

If the world ran more efficiently, we could produce more of everything at a lower price, and, other things being equal, people could have a better standard of living on the same income.

If people were better educated, their personal productivity would be higher and their standard of living would be higher, other things being equal. If they were more literate and had a better grasp of data and mathematics, they would question the inequality in the world and ask why it is acceptable in America to give hundreds of millions in tax breaks to people whose companies needed government bailouts to the tune of billions of dollars while cutting benefits of people making $40,000 a year. They would take fewer things on faith and demand proof.

At the same conference, some gave a figure that there were currently 260 open bids on social media services to government. (These aren’t the exact terms/ number; I changed the details because my point is not to call out a specific person.) That figure was live-streaming on the Internet, the statement was made by a person in authority and cited in a couple of other presentations the same day; someone even intended to cite this in a publication on-line.

The only problem is, it was completely wrong, off by a factor of about 100. There were more like two open bids, if any. If you went to that site and searched on “social” and “media” and “government” you would get hits like:

Need temporary file clerk. Must be able to reproduce electronic media like CD-ROMs, have good social skills and obtain clearance to work in government facilities.

How did I know that wasn’t correct? I have some experience with government contracts and that just did not fit with the bids I had seen coming out. So, I went and checked.

This is one reason I am very concerned about the decline of traditional news such as the Los Angeles Times and New York Times. You can call them dead tree media all you want but one thing both have that a lot of bloggers (and don’t even get me started on Facebook and MySpace) don’t is FACT-CHECKERS .

The amount of information on the Internet that is just plain wrong is mind-boggling. Yes, there are plenty of books and radio talk shows as well that are pure fiction masquerading as historical, political, economic and scientific truths.

Why do we accept this? Why don’t we challenge it? Why, in a room full of people with iPads, laptops and smartphones was I the only one who checked this incredibly inflated number? Was it because it just happened to fit with what people wanted to hear?

I don’t know the answer to any of those questions but I do know this. It doesn’t have to be that way. It can start with us.¬† When I started this blog I wrote it just for the hell of it – well, I still do – but then I realized that other people were reading it and might take things seriously if I said that PROC MI was to make up numbers to replace the ones missing (I was kidding and it doesn’t do that. You can read a serious explanation of multiple imputation here).

I now specifically state if I am kidding about something because it is not always obvious to other people. I try to specify when I am stating my opinion, when I am using a hypothetical example to make a point and when I am stating a fact, e.g., this is what the Tukey post hoc test does.

One thing I am not very good about is calling out specific errors, although I do respond and appreciate it when someone else does that to me. In the post above on multiple imputation, a commenter pointed out that it wasn’t a completely random sample. I said he was correct and explained why I did it the way I did for the example.

The more appealing thing to some might be to start running around like a bunch of fifteen-year-olds and calling out every blog and tweet that has a factual error. That’s not a terrible idea and if you actually are fifteen years old or just feel like it, far be it from me to stop you. Since I cannot pass a candy store without going in and buying one of everything, I’m the last person to give lectures on maturity.

Or, you could start with being more careful on your own writing, on what you post, what you tweet. Maybe you could make a major effort to analyze data and publish your results on-line. I’m going with that option because it happens to be more comfortable for me.

Either way, I think we need to quit being so cavalier about,

“Oh yeah, tons of what is on the Internet is bullshit.”

I’m passionate about having data accurate and available for evaluation because I believe that the world would be better off if we had all of the facts, if what we believed to be true actually was.

Much of our social and economic problems, both national and personal, are a result of believing  the wrong data, which is related to not understanding how to evaluate data. (Pity the poor person who went out and invested in social media stocks based on the booming market figures.) This is compounded by the fact that some people have a vested interest in promoting false data.

How do we get from wrong numbers, to poverty, poor education and inequality? There’s a reason the title of this post is Part 1, you know!


