Sunday, December 9, 2012

Numbers

In my teenager, some of my friends chose social science rather than science to avoid maths courses in college. The stereotype is that social science, such as history, literature, sociology and political science does not require quantitative skills, but qualitative ones, like writing reports and communicating with people.  However this looks no longer true. Economics, together with its loyal partner - statistics, has dominated social science methodologies. The first time I read a political science paper inundated with regressions, I thought I found the wrong paper, but now I'm so used to mathy papers of this kind. Obviously numbers are widely applied in social science research, like developing index to evaluate the quality of democracy/dictatorship evaluating policy impacts, and using numbers to show the demographic changes in history, etc. Recently I even found a "poem-making" software which analyzes Chinese poems from Tang Dynasty, identifies the most popular words and phrases and reorganizes them into new poems. Some people acclaimed that this sort of software will put an end to social science, which sounds like a paranoid sleep talk by those with little idea of arts and literature.

It's true that in an information era, traditional way of studying social science may not be sufficient enough. Case studies, which used to be widely applied, are now considered to be biased samples; and causalities between two events are less convincing without excluding other factors rigorously. Interviewees can lie, interviewers can be biased, and it looks like in research the only reliable source is data. The development of data-processing software also makes it easier to do research with large data set. Therefore social science scholars and students, no matter how difficult it is to quantify their research objects, are trying to establish a database and use statistical models to reach certain conclusions. I won't say it's wrong - I've spent the last few years learning these skills, but there are several things that should be kept in mind in data work, especially for policy students.

One concern is that data can "lie" too. If you've worked with STATA, you may have noticed that conclusions can be very different when you use different regression function forms, different control variables or whether to cluster/stratify or not. From time to time, we need to use our common sense and logic to choose the one most likely to be true. However if we come across something that we're not familiar with, then how can we decide if we've handled data in the right way? It's quite common that people have different stances on the same issue even if they happen to use the same data base. Moreover, data analysis always requires a few assumptions, based on which our conclusions can be developed. However, because so many variables (either measurable or not) exist in the real world, that sometimes it's very hard to exam whether your assumptions hold or not. Tons of arguments arise in this field, and researchers are still fighting against each other when new variable/evidence emerges.

Another concern is that when numbers are large, we can easily be misled. If you think about 0.001% of the population, you may think of only a few people; but when you are referring to 13,900 people in China, that's not a small group. Number itself is not enough to display the full picture. On contrary, numbers can be cunningly used to hide the facts.

In addition, obsession with numbers is almost as bad as ignoring numbers. Though it's important to see policy impact on large groups, and therefore exam its effectiveness by looking into the joint benefits received by the population; single cases are vital too. If you think about how policy changes such as the abortion of racial segregation in the US, or how big event happens such as the start of WW1, a single case makes all the differences. There are a lot of psychological studies on cases vs. numbers, and case studies tend to impress audiences more. This is not surprising: after reading an article/report, which can you remember, numbers or stories?

People talk a lot about big data these days, and sometimes I can't help wondering how I look like in those companies' eyes - maybe a few dummy variables to identify my race, gender and consumption preferences, etc, and a few logit regressions to find out what coupons can induce a new purchase record from me - simple and straightforward.

No comments:

Post a Comment