Register for an account


Enter your name and email address below.

Your email address is used to log in and will not be shared or sold. Read our privacy policy.


Website access code

Enter your access code into the form field below.

If you are a Zinio, Nook, Kindle, Apple, or Google Play subscriber, you can enter your website access code to gain subscriber access. Your website access code is located in the upper right corner of the Table of Contents page of your digital edition.


Using the General Social Survey

Gene ExpressionBy Razib KhanJuly 8, 2010 11:01 PM


Sign up for our email newsletter for the latest science news

I've mentioned this before, but I thought it would be useful to repeat again. Many of my social science related posts use Berkeley's web interface with the General Social Survey. Regularly people ask me in the comments details as to the variables, or a more explicit elaboration of the methods. First, this is a weblog, not a venue for me to publish scholarly papers. Most of the GSS related posts are meant to be "quick & dirty," and stimulate further exploration by readers. Unfortunately follow ups rarely happen. One can speculate why, but that's how it is. Nevertheless, I thought I would repeat really quickly how to use the GSS in a basic fashion. First, here's the URL: This is the database from 1972 to 2008. You'll meet a screen like this:


The page is cluttered, but basically the right side is where you enter in your row and column variables which you want to cross or compare together. The left side allows you to explore the variables. Search and selected are pretty straightforward, while you can browse the list of variables in the menu to the bottom left. The easiest thing to do is just look at frequencies of X, Y, and Z against particular categories A, B and C (e.g., educational attainments vs. sex). But you can do more, at the top left if you select "analysis" you have more options:


I've been looking at mean values a lot. Sometimes the mean is obvious because the variables are quantitative. But if you're talking about a dichotomous response it is "recoded" numerically (e.g., 0 vs. 1), so you have to keep in mind that the mean is just a representation of the underlying data. There are correlations and regressions too. You can do a lot with the GSS, but the more complicated or detailed you get in your analysis, the less appropriate for a "quick & dirty" they are. I've been shying away from presenting regressions because to do it right you have to be careful, and if you just throw out a bunch of betas people aren't going to replicate your analysis and might put more stock in the model than they should (and it's not hard to massage the betas you get with your variables my just manipulating the set of variables). Here's a quick example of a query:


WORDSUM will output the % in the sample who score 0, 1, 2, etc. out of 10 on the WORDSUM vocabuary test. I wanted to check it against highest education attained, DEGREE. I decided to combine those without high school diplomas, those with high school diplomas, and some college, into one category, and label it "No College." Next I combined those with bachelors and graduate degrees into one category. Then I controlled for males and females, so it will output the row and column variables twice for each control. Finally I constrained the data set to non-hispanic whites who were surveyed after 1999 to the present (2008 in this survey). Here's the outcome for males:


3 Free Articles Left

Want it all? Get unlimited access when you subscribe.


Already a subscriber? Register or Log In

Want unlimited access?

Subscribe today and save 70%


Already a subscriber? Register or Log In