Trying to extract demographic data from twitter


June Po and Renan Escalante joined us for the afternoon of the second day. They were trying to get demographic data from twitter feeds, specifically the r-shief #f29 dataset, initially by looking for keywords that could be linked to race/ethnicity. For example, they began by looking  for “Black,” but found that this approach mostly returned false positives (eg Blackberry). Their next step was to try and look for co-occurence of ‘race’ and ‘Black,’ but that didn’t work either. The third approach they tried was to look for hashtags with terms related to race, but that didn’t produce many results either. In the future they’re thinking about other approaches: for example looking at twitter usernames, looking at RT networks, and so on.