Collective Social Behavior in Health-Related Problems

The recent availability of large-scale population data from web searches and social media now allows us to study collective social behavior on a global scale. Using various sources of large-scale social data, such as electronic health records, social media, web searches, public forum, and mobile application data, we are working to understand the causes and solutions for various health-related problems. These range from understanding the human patterns of reproduction and interest in sex at a global scale, to uncovering adverse drug reactions.

Much recent research aims to identify evidence for Drug-Drug Interactions (DDI) and Adverse Drug reactions (ADR) from the biomedical scientific literature. In addition to this "Bibliome", the universe of social media provides a very promising source of large-scale data that can help identify DDI and ADR in ways that have not been hitherto possible.

Given the large number of users, analysis of social media data may be useful to identify under-reported, population-level pathology associated with DDI, thus further contributing to improvements in population health. Moreover, tapping into this data allows us to infer drug interactions with natural products – including cannabis – which constitute an array of DDI very poorly explored by biomedical research thus far.

Our goal is to determine the potential of Instagram for public health monitoring and surveillance for DDI, ADR, and behavioral pathology at large. Most social media analysis focuses on Twitter and Facebook, but Instagram is an increasingly important platform, especially among teens, with unrestricted access of public posts, high availability of posts with geolocation coordinates, and images to supplement textual analysis.

Network of term proximity build out of Instagram timelines that mentioned a drug used to treat depression. Term co-occurence on one week window resolution. Largest connected component and weight >= 0.05 shown. Nodes are colored according to the correlation with Principal Component 4.

Yearly Classification results on the US House of Representatives. Classification based on 3.000 textual features extracted from house floor speeches.

Discourse Polarization in the US Congress

Congressional politics in the United States has become increasingly polarized across the aisle in recent decades. However, based on roll call votes or bill cosponsorship data, common estimates of polarization tell us little about the lawmakers' agendas and values.

We address this issue by studying the U.S. House floor speeches using text mining and machine learning techniques. Our results shows that predicting party affiliation from textual features improves with more recent speeches, suggesting intensification of polarized discourse. Moreover, polarization is more serious in some topics but less remarkable in others.

We also show that building knowledge networks on feature relations shows a preliminary road to the study of policy agendas and values. This findings will facilitate future analyses of the use of framing devices in political communication such as "dog whistles".

Detecting conflict in social unrest using Instagram

Public protests and civil disobedience have been a recurring means to change the political status quo via social activism. After the introduction of mobile communication and the adoption of social media, it has become possible to obtain and measure real-time, large-scale quantitative data about social unrest situations.

Occasionally, social activism can degenerate into unrest and violence. In such conflict situations, protests can transition from peaceful to violent, including riots that damage property and clashes with police. Here we address the question of whether the build-up in tension in such protest activities can be identified and ultimately predicted using social media data. We collected data from the social media platform Instagram related to the 2014 clashes in Ferguson and Hong Kong.

Public Instagram posts that matched our event specific hashtags on the service’s API were collected. Only posts with geo-located within the protest area were kept. Posts were curated and annotated for traces of violence or tension build-up. We divided the geographical area in a 2-dimensional grid of rectangular cells and aggregated the data in 15 minutes intervals. We analyzed this space-time data using the Singular Value Decomposition (SVD). Our goal was to identify the (time- and space-) singular vectors most correlated with tension build-up or the onset of violence.

Our results indicate that it is clearly possible to pinpoint the exact location of the main gatherings solely by calculating cell density. Furthermore, some singular vectors characterize well the dynamics of increased social conflict. In this paper, we describe and visualize the dynamics of social conflict in Ferguson and Hong Kong.

Our work demonstrates that current geo-tagged social media posts can be an accurate source of data to predict tension build-up and ultimately violence in social unrest situations. This method could be useful for journalists, human-rights agencies, and government orgnizations.

(left) Instagram posts over Hong Kong. (right) Instagram posts over Ferguson (MO). Violent posts shown in red.

Funding Project partially funded by

Project Members

Luis Rocha

Johan Bollen

Lang Li

Ian B Wood

Rion Brattig Correia

Selected Project Publications