We have been developing a simple friendship mobility model that captures the essence of how a group of friends travels together or move from one location to another. The idea is pretty simple, but the results are powerful.

First, we collected data from location-based social networks, specifically GoWalla. Second, we used a Marko model where checkins represented states and the transitional probabilities of going from one state to another was empirically defined by the training dataset.  Since GoWalla provides not only the mobility traces of individual users through a process known as checking in, but GoWalla also provides the friendship topology and other cool stuff too! Therefore, each user gets his or her own empirical Markov Model, and the complete Markov Model consists of each one. Once we have the complete Markov Model, we use Miller’s coordinate system to convert latitude/longitude into a Cartesian system that preserve distances.

 

 

This is how a set of friends travels together. Implications? Huge. For instance, take the popular Random Waypoint Model that has dominated simulations in the networking research community and replace that with the Friendship Mobility Model. Or study the migration of a population during a natural disaster; e.g, the recent nuclear disaster in Japan. Enough Said.

The paper, “Using Location-Based Social Networks to Simulate Human Mobility for Mobile Networks” is currently under review for NetSciCom 2012, in conjunction with InfoCom. More results will be present and datasets/code will be publicly available and disseminated by an agreement with IRB @ RPI under the protocol #1125 entitled, “Data Infrastructure for Complex Social Networks.”

Of course, we aren’t the only one or the first to study human mobility. Check out the awesome work of these folks. http://www.barabasilab.com/ http://cs.stanford.edu/people/jure/

SCNARC has recently invested time and resources in building data infrastructure for complex social networks (DICS). The mission of DICS is to provide a multi-scale databank that consists of petabytes of information collected in real time ranging from political and social events to natural disasters.  In the first round of investment, a group of graduate students designed the hardware of the DICS cluster to optimize the performance of indexing and crawling the Internet. A prototype consists of 48 cores, 192 TBs of RAM, and 100 TBs of storage. By working closely with IRB, DICS plans to disseminate the data by providing direct access to 385 internal corroborators under the alliance of NS-CTA and the scientific community. More importantly, the data from DICS will be used to support hypotheses and mathematical models of how technology impacts social interaction through the lenses of Computer Scientists, Physicists, and Social Scientists.

 

 

Contributions: W. Babitt, K. Gilvaz, T. Nguyen, B. Szymanski

Construction of Complex Information Networks

An information campaign network is an overlay network formulated by using data from multiple social networks to capture the spread of information.  We propose to construct and visualize a complex information campaign network (ICN) to model how far and wide a particular message, idea, or opinion spread in social and geographic spaces.  Social space refers to the social network view as a graph where actors are nodes and edges are ties. Geographic space is a social space with geographic information of the actors and edges. Our goals are to provide a model that captures the essence of the information campaign network, variables to quantify the impact of propagation, and a visualization tool for researchers, scientists, and the general population to see the events of their interests propagate in real time through mobile computing devices.

Dots = Networks
Silver: All geotagged tweets from Twitter in dataset.
Blue: Tweets on Obama
Purple: Tweets on 9/11
Orange:  Tweets about recent earthquake in New Delhi
Lines :
Green: Positive Sentiments
Red: Negative Sentiments
Yellow: Neutral Sentiments

For each tweet, classify its sentiment by either s in S = {Positive, Negative, Nuetral}
Give a network n in  N =  {Obama, 9/11, Earthquake}
pos(n) = { all the tweets in n s.t the sentiment is pos }
neg(n) = { all the tweets in n s.t the sentiment is neg }
net(n) =  { all the tweets in n s.t the sentiment is neutral }

Connect tweets with similar sentiments using a link with a distinct color assigned to each sentiment.
Thus the network is partitioned into cliques of similar sentiment.

First, the results are quite interesting because the overall sentiment of the network is very consistent with the meaning of the network. For example, 9/11 and the earthquake in Delhi are emotionally distressing events, so it makes sense for the mood of the tweets to be negative or at least neutral. Assuming that the sentiments are randomly and equally distributed for a network, then the 3 networks in the video violate this equally distributed property.

Second, in the information network relating to Obama, 9/11, and Delhi, the existence of one majority sentiment or opinion dominates the contrasting sentiments of the rest. That is, the majority of the population believes in X and a very tiny population beeves in the opposite of X – analogous to the tipping point. There will be an app for this on the iPad and mobile computing devices.

Code and data will also be available soon. Please email if you cannot wait.

Updates

Please stay tune for updates within the next two weeks. Here is a list of the things I’m planning to blog about.

1) A massively distributed system to index the Social Web from hardware infrastructure to parallel algorithmic design. We are talking about 30,000 CPUs and half a petabyte of data.

2) Visualization tools to study massive amount of social data.

3) Datasets, datasets, and more datasets! We are planning to release terabytes of social data in the future.

4) I’m going to be at MIT NetMob 2011. I’m going to mention a few of the ongoing research problems and progress mentioned at NetMob.

If you want, I think it is possible to receive automatic updates from me through the subscription. I promise I won’t spam!

I was cordially invited to speak at the Boston Higher Education Resource Center (HERC) on May 12, 2011. Georgiana Chevry invited me to present my research project on social networks to her high school students. The Passport program is an academic enrichment program for juniors and seniors to “enter and complete college successfully.” I accepted the opportunity because I remember how confused I was when I was in high school.

The challenge for me as a presenter was to make the topic intellectually interesting for them. Therefore, my ultimate objective was to spark their curiosity in science by interrupting the presentation to ask me questions that they are “dying” to know. Luckily, social networking is ubiquitous – every student in the Passport program either has a Facebook account, a Twitter account, or a combination of the two. I know I could spark some interests from the group when I presented a study on using massive amount of Tweets to predict the stock market. I told them that a hedge fund of $120 million dollars was created base on the idea.

Out of nowhere, a student who is planning to major in Spanish and international affairs asked “How?” Instead of me giving out the answer, I decided to go around the group and ask them “How?” One student proposed to take “surveys” of the Tweets.  I said taking surveys is too time consuming and costly. I asked, how long do you think it takes to survey a billion tweets? Another student proposed to develop a program that can do it automatically. A light bulb went up. Now they understand the power of computations.

Then I got into the purpose, methodology, and conclusion of the project. At the end of the presentation, I presented some open problems that I brain stormed with them. For instance, how to infer the ethnicity of a user on Twitter? One student proposed to look at his or her profile picture and take a guess. This was a valid answer, but I asked her if she wanted to do that for 100,000 profiles. One student proposed to use last name, but another student responded with …. “women could get married and change their names.” Now, they were thinking. I didn’t tell them this was the core idea behind using latent dirchlet allocation in probabilistic machine learning.

When it comes to teaching, I truly believe in the feminist pedagogy. That is, the students are the center of the classroom and that they should be able to learn from each other and themselves by critically thinking and asking questions. My job was only to facilitate the discussion.

Follow

Get every new post delivered to your Inbox.