headshot sky

Tauhid Zaman

KDD Career Development Professor in
Communications and Technology
Assistant Professor
Operations Management
MIT Sloan School of Management

Email: zlisto@mit.edu

Twitter, Instagram, Snapchat, github, DraftKings: @zlisto



Research Interests

My research focuses on solving operational problems involving social network data using probabilistic models, network algorithms, and modern statistical methods.  Some of the topics I study in the social networks space include predicting the popularity of content, finding online extremists, and geo-locating users.

My broader interests cover data driven approaches to investing in startup companies, non-traditional choice modeling, algorithmic sports betting, and biometric data.

See below for details on my various research projects, ranked by a mix of recency and how cool I think they are at the moment.




Finding Extremists in Online Social Networks

Extremist groups such as ISIS use social media to recruit, spread propaganda, and incite violence.  How can one monitor social networks to shut down these users?  Our work develops simple machine learning methods to predict if new users will belong to existing extremist groups using their network neighborhoods.  We also develop a network search model based on Polya’s Urns that lets one monitor the network to see if suspended users return with new accounts.  We tested our method on real Twitter accounts of ISIS members and found we can find the returning users much faster than other search methods.This work is done with Lieutenant Colonel Christopher Marks.

[Theory Paper and Application Paper]

(Originally these were 2 papers, then the reviewers didn’t like either of them because one had not enough theory and one had not enough application, so now they are combined into one paper.  But I still like having them in two separate PDFs).

isis timeline network

Picking Winners: A Framework For Venture Capital Investment

We develop a data-driven approach to build portfolios of startup companies.  Our framework involves building portfolios to maximize the probability of at least one company “winning” which means having an IPO or an acquisition.  We call this the “Picking Winners” portfolio.  We develop a model based on first passage times random walks for the companies’ funding round time series.  We train our model on data for 24,000 companies.  We find with our approach we can achieve higher exit rates than many top venture capital firms.

This work is done with David Scott Hunter.
Media mentions
Vice: “MIT Researchers Offer Algorithm for Picking ‘Winning’ Startups

The Wallstreet Journal: “Data-Driven Portfolios Power ‘Home-Run’ Exits in MIT Study“[Paper]


Picking Winners with Integer Programming in Daily Fantasy Sports

We develop an integer programming formulation for picking entries into daily fantasy sports contests on sites like DraftKings. Our “Picking Winners”
portfolio of entries aims to have one entry achieve a huge number of points and win the contest. This is because nearly all the money
goes to the winner. Our algorithm creates lineups with high variance (known as stacking) and then forces the lineups
to not share too many entries (diversify the portfolio). We were able to win multiple contests in hockey and baseball
using our approach.

This work is done with David Scott Hunter.
The Julia code to implement our algorithm:




dkhockey wins.jpg

Predicting Performance Using Biometric Data

We show that you can predict who will choke under pressure using biometric data measured in low stress environments.  We conduct an experiment where subjects perform arithmetic questions for money.  They answer the questions in low stress and high stress rounds.  High stress is induced by time constraints and higher monetary reward.  We measure their electrodermal activity (EDA), which is a measure of how much they sweat.  EDA is a known physiological indicator of mental arousal.  Basically, if your mind is aroused, you sweat.  The arousal can be due to stress or excitement.  We find that subjects whose EDA decreases in the low stress round do poorly in the high stress round.  Unlike other studies in this area, ours is truly predictive, in that we forecast a future outcome using historical biometric data.  Our results suggest the potential of biometric data for job training and hiring processes.

This work is done with Carter Mundell and Juan Pablo Vielma.

Media mentions

The Wallstreet Journal: “Does the Amount You Sweat Predict Your Job Performance?



Choice-Time Models

What can how fast you swipe on a face in Tinder tell us about how much you like (or dislike) that face?  Turns out quite a bit.  We develop a “choice-time-engagement” model that uses choice and time data to infer how much you like something, and how engaged you are with the decision you make.  Think about apps like Tinder.  A lot of times, people aren’t engaged and swipe blindly.  It would be nice to measure this engagement, and maybe count their swipes less when trying to figure out the quality of users on the site.  Our choice-time-engagement model can denoise online survey data using choice and time data to get better estimates of true user preferences.  This is useful for not only dating apps, but nearly any app where people vote on stuff.To demonstrate that our model works, we tested it on some online surveys.  One is a Mathematicians survey, where the subjects vote on their preferred mathematicians in head to head comparisons.  The resulting response times and preferences are shown in the figure on the left.  As can be seen, Newton vs. Pascal is a fast decision, with Newton winning most of the time.  On the other hand, Newton vs Gauss takes 4.3 sec to decide, and its nearly 50/50.

This is joint work with Zhengli Wang.


math poll

Building a set of Location Specific Social Media Users

We present an algorithm to build a set of social media users starting from a small seed set.  Our method uses an expand-classify approach where we classify the set of users as in the target location or not, and then expands the set by collecting the neighbors of the people in the target location.  Our classification algorithm uses a factor graph model and we show that classification can be easily done by solving a min-cut problem.  We find that we can collect several thousand users from Twitter in a few hours.  Testing on geo-tagged content, we find that our method is fairly accurate.

This work is done with Lieutenant Colonel Christopher Marks.



Predicting the Popularity of Tweets

We developed a probabilistic model for the spread of an individual tweet in Twitter. By observing the times of the retweets of a tweet, we are able to predict the total number of retweets the tweet will receive.
This is joint work with Emily Fox and Eric Bradlow.

The data for this work can be found on our Data page.

You can try a demo of our predictor called Twouija at twouija.com.



The Superstar Model

We present the Superstar Model for a growing network.
This model is motivated by the observation that in real social networks, many users
have degree on the order of the network size (think of Justin Bieber). Traditional
network growth models like preferential attachment can’t capture this behavior. Our
model is designed to model these “superstars”. We show how you can predict the degree
distribution of the network using only the degree of the superstar.

The picture on the right is an example superstar network. This network is all retweets
about the 2010 BET Awards. You can see Justin Bieber and Lady Gaga are  superstars here.
This is joint work with  Shankar Bhamidi and J. Michael Steele .

[Paper ]


Finding Rumor Sources

Imagine a rumor spreads in a network and all we know is the source. Then can we find the source using only the network structure? It turns out that we can and there is a network centrality we created called rumor centrality which does this. For regular tree networks under a certain rumor spreading model it is an exact maximum likelihood estimator for the source. We have proven that it performs well on essentially any sparse network, showing the rumor centrality is a universally good source estimator. Rumor centrality is also a rather interesting mathematical object, having connections with linear extensions of partial orders, random walks, Markov chains, and Nash equilibria of certain network games.Related publications can be found here.This work was done in collaboration with Devavrat Shah.The figure on the right shows how rumor centrality finds the source. A simulated rumor was spread on this network, and the each node’s normalized rumor centrality was calculated, with red=1 and blue = 0. As can be seen, rumor centrality narrows down the set of likely rumor sources. In this figure, the true source is the red node in the center.

This work is done with  Devavrat Shah.

[Paper 1 and Paper 2]


Learning Community Structure

What defines a community in a social network? Is it the community’s leaders? Or is it the less important followers in a community? We have found that these followers actually provide the identifiable signal for a community in a network. These followers have no friends outside of their community, and in this way provide the community with an identity. Using this observation, we created the leader-follower algorithm (LFA) which finds communities in networks by searching for these followers. The LFA has linear run-time and so can be applied to incredibly large networks. It can find overlapping communities and learns the number of communities naturally from the network structure. We have also proven that the LFA has good performance on a wide class of networks. These strengths give the LFA and advantage over other community detection methods such as spectral clustering or inference based methods. The figure on the right shows the communities found by the LFA in my own Facebook network, with appropriate social labels for each community found.

This is joint work with  Devavrat Shah.

[Paper ]