Devin has ~7,000 tweets from 2013 - 2018 (6 years). This project analyzes all ~7,000 tweets, removing retweets and replies. The tweets are split into a bag of words where a term frequency analysis is performed to find Devin's top words. The same analysis is performed on 23 other Twitter users, for a total of 50,000 analyzed tweets. We trained a model to predict the user that Devin tweets the most similar to in order to detect whether this can predict related goals, aspirations, interests, etc. We also perform a Digram Markov Chain on Devin's tweets in order to predict future tweets.
We wanted to see if it were possible to detect similar people based on their tweets. We also wanted to see if
it were possible to accurately guess someone's online personality by their IRL personality.
We thought it would be possible to do so, and we hypothesized that Devin would tweet most similarly to Chrissy Teigen and Typical Girl.
We explored different ways to analyze all 50,000 tweets.
Ultimately, we used
unigram, bigram and a combination of both in order to detect similarities in language.
We found that they produced very different results. Our bigram graph only had 8 comparisons total,
suggesting that Devin's tweets had no similarity with the other 15 Twitter users,
which is different than what the unigram graph produced. Our unigram and bigram combination had more comparisons, which was expected.
For extraction, We used Twitter's API (Tweepy) to pull the tweets. After extracting and cleaning the tweets,
we performed K Nearest Neighbors with an optimal k of 9 to find Devin's most similar tweeters.
For the visuals, we used the Google Visualization API to format and graph the data.
To predict some Tweets that Devin may make in the future, we used a Digram Markov Chain from thousands of Devin's Tweets.
Here are some of our favorites.
"of course right when i saw u hitting on a final to pass the class instead of studying ur the most beautiful person 2 me~"
"idiot if you mix it with water. i tried calling an uber off a yacht"
"yo i relate to how k selected organisms are iteroparous really it's fine you can be a blues fan today"
"most people have dreams of them flying or travelling, etc. i dream about the person who set my phone alarm to 4:45 without me noticing... alright that was so trippy"
"kinda feels like i'm not ignoring u, i'm just tryna play mortal kombat"
"12 hours until i met you. worst decision of my life"
This project was a lot harder than we thought it would be.
Cleaning the data ultimately caused a lot of the language to diminish (due to images, videos, etc),
thus making it more difficult to accurately detect similarities within tweets.
Overall, our original hypothesis was incorrect. We initially thought that Devin would tweet
similar to Chrissy Teigen and Typical Girl because of the way she talks, but it appears that
her online presence follows closely to Justin Bieber's. We also found that it is very difficult to
determine related goals, aspirations, interests, etc through Twitter language because many people have
a very different online presence than who they really are in person.