Wiki
Social Graphs and Interactions

Lecture 4
Page last edited by Sune Lehmann Jørgensen (sljo) 23/09-2014

Overview

This week, it's all about putting getting a functional bot up and running, doing everything except generating lots of content. Additionally, we will work on NLPP's chapter 6 which will serve as an appetizer for the class's next theme: Machine Learning. We'll play around with the ML algorithms that are built into NLTK and become acquainted with important concepts that we'll be exploring further next week.

 

 

Reading

  • NLPP1e chapter 6.1 and 6.3.
  • If you're still feeling anything less than 100% sure of your Python skills, I highly recommend working through all of chapter 4, which contains a lot of useful info on good Python programming, but this is not required. In particular 4.1 on Assignments which is one of the less intuitive things in Python is well worth a read.
  • Sune's slides (which will be downloadable from the CN group).

Exercises

 

NLPP1e

  1. What is supervised classification? Explain in your own words.
  2. Work through the example on Gender Identification in 6.1, make it run in your own IPython Notebook - and make sure you know what's going on in the code. Include solution to the "your turn" modification - and explain your choice of features. Do this before moving on.
  3. Now, work through Choosing the Right Features.  What is overfitting? Why does the function gender_features2 overfit the training data? How does the notion of a development set help with overfitting?
  4. Finally, work through Document Classification in 6.1 on your own. Let's say we label every tweet with more than 10 retweets as "good" and every tweet with less than that as "bad". Can we use this method to figure out what features makes some tweets popular? How?
  5. Describe in your own words the evaluation metrics: Accuracy, Precision and recall, Confusion Matrices, and Cross-validation from chapter 6.3.

Twitter

  • Today the simple goal is to build the most realistic bot you can. Follow the steps in the lecture slides from today. Please document & explain your choices for each step. The bot must fulfill the following requirements
  1. The code must run on your AWS machine.
  2. Your bot must have a realistic profile, scoring well (human) on http://truthy.indiana.edu/botornot/ site.
  3. The bot must have a "location", and each tweet must have a geo-tag close to that location.
  4. You must have implemented the reciprocal follow routine using cron. I recommend a "sophisticated" (not all at once, not totally random) time table for this.
  5. Your bot must retweet precisely one tweet per day using cron (e.g. the most re-tweeted tweet you've seen in the past 24 hours). Again, the timing of the tweet must include sophisticated/circadian time table.
Support: +45 45 25 74 43