Wiki
Social Graphs and Interactions

Lecture 3
Page last edited by Sune Lehmann Jørgensen (sljo) 17/09-2014

Goals

  • Finish up exercises from last time
  • Read a real scientific paper and think about it!
  • Work on NLPP1e chapter 2,3
  • Get your Twitter bot up and running on Amazon

Reading

As last time, the reading for today is  Natural Language Processing with Python (1st edition) (NLPP1e), please take a look at

  • NLPP Chapter 2.1, 2.2, 2.3
  • NLPP Chapter 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.9. 3.10.
We will also read about twitter bots. Read the first 4 pages of
Finally, you will need to grab twitter recipes from MTSW2e. 

 

Exercises

A) Natural language processing exercises.

  • Do exercises in NLPP1e 2.8: 16, 17, 18 
  • Work through the exercises NLPP1e 3.12: 6, 8 (use your function to get the text from this page and evaluate the results), 25a,b., 3.30. 

B) "Rise of the social bots". After you've read the article, answer the following questions

  • What is the definition of a social bot? What is the Turing test? When can a social bot claim to have passed the Turing test?
  • Mention what you think at the three most important problems with social bots. There are many issues, so you have to explain why your reasons are important.
  • In the section "Act like a human, think like a bot" the authors explain potential strategies used by sophisticated bots. Create a list summarizing all those strategies - and write a short evaluation of how effective you think each one is. Provide reasons for your answers.

C) Twitter exercises.

C1) Set up an Amazon EC2 account to run your bot 24/7. We do not demand an iPython notebook for the code you use for the bot on Amazon EC2. IPython notebooks are for handing in the assignments, but the bot itself should be run via scripts on EC2 (a script is just a file (with .py extension) that contains your code).

  • Set up the EC2 account with Python, etc. Instructions here.
  • Set up cron to collect the content of your own timeline.
  • Start by reading about cron here http://en.wikipedia.org/wiki/Cron. What does cron do? Which file do you need to change in order to start scheduling a script. How do you schedule a program to run every 15 minutes? How would you add randomness the timings?
  • Set up a script that collects all tweets in your own timeline. (Hint: You can use example 9.21 in MTSW2e to get started.) If you store all of the metadata connected to each tweet (it's a lot, cf MTSW2e pp 22-25) you will fill up your 8 gigs of Amazon storage very quickly. Therefore, extract only summary informtion from each tweet's json. Extract values corresponding to the following keys: ["text"], ["coordinates"], ["retweet_count"], ["id"], ["created_at"], ["user"]["id"]. (Note that the last one ["user"]["id"] is actually 2 levels deep, it's not a missing comma).

C2) Try the "reciprocal follow" routine (but stick to less than 50 per day to avoid getting banned). The reciprocal follow routine goes like this and can be automated using cron

  1. Seach for spam accounts by searching for accounts with the term "followback" in the description.
  2. Follow 50 new accounts per day. (And keep track of who you've tried so you don't follow them again).
  3. Wait 24 hours.
  4. Unfollow whoever did not follow you back. 
  5. Repeat

C3 Extra credit) Consider making your bot say something. It is dangerous for the bot to say too much (frequent posting will make many humans unfollow you), but maybe start the bot on one retweet per day. It's best if you retweet humans, not bots.

Support: +45 45 25 74 43