|
Page last edited by Sune Lehmann Jørgensen (sljo) 17/09-2014
Goals
- Finish up exercises from last time
- Read a real scientific paper and think about it!
- Work on NLPP1e chapter 2,3
- Get your Twitter bot up and running on Amazon
Reading
As last time, the reading for today is Natural
Language Processing with Python (1st edition) (NLPP1e),
please take a look at
-
NLPP Chapter 2.1, 2.2, 2.3
-
NLPP Chapter 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.9.
3.10.
We will also read about twitter bots. Read the first 4 pages
of
Finally, you will need to grab twitter recipes from
MTSW2e.
Exercises
A) Natural language processing exercises.
- Do exercises in NLPP1e 2.8: 16, 17, 18
- Work through the exercises NLPP1e 3.12: 6, 8 (use your function
to get the text from this page and evaluate the results), 25a,b.,
3.30.
B) "Rise of the social bots". After you've read the
article, answer the following questions
- What is the definition of a social bot? What is the Turing
test? When can a social bot claim to have passed the Turing
test?
- Mention what you think at the three most important problems
with social bots. There are many issues, so you have to explain why
your reasons are important.
- In the section "Act like a human, think like a bot" the authors
explain potential strategies used by sophisticated bots. Create a
list summarizing all those strategies - and write a short
evaluation of how effective you think each one is. Provide reasons
for your answers.
C) Twitter exercises.
C1) Set up an Amazon EC2 account to run your bot 24/7.
We do not demand an iPython notebook for the code you use
for the bot on Amazon EC2. IPython notebooks are for handing in the
assignments, but the bot itself should be run via scripts on EC2 (a
script is just a file (with .py extension) that contains your
code).
- Set up the EC2 account with Python, etc. Instructions here.
- Set up cron to collect the content of
your own timeline.
- Start by reading about cron here http://en.wikipedia.org/wiki/Cron.
What does cron do? Which file do you need to change in order to
start scheduling a script. How do you schedule a program to run
every 15 minutes? How would you add randomness the timings?
- Set up a script that collects all tweets in your own timeline.
(Hint: You can use example 9.21 in MTSW2e to get started.) If
you store all of the metadata connected to each tweet (it's a lot,
cf MTSW2e pp 22-25) you will fill up your 8 gigs of Amazon storage
very quickly. Therefore, extract only summary informtion from each
tweet's json. Extract values corresponding to the following keys:
["text"], ["coordinates"], ["retweet_count"], ["id"],
["created_at"], ["user"]["id"]. (Note that the last one
["user"]["id"] is actually 2 levels deep, it's not a missing
comma).
C2) Try the "reciprocal follow" routine (but stick to less than
50 per day to avoid getting banned). The reciprocal follow routine
goes like this and can be automated using cron
- Seach for spam accounts by searching for accounts with the term
"followback" in the description.
- Follow 50 new accounts per day. (And keep track of who you've
tried so you don't follow them again).
- Wait 24 hours.
- Unfollow whoever did not follow you back.
- Repeat
C3 Extra credit) Consider making your bot
say something. It is dangerous for the bot to say too much
(frequent posting will make many humans unfollow you), but maybe
start the bot on one retweet per day. It's best if you retweet
humans, not bots.
This page will be permanently deleted and cannot be recovered. Are you sure?
|