|
Page last edited by Sune Lehmann Jørgensen (sljo) 23/09-2014
Overview
This week, it's all about putting getting a functional bot up and
running, doing everything except generating lots of content.
Additionally, we will work on NLPP's chapter 6 which will serve as
an appetizer for the class's next theme: Machine Learning. We'll
play around with the ML algorithms that are built into NLTK and
become acquainted with important concepts that we'll be exploring
further next week.
Reading
- NLPP1e chapter 6.1 and 6.3.
- If you're still feeling anything less
than 100% sure of your Python skills, I highly recommend working
through all of chapter 4, which contains a lot of useful info on
good Python programming, but this is not required. In particular
4.1 on Assignments which is one of the less intuitive things in
Python is well worth a read.
- Sune's slides (which will be
downloadable from the CN group).
Exercises
NLPP1e
- What is supervised classification?
Explain in your own words.
- Work through the example
on Gender Identification in 6.1, make it run in your
own IPython Notebook - and make sure you know what's going on in
the code. Include solution to the "your turn"
modification - and explain your choice of features. Do this before
moving on.
- Now, work through Choosing the
Right Features. What is overfitting? Why does the
function gender_features2 overfit the training
data? How does the notion of a development set help with
overfitting?
- Finally, work through Document
Classification in 6.1 on your own. Let's say we label
every tweet with more than 10 retweets as "good" and every tweet
with less than that as "bad". Can we use this method to figure out
what features makes some tweets popular? How?
- Describe in your own words the
evaluation metrics: Accuracy, Precision and recall, Confusion
Matrices, and Cross-validation from chapter 6.3.
Twitter
- Today the simple goal is to build the most realistic bot you
can. Follow the steps in the lecture slides from today. Please
document & explain your choices for each step. The bot must
fulfill the following requirements
- The code must run on your AWS machine.
- Your bot must have a realistic profile, scoring well (human)
on http://truthy.indiana.edu/botornot/ site.
- The bot must have a "location", and each tweet must have a
geo-tag close to that location.
- You must have implemented the reciprocal follow routine using
cron. I recommend a "sophisticated" (not all at once, not totally
random) time table for this.
- Your bot must retweet precisely one tweet per day using cron
(e.g. the most re-tweeted tweet you've seen in the past 24 hours).
Again, the timing of the tweet must include sophisticated/circadian
time table.
This page will be permanently deleted and cannot be recovered. Are you sure?
|