Page last edited by Ole Winther (olwi) 12/03-2013
Formalia: Please read http://dtu.cnwiki.dk/02822/page/666/assignments-grading carefully
before proceeding. This page contains information about formatting
(including restrictions on size, etc), group sizes, and many other
aspects of handing in the assignment. If you fail to
follow these simple instructions, it will negatively impact your
grade!
Due date and time: The assignment is due
on Monday March 18th at 23:59.
Exercise 1a. Make sure you have read Programming
Collective Intelligence Chapter 6 pages 117-141 (excluding page
127-131 and 138-139). Answers the following questions in your
own words.
- Explain the concept of classification. Come up with an example
where one can use a classifier. What kind of features are
meaningful to use in this example?
- What are the features we use for document classification?
- What is naïve about the naïve Bayes classifier? Is the
assumption reasonable in the example you came up with above?
- Give an example of the use of Bayes’ theorem. Explain how we
calculate/set each term in the Category and Document setting using
the training data and the category prior. Does the category prior
have the same type of effect as the assumed probability (defined on
page 122)?
Exercise 1b. Make sure you have read Programming
Collective Intelligence Chapter 7 pages 142-166 (excluding page
151-153). Answer the following questions in your own
words.
- The decision tree is a classifier just like the naïve Bayes
classifier we studied last week. What is the main attraction about
the decision tree according to the book?
- Describe the process of building the tree from training data.
Describe how to choose which variable to split on. Describe Gini
impurity and entropy. Describe the recursive tree building
process.
- What is overfitting? How can pruning of the tree as described
in the book cure overfitting?
- What are missing values? How does the decision tree deal with
missing values? How would you deal with missing values in the naïve
Bayes classifier?
Exercise 1c. Make sure you have read Programming
Collective Intelligence Chapter 10 pages 226-249. Answer the
following questions in your own words.
- Explain in your own words, the principle of how the two
matrices in non-negative matrix factorization are fitted (no math
needed). Which difference is it that we minimize? What is the
additional constraint that we use in non-negative matrix
factorization? How many different features should we use relative
to the dimensions in the data matrix (more or less)? Hint: Do we
risk to overfit?
- Explain on a high level what we hope to get out of applying the
algorithm to the news corpus (page 227-229)?
- Explain on a high level what we hope to get out of applying the
algorithm to the stock data (pages 243-245)?
Exercise 2. Naïve Bayes for classification of Flickr
data.
- Choose the categories you want to classify. Use as categories
either a few geographical locations (for example Zealand and
Jutland) or type of landscape (nature and city, etc.). In the first
case you can use a bounding box to validate the category and in the
second case you need to manually open and inspect the photos.
- Use features based on tags (or descriptions). Discuss the
features you have chosen. Do you expect them to be informative?
Why?
- Does the naive Bayes classifier perform as expected? How many
examples from each category do you need to get stable results?
- There exist many methods for extracting features from images
such as local color and shape features. We will not use them in
this exercise. If we had them, would they be useful for the
classification problem you have set up?
Exercise 3. Decision tree for classification of Flickr
data.
- Choose the categories you want to classify. Use as categories
either a few geographical locations (for example Zealand and
Jutland) or type of landscape (nature and city, etc.). In the first
case you can use a bounding box to validate the category and in the
second case you need to manually open and inspect the photos.
- Use the same features as in Exercise 2 to train the decision
tree classifier.
- Interpret the trees you get in the same way as in the
book.
- Validate the trained model on test data. Discuss some cases
where the model worked and some cases where it didn’t. Compare with
the predictions you get using the naive Bayes classifier. Are they
in general in agreement?
Exercise 4. Finding independent features in Flickr
data.
- Use features based on tags (or descriptions .... will need to
be downloaded using flickr.photos_search and the "extras" option),
represent the data in a matrix and apply the non-negative matrix
factorization algorithm.
- Interpret the features that you get? Do they make sense?
- Can we use the topics we get for unsupervised classification,
that is inspect whether the documents which have a high weight in a
certain topic all belong to the same category (could be
geographical location or something else). Can you see similarities
with the decision trees in some of the features vectors?
Exercise 5. Summing up what machine learning can be
used for.
- Very briefly one can say that machine learning is about
learning from data. Discuss, having the applications we have worked
with in these exercises in mind, what the advantages are of
learning from data. Given that you want to solve these tasks can
you come up with alternative non-machine learning approaches?
- Which of the three methods we have learned about are predictive
(that is for classification) and which aim at learning something
about the features in the data? [Hint: A method may achieve
both.]
- What are the key differences between non-negative matrix
factorization (unsupervised approach) and the classifiers we have
studied previous (supervised) according to the book (as discussed
in the beginning Chapter 10).
This page will be permanently deleted and cannot be recovered. Are you sure?
|