Wiki
Social Data Modeling

Lecture 1
Page last edited by Vedran Sekara (vese) 10/02-2013

Learning objectives.

  • Get Python installed on your system.
  • Feel comfortable writing small scripts in python. 
  • Start playing with APIs

Before you get started: don't forget to fill out the questionnaire.

http://bit.ly/02822

Program

Exercise 1.  First we have to get your system in shape for analyzing text.

  • Get Python up and running on your system. ImportantWe use Python 2.7, so don't install python 3.X - it will only cause you pain and suffering. Also - on Windows make sure you install 32 bit Python, even if you're on a 64 bit system (since NumPy only works with 32 bit). See below for tips for all of exercise 1.
  • Make sure you have the numpy, scipy, matplotlib up and running
  • Add the NLTK module to your Python distribution.

 

Exercise 2. You already know how to program (it's a course requirement), but how's your Python doing? Going over the following exercises will help you feel more comfortable in a new environment.  Write a script that does the following

  • Create a list a that contains the numbers from 1 to 990, incremented by one, using the range function.
  • Show that you understand slicing in Python by extracting a list b with the numbers from 42 to 79 from the list created above.
  • Using def, define a function that takes as input a number x and outputs the number multiplied by itself plus eight f(x) = x(x+8). Apply this function to every element of the list b using a for loop.
  • Write the output of your function to a text file with one number per line.
  • Show that you know about strings typing and understanding everything in the example in http://learnpythonthehardway.org/book/ex6.html If you feel this is too complex, try completing exercises 0-5 first. 
  • Learn about JSON by reading the wikipedia page. Why is json superior to xml? (... or why not?)
  • Use the json module (http://docs.python.org/library/json.html). First use urllib2 (http://docs.python.org/howto/urllib2.html) to download this file, then load the json as a python object and use pprint to make it look good when written to the terminal.

[ -->note ] Experience shows that some of you will find Python difficult. This year, we're taking this into account, and setting up extra help sessions for anyone who wants to participate. We will offer help with the concrete exercises and work through the book/course

http://learnpythonthehardway.org/book/

with you guys to give you a better chance at following all the cool stuff we'll be doing in Python.

 

Exercise 3. Use APIs to control the internet

  • What is a Web API? Describe in your own words (you may want to ask wikipedia or Google, if you don't already know what an API is).
  • What does it mean that a webservice is "RESTful"? Describe in your own words.
  • Write a Python script that generates the URL to draw a simple pie chart with  Google Chart API. Make the pie with 5 slices, occupying the following fractions: [4%, 7%, 14%, 25%, 50%]
  • Use  Google's Static Map API to draw a straight line from your house to DTU building 101.
  • Next week we will be working with data from a photo sharing service (and social network) called Flickr. In order to mine the contents of that dataset, you'll need to get an API Key. Request one here (go for the non-commercial one).

Exercise 4. Play around with the Flickr API and figure out how to downoad metadata for a picture

  • First in a browser, what is the API address that directs you to the metadata of a picture?
  • [Hint 1: Take a look at some example flicker API queries just to get a feel for how the API works ... for what the requests look like].
  • [Hint 2: Each photo has a unique id that you can call through the API e.g. http://api.flickr.com/services/rest/?format=json& photo_id=[INSERT PHOTO ID HERE]&method=flickr.photos.getInfo&api_key=[INSERT API KEY HERE] ]
  • Use the API JSON output to find the URL for the picture with id=3273728483. What does it depict?
  • Try to download a picture based on information contained in the metadata you have recieved from the API calls. Use urllib to retrieve the picture.
  • [Hint 3: Each photo has a server, farm, id, and secret, use these to construct a url: http://farm[ farm number ].static.flickr.com/[ server number ]/[ picture id ]_[ secret id ].jpg]

 

Reading:

There is no specific reading for today - you'll be finding the information needed to complete the exercises by searching on line. But the assignments have useful links embedded (I recommend using them!).

 

Setting up Python (tips & Tricks)

A. Adding Python to your system 

  • If you're on linux, you're in luck. Just use apt-get or similar to install what you need.
  • If you're on Windows or Mac, a good option is Enthought's free distribution. This one includes numpy, scipy, and matplotlib.
  • [advanced option] On Mac, you can also consider SciPy SuperpacMacPorts or Homebrew.
  • [advanced option] On windows you can also install standard Python and then required packages via easy_install & pip

B. Installing NLTK

Once your Python stack is up and running, you'll need to add Natural Language Toolkit (NLTK) to your system.

Links

Some example APIs

Visualization

Fun links (pointing forward)

Support: +45 45 25 74 43