|
Page last edited by Vedran Sekara (vese) 12/02-2013
Learning Objectives (Specific)
- Create (simple) visualizations using Python's matplotlib.
- Download real-world data for visualization.
- Work with the (messy) real-world data to create
visualization.
- Extend work with APIs to acquire Flickr data for later use.
Also try out convenient Python module for communication with the
API
- Create (advanced) geo-visualizations using Python's matplotlib
& basemaps.
Lecture
I'm traveling this week, so today's lecture is hosted by Vedran,
who's a grad student in my lab. We have a number of interesting
elements on the program.
- Edward Tufte's rules for visualization (by guest
lecturer Lasse Mølgaard, DTU Compute, Cognitive Systems Sections).
Reading for this part is:
http://neuroscience.telenczuk.pl/wp-content/uploads/2010/10/vis_handout.pdf. (Note
that this pdf also has useful notes on matplotlib).
- Journalism in the age of data. You'll be watching the
first part of a great documentary on the role of visualization in
modern media (http://datajournalism.stanford.edu).
I recommend you check out the remaining parts - lots of info on
good tools, etc.
- Work on exercises. This one goes without saying. The
exercises are the center of the class, and as always, if you work
your way through the exercises, you'll be well on your way to
mastering the curriculum. Today, you'll begin to visualize some
data on your own ... and continue the work on the APIs from last
time; this will sharpen your ability to interact with Web APIs to
find cool data sets for visualization.
Group work
Exercise 1. Play with Python's matplotlib (http://matplotlib.sourceforge.net/)
and real data.
- Go to the Guardian Data Store, http://www.guardian.co.uk/news/datablog+society/alcohol and
find the data on alcohol consumption across the globe: The per
capita recorded alcohol consumption (litres of pure alcohol) among
adults (older than 15 years). The data accompanies the article
"Boozers of the world" from 9 March 2009. Download the data to
your computer, clean it, and read it into Python. This part is
painful & more difficult than one might think. You'll have to
decide how to do this: will you get the data ready for Python via
excel, via google docs, import using a python tool? And how do you
handle missing values, etc.? This will be a difficult step for most
of you, but that's part of the learning experience - real data is
usually messy and requires cleanup.
- First, find the top 5 drinking countries using Python (I
recommend sorting the list of countries in descending order, check
out http://wiki.python.org/moin/HowTo/Sorting).
- What number on the list is Denmark?
- Generate a simple line-plot of the countries'
consumption in descending order using the
matplotlib.pyplot (henceforth abbreviated as "plt")
command plot.
- Create a barchart of the same data, using the plt
command bar. What is the problem with this
visualization?
- Revise your barchart so as to emphasize the difference between
the countries that drink the most and the countries that drink the
least. To achieve this goal, construct a barchart of the top
15 and bottom 15 alcohol-consuming contries, with country
names on the x-axis. Something like this, but not
necessarily as fancy! Note: assignment continues below image.

Excercise 2. Continuation of playing around with the
Flickr API
- Instead of constructing URLs for each query (as we did last
time), it is possible to greatly streamline the process of
accessing the Flickr API using the python module Flickrapi.
Download and install the package (for example, use
easy_install to get the package).
Usage:
import flickrapi
flickr=flickrapi.FlickrAPI(api_key,cache=True)
photos=flickr.photos_search(tags=['Monty','Python'],format='json')
- "photos_search" is only one of many avaliable methods, others
can be found here, just
remember to substitute the dot with an underscore when
calling the method via the flickrapi package!
- Download metadata for 2x500 pictures that are all taken in
Denmark. First get 500 tagged with words related to the coastline,
e.g. tags=['beach','water','coast','coastline','sea',...]. . Next
get metadata for 500 images tagged with tags related "inland
nature",
e.g. tags=['forest','nature','grass','green','landscape',...]
- hint: Use page and bbox as
additional arguments when you search photos. Bbox
specifies the geographic bounding box (find one for denmark using
this page) and
page the page-number of the search results (limited to 250
pictures per page). Save the data to a text file or pickle
it.
- Download the GPS locations for all the pictures using the
method photos_geo_getLocation. Again,
don't forget to save the data.
Exercise 3. Play with basemaps (http://matplotlib.org/basemap/)
to plot geographical Flickr data from last week.
- Use basemaps to create a map of Denmark (examples: http://matplotlib.org/basemap/users/examples.html)
- Extract latiitude and longitude coordinates from all the
downloaded Flickr pictures (hint: the returned
JSON structure is a dictionary, you can extract the coordinates by
using ['photo']['location']['latitude'] and
...]['longutide'] as keys).
- Plot the location of each picture onto the map you've just
created, start with the coast line locations. What does the
distribution of points resemble?
- Now (using a new symbol for the datapoints), plot the
inland-nature points. Can you tell the difference?
- We will be working with data from Flickr throughout the course.
So for next, week download metadata for 10.000 pictures (use
tags=['Denmark','Danmark']). Since Flickr has a max cap of 3600 API
requests per hour you need to build in a timer into your python
script http://docs.python.org/2/library/time.html,
read about the package denoted sleep.
For inspiration, the figure below show an example of a map with
~18000 flickr locations plotted.

Links
This page will be permanently deleted and cannot be recovered. Are you sure?
|