Wiki
02806 Course Wiki

Assignment 3
Page last edited by Sune Lehmann Jørgensen (sljo) 01/04-2014

Formalia: Please read   http://dtu.cnwiki.dk/02806/page/1159/assignments-project carefully before proceeding. This page contains information about formatting (including restrictions on group size, etc), and many other aspects of handing in the assignment.   If you fail to follow these simple instructions, it will negatively impact your grade! 

 

Due date: The due date is April 7th 23:59. Hand in via CampusNet upload of relevant .ipynb files, etc.

 

1) A couple of quick questions to test your network knowledge:

  • What are network components, and how does the discovery of America show that these are important?
  • What is a small-world network?
  • What is triadic closure?
  • What do sociologist mean when they talk about the strength of weak ties?

2) Human paths vs shortest paths. The file paths_finished.tsv contains 51318 completed paths. You can read about the format in the file header. In this exercise, we compare shortest paths between start and end nodes of human searches as given  by networkx, vs the actual path lengh given by searching humans. Note that humans sometimes hit the "back" button while searching the wikispeedia graph. In order to correct the path with the back button, you may use code similar to:

 

        # path contains the sequence of pages
        currentpath = [path[0]]
        for j in range(1, len(path)):
            if path[j] == '<':
                currentpath.pop()
            else:
                currentpath.append(path[j])
                u, v = currentpath[-2], currentpath[-1]
                # add u, v to the graph

 

Answer the following questions.

  • What is the average length of human and computer paths respectively? Plot a histogram of both.
  • Plot the two quantities against each other. Is there a correlation between the two? Can you think of a better way of visualizing the correlation?
  • Interpret your results.

3) Growing networks. We can think of the full network as a substrate where the human searching unfolds. We will now add human searches in batches of 5000 at a time to study the search dynamics on the network. Keep track of how many times each node was visited and how many times each link was traversed. For each of your 10 timesteps, answer the following questions:

  • How many nodes were touched by a path? 
  • How many links are in the network?
  • Which are the 5 most visited nodes? (highest node weight)
  • Which are the 5 highest in/out degree nodes? (most neighbors)
  • Which are the 5 most frequently traversed links? The network is directed - look at the "other direction" for each of these five links. Is it as likely to go the other way?
  • Define betweenness centrality (wiki link) in your own words. Which 5 nodes have the highest betweenness centrality   (networkx link). Compare betweenness centrality to node weight (you choose how to do this) - is this related to exercise 2? 

4) Interpret your results from the previous exercise to gain an understanding of the growing network. What happens as we add paths? Are we converging to a steady state or does the network keep changing?

  • Plot the total number of nodes in the graph for each of the 10 time-steps. (y-axis: number of nodes, x-axis: time-step index from 1 to10) and plot the total number of links in the graph for each of the time-windows. Based on these two plots, does it seem like we would get something out of adding another 10000 searches?
  • What is the correlation of node degree in the discovered network with degree in the full network for each of the 10 time steps. Does this change over time?
  • Evaluate the stability of the network measures (node weight, in/out-degree, link-weight) based on your top 5 lists. Are these rankings stable?
  • [Optional & Advanced] How do your discoveries relate to the findings in  this paper?

5) Visualize an aspect of the dynamics using Gephi. (Include the relevant visualizations and explanations in the iPython notebook).

Support: +45 45 25 74 43