Page last edited by Sune Lehmann Jørgensen (sljo) 01/04-2014
Formalia: Please read http://dtu.cnwiki.dk/02806/page/1159/assignments-project carefully before proceeding. This
page contains information about formatting (including restrictions
on group size, etc), and many other aspects of handing in the
assignment. If you
fail to follow these simple instructions, it will negatively impact
your grade!
Due date: The due date is April 7th 23:59.
Hand in via CampusNet upload of relevant .ipynb files, etc.
1) A couple of quick questions to test
your network knowledge:
- What are network components, and how
does the discovery of America show that these are important?
- What is a small-world network?
- What is triadic closure?
- What do sociologist mean when they talk
about the strength of weak ties?
2) Human paths vs shortest
paths. The
file paths_finished.tsv contains 51318
completed paths. You can read about the format in the file header.
In this exercise, we compare shortest paths between start and end
nodes of human searches as given
by networkx, vs the actual path lengh given by
searching humans. Note that humans sometimes hit the "back" button
while searching the wikispeedia graph. In order to correct the path
with the back button, you may use code similar to:
# path contains the
sequence of pages
currentpath =
[path[0]]
for j in range(1,
len(path)):
if path[j] == '<':
currentpath.pop()
else:
currentpath.append(path[j])
u, v = currentpath[-2], currentpath[-1]
# add u, v to the graph
Answer the following questions.
- What is the average length of human and
computer paths respectively? Plot a histogram of both.
- Plot the two quantities against each
other. Is there a correlation
between the two? Can you think of a better way of visualizing the
correlation?
- Interpret your results.
3) Growing networks. We can think of the
full network as a substrate where the human searching unfolds. We
will now add human searches in batches of 5000 at a time to study
the search dynamics on the network. Keep track of how many times
each node was visited and how many times each link was traversed.
For each of your 10 timesteps, answer the following questions:
- How many nodes were touched by a
path?
- How many links are in the network?
- Which are the 5 most visited nodes?
(highest node weight)
- Which are the 5 highest in/out degree
nodes? (most neighbors)
- Which are the 5 most frequently
traversed links? The network is directed - look at the "other
direction" for each of these five links. Is it as likely to go the
other way?
- Define betweenness
centrality (wiki link) in your own
words. Which 5 nodes have the
highest betweenness centrality
(networkx link). Compare betweenness centrality to node
weight (you choose how to do this) - is this related to exercise
2?
4) Interpret your results from the previous
exercise to gain an understanding of the growing network. What
happens as we add paths? Are we converging to a steady state or
does the network keep changing?
- Plot the total number of nodes in the
graph for each of the 10 time-steps. (y-axis: number of nodes,
x-axis: time-step index from 1 to10) and plot the total number of links in the graph
for each of the time-windows. Based on these two plots, does it
seem like we would get something out of adding another 10000
searches?
- What is
the correlation of node degree in the discovered network with
degree in the full network for each of the 10 time steps. Does this
change over time?
- Evaluate
the stability of the network measures (node weight, in/out-degree,
link-weight) based on your top 5 lists. Are these rankings
stable?
- [Optional
& Advanced] How do your discoveries relate to the findings
in
this paper?
5)
Visualize an aspect of the dynamics using Gephi. (Include the
relevant visualizations and explanations in the iPython
notebook).
This page will be permanently deleted and cannot be recovered. Are you sure?
|