Wiki
Social Data Modeling

Lecture 8
Page last edited by Sune Lehmann Jørgensen (sljo) 02/04-2013

Program

Today's lecture is about network analysis. More specifically, we'll focus on simple models for network structure. And because I love networks, we'll also spend a bit of time covering networks more generally.

 

Reading

Today's reading. 

 

Exercise 1. Reading and NetworkX

  • If you don't already know the module, work through the NetworkX tutorial - trying out the various commands on your own.
  • Answer the following questions in your own words (look the answers up online - e.g. using the wikipedia links above): What does clustering mean to a network scientist? What does it mean that a network is a "small-world"? (hint: remember to include clustering in your answer). What is the network degree distribution? What is a power-law? Name a few examples of networks with power-law degree distributions.

Exercise 2. The Barabasi-Albert (BA) model in NetworkX.

  • Use NetworkX to code up the BA model. Start with 5 nodes connected at random. Add one node at a time and have each new node connect to 3 existing nodes. Keep track of the age of each node (e.g. by naming the nodes by the time-step they've been introduced). Hint: The trickiest thing about coding up this model is choosing a node with probability proportional to its degree. First, you will need to to be able to choose stuff at random: use this module. Now, the easiest way that I can think of to do this is to create a list with each node occurring with its degree and simply picking a random node from this list (but maybe you can find a better way). [It's possible to create BA networks with a builtin NetworkX function - that's not ok - you must write the code on your own].
  • Once you've generated a network of 300 nodes, use NetworkX and matplotlib to plot the network.
  • Now, create a new network of 5000 nodes and plot the degree distribution (use both loglog and linear scales). Hint: To see an example of plotting a network degree distribution, check out pp 26-28 here: http://www.stanford.edu/class/cs224w/nx_tutorial.pdf .
  • Fit the data in the 5000 node network to find the slope of the straight line in the log-log plot. [Hint: linear regression in python]. Generate 200 networks and fit each one - what's the average value and variance across for all slopes? Does that answer correspond to what you expected to find, according to the theory
  • Age. Calculate the average degree as a function of node-age for your 200 networks. (What is the average degree of all of the oldest nodes, what's the average degree of all the second-oldest nodes, what's the average degree of all the the third oldest nodes, etc). Create a plot of average degree on the y-axis and node-age on the x-axis. Is this the picture you expect to see in real-world networks? Justify your answer.
  • Do you think the BA model is a good model for real-world networks? Explain the reasons for your anwer.

 

Advanced reading (will help you answer some of the questions above)

  • Adamic and Huberman (2000). Power law distribution of the World Wide Web. Science, 287:2115. Download here, helps with the age-question.
  • Goldstein, Morris, Yen (2004). Problems with fitting to the power-law distribution. Eur. Phys. J. B 41, 255–258 (2004). Download here. Helps with problems with fitting power-laws question.

The Movie

 

 

 

Support: +45 45 25 74 43