Welcome to Social Network Analysis

This is the first in a series of lectures and tutorials that I've prepared for COMM-645 at the USC Annenberg School of Communication.

These talks are meant to be introductory self standing lectures which tackle the basics of social network analysis, hope you enjoy.

Welcome to COMM645! Today we are going to be demonstrating some of the capabilities of R and social network analysis. Don't worry about running this code at this time, just follow along and watch how the script, the console window and the environment interact.

Basic R runs off of the command line with text commands, for this class we will be using RStudio a program that sits on top of R and makes it easier to use. Let's take a quick tour of how RStudio works so you can understand this demo.

My window has four panes. The source pane is where you write code. It is basically a text editor like notepad. Any code you write in the source window will not run automatically, so you can tweak or make changes to it slowly. Once your are ready to run your code you can send a line to R by placing your cursor on it and pressing the run key, or CTRL/Command-Enter.

Running code sends it from the source window to the console. The console is where your code is actually run, and any results will be displayed there. Additionally you can type code straight into the console and run it with the Enter key. You can only type one command at a time in the console so it is generally best to write most of your commands in the source window and just run code through the console only when you are mucking around or doing calculations you don't need to reproduce.

Next up is the environment/history window. The environment shows you all of the variables or objects that you have created in R. Pretty much anything you want can be stored as an object, from a single digit or letter to a massive network with millions of people in it. Each object is assigned a name which can then be referenced in your code at a later point. As an example let's assign the number 2 to the name "two" and watch what happens.

two<-2

1+two

## [1] 3

rm(two)

As you can see two appeared in the environment, typing two (without quotations) into any piece of R code will stick the number two in there instead. The rm command is short form for remove and deletes the object from the environment which creates an error if we rerun two+1.

Beside the environment there is a history tab, which will show you all of the commands that you have typed in your session.

Finally there is the utility window, which should have several tabs on top such as files, plots, packages and help. In order, files is a browser that lets you browse data on your computer and set the "working directory" the directory where R grabs data or other files from.

Plots is a generic area that will display graphs or networks.

Help is an easy window for looking up R commands, you can search it directly or write some code with ?? in front of it.

??lm

Finally there is the package tab which takes a bit more explaining.

Packages

R is a statistical programming language that is built from the ground up for managing, plotting and examining data. Base R has a lot of basic functionality such as handling data-sets, basic calculations and popular statistics such as regression or chi-squared tests.

data(iris)

head(iris)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

demoLM<-lm(Sepal.Length~Petal.Length+Petal.Width, data=iris)

summary(demoLM)

## 
## Call:
## lm(formula = Sepal.Length ~ Petal.Length + Petal.Width, data = iris)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.18534 -0.29838 -0.02763  0.28925  1.02320 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.19058    0.09705  43.181  < 2e-16 ***
## Petal.Length  0.54178    0.06928   7.820 9.41e-13 ***
## Petal.Width  -0.31955    0.16045  -1.992   0.0483 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4031 on 147 degrees of freedom
## Multiple R-squared:  0.7663, Adjusted R-squared:  0.7631 
## F-statistic:   241 on 2 and 147 DF,  p-value: < 2.2e-16

In this class we are especially interested in social network analysis, which isn't supported out of the box. Therefore we have to extend R with packages, chunks of code that extend or add abilities to R just like an app will extend your phone.

There are hundreds of R packages (a full list can be seen here) and to install them all you do is pass the install.packages code like so.

install.packages('igraph')
install.packages('statnet')

After installing the package it can be loaded and ready to use passing the library command.

library(igraph)

## 
## Attaching package: 'igraph'
## 
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## 
## The following object is masked from 'package:base':
## 
##     union

Now everything is ready to demonstrate R and social network analysis. Once again don't worry about following along, just sit back and watch how the various moving parts interact.

SNA Demo

The first thing we are going to do is load the igraph package. This program alongside the sna package are going to be the main parts of R that we will use in this class. We've already called the igraph library in the previous section so let's move along a read a network into R.

In this case we will be looking at a graph drawn from the musical Les Misérables. In this case each node will represent a character in the play, and an edge signifies any two characters on stage at the same time. We'll be reading the graph out of the graphml format, which is a specialized type of file for holding network data.

lemis<-read.graph('lemis.graphml', format='graphml')

lemis

## IGRAPH D-W- 77 254 -- 
## + attr: label (v/c), r (v/n), g (v/n), b (v/n), x (v/n), y (v/n),
## | size (v/n), id (v/c), Edge Label (e/c), weight (e/n), Edge Id
## | (e/c)
## + edges:
##  [1]  2-> 1  3-> 1  4-> 1  4-> 3  5-> 1  6-> 1  7-> 1  8-> 1  9-> 1 10-> 1
## [11] 12-> 1 12-> 3 12-> 4 12->11 13->12 14->12 15->12 16->12 18->17 19->17
## [21] 19->18 20->17 20->18 20->19 21->17 21->18 21->19 21->20 22->17 22->18
## [31] 22->19 22->20 22->21 23->17 23->18 23->19 23->20 23->21 23->22 24->12
## [41] 24->13 24->17 24->18 24->19 24->20 24->21 24->22 24->23 25->12 25->24
## [51] 26->12 26->24 26->25 27->12 27->17 27->25 27->26 28->12 28->24 28->25
## + ... omitted several edges

So we can see that "lemis" is a network with 77 nodes and 254 edges. If we want to determine the degree (that is the number of edges) we simply pass one command.

deg<-degree(lemis)
deg

##  [1] 10  1  3  3  1  1  1  1  1  1  1 36  2  1  1  1  9  7  7  7  7  7  7
## [24] 15 11 16 11 17  4  8  2  4  1  2  6  6  6  6  6  3  1 11  3  3  2  1
## [47]  1  2 22  7  2  7  2  1  4 19  2 11 15 11  9 11 13 12 13 12 10  1 10
## [70] 10 10  9  3  2  2  7  7

This gives us a list of numbers for each character in order, showing how many times they appeared in a scene with another character. To make it more readable we can attach the names to the degree list as well.

names(deg)<-V(lemis)$label
sort(deg, decreasing=TRUE)

##          Valjean         Gavroche           Marius           Javert 
##               36               22               19               17 
##       Thenardier          Fantine         Enjolras       Courfeyrac 
##               16               15               15               13 
##          Bossuet          Bahorel             Joly    MmeThenardier 
##               13               12               12               11 
##          Cosette          Eponine           Mabeuf       Combeferre 
##               11               11               11               11 
##          Feuilly           Myriel        Grantaire        Gueulemer 
##               11               10               10               10 
##            Babet       Claquesous        Tholomyes        Prouvaire 
##               10               10                9                9 
##     Montparnasse       Bamatabois        Listolier          Fameuil 
##                9                8                7                7 
##      Blacheville        Favourite           Dahlia          Zephine 
##                7                7                7                7 
##     Gillenormand MlleGillenormand           Brujon     MmeHucheloup 
##                7                7                7                7 
##            Judge     Champmathieu           Brevet       Chenildieu 
##                6                6                6                6 
##      Cochepaille     Fauchelevent         Simplice   LtGillenormand 
##                6                4                4                4 
##   MlleBaptistine      MmeMagloire        Pontmercy          Anzelma 
##                3                3                3                3 
##           Woman2        Toussaint       Marguerite         Perpetue 
##                3                3                2                2 
##           Woman1   MotherInnocent        MmeBurgon           Magnon 
##                2                2                2                2 
##     MmePontmercy        BaronessT           Child1           Child2 
##                2                2                2                2 
##         Napoleon     CountessDeLo         Geborand     Champtercier 
##                1                1                1                1 
##         Cravatte            Count           OldMan          Labarre 
##                1                1                1                1 
##           MmeDeR          Isabeau          Gervais      Scaufflaire 
##                1                1                1                1 
##     Boulatruelle          Gribier        Jondrette      MlleVaubois 
##                1                1                1                1 
##   MotherPlutarch 
##                1

So we see that Valjean appears in the most scenes, as expected for those of you familiar with the story.

Networks can also generate "centrality metrics" which are expressions that attempt to capture if certain members of the network are more important/significant than others. A great example is the Kevin Bacon game, where you pick any actor and see if you can get to Kevin Bacon in 6 hops. In network terms he has high closeness centrality, that is to say it is easy to get from Kevin Bacon's spot in a given movie star network to any other part. For our Les Misérables data we can find the Kevin Bacon of the play with the following command.

close<-closeness(lemis, mode='all')
names(close)<-V(lemis)$label
sort(close, decreasing=TRUE)

##         Gavroche          Valjean     Montparnasse           Javert 
##      0.004366812      0.004255319      0.004065041      0.004032258 
##        Gueulemer       Thenardier       Claquesous            Babet 
##      0.003952569      0.003846154      0.003831418      0.003759398 
##           Mabeuf       Bamatabois          Bossuet        Toussaint 
##      0.003703704      0.003610108      0.003597122      0.003584229 
##     MmeHucheloup    MmeThenardier          Eponine        Grantaire 
##      0.003546099      0.003533569      0.003508772      0.003508772 
##          Cosette       Marguerite           Brujon          Fantine 
##      0.003484321      0.003401361      0.003401361      0.003389831 
##         Enjolras        Prouvaire           Marius           Woman1 
##      0.003355705      0.003355705      0.003300330      0.003289474 
##           Woman2   MotherInnocent        Pontmercy          Labarre 
##      0.003246753      0.003246753      0.003236246      0.003225806 
##           MmeDeR          Isabeau          Gervais      Scaufflaire 
##      0.003225806      0.003225806      0.003225806      0.003225806 
##   LtGillenormand         Simplice     Fauchelevent          Feuilly 
##      0.003184713      0.003154574      0.003125000      0.003105590 
##          Bahorel        Tholomyes           Brevet       Chenildieu 
##      0.003086420      0.003067485      0.003067485      0.003067485 
##      Cochepaille     Gillenormand     Boulatruelle MlleGillenormand 
##      0.003067485      0.003067485      0.002985075      0.002976190 
##             Joly          Anzelma           Magnon       Courfeyrac 
##      0.002949853      0.002915452      0.002906977      0.002906977 
##        BaronessT       Combeferre     MmePontmercy         Perpetue 
##      0.002881844      0.002801120      0.002777778      0.002710027 
##        MmeBurgon           Child1           Child2            Judge 
##      0.002666667      0.002645503      0.002645503      0.002506266 
##     Champmathieu      MlleVaubois        Jondrette   MlleBaptistine 
##      0.002506266      0.002433090      0.002222222      0.002173913 
##      MmeMagloire          Gribier   MotherPlutarch        Listolier 
##      0.002173913      0.002127660      0.002020202      0.002008032 
##          Fameuil      Blacheville          Zephine           Dahlia 
##      0.002008032      0.002004008      0.001912046      0.001908397 
##        Favourite           Myriel         Napoleon     CountessDeLo 
##      0.001904762      0.001851852      0.001626016      0.001626016 
##         Geborand     Champtercier         Cravatte           OldMan 
##      0.001626016      0.001626016      0.001626016      0.001626016 
##            Count 
##      0.001449275

Here we can see that while Valjean has the most connections Gavroche has a higher closeness centrality, so you can get to more parts of the network faster if you start with him.

Similarly betweeness centrality captures how many shortest paths between any two given characters flow through a specific part of the network. In other words, what character is the bridge that connects otherwise disconnected groups from each other. I'm sure everyone has a friend (or is someone) who brings otherwise unconnected people together at a party or a get together. In network terms these folks have high betweenness centrality.

btw<-betweenness(lemis, directed=FALSE)
names(btw)<-V(lemis)$label
sort(btw, decreasing=TRUE)

##          Valjean         Gavroche           Javert           Myriel 
##     1293.6140693      812.6849387      551.1907287      504.0000000 
##       Thenardier          Fantine           Mabeuf       Bamatabois 
##      367.0057359      325.9865440      253.0330087      227.4785714 
##          Cosette           Marius        Tholomyes     Montparnasse 
##      212.8580447      205.5187229      187.5952381      141.8520022 
##    MmeThenardier       Claquesous        Grantaire MlleGillenormand 
##      129.8511905      120.6288059      102.9285714      102.8803030 
##     MmeHucheloup          Bossuet     Fauchelevent        MmeBurgon 
##       97.0595238       79.0863095       75.0000000       75.0000000 
##        Gueulemer     Gillenormand          Eponine        Pontmercy 
##       72.3816198       68.9583333       66.2004690       64.3137446 
##   LtGillenormand            Babet        Toussaint       Marguerite 
##       43.0125000       41.6359848       32.5222222       25.6547619 
##          Bahorel           Magnon     MmePontmercy        BaronessT 
##       21.1654762       13.8833333       13.5000000       10.9744048 
##          Feuilly         Enjolras           Brujon         Simplice 
##       10.9321429        7.8682900        4.6929293        3.6166667 
##       Courfeyrac          Anzelma             Joly         Napoleon 
##        1.8214286        0.7694805        0.5000000        0.0000000 
##   MlleBaptistine      MmeMagloire     CountessDeLo         Geborand 
##        0.0000000        0.0000000        0.0000000        0.0000000 
##     Champtercier         Cravatte            Count           OldMan 
##        0.0000000        0.0000000        0.0000000        0.0000000 
##          Labarre           MmeDeR          Isabeau          Gervais 
##        0.0000000        0.0000000        0.0000000        0.0000000 
##        Listolier          Fameuil      Blacheville        Favourite 
##        0.0000000        0.0000000        0.0000000        0.0000000 
##           Dahlia          Zephine         Perpetue      Scaufflaire 
##        0.0000000        0.0000000        0.0000000        0.0000000 
##           Woman1            Judge     Champmathieu           Brevet 
##        0.0000000        0.0000000        0.0000000        0.0000000 
##       Chenildieu      Cochepaille     Boulatruelle           Woman2 
##        0.0000000        0.0000000        0.0000000        0.0000000 
##   MotherInnocent          Gribier        Jondrette      MlleVaubois 
##        0.0000000        0.0000000        0.0000000        0.0000000 
##       Combeferre        Prouvaire   MotherPlutarch           Child1 
##        0.0000000        0.0000000        0.0000000        0.0000000 
##           Child2 
##        0.0000000

Valjean wins out again as the biggest bridge between various other parts of the story.

Finally let's plot the network, first we are going to scale each node by degree, the more connections a node has the bigger it is. Next we are

V(lemis)$size <- degree(lemis)*0.6
E(lemis)$arrow.size <- .2
E(lemis)$edge.color <- "gray80"
E(lemis)$width <- 1+E(lemis)$weight/12
l=layout.fruchterman.reingold(lemis)
plot(lemis,
  vertex.label.cex=0.75,
  vertex.label.color="black",
  vertex.label.family="Helvetica", 
  layout=l)

(valnet

In summary, today we've learned a bit about R, how to navigate around, what packages are and demonstrated that with less than 20 lines of code you can get network data, calculate powerful and informative network statistics and produce visualizations. Next time we will go over the same territory, but with you following along on your computers. So make sure that you've followed the R installation guide on blackboard before then.