datavisualisation – ahoi data

Whereabouts of observations between multiple latent class models. Supplementary plot for LCA with poLCA.

2015-08-18 by Niels

I got the idea for the following plot and some of the code from a Stackoverflow question, where User D.L. Dahly tried to show how observations in „a model with class=(i) are distributed by the model with class = (i+1)“. I contribute through the idea of not using igraph, but the DiagrammeR-package, which generates an appealing plot with little code.

The plot tries to visualize how classifications of observations (persons) in a latent class analysis change over a sequence of LC-models with growing number of classes. I ran five models with one to five classes. The plot starts on top with the loglinear independence model that only has one class. The sample then splits in the 2-class LCA in a class with 146 and a class of 436 observations. Ellipse two and three are the classes 1 and 2 from the latent class model with two classes. In the next line of ellipses (four,five and six) you find the classes 1,2 and 3 of the latent class model with three classes. Ellipses seven, eight, nine, ten are classes 1,2,3 and 4 from the 4-class latent class model. The thickness of the ellipses and the arrows is according to the amount of observations.

Here is the R-code for it:


# first: estimate 5 latent class models 
f<-with(mydata, cbind(var1:varx)~1)
lc1<-poLCA(f, data=mydata, nclass=1, na.rm = FALSE, nrep=30, maxiter=3000) #Loglinear independence model.
lc2<-poLCA(f, data=mydata, nclass=2, na.rm = FALSE, nrep=30, maxiter=3000)
lc3<-poLCA(f, data=mydata, nclass=3, na.rm = FALSE, nrep=30, maxiter=3000)
lc4<-poLCA(f, data=mydata, nclass=4, na.rm = FALSE, nrep=30, maxiter=3000) 
lc5<-poLCA(f, data=mydata, nclass=5, na.rm = FALSE, nrep=30, maxiter=3000)

#---------------------------------
# PLOT
#---------------------------------
library("DiagrammeR")
library("V8")
# This code stems from D.L. Dahly 

# build dataframe with predicted class for each observation
x1<-rep(1, nrow(lc1$predclass))        
x2<-lc2$predclass
x3<-lc3$predclass
x4<-lc4$predclass
x5<-lc5$predclass
results <- cbind(x1, x2, x3, x4, x5)
results <-as.data.frame(results)
results

# avoid double naming of classes (because each LCA named their classes 1,2,...,k)
N<-ncol(results) 
n<-0
for(i in 2:N) {
  results[,i]<- (results[,i])+((i-1)+n)
  n<-((i-1)+n)
}

# Make a data frame for the edges and counts
# cross-tabulations and their frequencies
g1<-plyr::count(results,c("x1","x2"))
g2<-plyr::count(results,c("x2","x3"))
colnames(g2)<-c("x1","x2","freq")
g3<-plyr::count(results, c("x3","x4"))
colnames(g3)<-c("x1", "x2","freq")
g4<-plyr::count(results,c("x4","x5"))
colnames(g4)<-c("x1","x2","freq")
edges<-rbind(g1,g2,g3,g4)

# Make a data frame for the class sizes
h1<-plyr::count(results,c("x1"))
h2<-plyr::count(results,c("x2"))
colnames(h2)<-c("x1","freq")
h3<- plyr::count(results,c("x3"))
colnames(h3)<-c("x1","freq")
h4<-plyr::count(results,c("x4"))
colnames(h4)<-c("x1","freq")
h5<-plyr::count(results,c("x5"))
colnames(h5)<-c("x1", "freq")
nodes<-rbind(h1,h2,h3,h4,h5)

Now, we use the data from edges and counts, as well as class sizes in DiagrammeR:

#dataframe for nodes - columns: node, label, type, attributes (like color and stuff)
colnames(nodes)<-c("node","label")

#scale nodes
nodes <- scale_nodes(nodes_df = nodes,
                     to_scale = nodes$label,
                     node_attr = "penwidth",
                     range = c(2, 5))

#dataframe for edges - columns: edge from, edge to, label, relationship, attributes 
colnames(edges)<-c("from", "to", "label")
edges$relationship<-c("given_to")

#scale edges
edges <- scale_edges(edges_df = edges,
                     to_scale = edges$label,
                     edge_attr = "penwidth",
                     range = c(1, 5))

nodes <- scale_nodes(nodes_df = nodes,
                     to_scale = nodes$penwidth,
                     node_attr = "alpha:fillcolor",
                     range = c(5, 90))

nodes
nodes$label2<-nodes$label
nodes$label<-paste0(nodes$node)

# Additional label outside of the ellipses
# nodes$label<-paste0(nodes$node, "',xlabel=","'",nodes$label2) 

# Group-number
#nodes$xlabel<-paste0("(n=",nodes$label2,")")

#plot stuff 
lca_graph<-create_graph(nodes,
                        edges,
                        node_attrs = c("fontname = Helvetica",
                                       "color = darkgrey",
                                       "style = filled",
                                       "fillcolor = lightgrey",
                                       "alpha_fillcolor = 0.5"),
                        edge_attrs = c("fontname = Helvetica",
                                       "fontsize=10"),
                        graph_attrs=c("layout=dot",
                                      "overlap = false",
                                      "fixedsize = true",
                                      "directed=TRUE"))
                                      
render_graph(lca_graph)

That´s it. DiagrammeR uses an algorithm to avoid overlapping. I tried some improvements of the plot, but decided to stick with this solution, because it´s already pretty nice. The only thing i miss in the plot are class-sizes. I tried to attach them with the „xlabel“-attribute in DiagrammeR, but the plot became to messy. You can try it yourself, by uncommenting this part:

nodes$label<-paste0(nodes$node, "',xlabel=","'",nodes$label2)

But i didn´t like it much.

Plot with background in ggplot2: Visualising line-ups from Hurricane-festival 1997 – 2015

2015-06-04 by Niels

The Hurricane Festival is taking place again this june. It could be interesting to have a look on its development over the years. In this case, the amount of bands for each year.

First, i gathered some data from Wikipedia and put it in a csv-file. You can access the data here:Hurricane Festival Bands 1997-2015.

A simple barplot is the best way to plot this data. But to make it a little more appealing, i want to use a custom font and a wallpaper from the Hurricane Festival website. But before i start plotting, i need to get the data in shape.

#These packages will be needed
library("dplyr")
library("tidyr")
library("ggplot2")
library("jpeg")
library("grid")
library("extrafont")

# read in the data
hurricane<-read.csv("Gesamt_1997_2015.csv", header=F, sep=";")
colnames(hurricane)<-c("bands","year")

With some Dplyr-magic we aggregate the count of bands per year:

# Group by year and count the number of bands in each year
plot.df<- hurricane %>% group_by(year) %>% summarise(count=n())
plot.df$year<-as.factor(plot.df$year)

I want to use the font „Open Sans“. You can download the font here Open Sans. Then you have to import it in R using the „extrafont“-package.

font_import(paths="/Open_Sans/")
loadfonts(device="win") #otherwise i get errors on my windows-PC
fonts()

As the title of this blog-entry suggests, i also want to use jpg-background for my graph. I downloaded the wallpaper from the Hurricane Festival from here http://www.hurricane.de/de/interaktiv/downloads/.

Now we can plot the data:

# Import the Wallpaper
img <- readJPEG("wallpaper-hurricane-1920-1080.jpg")

# start plotting
plot<-ggplot(plot.df,aes(x=year,y=count)) + 
  annotation_custom(rasterGrob(img, width=unit(1,"npc"), height=unit(1,"npc")), 
                    -Inf, Inf, -Inf, Inf) +
  scale_y_continuous(expand=c(0,0), limits = c(0,max(plot.df$count)*1.05))+
  geom_bar(stat="identity",fill="white",width=0.8)+ 
  geom_text(aes(label=plot.df$count), vjust=1.5,colour="black") +
  theme_bw() +
  theme(text=element_text(family="Open Sans"),
        plot.title = element_text(size = rel(1.5), face = "bold", vjust = 1.5),
        axis.line=element_blank(),
        axis.text.y=element_blank(),
        #axis.title.x=element_blank(),
        axis.title.y=element_blank(),
        axis.ticks.y = element_blank()) +
  ggtitle("How many Bands had each Hurricane-Festival in the years 1997-2015")+
  labs(x="@Niels_Bremen")
plot

With „alpha=0.85“ the bars become a little bit transparent, so you can see a bit more of the background-image.

plot<-ggplot(plot.df,aes(x=year,y=count)) + 
  annotation_custom(rasterGrob(img, width=unit(1,"npc"), height=unit(1,"npc")), 
                    -Inf, Inf, -Inf, Inf) +
   scale_y_continuous(expand=c(0,0), limits = c(0,max(plot.df$count)*1.05))+
  geom_bar(stat="identity",fill="white",width=0.8, alpha=0.85)+ 
  geom_text(aes(label=plot.df$count), vjust=1.5,colour="black") +
  theme_bw() +
  theme(text=element_text(family="Open Sans"),
        plot.title = element_text(size = rel(1.5), face = "bold", vjust = 1.5),
        axis.line=element_blank(),
        axis.text.y=element_blank(),
        #axis.title.x=element_blank(),
        axis.title.y=element_blank(),
        axis.ticks.y = element_blank()) +
  ggtitle("How many Bands had each Hurricane-Festival in the years 1997-2015?")+
  labs(x="@Niels_Bremen")
plot

By the way, this is what the plot would look like with ggplot2-defaults:

Not that bad, for just one line of code.

ggplot(plot.df,aes(x=year,y=count)) +geom_bar(stat="identity")

Anyway, here are some further graphics:
Here, i tried to give one point for each time a band has played at hurricane-festival. I tried to use an icon (png-file) of a hand instead of a point, but haven´t figured out how to do it.

#only the Top29 Bands
plot.df2<-count.bands %>%
  filter(min_rank(desc(count)) <= 29) %>%
  arrange(desc(count))

#Preparing data for the plot
plot.df3 <- data.frame(band = rep(plot.df2$bands, plot.df2$count),
                      count = unlist(lapply(plot.df2$count, seq_len)))

#load font
font_import(paths="e:/Blog/Hurricane/Open_Sans/")
loadfonts(device="win")
fonts()

#background
img <- readJPEG("e:/Blog/Hurricane/wallpaper-hurricane-800x450.jpg")

#plotting
ggplot(plot.df3, aes(x = count, y=reorder(band,count))) +
  annotation_custom(rasterGrob(img, width=unit(1,"npc"), height=unit(1,"npc")), 
                    -Inf, Inf, -Inf, Inf)+
  geom_point(colour="white") + 
  scale_x_continuous(limits=c(1, 7), breaks=seq(1,7, by=1)) +
  theme_bw()  +
  ggtitle("Most frequent bands at Hurricane-Festival (1997-2015)")+
  theme(text=element_text(family="Open Sans"),
        plot.title = element_text(size = rel(1.5), face = "bold", vjust = 1.5),
        axis.title.y=element_blank(),
        axis.ticks.y = element_blank(),
        axis.title.x=element_blank())

As a last plot, i did a wordcloud from all Bands that ever played at Hurricane.

library(wordcloud)
library(RColorBrewer)

#Plot and save wordcloud image
png('e:/wordcloud_hurricane.png', width=1200,height=1200,res=260)
wordcloud(count.bands$bands, count.bands$count2,
          scale = c(1.4, .2),
          min.freq=1,
          max.words=Inf, 
          random.order=FALSE, 
          rot.per=0, 
          colors="Darkgreen",
          bg = "transparent")
dev.off()