• about me
  • statistics
    • example data
  • twitter-mining (german)
  • raspberry pi (german)
  • links
  • Datenschutz
ahoi data

analysis, visualisation and playing around with data

latent class analysis

Whereabouts of observations between multiple latent class models. Supplementary plot for LCA with poLCA.

18. August 2015 by Niels

I got the idea for the following plot and some of the code from a Stackoverflow question, where User D.L. Dahly tried to show how observations in „a model with class=(i) are distributed by the model with class = (i+1)“. I contribute through the idea of not using igraph, but the DiagrammeR-package, which generates an appealing plot with little code.

The plot tries to visualize how classifications of observations (persons) in a latent class analysis change over a sequence of LC-models with growing number of classes. I ran five models with one to five classes. The plot starts on top with the loglinear independence model that only has one class. The sample then splits in the 2-class LCA in a class with 146 and a class of 436 observations. Ellipse two and three are the classes 1 and 2 from the latent class model with two classes. In the next line of ellipses (four,five and six) you find the classes 1,2 and 3 of the latent class model with three classes. Ellipses seven, eight, nine, ten are classes 1,2,3 and 4 from the 4-class latent class model. The thickness of the ellipses and the arrows is according to the amount of observations.

whereabouts

Here is the R-code for it:


# first: estimate 5 latent class models 
f<-with(mydata, cbind(var1:varx)~1)
lc1<-poLCA(f, data=mydata, nclass=1, na.rm = FALSE, nrep=30, maxiter=3000) #Loglinear independence model.
lc2<-poLCA(f, data=mydata, nclass=2, na.rm = FALSE, nrep=30, maxiter=3000)
lc3<-poLCA(f, data=mydata, nclass=3, na.rm = FALSE, nrep=30, maxiter=3000)
lc4<-poLCA(f, data=mydata, nclass=4, na.rm = FALSE, nrep=30, maxiter=3000) 
lc5<-poLCA(f, data=mydata, nclass=5, na.rm = FALSE, nrep=30, maxiter=3000)
#---------------------------------
# PLOT
#---------------------------------
library("DiagrammeR")
library("V8")
# This code stems from D.L. Dahly 

# build dataframe with predicted class for each observation
x1<-rep(1, nrow(lc1$predclass))        
x2<-lc2$predclass
x3<-lc3$predclass
x4<-lc4$predclass
x5<-lc5$predclass
results <- cbind(x1, x2, x3, x4, x5)
results <-as.data.frame(results)
results

# avoid double naming of classes (because each LCA named their classes 1,2,...,k)
N<-ncol(results) 
n<-0
for(i in 2:N) {
  results[,i]<- (results[,i])+((i-1)+n)
  n<-((i-1)+n)
}

# Make a data frame for the edges and counts
# cross-tabulations and their frequencies
g1<-plyr::count(results,c("x1","x2"))
g2<-plyr::count(results,c("x2","x3"))
colnames(g2)<-c("x1","x2","freq")
g3<-plyr::count(results, c("x3","x4"))
colnames(g3)<-c("x1", "x2","freq")
g4<-plyr::count(results,c("x4","x5"))
colnames(g4)<-c("x1","x2","freq")
edges<-rbind(g1,g2,g3,g4)

# Make a data frame for the class sizes
h1<-plyr::count(results,c("x1"))
h2<-plyr::count(results,c("x2"))
colnames(h2)<-c("x1","freq")
h3<- plyr::count(results,c("x3"))
colnames(h3)<-c("x1","freq")
h4<-plyr::count(results,c("x4"))
colnames(h4)<-c("x1","freq")
h5<-plyr::count(results,c("x5"))
colnames(h5)<-c("x1", "freq")
nodes<-rbind(h1,h2,h3,h4,h5)

Now, we use the data from edges and counts, as well as class sizes in DiagrammeR:

#dataframe for nodes - columns: node, label, type, attributes (like color and stuff)
colnames(nodes)<-c("node","label")

#scale nodes
nodes <- scale_nodes(nodes_df = nodes,
                     to_scale = nodes$label,
                     node_attr = "penwidth",
                     range = c(2, 5))

#dataframe for edges - columns: edge from, edge to, label, relationship, attributes 
colnames(edges)<-c("from", "to", "label")
edges$relationship<-c("given_to")

#scale edges
edges <- scale_edges(edges_df = edges,
                     to_scale = edges$label,
                     edge_attr = "penwidth",
                     range = c(1, 5))

nodes <- scale_nodes(nodes_df = nodes,
                     to_scale = nodes$penwidth,
                     node_attr = "alpha:fillcolor",
                     range = c(5, 90))

nodes
nodes$label2<-nodes$label
nodes$label<-paste0(nodes$node)

# Additional label outside of the ellipses
# nodes$label<-paste0(nodes$node, "',xlabel=","'",nodes$label2) 

# Group-number
#nodes$xlabel<-paste0("(n=",nodes$label2,")")

#plot stuff 
lca_graph<-create_graph(nodes,
                        edges,
                        node_attrs = c("fontname = Helvetica",
                                       "color = darkgrey",
                                       "style = filled",
                                       "fillcolor = lightgrey",
                                       "alpha_fillcolor = 0.5"),
                        edge_attrs = c("fontname = Helvetica",
                                       "fontsize=10"),
                        graph_attrs=c("layout=dot",
                                      "overlap = false",
                                      "fixedsize = true",
                                      "directed=TRUE"))
                                      
render_graph(lca_graph)

That´s it. DiagrammeR uses an algorithm to avoid overlapping. I tried some improvements of the plot, but decided to stick with this solution, because it´s already pretty nice. The only thing i miss in the plot are class-sizes. I tried to attach them with the „xlabel“-attribute in DiagrammeR, but the plot became to messy. You can try it yourself, by uncommenting this part:

nodes$label<-paste0(nodes$node, "',xlabel=","'",nodes$label2) 

But i didn´t like it much.

Posted in: datavisualisation, DiagrammeR, latent class analysis, latent variable, lavaan, LCA, plot, poLCA, rstats, Uncategorized Tagged: DiagrammeR, latent class analysis, poLCA, R

Neue Beiträge

  • Multiple Imputation in R. How to impute data with MICE for lavaan.
  • Whereabouts of observations between multiple latent class models. Supplementary plot for LCA with poLCA.
  • Example for a latent class analysis with the poLCA-package in R
  • How to plot correlations of rating items with R
  • Plot with background in ggplot2: Visualising line-ups from Hurricane-festival 1997 – 2015

Neue Kommentare

    Schlagwörter

    AMOS DiagrammeR graphviz latent class analysis lavaan lavaan.survey LISREL Mining MPLUS Onyx OpenMX path diagrams pip poLCA Python R Raspberry Pi Rest API rstats SEM semPlot sentiment analysis SPSS STATA Streaming API streamR structural equation modeling survey weights textmining TikZ Twitter twitter-API twitter-mining visualisation Visualisierung wordcloud

    Meta

    • Anmelden
    • Beitrags-Feed (RSS)
    • Kommentare als RSS
    • WordPress.org

    Copyright © 2019 ahoi data.

    Omega WordPress Theme by ThemeHall