## How to apply survey weights in structural equation modeling (SEM) with lavaan.

The R-Package *lavaan* is my favourite tool for fitting structural equation models (SEM). Its biggest advantages: It´s free, it´s open source and its range of functions is growing steadily.

Before *lavaan*, i used MPLUS, which still has the widest functionality of all SEM-Tools and is the most sophisticated software for latent variable modeling. The Muthéns and their MPLUS-team offer incredibly good support and documentation. The only problem is, that the software isn´t free and without a license you can´t get any of the support.

For me, one drawback of *lavaan *is, that it can´t model latent class models or mixture models …yet! Yves Rosseel is planning to add this in the next two years.

*lavaan *stands for „*la*tent *va*riable *an*alysis“. The package is available via CRAN and has a good tutorial on the lavaan project homepage. Models are specified via syntax. Thankfully, the *lavaan*-syntax is kept pretty simple. At least, it´s a lot easier than the LISREL-syntax (the first, and original SEM-software). But it´s not as easy as drawing a path-model in AMOS, the SPSS-module. Anyway, once you get to a little more complex models, you´ll find working with syntax a lot more efficient. If you don´t like working with syntax, i recommend having a look at Onyx – a graphical interface for structural equation modeling by Andreas Brandmaier. It´s a free tool in which you can draw your SEM as a path diagram and generate the *lavaan*-syntax from it.

But, when you do SEM-models the syntax will be the least complicated thing you had to learn, so i don´t think that will be a problem at all.

**Install lavaan**

If you want to use survey weights, you have to install lavaan, the survey package and lavaan.survey. *Lavaan *is the package used for modeling and the survey-package converts your data into an survey-design-object. After you specified the model in a *lavaan *fit object and you have generated a survey-design-object from your data, these two objects are passed to the *lavaan.survey* function, which will calculate the weighted model.

First, you install the packages:

#Install lavaan install.packages("lavaan", dependencies=TRUE) library(lavaan) #install lavaan.survey install.packages("lavaan.survey") library(lavaan.survey) #Install survey-package install.packages("survey") library(survey)

**Generate the survey-design object**

After the packages and the data are loaded, a svydesign-object is generated from our data. It´s not a suprise, that with „id=~ID“ the column „ID“ in the dataframe will be used as id-variable. With „weights= ~weights_trunc“ the column which holds the survey-weights is defined and with „data=data“ the dataframe is chosen.

library("survey") #load survey package data<- read.csv(file = "data.csv", header=T, sep=",") #read data #if necessary - recode missing value "9" to NA df[df== 9] <- NA #generate survey-design object svy.df<-svydesign(id=~ID, weights=~weight_trunc, data=data)

**Specifying the model**

I´ll use a simple structural equation model with two latent variables, measured by three and two indicator-variables. The exogenous latent variable „latent_a“ is measured by x1-x3, the endogenous latent variable „latent_b“ is measured by y1-y2. The variable „latent_b“ is regressed on (predicted by) „latent_a“.

library(lavaan) model_1 <- '# measurement model latent_a =~ F09_a + F09_b + F09_c latent_b =~ F12_a + F12_b # regressions latent_b ~ latent_a ' lavaan.fit <- sem(model_1, data=data, estimator="MLR", # robust fit / when you have missing data missing = "ml", #fiml for missing data mimic="Mplus") #you can run the model (unweighted) at this point and inspect it summary(lavaan.fit,fit.measures=TRUE, standardized=TRUE)

Normally, i would use MLM as estimator to get robust estimates (robust against non-normality of the endogenous variable), but in this case i chose MLR, because FIML is not available with MLM.

FIML (Full Information Maximum Likelihood algorithm- defined with missing=“ml“) is regarded as equally efficiant to multiple imputation in handling item-nonresponse. But, it can be a good idea to do multiple imputation anyway, because bootstrapping the standard errors is only available with ML-estimator. On the other Hand, it´s an advantage that with FIML it´s not necessary to explicitly model missingess, because FIML uses the already specified SEM.

When using the lavaan.survey-package, you can´t use fiml (yet). You have to do a multiple imputation for your data, if you have missings, and instead of MLR lavan.survey uses MLM as default.

**Fitting the model**

When the model is fitted with *lavaan.survey*, the covariance-matrix will be estimated using the *svyvar-object* generated by the survey-package . The *lavaan *model uses this weighted covariance-matrix with the MLM-estimator to fit the model. MLM is not compatible with missing=“fiml“, so if your data has missings you have to do multiple imputation first and pass your imputed dataframes as a list to the svydesign-package so it becomes a svy.design-object which can be used as data in lavaan.survey. The resulting parameters, fit indices and statistics will be adjusted for the sampling design. Also, if MLM is used, the chi-square (likelihood-ratio) test-statistic will be transformed to a Satorra-Bentler corrected chi-square. [This information stems from the lavaan.survey documentation]. In *lavaan*, you can choose the form of your output. Because i worked a lot with MPLUS, i prefer the MPLUS-Output.

library(lavaan.survey) #Fit the model using weighted data (by passing the survey-design object we generated above) survey.fit <- lavaan.survey(lavaan.fit, survey.design, estimator="ML") #inspect output summary(survey.fit, fit.measures=TRUE, standardized=TRUE, rsquare=TRUE) # if you´re interested in descriptive statistics # you can access the missing data patterns inspect(fit, 'patterns') # and the coverage of the covariance matrix (like in MPLUS) inspect(fit, 'coverage')

** Results **

I wouldn´t have expected that using weights in a SEM-analysis with lavaan is so easy to accomplish.

Here are the fit-indices of the weighted SEM.

lavaan (0.5-17) converged normally after 24 iterations Number of observations 577 Estimator ML Minimum Function Test Statistic 11.664 Degrees of freedom 4 P-value (Chi-square) 0.020 Model test baseline model: Minimum Function Test Statistic 955.394 Degrees of freedom 10 P-value 0.000 User model versus baseline model: Comparative Fit Index (CFI) 0.992 Tucker-Lewis Index (TLI) 0.980 Loglikelihood and Information Criteria: Loglikelihood user model (H0) -3675.100 Loglikelihood unrestricted model (H1) -3669.268 Number of free parameters 16 Akaike (AIC) 7382.200 Bayesian (BIC) 7451.926 Sample-size adjusted Bayesian (BIC) 7401.132 Root Mean Square Error of Approximation: RMSEA 0.058 90 Percent Confidence Interval 0.021 0.097 P-value RMSEA <= 0.05 0.314 Standardized Root Mean Square Residual: SRMR 0.022

…and so on. I don´t show the whole results.

It´s common to show the parameter-estimates in a path-diagram. In my next blogging-session i´ll demonstrate how to draw path diagrams of a lavaan-model with SEMPLOT (Project Homepage).