The R-Package *lavaan* is my favourite tool for fitting structural equation models (SEM). Its biggest advantages: It´s free, it´s open source and its range of functions is growing steadily.

Before *lavaan*, i used MPLUS, which still has the widest functionality of all SEM-Tools and is the most sophisticated software for latent variable modeling. The Muthéns and their MPLUS-team offer incredibly good support and documentation. The only problem is, that the software isn´t free and without a license you can´t get any of the support.

For me, one drawback of *lavaan *is, that it can´t model latent class models or mixture models …yet! Yves Rosseel is planning to add this in the next two years.

*lavaan *stands for „*la*tent *va*riable *an*alysis“. The package is available via CRAN and has a good tutorial on the lavaan project homepage. Models are specified via syntax. Thankfully, the *lavaan*-syntax is kept pretty simple. At least, it´s a lot easier than the LISREL-syntax (the first, and original SEM-software). But it´s not as easy as drawing a path-model in AMOS, the SPSS-module. Anyway, once you get to a little more complex models, you´ll find working with syntax a lot more efficient. If you don´t like working with syntax, i recommend having a look at Onyx – a graphical interface for structural equation modeling by Andreas Brandmaier. It´s a free tool in which you can draw your SEM as a path diagram and generate the *lavaan*-syntax from it.

But, when you do SEM-models the syntax will be the least complicated thing you had to learn, so i don´t think that will be a problem at all.

**Install lavaan**

If you want to use survey weights, you have to install lavaan, the survey package and lavaan.survey. *Lavaan *is the package used for modeling and the survey-package converts your data into an survey-design-object. After you specified the model in a *lavaan *fit object and you have generated a survey-design-object from your data, these two objects are passed to the *lavaan.survey* function, which will calculate the weighted model.

First, you install the packages:

1 2 3 4 5 6 7 8 9 10 11 |
#Install lavaan install.packages("lavaan", dependencies=TRUE) library(lavaan) #install lavaan.survey install.packages("lavaan.survey") library(lavaan.survey) #Install survey-package install.packages("survey") library(survey) |

**Generate the survey-design object**

After the packages and the data are loaded, a svydesign-object is generated from our data. It´s not a suprise, that with „id=~ID“ the column „ID“ in the dataframe will be used as id-variable. With „weights= ~weights_trunc“ the column which holds the survey-weights is defined and with „data=data“ the dataframe is chosen.

1 2 3 4 5 6 7 8 9 10 |
library("survey") #load survey package data<- read.csv(file = "data.csv", header=T, sep=",") #read data #if necessary - recode missing value "9" to NA df[df== 9] <- NA #generate survey-design object svy.df<-svydesign(id=~ID, weights=~weight_trunc, data=data) |

**Specifying the model**

I´ll use a simple structural equation model with two latent variables, measured by three and two indicator-variables. The exogenous latent variable „latent_a“ is measured by x1-x3, the endogenous latent variable „latent_b“ is measured by y1-y2. The variable „latent_b“ is regressed on (predicted by) „latent_a“.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
library(lavaan) model_1 <- '# measurement model latent_a =~ F09_a + F09_b + F09_c latent_b =~ F12_a + F12_b # regressions latent_b ~ latent_a ' lavaan.fit <- sem(model_1, data=data, estimator="MLR", # robust fit / when you have missing data missing = "ml", #fiml for missing data mimic="Mplus") #you can run the model (unweighted) at this point and inspect it summary(lavaan.fit,fit.measures=TRUE, standardized=TRUE) |

Normally, i would use MLM as estimator to get robust estimates (robust against non-normality of the endogenous variable), but in this case i chose MLR, because FIML is not available with MLM.

FIML (Full Information Maximum Likelihood algorithm- defined with missing=“ml“) is regarded as equally efficiant to multiple imputation in handling item-nonresponse. But, it can be a good idea to do multiple imputation anyway, because bootstrapping the standard errors is only available with ML-estimator. On the other Hand, it´s an advantage that with FIML it´s not necessary to explicitly model missingess, because FIML uses the already specified SEM.

When using the lavaan.survey-package, you can´t use fiml (yet). You have to do a multiple imputation for your data, if you have missings, and instead of MLR lavan.survey uses MLM as default.

**Fitting the model**

When the model is fitted with *lavaan.survey*, the covariance-matrix will be estimated using the *svyvar-object* generated by the survey-package . The *lavaan *model uses this weighted covariance-matrix with the MLM-estimator to fit the model. MLM is not compatible with missing=“fiml“, so if your data has missings you have to do multiple imputation first and pass your imputed dataframes as a list to the svydesign-package so it becomes a svy.design-object which can be used as data in lavaan.survey. The resulting parameters, fit indices and statistics will be adjusted for the sampling design. Also, if MLM is used, the chi-square (likelihood-ratio) test-statistic will be transformed to a Satorra-Bentler corrected chi-square. [This information stems from the lavaan.survey documentation]. In *lavaan*, you can choose the form of your output. Because i worked a lot with MPLUS, i prefer the MPLUS-Output.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
library(lavaan.survey) #Fit the model using weighted data (by passing the survey-design object we generated above) survey.fit <- lavaan.survey(lavaan.fit, survey.design, estimator="ML") #inspect output summary(survey.fit, fit.measures=TRUE, standardized=TRUE, rsquare=TRUE) # if you´re interested in descriptive statistics # you can access the missing data patterns inspect(fit, 'patterns') # and the coverage of the covariance matrix (like in MPLUS) inspect(fit, 'coverage') |

** Results **

I wouldn´t have expected that using weights in a SEM-analysis with lavaan is so easy to accomplish.

Here are the fit-indices of the weighted SEM.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
lavaan (0.5-17) converged normally after 24 iterations Number of observations 577 Estimator ML Minimum Function Test Statistic 11.664 Degrees of freedom 4 P-value (Chi-square) 0.020 Model test baseline model: Minimum Function Test Statistic 955.394 Degrees of freedom 10 P-value 0.000 User model versus baseline model: Comparative Fit Index (CFI) 0.992 Tucker-Lewis Index (TLI) 0.980 Loglikelihood and Information Criteria: Loglikelihood user model (H0) -3675.100 Loglikelihood unrestricted model (H1) -3669.268 Number of free parameters 16 Akaike (AIC) 7382.200 Bayesian (BIC) 7451.926 Sample-size adjusted Bayesian (BIC) 7401.132 Root Mean Square Error of Approximation: RMSEA 0.058 90 Percent Confidence Interval 0.021 0.097 P-value RMSEA <= 0.05 0.314 Standardized Root Mean Square Residual: SRMR 0.022 |

…and so on. I don´t show the whole results.

It´s common to show the parameter-estimates in a path-diagram. In my next blogging-session i´ll demonstrate how to draw path diagrams of a lavaan-model with SEMPLOT (Project Homepage).

Tania

Hi,

Thank you for this, it is very useful. However, I am getting the following error when fitting the weighted model:

„lavaan::duplicationMatrix‘ is deprecated.

Use ‚lav_matrix_duplication‘ instead.“

Do you know how to tackle this?

Thanks!

Niels

Hi Tania,

you don´t have to worry about this warning-message. There was a lavaan-update that changed the name of some functions and Daniel Oberski hasn´t adapted his lavaan.survey-package, yet. I asked him once on twitter and he told me, that it doesn´t affect your analysis. Just ignore the message, i´m sure it will be gone with the next update of the lavaan.survey-package. Best wishes, Niels

Tania

Thank you Niels!

Tania

Hi Niels,

I have another issue, and I thought maybe you could help me. I ran my model not using weights and I got very good fit and significant coefficients which made a lot of theoretical sense. Then, I ran my model using weights and the results changed completely (horrible fit, many coefficients not significant). I cannot understand why I have these many changes, any idea? Is lavaan.survey only intended for some type of weights? could it be that my weights are not suitable or appropriate? I am grateful for any suggestions or ideas! Thanks!

Niels

Hi Tania,

i´m afraid without more information about your data and your weighting-variable (like strata or sth else), it´s hard to guess where it goes wrong. My first idea would be, that you have missing values. The unweighted analysis used FIML and in the weighted analysis FIML is not (!) available, so lavaan.survey used listwise deletion instead. This information loss can lead to very different data. Is N different between both models? My second idea would be, that you haven´t created your svydesign-object adequately for your data. Perhaps you should visit the lavaan google-group and post your problem there. I´ll have a look on it, too. ==> https://groups.google.com/forum/#!forum/lavaan

Tania

Thank you Niels. The weights I am using, which were not calculated by me, but from the Agency that developed the survey, are calculated in four stages: selection of county, selection of area, selection of home and selection of subject. I don’t think that the problem are missing cases since I ran the non weighted model with the same cases as in the weighted model (since I only have a 6% of missing data I decided not to impute data). I think that the second option could be the case, where I did not specify the weight correctly. I copy it here in case you see anything wrong:

svy.df<-svydesign(id=~quest,

weights=~peso_kid,

data=kids_m)

survey.fit17d <- lavaan.survey(model17d.fit,

svy.df,

estimator = "MLM")

Is there something else I should specify? I should mention that my exogenous variables are nominal and ordinal, and all my exogenous variables are continuous but non-normal (reason for which I used MLM). I will post my problem also at the lavaan group.

Thank you very much!!

Niels

It seems, you only apply individual weights and omit the strata (stages) right now. The data should contain at least one variable for the strata, so you can define in it with „strata=variable“ in the svydesign-call. Perhaps, you´ll finde some answers for your special case here:

http://faculty.washington.edu/tlumley/old-survey/index.html and

http://www.ats.ucla.edu/stat/r/faq/svy_r_scpsu.htm

Or try the r-help mailing-list: https://stat.ethz.ch/mailman/listinfo/r-help

Best wishes

Niels

Tania

Thank you very much Niels for all your help!

Best,

Niels

For anybody reading this, who has a similar problem: It turned out, the weight could be specified correctly with the „probs“-argument in the survey-call.