Abstract: How Numbers Misbehave: Multivariate Regressions and Structural Equation Modeling (Society for Social Work and Research 14th Annual Conference: Social Work Research: A WORLD OF POSSIBILITIES)

11419 How Numbers Misbehave: Multivariate Regressions and Structural Equation Modeling

Saturday, January 16, 2010: 10:30 AM
Seacliff B (Hyatt Regency)
* noted as presenting author
Judith C. Baer, PhD , Rutgers University, Associate Professor, New Brunswick, NJ
Mi Sung Kim, MSW , Rutgers University, Doctoral Student, New Brunswick, NJ
Research models provide the empirical basis for understanding the organization of social phenomena in the world. For this and many other reasons it is important that models approximate the real world, and that our statistical output be as precise as possible.

The purpose of this paper is to compare and contrast two analytical strategies: the commonly used OLS regression and structural equation modeling (SEM). Multiple regression models are common in the published literature, but biases in multiple regression estimates are either not well recognized or ignored in many areas of social work research. For example, OLS multivariate regression compounds measurement error and specifically increases Type I error. Since SEM paths are not additive but are computed simultaneously, and because error terms are modeled in the analyses, problems with Type I and measurement error are decreased. Moreover, SEM has greater flexibility and can handle non-normal data, multicollinearity and categorical data. To illustrate differences between the statistical strategies, we present some outcome differences of the same model from the two procedures.


This was a secondary analysis of the Fragile Families and Child Wellbeing Study data. Weights were used to make the sample nationally representative of both unwed and wed births (n = 2,973/N = 863,162) mothers' age in years: M = 29.78, Jackknife*SE = .16; children's age in month: M = 34.45, Jackknife*SE = .11). Analysis consisted of SEM with Mplus 5.0 as well as multiple regressions with STATA. The probability of mother's GAD was measured by the Composite International Diagnostic Interview - Short Form (CIDI-SF: Kessler, et al., 1998); types of parenting involvement by activities representing the mother's engagement with the child; poverty by ability to pay utilities, rent, free food, etc.; the child's anxiety via maternal reports of the child's internalizing symptomology; parental stress by difficulties in parenting i.e., feeling worn out, etc. Preliminary analyses were conducted and some items were excluded. Both methods used weights and jackknife procedures.


Fit indices for the SEM model were good: WRMR= 1.209, RMSEA =.000. A comparison of the findings showed that significant paths in the regression model were no longer significant. Moreover, the mediating path in the regression model between parenting stress, negative parenting and child anxiety and depression was no longer significant. The probit for the relationship between poverty and generalized anxiety disorder was .886 (p < .01) in the regression model and .027 (p < .01) in the SEM model reflecting over-estimation in the regression probit.


The two models showed similar findings in general, for example, the relations between poverty and mother's GAD. However, the SEM strategy resulted in a more parsimonious model, including confirmatory factor analysis, than the multiple regression strategy. More significant paths in the model using multiple regressions suggested that it increased the probability of Type I error. Furthermore, the model using the multiple regression was not able to compute measurement error resulting in biased estimates. These findings point to important considerations for planning analyses and indicate caution in interpreting multivariate regression models.