Methods: We discuss three issues in this workshop. (1) Odds ratios versus predicted probabilities - although using odds ratios to interpret logit and similar models is very common, the method is rarely sufficient for understanding the results of the model. Logistic regression is essentially a nonlinear model, and the linear relation between a predictor and the cumulative distribution function (i.e., the probability) only exists in the probability range of 0.2 to 0.8. As such, the odds-ratio interpretation for results out of this range is misleading. "We strongly prefer methods of interpretation that are based on predicted probabilities (Long & Freese, 2014, Regression Models for categorical Dependent Variables Using Stata, p.227)." (2) Wald test versus likelihood ratio (LR) test - testing statistical significances based on the Wald test may not always produce similar findings as those provided by the LR test; as such, researchers must employ both methods to draw conclusions about statistically significant predictors (Guo, 2013, Maximum Likelihood Estimator: the Untold Stories, Caveats, and Tips for Application). (3) Single- versus multiple-parameter test. To address important research questions, statistical tests focusing on single parameter is often insufficient, whereas a test involving multiple parameters based on the so-called linear contrasts is highly recommended (Hosmer, et al. 2008, Applied Survival Analysis: Regression Modeling of Time-to-Event Data).
Results: (1) A study evaluating the effectiveness of a social emotional learning (SEL) intervention program showed that results based on predicted probabilities overcame the limitations of odds ratios, and clearly revealed that study children's "getting better" probabilities for the SEL group were higher than those of the control group. (2) A study evaluating the determinants of timing of adopting nonpharmaceutical mitigation interventions fighting the COVID pandemic in the United States confirmed that minority and vulnerable populations suffered most severely from the pandemic, which is an important finding supported by both the Wald and LR tests. (3) A study testing the research hypotheses about the adverse impacts of the welfare reform on the hazard rates of reunification for children placed in foster care indicated that the multi-parameter tests produced stronger results than those using the single-parameter tests alone.
Conclusion and Implications: Whenever possible, researchers running nonlinear models should exercise cautions and take remedial measures to warrant that findings are robust, and the statistical analysis is indeed rigorous.