Methods: I carry out two atheoretical, but statistically sound ‘studies.’ Study 1 assesses random variables in multiple regression models and study 2 assesses random variables for Granger-causal relationships to industrial production in the U.S. In study 1 (n=1,000), I randomly generated 20 variables with common value ranges: a dependent variable Y with values in the range 1-100; X1-4 (range 1-100); X5-14 (range 1-10); and X15-19 (range 0-1). I first carry out a ‘rigorously’ controlled OLS regression. Then, to avoid ‘overcontrolling, problems with available degrees of freedom, modeling irrelevant X-variables, and statistical artifacts,’ I evaluated two reduced variable models only retaining X-variables below p>.50 and p>.20 significance thresholds. In study 2, I use Federal Reserve data on U.S. industrial production presented in a time series across (n=368) quarter-years. Adding to this dataset, I randomly generate variables X1-10 (range 1-10). Utilizing this new dataset, I employ vector auto-regressive models of time-lagged effects and Wald tests to ascertain the presence of Granger-causality (i.e., showing that X preceded Y, X predicts future values of Y, and bidirectionality is ruled out).
Results: In study 1, the full model of X1-19 was marginally significant (F>0.12). We can have approximately 88 percent confidence that this suite of variables predicts Y better than the mean. X5 and X18 were statistically significant (p=.076; p=.001) factors and X15 was marginally significant (p=.105). These relationships to Y held in bivariate regressions. Each of the reduced variable models, was statistically significant (F>0.049 and F>0.0004 respectively) and X5 and X18 remained significant. In study 2, I found that among other significant relationships, time-lagged values of X10 significantly (p=0.03) predicted industrial production. However, industrial production did not significantly predict X10 (p=0.22) and the role of other variables was at least partially accounted for, establishing and surpassing Granger-causal criteria.
Conclusion/Implications: ‘Best fit’ models show that X5, X15, and X18 alone predict Y. X5 and X18 are independently significant predictors and thus ‘are good candidate targets for intervention’ aimed at improving Y outcomes. In modeling U.S. industrial production, we find that X10 is a Granger-causal factor, meaning that leveraging it could partially alleviate the U.S. unemployment burden. These results and implications are all well-grounded statistically, but clearly fallacious, as the variables have no substantive meaning and are significantly associated (even ‘causally’) by chance. Even among random variables where the likelihood of statistical significance is reduced, statistically significant but spurious findings occur easily. These results highlight that careful and theory-driven social work research is vital to producing sound policy and practice recommendations.