Methods: We will utilize 14 years of data from the Health and Retirement Study (2006-2020), a large representative sample of older adults in the United States funded by The National Institute on Aging and U.S. Social Security Administration. The total sample consisted of 15,385 older adults, where 76% identified as White, 14% Black, and 9% Hispanic. Guided by minority stress theory, we examined risk and protective factors from a socio-environmental point of view: sociodemographic characteristics (age, gender, marital status), economic factors (education, income, assets), multimorbidity weighted health index (diabetes, hypertension, cancer, lung and heart diseases, stroke, psychiatric problems, arthritis, obesity), discrimination (major lifetime, everyday discrimination and attribution), and perceived neighborhood conditions (social cohesion, physical order). Growth curve models and machine learning were performed to examine the association between socio-environmental factors and cognitive functioning.
Results: Preliminary analyses show similar, yet different, results between the two methods. For instance, large cognitive health inequities were persistent between the two statistical approaches, where Blacks and Hispanics had lower scores when compared to Whites; and education consistently operated as a protective factor to cognitive functioning across race and ethnicity. However, there were also important differences in how environmental factors (neighborhood characteristics), major lifetime discrimination, and everyday discrimination, were related with cognitive functioning across the two methods. Gender and race/ethnicity interactions showed divergent patterns with cognitive health across the two statistical approaches.
Conclusions and Implications: ML is often referred to as a statistical method for “lazy scientists.” Indeed, large datasets and ML enable scientists to investigate phenomena from an a-theoretical/”black-box” perspective. Yet doing so introduces ethical concerns with how to model and operationalize complex social phenomena. There are also important methodological tools inherent in ML and growth curve modeling that can adjust for episodic events that occur at the national level (e.g., great recession) and can reduce bias caused by concept drift and changes in the environment. When used judiciously and carefully, ML has the potential to confirm, and interrogate, theory. Findings cannot only spark scientific imagination but also have important implications to develop theory and a rigorous knowledge base to inform social policies and practices.