Methods: We obtained data collected from 606,579 adults (≥18 years) who completed the National Health Interview Survey (NHIS) for the years 1997-2016. The NHIS is a cross-sectional in-person household interview survey with a multistage probability sample design. It is conducted annually by the National Center for Health Statistics of the Centers for Disease Control and Prevention. The survey has an annual response rate of 60.8%-72.5%. We employed a supervised machine learning method (conditional inference trees; CTree) to predict hypertension using psychological distress, gender, and six additional factors identified in previous studies, including age and employment status. A sample 75% of data was used as training set and a sample of 25 % of data was used as a test set. The importance of the variables was calculated using mean decrease in node impurity, a generic method for evaluating variable importance.
Results: On average, there were 288,884 adults in annual survey from 1997 to 2016. The mean age was 48 (SD = 18). Among these individuals, 43% were male. A total of 78% adults were white; 14% were black; and the rest were identified as Asian or other. A total of 61% adults reported that they were employed in the previous week and in the past 12 months; 28% adults reported that they were not employed in the past 12 months; and the rest reported that they were not employed in the previous week, but had been in the past 12 months. The mean psychological distress for all adults regardless of employment status was 1.421 (SD = .67). In total, 28% of adult persons reported symptoms of hypertension. CTree was used to build prediction models for each survey year. On average, CTree had an accuracy rate of 76% (2004: maxaccuracy = 93%; 2016: minaccuracy = 72%). Variables were ranked based on their importance index. Not surprisingly, age was consistently ranked as the most important variable in predicting hypertension. Surprisingly, gender was ranked as the second important variable from 1997 to 2001, but the least important variable from 2002 to 2016. In contrast, while employment status was the least important variable from 1997 to 2001, it became the second most important variable from 2002 to 2016. Psychological distress consistently ranked as the third important variable in the model.
Conclusions/Implications: Our results discovered that there was a dramatic reversal in the importance of gender and employment status in predicting hypertension. Future social work researchers may test this hypothesis and consider using methods of data science to explore other possible risk factors for hypertension. Methods of data science may be useful in generating hypotheses and provide new insights for mixed findings of current studies.