Schedule:
Sunday, January 18, 2009: 10:45 AM
MPH 3 (New Orleans Marriott)
* noted as presenting author
Background and Purpose: Item response theory (IRT) is a modern statistical approach which can improve measurement in practice and research applications. For example, IRT analyses enable identification of items exhibiting differential item functioning (DIF). Responses to items demonstrating DIF are influenced not only by the target latent construct, but also by extraneous variables such as sociodemographic characteristics. Thus, items with DIF do not perform consistently among disparate groups, raising concerns regarding inequities in measurement. This paper illustrates the use of IRT analyses to identify DIF. Known disparities in identification rates of externalizing behavior problems—especially by child sex, race, and socioeconomic status (SES)—could be linked to DIF, but no previous investigations have addressed this issue. Subscales of two instruments targeting externalizing behavior problems in children were analyzed to determine whether items performed consistently among sociodemographically diverse preschool-aged children. Methods: Primary caregivers (N = 900) of preschool-aged children were recruited from four diverse pediatric primary care clinics. Participants completed a sociodemographic questionnaire, the Pediatric Symptom Checklist-17 (PSC-17), and the Behavior Problems Index (BPI). Classical psychometric and IRT analyses were conducted with the 18 items comprising the instruments' combined externalizing subscales. Samejima's (1969) graded response model was fit, and two DIF-detection methods were employed: an IRT-based likelihood model approach and an ordinal logistic regression approach. Items were tested for DIF by child sex, race (controlling for SES), and SES (controlling for race). Effect sizes were assessed after adjusting item parameter estimates for identified DIF. Results: Participants' children were distributed across sociodemographic groups of interest, with 53% male, 50% white, and 43% low SES. Classical psychometric analyses confirmed that the PSC-17 and BPI performed similarly to previous reports in the literature. The assumptions underlying IRT were met, and the graded response model's fit was acceptable. Using a stringent significance level corrected for multiple comparisons (p<.0027), combined results from the two DIF-detection methods revealed 8 items with statistically significant DIF: 2 items exhibited DIF by child sex, 3 items by race, 2 items by SES, and 1 item by both race and SES. The DIF was primarily identified in item difficulty threshold parameters. Item-level DIF effect sizes ranged from 0.14 to 0.72 standard deviations in magnitude. Conclusions and Implications: Item-level measurement bias is a serious concern in scale development. The relationship between item response and level of the target latent construct is unclear in items exhibiting DIF, potentially leading to disparities in measurement efforts. This study illustrated the application of IRT analyses to identify DIF in the measurement of externalizing behavior problems among a diverse sample of preschool-aged children. Results indicated that of 18 items, 8 exhibited significant DIF by child sex, race, or SES, potentially leading to false positive or negative findings attributable to sociodemographic characteristics. Social work researchers and clinicians routinely use scales which have not been examined using modern measurement theory methods. Investigations of DIF are particularly relevant to efforts to reduce health disparities and promote social justice, and DIF-detection analyses based upon IRT are uniquely suited to such endeavors.