Abstract: Using IRT to Explore Items on the CESD (Society for Social Work and Research 14th Annual Conference: Social Work Research: A WORLD OF POSSIBILITIES)

12301 Using IRT to Explore Items on the CESD

Sunday, January 17, 2010: 11:45 AM
Seacliff D (Hyatt Regency)
* noted as presenting author
Carl F. Siebert, MBA, MS , Rutgers University, Assistant Director of Research and Evaluation, Piscataway, NJ
Darcy Clay Siebert, PhD , Rutgers University, Associate Professor, New Brunswick, NJ
Akihito Kamata, PhD , Florida State University, Associate Professor and Chair, Tallahassee, FL
Purpose: Standardized instruments are frequently used in social work research, the assumption being that validated measures, developed using Classical Measurement Theory, improve the rigor of a study. However, improved software packages and computing power now provide researchers with additional tools to enhance their ability to measure latent constructs more accurately. Item Response Theory (IRT), and Differential Item Functioning (DIF) analyses in particular, can assist researchers in identifying the specific items in a measure that may be biased for members of particular groups. This study will describe the usefulness of IRT and DIF, and illustrate the process by examining social workers' responses to the CES-D, a widely-used, standardized measure of depressive symptoms.

Methods: A random sample of 751 North Carolina social workers completed an anonymous paper-and-pencil survey. The sample was predominantly female (83.2%), married (65.8%), white (87.7%), and held a CCSW/LCSW (55.5%). The survey included the 20-item CES-D, from which only three cases contained partial missing data. Overall, 18.6% experienced a “significant level of psychological distress” (CES-D score of 16 and higher); 24.6% of non-licensed social workers and 13.9% of those with licenses. Reliability using Cronbach's Alpha was .91 overall; .91 for non-licensed social workers, and .89 for licensed. Using Mplus 5.0, an IRT model was constructed for the 20 CES-D items along with a grouping variable for non-licensure/licensure. The model was used to create both Item Characteristic Curves (ICC) and Item Information Curves (IIC) for the items for each licensure group, while restraining the model to equal mean and variance levels for the two groups.

Results: Although the majority of the 20 items showed no item bias when evaluating depressive symptoms for non-licensed and licensed groups (12 with no signs of bias and four with minimal signs), four showed significant response differences. The ICC curves for items 2, 4, and 15 show a higher probability of indicating higher levels of depressive symptoms for the non-licensed than the licensed group, despite having similar total CES-D scores. Item 10 shows the reverse. The IIC curves for items 2, 4, and 15 show higher levels of information for the non-licensed than for the licensed group, and item 10 shows the reverse. These varying levels of information between groups are a strong indication that these items are biased in their ability to appropriately measure depressive symptoms. Despite item content/wording that suggests no group differences, clearly more than respondents' level of depressive symptoms is influencing how they respond to these items.

Implications: Given that 20% of the items in this widely-used, standardized measure are biased in this benign example of licensed and unlicensed social workers, the likelihood of biased items in less well-developed measures and/or between very different groups is high. Social work researchers should embrace new strategies for improving measures, particularly given the sensitive nature of much of the data and the extremely diverse samples that are of interest. To avoid confounding influences in measurement, IRT and DIF could prove critical to the improvement of social work research.