Methods: A random sample of 751 North Carolina social workers completed an anonymous paper-and-pencil survey. The sample was predominantly female (83.2%), married (65.8%), white (87.7%), and held a CCSW/LCSW (55.5%). The survey included the 20-item CES-D, from which only three cases contained partial missing data. Overall, 18.6% experienced a “significant level of psychological distress” (CES-D score of 16 and higher); 24.6% of non-licensed social workers and 13.9% of those with licenses. Reliability using Cronbach's Alpha was .91 overall; .91 for non-licensed social workers, and .89 for licensed. Using Mplus 5.0, an IRT model was constructed for the 20 CES-D items along with a grouping variable for non-licensure/licensure. The model was used to create both Item Characteristic Curves (ICC) and Item Information Curves (IIC) for the items for each licensure group, while restraining the model to equal mean and variance levels for the two groups.
Results: Although the majority of the 20 items showed no item bias when evaluating depressive symptoms for non-licensed and licensed groups (12 with no signs of bias and four with minimal signs), four showed significant response differences. The ICC curves for items 2, 4, and 15 show a higher probability of indicating higher levels of depressive symptoms for the non-licensed than the licensed group, despite having similar total CES-D scores. Item 10 shows the reverse. The IIC curves for items 2, 4, and 15 show higher levels of information for the non-licensed than for the licensed group, and item 10 shows the reverse. These varying levels of information between groups are a strong indication that these items are biased in their ability to appropriately measure depressive symptoms. Despite item content/wording that suggests no group differences, clearly more than respondents' level of depressive symptoms is influencing how they respond to these items.
Implications: Given that 20% of the items in this widely-used, standardized measure are biased in this benign example of licensed and unlicensed social workers, the likelihood of biased items in less well-developed measures and/or between very different groups is high. Social work researchers should embrace new strategies for improving measures, particularly given the sensitive nature of much of the data and the extremely diverse samples that are of interest. To avoid confounding influences in measurement, IRT and DIF could prove critical to the improvement of social work research.