Harnessing the Power of Item Response Theory in Social Work Science and Practice

Lambert, Michael Canute

Background and Significance

Biopsychosocial assessment and research requires tools that afford economical administration while providing precise measurement. Yet, social work scientists and practitioners predominantly employ assessment tools, whose psychometric parameters are estimated using classical test theory (CTT) based methodology. CTT assessment reduces measurement precision. By contrast, Item response theory (IRT) increases precision. Representing a set of theoretical models and procedures that are important for social work science and practice, IRT produces shorter yet more accurate research and assessment tools.

Methods

All IRT models imply one or more latent variables (i.e., trait, scales, factors, constructs) labeled theta (ϴ), measured by observable items (e.g., sleeping too much, overeating used to measure depression). IRT estimates the probability that respondents would respond affirmatively to items that measure their level of functioning on a biopsychosocial construct. Widely used IRT models include 1-parameter (Rasch or 1PL), 2-parameter (2PL), and 3-parameter (3PL) models. The 1PL model assumes all items measuring a construct discriminate equally well across trait levels measured. Hence, only location parameters (labeled “b”) reflecting levels of functioning items measure are estimated. The 2PL assumes that discrimination varies across items. Thus, a discrimination parameter (a) is also estimated for each item. Used primarily in achievement testing, the 3-PL includes both a and b, parameters as well as a c (guessing) parameter, which estimates the probability that a respondent with lower ability would respond accurately to an item that measures higher ability.

Measurement models guided by IRT afford scientifically rigorous assessment procedures for social work science and practice. Through the use of differential item functioning (DIF), IRT can assist researchers in addressing issues, critical to the social work profession (e.g., social justice). DIF analyses identify scale items that possess measurement bias for oppressed groups and IRT item linking and equating can reduce such bias. Furthermore, IRT permits development of precalibrated item banks. IRT computerized adaptive testing (CAT) software uses parameter estimates from banked items in its algorithms. Unlike traditional testing, where all items on a standardized assessment procedure are administered to all respondents, CAT only administers items that match functioning levels individuals being assessed present. It produces shorter and more precise assessment, tailored for each respondent, but permits valid score comparisons across respondents who receive different sets of items. CAT reduces item exposure and practice effects in longitudinal research, as respondents can receive different item sets at each data collection point. Constantly available to researchers, practitioners, clients, and research participants, web-based IRT CAT removes shipment and printing costs, assisting social work scientists and practitioners in adhering to the professional value of being judicious stewards of economic resources.

Implications

Assessment procedures often form the foundation on which social work science and practice are scaffolded. Measurement error can be one of the weakest links in such endeavors. IRT and CAT can reduce measurement artifacts. Their economical qualities, ability to reduce measurement error in science and practice, as well as test bias in disadvantaged groups, clearly fit ethically appropriate social work science and practice.

Society for Social Work and Research 18th Annual Conference: Research for Social Change: Addressing Local and Global Challenges

January 15 - 19, 2014