A Non-Randomized Efficacy Trial Replication of a Teaching Practice Intervention Using Propensity Score Weighting
A prior randomized control trial (RCT) demonstrated significant math achievement effects for CareerStart in middle and high school. This RCT was an efficacy trial conducted under idealized conditions including participants from a single district and a high level of involvement from researchers. Given these positive findings, an effectiveness trial should be conducted under more realistic conditions. However, important questions remain about the role of factors unique to the RCT district not easily replicable elsewhere (e.g., support from the superintendent of the district), and whether CareerStart should continue to be offered in language arts, given the lack of findings in reading achievement.
Methods. Thirteen schools outside of the efficacy trial implemented CareerStart after the RCT. Adding these to the seven waitlisted control schools from the RCT provided 20 non-randomly assigned schools that could answer questions about scalability to more heterogeneous populations and the effect of CareerStart on reading achievement. Being non-random, I used propensity score analysis (PSA) methods with inverse probability weighting to model an interrupted time series of achievement from third to eighth grade, comparing students in CareerStart schools to middle school students statewide. The continued implementation of CareerStart in the original RCT treatment schools during this period offered a unique test of the credibility of the PSA. Modeling the PSA on these original treatments and demonstrating equivalence between the PSA and RCT effects for these schools provided a stronger case for the credibility of the PSA than the typical balance test. Quantitative and qualitative data supported the assumption that implementations were equivalent across the two study periods.
Results. The PSA improved balance on observable characteristics, with only four of twenty-four characteristics remaining significantly different after PSA. CareerStart was shown to have a significant and positive effect on reading achievement but no effect on math achievement, and the effect on reading was larger outside the RCT district. The credibility test suggested the math findings could not be trusted. Alternatively, the reading findings were shown to be credible.
Conclusion and Implications. The finding that a single PSA model was credible for one outcome but not for another lent further support to the idea that PSA is a study-specific rather than generalizable non-random method. This study demonstrated the value of a credibility test based on having a non-random implementation of an efficacious program in the neighborhood of an efficacy RCT, suggesting that this may be a generalizable strategy for researchers to examine the scalability of efficacious programs. The findings also lent support to the continued study of CareerStart and support for its continued implementation and testing as an educational policy that can improve the academic achievement of students at risk of school failure.