Session: Strategies and Techniques for the Analysis of Secondary Databases



Bridging Disciplinary Boundaries (January 11 - 14, 2007)


Saturday, January 13, 2007: 8:00 AM-9:45 AM
Seacliff C (Hyatt Regency San Francisco)
Strategies and Techniques for the Analysis of Secondary Databases

Speaker/Presenter:	Christopher G. Hudson, PhD, Salem State College
Abstract Text: This workshop is designed to introduce recent developments in the strategies, methods, and techniques for preparation and analysis of large-scale databases, particularly those available from the U.S. census and medical databases. It will focus on the logic that informs key design decisions, such as selection of unit of analysis, and techniques for the preparation and transformation of data, for example, creation of various composite variables and the assessment of their reliability. Examples will be from peer-reviewed published studies of the presenter’s over the last 20+ years, involving homelessness and psychiatric care, using SPSS for the preparation of the data, supplemented by such programs as Excel, LISREL, and Maptitude. The following outlines the major topics: (1) Secondary Analysis as a Research Strategy: Types of data; advantages and disadvantages; comments on costs, PC equipment, and feasible database sizes; the importance of theory; and typical steps in data analysis. (2) Selected Methods and Techniques: (i) Determining unit of analysis: This section will begin will a discussion of the potentials for modeling variation in phenomena of interest between multiple jurisdictions. It will consider the pros and cons of three strategies: data aggregation, spreading, and multi-level modeling, or some combination. A special example to be considered will be the problem and techniques of merging data from sources with seemingly incompatible units, and the possibility of finding a common denominator. A technique for the weighted aggregation of data will be presented. Another critical topic is the question of studying variations in service episodes versus persons, and techniques for resolving this dilemma. (ii) Question of weighting by population in census studies of multiple jurisdictions: When, why, and how would this be done? (iii) The element of time: In databases organized on the basis of service episodes, it is often necessary to use lag variables and to compute LOS, time between episodes and recidivism rates. How is this done? (iv) Computation of composite variables, i.e. Median education, by computing grouped medians from census counts for categories of ordinal data; Community SES, based on census data and sociometric studies; Racial diversity, using the index of dispersion; Service access, i.e. distance between home and hospital, based on latitudes and longitudes of zipcode centroids. (v) Assessment of reliability and validity. Key examples to be reviewed include: Assessing diagnostic reliability using Kappa; use of multiple indicators and measures, i.e. census data and hospitalization rates as indicators of SMI prevalence; Assessment of systemic error in data collection, i.e. the use of an enumerator survey and weather data in assessing error in the homeless S-Night census. (3) Conclusion: Comments on major analytic procedures, i.e.: GIS and the mapping of bivariate correlations; the assessment of the overall fit in Cox proportional hazard models; and several key considerations in use of SEM, for example, that of assessment of goodness of fit with large databases, alternatives in inclusion of measurement reliabilities; and advantages in use of the extended LISREL model. (4) Format: Didactic, with discussion Powerpoint, and SPSS examples.

See more of Workshop

See more of Bridging Disciplinary Boundaries (January 11 - 14, 2007)