Session: Cluster Analysis and Generalized Linear Models for Spatial Data in R: Free Software to Map Client Needs (Society for Social Work and Research 14th Annual Conference: Social Work Research: A WORLD OF POSSIBILITIES)

5 Cluster Analysis and Generalized Linear Models for Spatial Data in R: Free Software to Map Client Needs

Cluster: Research Design and Measurement

Richard Smith, MFA, MSW, University of California, Berkeley and Julian C. C. Chow, PhD, School of Social Welfare
Thursday, January 14, 2010: 1:30 PM-3:15 PM
Seacliff C (Hyatt Regency)
The purpose of this workshop is to expose researchers to advanced spatial data analysis techniques in the R environment. Social Work researchers often work with point data, such as clients in communities or organizations in metro areas, or with areal data, which often takes the form of aggregate characteristics of a census tract or zip code (Freisthler et al., 2004). However, Tobbler's law (1987; 1999; 2004) everything is related to everything else, but near objects are more related than the distant. This spatial dependency violates the basic assumption of ordinary least squares regression.

For point data, cluster analysis and other point pattern techniques identify where objects are found together. Spatial data analysis has developed ways of modeling or controlling for these dependences. One frequently used statistic is the Moran's I, a measure of spatial autocorrelation for areal units. Spatial regression in ordinary least squares is one way to control for spatial autocorrelation by using a connectivity matrix as a weight (Anselin et al., 2005). However, not all data structures lend themselves to OLS regression. Spatial filtering an experimental technique that uses the eigenvectors of the spatial connectivity matrix to control for autocorrelation (Tiefelsdorf & Griffith, 2007). These may then be used in any generalized linear model. For example, binary outcomes using logistic regression or count data models using Poisson or negative binomial regression.

R can manipulate both point and areal data (Bivand et al., 2008). The workshop will show how to load spatial data into R from a flat file created by a spreadsheet, any proprietary software package or from the industry standard ESRI shapefile format. Next, we will show an example of graphing and clustering points. Finally, we will demonstrate an example of spatial filtering with eigenvectors.

The R project is a GNU (GNU is Not UNIX) version of the S system developed by Chambers et al. at Bell Laboratories for statistical and graphical analysis of data. GNU is the operating system developed by Richard Stallman of MIT and the Free Software Foundation. GNU software is copy left under the general public license (GPL) so that it may be used, modified and redistributed provided all derivative products also are copyleft GPL. Accordingly, R is distributed for free, upgrades are free and if a function does not exist, the user may create it.

The workshop will be primarily a demonstration, but with advanced registration, the instructors will provide materials so that attendees can install R and workshop data on their laptops before attending. This will allow people to follow along. The workshop assumes prior knowledge of R. Workshop participants who do not know R will be given pre-conference homework.

See more of: Workshops