Before data can be analyzed it must be in the appropriate format. It is estimated that data management constitutes 80% of the quality of the final data analysis (Dasu & Johnson, 2003). Incorrect or inconsistent data can lead to incorrect conclusions negating the benefits of data driven decisions (Hellerstein, 2008). R provides an array of built-in functions and packages for the purpose of data management.
While analyzing data, it is often necessary to create new variables, collapse or recode categories, compute scales, etc. R’s data management functions have significant advantages over proprietary statistical software packages. One of the appeals of using R is that with just a few lines of syntax, data can be manipulated. For example the read.table() function is quite flexible. Different types of text data, such as comma or tab delimited can be imported into R by changing a single option (Auerbach & Zeitlin, 2015). R syntax is more logical and follows programming principles more strictly. Furthermore, there is a very lively community to rely on for direction.
Through demonstration, this workshop will show how to implement the most commonly used data management tasks in R, including built-in functions and add on packages.
The paper will begin with a discussion of how to input data directly into an R data frame and then how to use Excel or another spreadsheet to quickly and effectively record data (Auerbach & Zeitlin, 2014). Since Excel is the most commonly used spreadsheet program, entering data into Excel will be demonstrated. Packages for importing data from Excel, STATA, SAS and SPSS such as foreign, Hmisc and xlsx, will be discussed.
We will demonstrate a number of data management functions that include concatenating data sets, merging data sets, adding variables to an existing R data set, sorting a data set, sub-setting a data set, aggregating a data set, computing new variables, deleting variables, recoding variables, and renaming variables.
Through demonstration, participants will be presented with the benefits and ease of using R for data management. \ Although not necessary, attendees are encouraged to bring along their laptops to work through the examples as they are shown. Participants will be provided with a Dropbox link so they can access all the resources mentioned during this presentation.