Home

 

R: Case Study 1

This case study is part of a collection of exercises involving the statistics software package R.

This case study deals with data from experiments described in Kürschner et al. 2008, which concern the intelligibility of Swedish words among Danes. The data for this experiment can be found in the file /storage3/data/erikt/r/kuerschner-et-al.dat

Kürschner, Sebastian, Charlotte Gooskens and Renée van Bezooijen. Linguistic determinants of the intelligibility of Swedish words among Danes. International Journal of Humanities and Arts Computing, 83-100. (PDF)

Loading data in R

We need to load the experiment data into R. Inspection of the data file with a browser of file manager reveals that the file consists of a table with rows on lines and column item separated by tab tokens. This is a format that R understands: we can use the command read.table() to read such files. Use help(read.table) to get more information about the command.

read.table()can take several options. We note that the first line of the data file specifies the names of the columns. This should be specified in the reading command to prevent this line from being included in the table: add header=TRUE between the brackets:

> table = read.table('/storage3/data/erikt/r/kuerschner-et-al.dat',
  header=TRUE)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
line 15 did not have 20 elements

When we read the file, we get an error message. It is not easy to find out the cause of this. It seems that the program cannot handle column elements that end with a slash. After some experiments, we find out that the error message can be removed by specifying the seperator character, the token in the file between the elements of different columns (a TAB sign), in the reading command:

> table = read.table('/storage3/data/erikt/r/kuerschner-et-al.dat',
  header=TRUE,sep="\t")

When we examine the table in R, we see that some numbers have been specified with a comma as decimal seperator. This should also be mentioned in the reading command:

> table = read.table('/storage3/data/erikt/r/kuerschner-et-al.dat',
  header=TRUE,sep="\t",dec=",")

Specifying the comma as decimal separator enables R to identify the foating point numbers in the data file.

Only now can we start processing the data with R. Note that it is important to load the file into R in the correct way. For other data files you may need to use a different set of options from the command read.table().

Processing the data

Read the paper and try to use R to verify a claim of the authors.


Home
Last update: March 17, 2011. erikt(at)xs4all.nl