This is an all to common, and unfortunate, occurrence. So long as Excel dominates, this is the reality with which we will have to live. Ultimately, it’s the user needing to do advanced or in depth Analytics that loses.
Category Archives: Analytics
Joseph Rickert, a Data Scientist and R Community Manager for Revolution Analytics recently created a ‘Meta’ book about ‘R’ which he posits might be called, “An R Based Introduction to Probability and Statistics with Applications”. ‘Meta’ in this context means this R Book is virtualmade up of references to its content; various other books, papers and articles.
What follows below is a snippet of a blog post by Joe discussing this ‘Meta’ book and a table which describes the various chapters and topics of the book with the associated reference (as clickable links) to the material Joe has identified to fill out the contents of his hypothetical book.
At the Revolutions Analytics Blog, Joseph Rickert wrote:
[…] it occurred to me that there is a tremendous amount of high quality material on a wide range of topics in the Contributed Documentation page (of CRAN) that would make a perfect introduction to all sorts of people coming to R. […] from among this treasure cache (and a few other online sources), I have assembled an R “meta” book in the following table that might be called: An R Based Introduction to Probability and Statistics with Applications.
[…]
The content column lists the topics that I think ought to be included in a good introductory probability and statistics textbook. With a little searching, you will be able to find a discussion of each topic in the document listed to its right. Obviously, there is a lot overlap among the documents listed, since most of them are substantial works that cover much more than the few topics that I have listed.
Content  Document  Author  

1.  Basic Probability and Statistics  Introduction to Probability and Statistics Using R  G. Jay Kerns 
2.  Fitting Probability Distributions  Fitting Distributions with R  Vito Ricci 
3.  • Regression  Practical Regression and Anova using R  Julian J. Faraway 
– Inference  
– Diagnostics  
– Stepwise Regression  
– Ridge Regression  
• ANOVA  
4.  Experimental Design  An R companion to Experimental Design  Vikneswaran 
5.  Survival Analysis  Cox ProportionalHazards Regression for Survival Data  John Fox 
6.  Generalized Linear Models  Analysis of epidemiological data using R and Epicalc  Virasakdi Chongsuvivatwong 
7.  • Bootstrap  icebreakeR  Andrew Robinson 
• Hierarchical Models  
• Nonlinear Mixed Effects  
8.  Time Series  Time Series Analysis with R  McLeod, Yu and Mahdi 
9.  • Bayesian Statistics  Statistics Using R with Biological Examples  Kim Seefeld and Ernst Linder 
• Gibbs Sampler  
10.  Machine Learning  R and Data Mining: Examples and Case Studies  Yanchang Zhao 
• Decision Trees and Random Forest  
• Clustering  
• Outlier Detection  
• Time Series Analysis and Mining  
• Text Mining  
• Social Network Analysis  
11.  Bioinformatics  Applied Statistics for Bioinformatics using R  Wim P. Krijnen 
• Cluster Analysis  
• Classification Methods  
• Markov Models  
• Micro Array Analysis  
12.  Forecasting  Forecasting: principles and practice  Hyndman and Athanasopoulos 
13.  Structural Equation Models  Structural Equation Models  John Fox 
14.  Credit Scoring  Guide to Credit Scoring in R  Dhruv Sharma 
R is a programming language almost exclusively associated with the processing of numbers. Unlike the languages Python, Ruby, Java or C, R is not thought of when there is a processing task to be done involving text type data. This is a shame. R has the capability to process character strings, has fairly robust support for regular expression processing and when combined with it’s inherent statistics capabilities, makes for a very powerful tool to perform text readability analysis, semantic analysis and many other operations not thought about in concert with R.
Following is a list of R learning resources for text and string processing:
 eBook Download: Handling and Processing Strings in R
Gaston Sanchez Is a self described applied statistician who has written a CreativeCommons licensed ebook, Handling and Processing Strings in R. This book is an excellent overview of R’s string handling capabilities from the basics to more advanced topics. If for no other reason, the two chapters on regular expressions make this book a mustread.
This is a link to the post on Gaston’s blog where her describes his motivation for writing the book and gives an overview of its content: Handling and Processing Strings in R  Gaston Sanchez.

R Programming/Text Processing – Wikibooks
Wikibooks is a project hosted by the Wikimedia Foundation. The same organization which hosts Wikipedia. The mission of Wikibooks is to provide a forum for collaboratively writing opencontent textbooks. The subjects of books which have been written at Wikibooks range from cooking to clocks.
As it happens, there are a fair number of technical Wikibooks. This one in particular is focused on the R Programming Language and it’s use for Text Processing.

Regular Expressions – Wikibooks
It’s not long after the topic of text processing is raised that the closely related topic of Regular Expressions is brought up. It’s not possible, not is it practical to have a discussion of one without the other, they are that intimately linked. This Wikibook, while not specific to R, has relevance in that it is a comprehensive look at Regular Expressions.

R Journal Article: stringr: modern, consistent string processing
Hadley Wickham is an Assistant Professor of Statistics at Rice University. He is also the author of the stringr package for R. In this R – Journal article, Hadley gives an in depth look at his package.
From the stringr package documentation:
stringr is a set of simple wrappers that make R’s string functions more consistent, simpler and easier to use. It does this by ensuring that: function and argument names (and positions) are consistent, all functions deal with NA’s and zero length character appropriately, and the output data structures from each function matches the input data structures of other functions.
 Introduction to String Matching and Modification in R Using Regular Expressions
Svetlana Eden Is a Biostatistician at Vandebilt university. Svetlana authored the above paper, “Introduction to String Matching and Modification in R Using Regulfar Expressions”, in which she takes a deep dive into the use of Regular Repressions in R.
This next list of resources, while not specific to string or text processing, are very good resources for getting started with and into using the R programming language:
 Cookbook for R » Cookbook for R
 http://cran.rproject.org/doc/contrib/Shortrefcard.pdf
 twotorials by anthony damico
This last list of resources, again are general to R. These are from a Computerworld series of articles introducing the R language and providing a fairly comprehensive beginners guide to the language. The last item is an enumeration of 60+ additional learning resources including books, articles, tips and tricks:
 Beginner’s guide to R: Introduction – Computerworld
 Beginner’s guide to R: Get your data into R – Computerworld
 Beginner’s guide to R: Easy ways to do basic data analysis – Computerworld
 Beginner’s guide to R: Painless data visualization – Computerworld
 Beginner’s guide to R: Syntax quirks you’ll want to know – Computerworld
 60+ R resources to improve your data skills – Computerworld