This is an all to common, and unfortunate, occurrence. So long as Excel dominates, this is the reality with which we will have to live. Ultimately, it’s the user needing to do advanced or in depth Analytics that loses.
Joseph Rickert, a Data Scientist and R Community Manager for Revolution Analytics recently created a ‘Meta’ book about ‘R’ which he posits might be called, “An R Based Introduction to Probability and Statistics with Applications”. ‘Meta’ in this context means this R Book is virtual-made up of references to its content; various other books, papers and articles.
What follows below is a snippet of a blog post by Joe discussing this ‘Meta’ book and a table which describes the various chapters and topics of the book with the associated reference (as clickable links) to the material Joe has identified to fill out the contents of his hypothetical book.
At the Revolutions Analytics Blog, Joseph Rickert wrote:
[…] it occurred to me that there is a tremendous amount of high quality material on a wide range of topics in the Contributed Documentation page (of CRAN) that would make a perfect introduction to all sorts of people coming to R. […] from among this treasure cache (and a few other online sources), I have assembled an R “meta” book in the following table that might be called: An R Based Introduction to Probability and Statistics with Applications.
The content column lists the topics that I think ought to be included in a good introductory probability and statistics textbook. With a little searching, you will be able to find a discussion of each topic in the document listed to its right. Obviously, there is a lot overlap among the documents listed, since most of them are substantial works that cover much more than the few topics that I have listed.
|1.||Basic Probability and Statistics||Introduction to Probability and Statistics Using R||G. Jay Kerns|
|2.||Fitting Probability Distributions||Fitting Distributions with R||Vito Ricci|
|3.||• Regression||Practical Regression and Anova using R||Julian J. Faraway|
|– Stepwise Regression|
|– Ridge Regression|
|4.||Experimental Design||An R companion to Experimental Design||Vikneswaran|
|5.||Survival Analysis||Cox Proportional-Hazards Regression for Survival Data||John Fox|
|6.||Generalized Linear Models||Analysis of epidemiological data using R and Epicalc||Virasakdi Chongsuvivatwong|
|7.||• Bootstrap||icebreakeR||Andrew Robinson|
|• Hierarchical Models|
|• Nonlinear Mixed Effects|
|8.||Time Series||Time Series Analysis with R||McLeod, Yu and Mahdi|
|9.||• Bayesian Statistics||Statistics Using R with Biological Examples||Kim Seefeld and Ernst Linder|
|• Gibbs Sampler|
|10.||Machine Learning||R and Data Mining: Examples and Case Studies||Yanchang Zhao|
|• Decision Trees and Random Forest|
|• Outlier Detection|
|• Time Series Analysis and Mining|
|• Text Mining|
|• Social Network Analysis|
|11.||Bioinformatics||Applied Statistics for Bioinformatics using R||Wim P. Krijnen|
|• Cluster Analysis|
|• Classification Methods|
|• Markov Models|
|• Micro Array Analysis|
|12.||Forecasting||Forecasting: principles and practice||Hyndman and Athanasopoulos|
|13.||Structural Equation Models||Structural Equation Models||John Fox|
|14.||Credit Scoring||Guide to Credit Scoring in R||Dhruv Sharma|
R is a programming language almost exclusively associated with the processing of numbers. Unlike the languages Python, Ruby, Java or C, R is not thought of when there is a processing task to be done involving text type data. This is a shame. R has the capability to process character strings, has fairly robust support for regular expression processing and when combined with it’s inherent statistics capabilities, makes for a very powerful tool to perform text readability analysis, semantic analysis and many other operations not thought about in concert with R.
Following is a list of R learning resources for text and string processing:
- eBook Download: Handling and Processing Strings in R
Gaston Sanchez Is a self described applied statistician who has written a Creative-Commons licensed e-book, Handling and Processing Strings in R. This book is an excellent overview of R’s string handling capabilities from the basics to more advanced topics. If for no other reason, the two chapters on regular expressions make this book a must-read.
This is a link to the post on Gaston’s blog where her describes his motivation for writing the book and gives an overview of its content: Handling and Processing Strings in R | Gaston Sanchez.
Wikibooks is a project hosted by the Wikimedia Foundation. The same organization which hosts Wikipedia. The mission of Wikibooks is to provide a forum for collaboratively writing open-content textbooks. The subjects of books which have been written at Wikibooks range from cooking to clocks.
As it happens, there are a fair number of technical Wikibooks. This one in particular is focused on the R Programming Language and it’s use for Text Processing.
It’s not long after the topic of text processing is raised that the closely related topic of Regular Expressions is brought up. It’s not possible, not is it practical to have a discussion of one without the other, they are that intimately linked. This Wikibook, while not specific to R, has relevance in that it is a comprehensive look at Regular Expressions.
Hadley Wickham is an Assistant Professor of Statistics at Rice University. He is also the author of the stringr package for R. In this R – Journal article, Hadley gives an in depth look at his package.
From the stringr package documentation:
stringr is a set of simple wrappers that make R’s string functions more consistent, simpler and easier to use. It does this by ensuring that: function and argument names (and positions) are consistent, all functions deal with NA’s and zero length character appropriately, and the output data structures from each function matches the input data structures of other functions.
- Introduction to String Matching and Modification in R Using Regular Expressions
Svetlana Eden Is a Biostatistician at Vandebilt university. Svetlana authored the above paper, “Introduction to String Matching and Modification in R Using Regulfar Expressions”, in which she takes a deep dive into the use of Regular Repressions in R.
This next list of resources, while not specific to string or text processing, are very good resources for getting started with and into using the R programming language:
- Cookbook for R » Cookbook for R
- twotorials by anthony damico
This last list of resources, again are general to R. These are from a Computerworld series of articles introducing the R language and providing a fairly comprehensive beginners guide to the language. The last item is an enumeration of 60+ additional learning resources including books, articles, tips and tricks:
- Beginner’s guide to R: Introduction – Computerworld
- Beginner’s guide to R: Get your data into R – Computerworld
- Beginner’s guide to R: Easy ways to do basic data analysis – Computerworld
- Beginner’s guide to R: Painless data visualization – Computerworld
- Beginner’s guide to R: Syntax quirks you’ll want to know – Computerworld
- 60+ R resources to improve your data skills – Computerworld