R Learning Resources for String Processing

R is a programming language almost exclusively associated with the processing of numbers. Unlike the languages Python, Ruby, Java or C, R is not thought of when there is a processing task to be done involving text type data. This is a shame. R has the capability to process character strings, has fairly robust support for regular expression processing and when combined with it’s inherent statistics capabilities, makes for a very powerful tool to perform text readability analysis, semantic analysis and many other operations not thought about in concert with R.

Following is a list of R learning resources for text and string processing:

  • eBook Download: Handling and Processing Strings in R

    Gaston Sanchez eBook: Handling and Processing Strings in R

    Gaston Sanchez Is a self described applied statistician who has written a Creative-Commons licensed e-book, Handling and Processing Strings in R. This book is an excellent overview of R’s string handling capabilities from the basics to more advanced topics. If for no other reason, the two chapters on regular expressions make this book a must-read.

    This is a link to the post on Gaston’s blog where her describes his motivation for writing the book and gives an overview of its content: Handling and Processing Strings in R | Gaston Sanchez.

  • R Programming/Text Processing – Wikibooks

    Captured by webthumbnail.org: R Programming/Text Processing - Wikibooks

    Wikibooks is a project hosted by the Wikimedia Foundation. The same organization which hosts Wikipedia. The mission of Wikibooks is to provide a forum for collaboratively writing open-content textbooks. The subjects of books which have been written at Wikibooks range from cooking to clocks.

    As it happens, there are a fair number of technical Wikibooks. This one in particular is focused on the R Programming Language and it’s use for Text Processing.

  • Regular Expressions – Wikibooks

    Captured by webthumbnail.org: Regular Expressions - Wikibooks

    It’s not long after the topic of text processing is raised that the closely related topic of Regular Expressions is brought up. It’s not possible, not is it practical to have a discussion of one without the other, they are that intimately linked. This Wikibook, while not specific to R, has relevance in that it is a comprehensive look at Regular Expressions.

  • R Journal Article: stringr: modern, consistent string processing

    Captured by webthumbnail.org: Hadley Wickham

    Hadley Wickham is an Assistant Professor of Statistics at Rice University. He is also the author of the stringr package for R. In this R – Journal article, Hadley gives an in depth look at his package.

    From the stringr package documentation:

    stringr is a set of simple wrappers that make R’s string functions more consistent, simpler and easier to use. It does this by ensuring that: function and argument names (and positions) are consistent, all functions deal with NA’s and zero length character appropriately, and the output data structures from each function matches the input data structures of other functions.

  • Introduction to String Matching and Modification in R Using Regular Expressions

    Svetlana Eden Is a Biostatistician at Vandebilt university. Svetlana authored the above paper, “Introduction to String Matching and Modification in R Using Regulfar Expressions”, in which she takes a deep dive into the use of Regular Repressions in R.

This next list of resources, while not specific to string or text processing, are very good resources for getting started with and into using the R programming language:

This last list of resources, again are general to R. These are from a Computerworld series of articles introducing the R language and providing a fairly comprehensive beginners guide to the language. The last item is an enumeration of 60+ additional learning resources including books, articles, tips and tricks:

Metacademy: Machine Learning and Probabilistic AI Learning Resources

I’ve recently come across a tremendous resource for the discovery of various machine learning and probabilistic artificial intelligence topics and associated educational materials. The site I’m describing is Metacademy.Metacademy Large Cropped Home PageMetacademy is a community-driven, open-source platform to facilitate the collaborative construction of a web of knowledge by domain experts meant to help individuals efficiently learn about any topic of interest (supported by Metacademy and the domain experts). The experts responsible for Metacademy are Roger Grosse and Colorado Reed. In addition to building the site, they organized roughly 350 machine learning and probabilistic artificial intelligence concepts along with related training and learning materials.

While Metacademy is currently focused on machine learning and probabilistic artificial intelligence topics, eventually, it has the goal to cover a much wider breadth of knowledge; e.g. mathematics, engineering, music, medicine, computer science, etc.

The premiss of Metacademy is that a user will search for and click on a concept of interest. Metacademy then produces a “learning plan” which includes the prerequisite concepts which were identified in the web of knowledge previously created by the domain experts. This component of identifying for the student the list of prerequisite concepts is what sets Metacademy apart from other learning sites or course catalogs.

As posted at Metacademy:
… But try learning something of conceptual depth by sifting through Google search results … and you’re in for a lot of agony. Before you learn this concept, you need to learn its prerequisite concepts (sometimes you’re not entirely sure what these are), and the prerequisite concepts may have prerequisites themselves. Pretty soon, you’re deep in dependency hell, switching between twenty different tabs trying to understand the various [pre]prerequisite concepts in order to understand the tutorial article Google returned …

Metacademy’s learning experience revolves around two central components:

  • a “learning plan” in a tabular ‘list view’
    Metacademy Logistic Regression List View
  • a “graph view” of the learning plan which is meant to help explore relationships among concepts
    Metacademy Logistic Regression Graph View

Clicking on the check-mark next to the title of a concept in either the graph or list view marks that [prerequisite] concept as being understood. To not show those concepts which have been marked as being understood, click the “hide” button in the upper right. Note that Metacademy will remember the concepts marked as understood and hidden and will automatically re-apply these selections at future visits.

As Metacademy is a work in progress and limited in scope, please keep an open mind when visiting, but I think that you will find it an interesting, unique and valuable resource, particularly if you are, as I am, actively exploring the world of machine learning.


Improve your Digital Analytics Skills with Google’s … – Analytics Blog

See on Scoop.itEvidence Based Systems

That’s why today we’re excited to announce Analytics Academy — a new hub for you and your colleagues to participate in free, online, community-based video courses about digital analytics and Google Analytics.

mike pluta‘s insight:

Another free resource from which to learn about analytics, statistics, data visualization and big data concepts.

See on analytics.blogspot.com