Sunday, February 10, 2013

Resources to speed the R learning curve

I recently blogged about the learning curve  in R and posted it on on Hacker News. The response was overwhelming with lots of suggestions on where to go for more resources and further learning. I have complied them for those looking for resources to speed the R learning curve.

https://github.com/hadley/devtools/wiki  A place to learn R like a programming language, focusing on cross-cutting concerns and general concepts

http://www.r-bloggers.com/ An R blogging platform that has done a great job of promoting R and encouraging the community and gives a good sense of the state of R

http://tryr.codeschool.com/ Free R Tutorials from Code School and O'Reilly

http://cbio.ensmp.fr/~thocking/papers/2011-08-16-directlabels-and-regular-expressions-for-useR-2011/2011-useR-named-capture-regexp.pdf Fast, named capture regular expressions in R

http://www.win-vector.com/blog/2009/09/survive-r/ Survival guide

http://www.burns-stat.com/pages/Tutor/R_inferno.pdf  A manual by Patrick Burns for R developers with lots of useful tricks and tips for reducing memory usage, improving performance, and avoiding errors in computational analysis

http://blog.revolutionanalytics.com/ Blog from the staff of Revolution Analytics on using R for big data analysis, predictive modeling, data science and more
 
Morte Tutorials



http://cran.r-project.org/manuals.html Docs 


http://onepager.togaware.com/ Handson Data Science with R
http://stackoverflow.com/questions/tagged/r?sort=votes&p... StackOverflow


Cheat sheeets



R Journal


http://rseek.org/ Rseek search engine
  
http://www.r-chart.com/ For experiences of web application/database developer whose tool kit includes R
 
Books
http://www.amazon.com/gp/product/0387981403 ggplot2: Elegant Graphics for Data Analysis  by Hadley Wickham
http://www.amazon.com/dp/1449316956/ref=cm_sw_su_dp R Graphics Cookbook  by Winston Chang
 
For more books relating to R
http://www.r-project.org/doc/bib/R-books.html

Thursday, February 7, 2013

The R Learning Curve

R is meant for statistical computing. Developed in New Zealand by two professors of statistics, it is often referred to as the language written by statisticians for statisticians.  R is a GNU Project, and is available as free software. Of recent, it has found favor with many data analysts as the big data takes center stage in many businesses and there is an increased appetite for flexible tools that can be fine tuned to match individual requirements.

The R manuals , available on the R project website has very clear explanations on installing R and guidance to to different R packages. The manuals also offer some basic tutorials on using R for statistical computing and plotting graphs.The Internet is a great resource for insights on how to get things done in R. Places like stackoverflow offer more than one technique to get things done in R.

I first tried R last spring before starting grad school. It was easy to set-up and install.The R-Project website has very straightforward information on setting up R.  There are several videos on youtube as well that helps one install R.

The initial learning experience is fun, especially if one is familiar with statistics. You don't have to type print to get an answer. The basic syntax felt like typing into a calculator. Most questions that pop into your head have an answer in a manual or one of the numerous websites out there. But that is where the honeymoon ends.

Once I go hooked on R, I decided it was time for some formal learning.  Coursera was offering Computing for Data Analysis with R. I signed up. I blogged about class  experience recently.  From my experience, the most challenging areas once one get a hang of R are -

Cleaning the data - This takes time and it can be annoying. I mean like a thorn in the flesh annoying. For me, it was trial and error. I found that regular expressions in R are a great way to isolate the string that one is looking for and get a data set with the values that can be worked on to tackle the problem in hand.

Finding the right Package - This one is tricky. Not all the things you want to do in R, you can do with the basic download. You will need to download packages. Reading what others have to say about the functionalities and matching it to your needs is the best way to go about this. Once you know the name of the package, a google search can easily help you locate it and most packages can be easily loaded and installed.

Writing Functions, Using Loops and Control Structures: Like other languages, this is purely the product of deliberate learning and practice. Unlike other languages, it is hard to come across snippets of code to find exactly what you are looking for. For me, this was the single most challenging part of learning R. Most help communities assume that you have some understanding of how R code works and you are familiar with the commands. My solution was to keep searching till I found it. It was not easy. I grilled away on my computer trying different ways to extract the information I wanted. Finally when I nailed it after several iterations, I felt a sense of accomplishment that would have escaped me had I just copied and tidied up the code.

Graphs and Plots: I found this exciting but there is a lot more to learn here. The class touched many aspects of graphing in R but there is a lot more to do.

Overall learning R has definitely been time consuming and frustrating but ultimately rewarding.