Friday, November 30, 2012

More places to learn Hadoop

This is a follow-up to my earlier post on starting to learn Hadoop. I came across a couple of interesting links to learning Hadoop and increasing understanding on how it works -

Learning “Machine Learning” by Example is a Meetup which you can join in off-site. It is good starter place for someone who is new to analytics. It is a wonderful opportunity to learn the concepts underlying Machine Learning and is based on the “learning by example” principle.

Here is a well constructed diagram on how Big Data gets stored and retrieved to and from a distributed file system like Hadoop


Just came  across these tutorials and think they are awesome
Yahoo's Hadoop Tutorial

Tuesday, November 27, 2012

Sources for learning more on Algorithms and their applications

Algorithms are an enticing concept. When in grad school for Software Engineering, the hows, whys and the pattens are both mind blogging and intriguing. The history of algorithms, their application before the advent of computing, and in the present are all very interesting.

Here are some material that has been of great help -

Introduction to Algorithms   by Thomas H Cormen, Ronald L. Rivest and Clifford Stein

Algorithms in Modern Mathematics and Computer Science by by Donald E. Knuth

Algorithms by Sanjoy Dasgupta, Christos Papadimitriou and Umesh Vazirani

I am always looking for more, please feel free to comment

Thursday, November 22, 2012

Using MOOCs while in Grad School

Graduate school certainly invokes your curiosity. You are sometimes so intrigued that you want to learn more but it is not a part of your coursework. What do you do? Of course, you could look it up on the web, go to a library but you want more. You actually want to learn in a more interactive way. My answer is to take advantage of the Massive Open Online Courses (MOOCs) including udacityCoursera and edX. You get quality courses from highly qualified professors from some of the best universities in the world. I use it to supplement my coursework. You can even get certificates for some of the courses you complete.

For a graduate student the questions are -

·         how do I fit this in with all the other things I have to do while in grad school? My answer is to do just one course per semester of just do portions of a course that you want to. If you take more than you can chew, you will give up.

·         how do I make a choice among the many courses, they all seem so interesting? If you are going to use the course to supplement your coursework, read the syllabus and choose the ones that will augment your schoolwork. If you are doing a course that your school does not offer, but you want to do anyway, make sure you are willing to commit three to four hours a week for the course especially if it is learning something new.

·         how do I keep myself motivated to keep going while I have so many other grad school commitments? Know before you start you are going to have to face this point as you reach are midway into your grad school semester. Be ready to dedicate the time before you commit. One good time to start a MOOC course is during a semester break, this way you are likely to finish before the semester mid-terms.

·         how do I know I am learning what I set out to learn? Start with a check list, especially if your aim is to learn a specific concept or technology. Check yourself by doing a small project post learning. I know that is better said than done. Do it during your semester break.

·         how do I translate my coursework onto my resume? If you have completed the course, you get a certificate of completion. If you have done only specific parts of the course, a project you did using your newly acquired knowledge is the best bet. 

I know what I have said here is in no way comprehensive, feel free to add your comments and experiences. So far I have partially completed three courses - Stats 101 and CS 101 from Udacity to brush up my statistics skills and basics of CS, and Data Analysis from Coursera to learn R in a structured manner.

Wednesday, November 7, 2012

Be Assertive in grad school

In grad school, or for that matter anywhere some of us dismiss our findings as trivial, and watch others use it to their advantage. I had such an experience just yesterday. I located an error in the code handed to us by our professor. I ran a few tests and knew he was way off the mark. In my naivety I went ahead and shared this information with a fellow grad student. The smart cookie approached the professor with the error, got applauded for it and may have scored extra credit points as well.

Lesson learned : 
1. If you have tested an idea, share it with the world in such a way that you are the owner of your idea. You should benefit from the hours and hard work that went into the process.
2. Do not dismiss your findings as trivial. You make risk looking like a fool if you are wrong, but the taste of sucess if you are right is worth the risk.
3. If you do make this error, just remember you were smart once, you are more likely than the person stealing your idea to be smart once again !!!

Sunday, November 4, 2012

Data Scientists work to solve problems for the thrill

Is money the only motivator? Sometimes you just want to be challenged more than you want to be remunerated. You just want to ace it. Here is a data scientist example of this behavior in action.

Monday, October 29, 2012

A place to start learning Hadoop

I have been looking for a place to learn more tools used in analytics and big data. is a great place to learn more about Hadoop. Hadoop is the most widely used framework for distributed file system processing. It is used by leading companies like Google, Facebook, Amazon,Yahoo! and others to meet the challenges from high storage capacity requirements and data intensive computing. Apart from basic tutorials on what is Hadoop, it is also a place to learn some hands on Hadoop cluster creation.

Friday, October 26, 2012

Learning to apply theories / concepts

As a grad student, my learning goal is to understand the underlying concept behind a technique so that it can be applied or improvised to solve a problem. Most assignments, tests and exams are heavy on problems and as any grad student knows, never straight forward. Apart from being motivated by the desire to do well in exams, I am always driven by the fear that I will have no clue on how to apply this in a real world scenario. A concept can't be applied unless it is well understood, and as a grad student, time is precious - so learning quickly is just as important. I recently came across a blog by Cal Newport that has tips on achieving just this.

Friday, October 19, 2012

Standing on the shoulders of giants

Today I was trying to tackle an assignment from my DBMS class. Write SQLs and extract information from the database. Sounds easy, doesn't it? Well, if it were so easy everyone would be working on databases and DBAs would be a dime a dozen. Well, the hard part is getting the sql exactly right, and ofcourse there is the issue of performance. You do not want to write an sql that takes hours to run in order to give you the answer. You want it to be crisp and exact. It was back to the drawing board for me every few minutes. Everytime, I thought I got it, I would make a few changes to the database and find out it is not the perfect query. Finally I have it nailed down but not without multiple trials and errors, and of course it can be fine tuned further. It's just the right time to remember I stand on the shoulders of real giants who were masters of invention by iteration. In the words of John Backus, who was instrumental to the development of FORTRAN language - "You need the willingness to fail all the time. You have to generate many ideas and then you have to work very hard only to discover that they don't work. And you keep doing that over and over until you find one that does work." If the head of the the team that revolutionized the first genreation of programming has that to say, I guess I really have no right to complain.

Thursday, October 18, 2012

Grad Student

I am in my second semester of grad school doing my MS. It is the begining of a long journey. I plan to blog my experiences here. This semester I am doing Technical Writing and Database Management Systems. The assignments, tests and quizzes are all surreal. The question I keep asking myself is 'am I just drifting thru' or 'am I learning'? If the purpose of grad school is to keep you on your toes and tell you, you have a lot more to learn I think I am right on track. I recently sought out a copy of 'Concrete Mathematics' by by Ronald L. Graham, Donald E. Knuth, and Oren Patashnik (Reading, Massachusetts: Addison-Wesley, 1994), xiii+657pp. ISBN 0-201-55802-5. I was inspired to start reading this because of my desire to understand the algorithms behind computer programming. Anyone who has written simple lines of code, instictively understands that recurrance is an important concept in computing. A central theme that professors reinforce is: reduce a big problem to small problems. This is the underlying principle behind mathematical induction and recursive algorithms. That is the basic concept I have absorbed from reading about 10 pages of the book, so more later.