Monday, October 29, 2012

A place to start learning Hadoop

I have been looking for a place to learn more tools used in analytics and big data. is a great place to learn more about Hadoop. Hadoop is the most widely used framework for distributed file system processing. It is used by leading companies like Google, Facebook, Amazon,Yahoo! and others to meet the challenges from high storage capacity requirements and data intensive computing. Apart from basic tutorials on what is Hadoop, it is also a place to learn some hands on Hadoop cluster creation.

Friday, October 26, 2012

Learning to apply theories / concepts

As a grad student, my learning goal is to understand the underlying concept behind a technique so that it can be applied or improvised to solve a problem. Most assignments, tests and exams are heavy on problems and as any grad student knows, never straight forward. Apart from being motivated by the desire to do well in exams, I am always driven by the fear that I will have no clue on how to apply this in a real world scenario. A concept can't be applied unless it is well understood, and as a grad student, time is precious - so learning quickly is just as important. I recently came across a blog by Cal Newport that has tips on achieving just this.

Friday, October 19, 2012

Standing on the shoulders of giants

Today I was trying to tackle an assignment from my DBMS class. Write SQLs and extract information from the database. Sounds easy, doesn't it? Well, if it were so easy everyone would be working on databases and DBAs would be a dime a dozen. Well, the hard part is getting the sql exactly right, and ofcourse there is the issue of performance. You do not want to write an sql that takes hours to run in order to give you the answer. You want it to be crisp and exact. It was back to the drawing board for me every few minutes. Everytime, I thought I got it, I would make a few changes to the database and find out it is not the perfect query. Finally I have it nailed down but not without multiple trials and errors, and of course it can be fine tuned further. It's just the right time to remember I stand on the shoulders of real giants who were masters of invention by iteration. In the words of John Backus, who was instrumental to the development of FORTRAN language - "You need the willingness to fail all the time. You have to generate many ideas and then you have to work very hard only to discover that they don't work. And you keep doing that over and over until you find one that does work." If the head of the the team that revolutionized the first genreation of programming has that to say, I guess I really have no right to complain.

Thursday, October 18, 2012

Grad Student

I am in my second semester of grad school doing my MS. It is the begining of a long journey. I plan to blog my experiences here. This semester I am doing Technical Writing and Database Management Systems. The assignments, tests and quizzes are all surreal. The question I keep asking myself is 'am I just drifting thru' or 'am I learning'? If the purpose of grad school is to keep you on your toes and tell you, you have a lot more to learn I think I am right on track. I recently sought out a copy of 'Concrete Mathematics' by by Ronald L. Graham, Donald E. Knuth, and Oren Patashnik (Reading, Massachusetts: Addison-Wesley, 1994), xiii+657pp. ISBN 0-201-55802-5. I was inspired to start reading this because of my desire to understand the algorithms behind computer programming. Anyone who has written simple lines of code, instictively understands that recurrance is an important concept in computing. A central theme that professors reinforce is: reduce a big problem to small problems. This is the underlying principle behind mathematical induction and recursive algorithms. That is the basic concept I have absorbed from reading about 10 pages of the book, so more later.