Working with data is HARD.  Let’s face it, you’re brave to even attempt it, let alone make it your everyday job.

Fortunately, some incredibly talented people have taken the time to compile and share their deep knowledge for you.

Here are 7 books we recommend for picking up some new skills in 2013:

Hadoop: The Definitive Guide by Tom White

Tom works at Cloudera and is one of the foremost experts on Hadoop, having been an Apache Hadoop committer since February 2007.  He is a Hadoop PMC member and a member of the Apache Software Foundation.

What people have said:  “A comprehensive, ‘roll up your sleeves, here’s some Java’ deep dive into Hadoop…  No single book will do Hadoop justice, but this book is the best attempt so far.” (via Amazon)

Programming Pig by Alan Gates (free online version!)

Alan open-sourced Pig while at Yahoo! and later designed HCatalog.  He’s currently a co-founder at Hortonworks, where he continues his extensive work on open-source projects.

What people have said:  “[T]his is an excellent book that covers the details of using Pig, from basic to advanced features.  It saved my bacon (if you’ll pardon the expression…) numerous times on a recent, challenging project.” (via O’Reilly)

(Bonus: We’ve compiled some additional Pig resources here.)


NoSQL Distilled by Pramod Sadalage and Martin Fowler

Pramod is as a DBA and developer at ThoughtWorks, an enterprise application development and integration company.  He pioneered the practices and processes of evolutionary database design and database refactoring.

Martin is Thoughtworks’ Chief Scientist and pioneered various topics around object-oriented technology and agile methods.  He’s an active speaker and author, having written six books on software development.

What people have said: “The authors of this book present a wonderful, accessible, product-agnostic introduction to the world of NoSQL…  This book has demystified much of NoSQL for me and made it seem quite common-sensical.” (via Amazon)


Python for Data Analysis by Wes McKinney.

Wes is Python’s pied piper of data analysis.  The MIT math major is the main developer of pandas, a Python data analysis library, and co-founder of Lambda Foundry.

What people have said:  “One of the best and most practical programming books I’ve ever read.  Amazing job at introducing tools (ipython, pandas) that aren’t well covered on the web.” (via O’Reilly)


Machine Learning for Hackers by Drew Conway and John Myles White

Drew is kind of a big deal in NYC’s data community: in addition to being a PhD candidate at NYU, he is IA Ventures’ “Scientist-in-Residence”, a co-organizer of Data Gotham, and co-founder of DataKind.

John is a Ph.D. candidate in the Department of Psychology at Princeton University, where he leverages his mathematical modeling and machine learning chops to understand human decision-making.

What people have said:  “Drew and John have written an excellent book on presenting machine learning concepts like classification, clustering, recommendation, network graphs, and SVMs to name a few. The authors do a great job of presenting how to apply these machine learning algorithms and explain the general concepts of the algorithms.” (via Amazon)

Hadoop Operations by Eric Sammer

Eric is a principal architect at Cloudera and an active speaker on large scale data processing, integration, and system management.  Prior to Cloudera, he worked at various startups for over a decade as a DBA, SysAdmin, software engineer, and system architect.

What people have said:  “Whether the topic is HDFS and how data is ingested and replicated, or how Map/Reduce “finds” the most suitable node to run it’s tasks on, or what the cost and performance advantages are of adopting the shared-nothing, commodity model recommended for Hadoop clusters, etc., etc., etc., this book provides the how, what, when, where and why of Hadoop (the missing manual, of sorts).” (via Amazon)


Agile Data by Russell Jurney (free online version!)

Those who have met Russell (or followed him on Twitter) know him as a hilarious force of nature, but his data science chops are no joke.  After working at Ning and LinkedIn, Russell is now Hortonworks’ Hadoop Evangelist.

What people have said:  “This is definitely the best book I’ve ever written.”  (Nice review, Russell…)  (via O’Reilly)


Try our award-winning
Pig Platform, free