Working with data is HARD. Let’s face it, you’re brave to even attempt it, let alone make it your everyday job.
Fortunately, some incredibly talented people have taken the time to compile and share their deep knowledge for you.
Tom works at Cloudera and is one of the foremost experts on Hadoop, having been an Apache Hadoop committer since February 2007. He is a Hadoop PMC member and a member of the Apache Software Foundation.
What people have said: “A comprehensive, ‘roll up your sleeves, here’s some Java’ deep dive into Hadoop… No single book will do Hadoop justice, but this book is the best attempt so far.” (via Amazon)
Alan open-sourced Pig while at Yahoo! and later designed HCatalog. He’s currently a co-founder at Hortonworks, where he continues his extensive work on open-source projects.
What people have said: “[T]his is an excellent book that covers the details of using Pig, from basic to advanced features. It saved my bacon (if you’ll pardon the expression…) numerous times on a recent, challenging project.” (via O’Reilly)
(Bonus: We’ve compiled some additional Pig resources here.)
Pramod is as a DBA and developer at ThoughtWorks, an enterprise application development and integration company. He pioneered the practices and processes of evolutionary database design and database refactoring.
Martin is Thoughtworks’ Chief Scientist and pioneered various topics around object-oriented technology and agile methods. He’s an active speaker and author, having written six books on software development.
What people have said: “The authors of this book present a wonderful, accessible, product-agnostic introduction to the world of NoSQL… This book has demystified much of NoSQL for me and made it seem quite common-sensical.” (via Amazon)
Wes is Python’s pied piper of data analysis. The MIT math major is the main developer of pandas, a Python data analysis library, and co-founder of Lambda Foundry.
What people have said: “One of the best and most practical programming books I’ve ever read. Amazing job at introducing tools (ipython, pandas) that aren’t well covered on the web.” (via O’Reilly)
Drew is kind of a big deal in NYC’s data community: in addition to being a PhD candidate at NYU, he is IA Ventures’ “Scientist-in-Residence”, a co-organizer of Data Gotham, and co-founder of DataKind.
John is a Ph.D. candidate in the Department of Psychology at Princeton University, where he leverages his mathematical modeling and machine learning chops to understand human decision-making.
What people have said: “Drew and John have written an excellent book on presenting machine learning concepts like classification, clustering, recommendation, network graphs, and SVMs to name a few. The authors do a great job of presenting how to apply these machine learning algorithms and explain the general concepts of the algorithms.” (via Amazon)
Eric is a principal architect at Cloudera and an active speaker on large scale data processing, integration, and system management. Prior to Cloudera, he worked at various startups for over a decade as a DBA, SysAdmin, software engineer, and system architect.
What people have said: “Whether the topic is HDFS and how data is ingested and replicated, or how Map/Reduce “finds” the most suitable node to run it’s tasks on, or what the cost and performance advantages are of adopting the shared-nothing, commodity model recommended for Hadoop clusters, etc., etc., etc., this book provides the how, what, when, where and why of Hadoop (the missing manual, of sorts).” (via Amazon)
Those who have met Russell (or followed him on Twitter) know him as a hilarious force of nature, but his data science chops are no joke. After working at Ning and LinkedIn, Russell is now Hortonworks’ Hadoop Evangelist.
What people have said: “This is definitely the best book I’ve ever written.” (Nice review, Russell…) (via O’Reilly)