You have MongoDB, so you have this tremendously scalable database. You’re collecting a ton of data, but you know you need to do more with it (okay, a lot more). You think you want to use Hadoop, but it doesn’t sound easy.
To keep it simple, we’ve divided the article into three parts:
“WHY” explains the reasons for using Hadoop to process data stored in MongoDB
“HOW” helps you get get set up
“DEMO” shows you MongoDB and Hadoop working together. If you’re a tldr; type, you’ll want to start with this section.
Mortar co-founder Jeremy Karn gave this talk on using MongoDB data with Hadoop (and specifically with Apache Pig) at MongoSV.
Jeremy’s presentation covers the steps needed to read JSON from Mongo into Pig, parallel process it on Hadoop with sophisticated functions, and write back to Mongo.
Jeremy was a big part of our contributions to the Mongo Hadoop connector, which we extended it to make it work with Pig. MongoDB creator (and 10gen founder) Dwight Merriman also gave Mortar a nice shout out.