Our CEO, K Young, was interviewed following his talk on MongoDB and Hadoop at MongoNYC. Here’s a quick rundown of the highlights:
- The difference between Mortar and the raw infrastructure of Amazon Elastic MapReduce (4:09)
- Why Mortar cares about collaborative and repeatable data science (5:49)
- Using Hadoop with MongoDB (11:24) [We’ve also written and spoken about MongoDB and Hadoop in the past few months.]
- Making the business case for Hadoop (15:42)
Here’s the embedded video:
You have MongoDB, so you have this tremendously scalable database. You’re collecting a ton of data, but you know you need to do more with it (okay, a lot more). You think you want to use Hadoop, but it doesn’t sound easy.
To keep it simple, we’ve divided the article into three parts:
"WHY" explains the reasons for using Hadoop to process data stored in MongoDB
"HOW" helps you get get set up
"DEMO" shows you MongoDB and Hadoop working together. If you’re a tldr; type, you’ll want to start with this section.
Mortar co-founder Jeremy Karn gave this talk on using MongoDB data with Hadoop (and specifically with Apache Pig) at MongoSV.
Jeremy’s presentation covers the steps needed to read JSON from Mongo into Pig, parallel process it on Hadoop with sophisticated functions, and write back to Mongo.
Jeremy was a big part of our contributions to the Mongo Hadoop connector, which we extended it to make it work with Pig. MongoDB creator (and 10gen founder) Dwight Merriman also gave Mortar a nice shout out.