The following guest post is from Mortar Data user Dave Fauth. Dave is a senior architect and systems engineer at Intelliware Systems. You can follow Dave on Twitter at @davefauth.
Recently Mortar worked with Pig and CPython to have it committed into the Apache Pig trunk. This now allows to take advantage of Hadoop with real Python. Users get to focus just on the logic you need, and streaming Python takes care of all the plumbing.
Shortly thereafter, Elasticsearch announced integration with Hadoop. “Using Elasticsearch in Hadoop has never been easier. Thanks to the deep API integration, interacting with Elasticsearch is similar to that of HDFS resources. And since Hadoop is more then just vanilla Map/Reduce, in elasticsearch-hadoop one will find support for Apache Hive, Apache Pig and Cascading in addition to plain Map/Reduce.”
For a long time, data scientists and engineers had to choose between leveraging the power of Hadoop and using Python’s amazing data science libraries (like NLTK, NumPy, and SciPy). It’s a painful decision, and one we thought should be eliminated.
So about a year ago, we solved this problem by extending Pig to work with CPython, allowing our users to take advantage of Hadoop with real Python (see our presentation here). To say Mortar users have loved that combination would be an understatement.
However, only Mortar users could use Pig and real Python together…until now.
Working with data is HARD. Let’s face it, you’re brave to even attempt it, let alone make it your everyday job.
Fortunately, some incredibly talented people have taken the time to compile and share their deep knowledge for you.
Here are 7 books we recommend for picking up some new skills in 2013:
Our CEO, K Young, spoke at PyData NYC about using real Python with Pig, and why we integrated these two awesome languages. The audience asked some great questions, many of which you can see at the end of the video.
Here is the video (with slides just below):