For a long time, data scientists and engineers had to choose between leveraging the power of Hadoop and using Python’s amazing data science libraries (like NLTK, NumPy, and SciPy). It’s a painful decision, and one we thought should be eliminated.
So about a year ago, we solved this problem by extending Pig to work with CPython, allowing our users to take advantage of Hadoop with real Python (see our presentation here). To say Mortar users have loved that combination would be an understatement.
However, only Mortar users could use Pig and real Python together…until now.
Did you always want your own Twitter dataset to work with? Well, you can have one for free—our open source Twitter Gardenhose.
If you want to take advantage of the Twitter Gardenhouse, you have 2 options:
- Read it directly from our S3 bucket
- Store it to your S3 bucket: The README describes how to deploy on Heroku—it should take you about 30 minutes to set up and get running. It’s a surprisingly simple node.js app.