October 20, 2014

John Matson

image

Are you a Pythonista, or are you an R devotee? Are you all about ggplot2, or do you entrust your visuals to matplotlib?

With Mortar, it doesn’t matter which camp you belong to. That’s because we have just added R support, so that you can readily access all your favorite technologies on the Mortar platform.

Read More

image

Hadoop Weekly is a recurring guest post by Joe Crobak. Joe is a software engineer focused on Hadoop and analytics. You can follow Joe on Twitter at @joecrobak.

With Strata + Hadoop World this week, there were a number of partnership announcements and software releases. Among them, Cloudera and Hortonworks released new versions of their distributions, MapR is bundling MapR-DB with their community edition, and Pivotal announced plans for the Tachyon project. There are also several good technical posts this week covering Sqoop, Kafka, Presto, Hive, and Scala as a language for data processing. I tried to cover the key news from the week but likely missed some stories given the Strata + Hadoop World tsunami. Please let me know if there’s something you think should be in next week’s newsletter.

Read More

John Matson

image

Recently Jake Porway, the founder and executive director of DataKind, visited the NYC Data Science meetup to talk about his organization’s work using data for good. DataKind is a nonprofit that pairs data scientists with mission-driven organizations, and Jake gave a few examples of fruitful efforts that mapped indicators of childhood wellbeing in Washington, D.C., or figured out whether widespread tree-pruning makes New Yorkers any safer.

Jake also spoke about some of the lessons learned in working with myriad organizations where data science has not historically been a priority, and closed with some thoughts about the promise and potential pitfalls of data science.

Read More

John Matson

image

At Mortar, we’re all about providing our users with versatile, powerful, easy-to-use tools that enable them to solve thorny data challenges. And few tools meet that description as well as Luigi, the open-source framework for building scalable, resilient data pipelines.

So today we’re proud to announce that Luigi has been fully integrated into Mortar. Our users can now seamlessly use Luigi within the Mortar platform to develop complex data pipelines using Hadoop or non-Hadoop technologies, deploy those pipelines to run in the cloud with a single command, and then monitor their progress with the Mortar web app.

Read More

image

Hadoop Weekly is a recurring guest post by Joe Crobak. Joe is a software engineer focused on Hadoop and analytics. You can follow Joe on Twitter at @joecrobak.

With Strata+Hadoop World taking place this week in New York, we can expect to see a lot of announcements. But a number of folks have jumped out ahead of the conference, and there are several partnership and technical announcements in this week’s issue. On the technical side, Databricks posted a benchmark for terasort on Spark, and eBay has open-sourced Kylin, their Hadoop OLAP system. If you’re in NYC for Strata+Hadoop World, be sure to check out some of the 14 meetups happening this week!

Read More

image

Hadoop Weekly is a recurring guest post by Joe Crobak. Joe is a software engineer focused on Hadoop and analytics. You can follow Joe on Twitter at @joecrobak.

It’s a relatively quiet week with only two releases (the calm before the Strata + Hadoop World storm?). In the technical and news areas, two themes are playing out this week. First, there is a lot of great content on stream processing frameworks—namely Storm and Spark streaming. Second, there are several articles about integration YARN with other systems and frameworks (OpenStack, Mesos, AWS). There are also pieces on Spark MLlib, RStudio on Amazon EMR, and the cost-based optimizer for Hive—something for everyone.

Read More

image

Hadoop Weekly is a recurring guest post by Joe Crobak. Joe is a software engineer focused on Hadoop and analytics. You can follow Joe on Twitter at @joecrobak.

This week’s issue has a lot of great content. It includes new open-source projects from Netflix and LinkedIn, several articles about Apache Spark (including details from Hortonworks on their plans for it), and news on Cascading on Tez. There’s also coverage of news in the ecosystem and several additional releases.

Read More

image

Hadoop Weekly is a recurring guest post by Joe Crobak. Joe is a software engineer focused on Hadoop and analytics. You can follow Joe on Twitter at @joecrobak.

This weeks issues includes a number of posts covering the recently released Apache Spark 1.1, Apache Drill 0.5.0-incubating, and Apache Tez 0.5.0. In addition, there’s a look at Hadoop in the healthcare industry, a look at ORCFile for non-Hive workloads, instructions for building a Hadoop setup on Mac, and more. The amount of content this week shows that we’re past the summer lull, and I expect to see lots more great content this fall.

Read More

image

Hadoop Weekly is a recurring guest post by Joe Crobak. Joe is a software engineer focused on Hadoop and analytics. You can follow Joe on Twitter at @joecrobak.

There were several releases in the Hadoop ecosystem this week, including Apache Hadoop 2.5.1 and Apache Spark 1.1.0. There’s a lot of interesting technical content, including testing HBase’s consistency with Jepsen and an in-depth look at an end-to-end big data infrastructure with Hadoop. On that node, there’s an interesting look into the growing demand for Data Engineers to build out Hadoop infrastructure.

Read More

John Matson

image

At Mortar, we provide all our users with excellent security. And that includes customers working with sensitive data—lately, we’ve been hearing from more and more of them.

So we are pleased to announce that we now offer an Advanced Security package for customers with strict security compliance requirements or with unique business demands that require additional data protection.

Read More

image

Hadoop Weekly is a recurring guest post by Joe Crobak. Joe is a software engineer focused on Hadoop and analytics. You can follow Joe on Twitter at @joecrobak.

While last week’s issue had posts covering a few common themes, this week’s issue has content for a wide number of topics. Those topics include: Spork (Pig on Spark), Hive (specifically the new Stinger.next initiative), and Presto. There is also some interesting news from established enterprise companies—Teradata has acquired Think Big Analytics, and Cisco has released management and monitoring software for Hadoop.

Read More

image

Hadoop Weekly is a recurring guest post by Joe Crobak. Joe is a software engineer focused on Hadoop and analytics. You can follow Joe on Twitter at @joecrobak.

This week’s issue features a lot of good technical content covering Apache Storm and Apache Spark. There are also a number of releases—Apache Flink, Apache Phoenix, Cloudera Enterprise, and Luigi. In addition, Hortonworks announced a technical preview of Apache Kafka support for HDP, and SequenceIQ unveiled Periscope, an open-source tool for YARN cluster auto-scaling.

Read More

image

Hadoop Weekly is a recurring guest post by Joe Crobak. Joe is a software engineer focused on Hadoop and analytics. You can follow Joe on Twitter at @joecrobak.

This week’s edition has a lot of great technical content from prominent Hadoop vendors Hortonworks and Cloudera as well as newcomer SequenceIQ. There are also a couple of interesting articles based on real-world experience covering an A/B testing platform and Apache Zookeeper. Those types of articles tend to be quite good but more difficult to find—as always, if you have suggestions for the newsletter please send them my way!

Read More

August 21, 2014

K Young

image

Today we have a huge announcement: Mortar is now free for accounts with up to three users.

Our mission at Mortar is to help data scientists and data engineers spend 100% of their time on problems that are specific to their business—and not on time-wasters like babysitting infrastructure, managing complex deploys, and rebuilding common algorithms from scratch. But for us to succeed at our mission, we need to make Mortar not just an amazing product, but also affordable for everyone.

Read More

August 19, 2014

K Young

image

If you’ve used Hadoop, you know that the overhead time necessary to provision and run small jobs can be painful. Most likely you kill time every time you test something by grabbing coffee, and pretty soon your hands are shaking from all that testing.

It doesn’t have to be like this. As of today you can run small jobs from Mortar in seconds. How? Choose to execute your job without a cluster, and we’ll skip provisioning and distributed computation—so you can get answers fast.

Read More