Don’s first presentation was at the NYC Pig User Group Meetup and entitled “Pig vs. MapReduce: When, Why, and How”. In his presentation, Don discusses how he chooses between Pig and MapReduce by considering developer and processing time, maintainability and deployment, repurposing engineers that are new to Java and Pig, and a number of other factors.
While there are numerous reasons to use Pig, I think a lot of people were surprised that the author of a book on MapReduce is such a huge Pig fan.
Here is a time-stamped overview of Don’s talk, along with the slides and video:
0:20 - Don’s background
3:46 - When should you use Pig?
5:11 - Why should you use Pig?
9:34 - Things that are harder to express in Pig
11:29 - Using Pig’s MapReduce relational operator for combining MapReduce and Pig scripts
14:04 - Calculating time you’ll spend using Pig vs. MapReduce (developer time vs. processing speed)
18:20 - Why is development so much faster in Pig?
21:50 - Speed to productivity in Pig vs. Java/MapReduce (repurposing engineers and SQL programmers)
24:36 - Leveraging UDFs with Pig
30:01 - Maintainability and Deployment of Pig vs. MapReduce
36:07 - Things that are harder to do with Pig
43:17 - Pig vs. Hive vs. MapReduce
56:30 - Pig/MapReduce analogies
58:22 - Presentation summary
Hope you enjoy Don’s presentation as much as we did. We’ll post Don’s talk at the NYC Data Science Meetup on “Hadoop for Data Science” soon. [Update: The slides and video for this talk are now posted here.]