If you want Hilary Mason, Drew Conway, or Max Shron to build your recommender for free, enter your email address here.
As a platform for working with data, we’ve seen users tackle lots of interesting use-cases: log analysis, natural language processing, pattern detection, and many more.
However, perhaps no use-case is in greater demand than recommender systems. If you have more “inventory” than your users can easily find (whether it’s news, jobs, videos, restaurants, vacations, recipes, apps, etc.), a great recommender is crucial to driving engagement.
The problem is that recommender systems are really hard to implement, so most companies either don’t have one or aren’t happy with what they have.
What makes recommenders so tough?
Our second NYC Data Science Meetup featured Tumblr data scientist Adam Laiacano, who discussed the analytics stack at Tumblr and the tools he and his team use to organize and analyze data.
Here are the video and slides from Adam’s talk, which cover Tumblr’s use of Scribe, Hive & Pig, Hue, and Vowpal Wabbit:
Thanks to everyone who came out to our inaugural NYC Data Science Meetup. For those who couldn’t attend, Hilary Mason fought off jetlag and a tough cold to give a great presentation.
Below is a 12-minute clip from Hilary’s talk, which she called “Dirty Secrets of Data Science.”
New York’s data science community has been building since long before “data science” was used to describe it. In addition to a long history of advertising and adtech companies, the recent startup explosion here in NYC has been largely led by companies built to leverage data science (including FourSquare, Tumblr, AppNexus, and Knewton, to name just a few).
“Big data” entered our language before anyone knew what it meant. So then we spent a lot of time discussing it: “Is it really about the ‘bigness’?”, “Isn’t it about non-relational data?”, “No wait, it’s about the the need for speed.“ This got boiled down to the three Vs (volume, variety, velocity), but then “big data” just meant three things, which didn’t clarify much at all.
So we, the tech community, are developing new vocabulary and distinctions, and in 2013, no one is going to say “big data” anymore. (Actually, given that Dilbert already skewered big data, it’s heyday may already be over.)
This is the life-cycle of any good buzzword. A buzzword is born when something so new and important is happening that we need to talk about it before we understand it; while it is still amorphous. It refers to a family of related concepts. Then we develop greater understanding and distinctions, and pretty soon you’re embarrassed for your colleague when he trots out last year’s buzzword (remember Web 2.0?).
So what is the crux of “big data”? Why is it so new and important that we have to talk about it with a buzzword? In short, we’re all freaking out because old bottlenecks recently got shattered, the new bottlenecks are us and our existing tools, and mad riches are visible just over the horizon. (And it’s not just about riches — there’s also massive potential for human improvement.   )