Since we do a lot of experimenting with data, we’re always excited to find new datasets to use with Mortar. We’re saving bookmarks and sharing datasets with our team on a nearly-daily basis.
There are tons of resources throughout the web, but given our love for the data scientist community, we thought we’d pick out a few of the best dataset lists curated by data scientists.
Below is a collection of six great dataset lists from both famous data scientists and those who aren’t well-known:
2) Hilary Mason is a Data Scientist in Residence at Accel Partners (and one of Mortar’s advisors!). She was previously Chief Scientist at Bitly. Hilary uses a Bitly bundle to store new datasets she finds: https://bitly.com/bundles/hmason/1
3) Kevin Chai was most recently working as a research fellow for theCentre of Health Informatics at the University of New South Wales in Sydney, Australia. While Kevin may not be as famous as others on this list, his collection of datasets is great: http://kevinchai.net/datasets
4) Jeff Hammerbacher is Co-Founder and Chief Scientist of Cloudera. He previously ran Facebook’s data team and even (with DJ Patil) coined the term “data scientist”. Here’s Jeff’s list: http://www.quora.com/Jeff-Hammerbacher/Introduction-to-Data-Science-Data-Sets
(Bonus: A number of data scientists, including Jeff, contributed to this Quora thread about where to find large datasets: http://www.quora.com/Data/Where-can-I-find-large-datasets-open-to-the-public)
5) Jerry Smith is Chief Data Scientist at 3i-MIND and an Adjunct Professor at NOVA Southeastern University. Jerry has put together a nice collection of data repositories:
6) Gregory Piatetsky-Shapiro is the President ofKDnuggets and a founder of KDD (Knowledge Discovery and Data mining conferences). He assembled a long list of datasets on the KDnuggets site: