As data geeks, we get some good-n-geeky data questions. A common one is, “What criteria should I use to decide which data to store and which to discard?”
Which is great — I love questions with straightforward answers. Answer: If you think there’s even a slight possibility it could be valuable at some point, store it. For most people that means STORE ALL OF YOUR DATA.
But isn’t it a waste of money to save all of that data if you’re not sure whether you’ll use it? No — it’s probably not.
Innovation and aggressive competition are driving a rapid decline in storage costs, and new technologies are making data analysis very accessible. As a result, the potential opportunity costs far outweigh what you’d spend on cloud storage.
For example, let’s take a look at a particularly popular dataset: the Twitter Firehose. [For those who aren’t familiar, the Twitter Firehose is a stream of all tweets on Twitter.]

