Putting Unstructured Data Into Context: What the Cosmos Can Teach Us About Unstructured Data
How is it that something can be so incredibly large and minutely small at the same time? If you're as fascinated by natural science as I am, then you're likely also watching Neil deGrasse Tyson's reboot of the 'Cosmos' series. Maybe it makes you think about planets, our solar system, or maybe our galaxy. You may have thought about it in the opposite direction, in the context of atoms, neutrons, electrons, and protons. Regardless of which direction you go, there always seems to be something bigger or something smaller. Not surprisingly, this concept rings true for many things.
I am always thinking about how to describe the magnitude of unstructured data to people that are not familiar with the subject. It occurred to me that unstructured data can be compared to something immense that all of us are familiar with – our galaxy, the Milky Way. The Milky Way contains somewhere between 200-800 billion stars and planets, and is just one of an estimated 176 billion galaxies (that we know of).1
What should you learn next?
To be conservative, let's imagine every Fortune 500 company has just 1 Petabyte of data. Combined, that would be 1.3337887e+12 files (1.34 Trillion). That is almost twice the number of stars and planets in the Milky Way Galaxy at the high end of current estimates. No matter how you slice it, that's a lot of files.
Trying to imagine that number of stars is entertaining, but trying to imagine getting control of that may files is not nearly as amusing.
What's holding all these stars and planets together in the Milky Way is a huge black hole at the center. That's just about how people view Active Directory in their galaxy of files – the force holding it all together that seems like a huge black hole. The galaxy is massive, complex, and always changing. Scientists use the most sophisticated and well-constructed tools and methodologies to understand the critical forces governing the behavior of all that stuff. But most folks find themselves trying to manage their unstructured data with the basic tools that come with the operating systems and methods that are on par with spreadsheet macros. You might as well try to find the secrets of the night sky with a magnifying glass.
Back to the original question: How can something so large be so small at the same time? The Fortune 500 is a galaxy's worth of unstructured data. But that's just a drop in the bucket when you think about all the unstructured data that's out there. And we're only talking about what's been identified so far. We know our customers are finding more data every time they go looking for it. People think every file they make is so small, but each file is another star or planet in the galaxy their organization is building up.
It's fine to tickle your interest in the stars by taking a little telescope into your backyard. But when you're going to get serious about understanding a problem this large and trying to gain control over it, you better make sure you are ready for the type of investment that can handle billions and billions of points in your sky.
Adam Laub is Vice President, Marketing at STEALTHbits Technologies.