This is a guest blog by Jules Miller of Hire an Esquire originally featured on the Evolve Law blog.

Dollarphotoclub_BigDataBlogPost-1024x1024There is a whole lot of noise about Big Data in the legal industry with articles and panels seemingly every day, but it’s unclear if there is a real understanding of what Big Data actually means. As Dan Ariely, Professor at Duke University and founder of BEworks said, “Big Data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.”

What is Big Data, really? According to trusty source  Wikipedia, “Big Data is a broad term for data sets so large or complex that traditional data processing applications are inadequate.”

Big Data was about the ability to store and process data. In the late 1990’s and early 2000’s when the term “Big Data” was coined and growing in popularity, there weren’t adequate tools to store and process larger data sets. Now there are, driven by technology like Hadoop, which was launched in December 2011. In fact, there is now an entire industry of tools, infrastructure, analytics, visualization, distributors, Infrastructure as a Service (IaaS), and structured databases to help us process large data sets. So if we now can all easily process lots of data simply and relatively cheaply, what exactly is Big Data?  Why don’t we call it “analytics” or even just “data”?

Small Data: Darwin’s Brain
Charles Darwin once stated; “My mind seems to have become a kind of machine for grinding out general laws out of large collections of facts.” Darwin’s brain is a good example of small data. Small data is generally under 10 gigabytes of data, so it’s a data set that you can analyze without needing anything fancier than the laptop you bought at Best Buy.

Big Data: Google
Multiply Darwin’s brain by literally 1 million times, and on the other end of the spectrum is something like Google. Google processes 100 petabytes of data every single day, which is 100 million gigabytes. It has 3 million servers to store 15,000 petabytes of data (15 exabytes).  As a comparison, all printed words in the history of mankind in all languages (books, legal documents, etc.) is only 200 petabytes of data – Google processes this in 2 days – and all words ever spoken by human beings is 5 exabytes of data – Google stores three times more data. That’s Big Data, and it requires tools like Hadoop and distributed databases. Only a handful of companies can truly claim to be Big Data, and if you’re not Google, Twitter, Facebook or Amazon, it probably doesn’t apply to you.

What Does this Mean for Legal Tech?
Big Data for law is a misnomer. Quite simply, the legal industry doesn’t have enough data to be “big.” There is a lot of data reviewed and processed by eDiscovery technologies very effectively, and emerging tech companies such as Lex MachinaJuristateBrevia and Justly are doing incredible work analyzing various legal data sets. However, these datasets are still mostly measured in gigabytes or maybe terabytes; nowhere near the petabytes and exabytes of Google. These types of legal analytics tools are still enormously useful to law firms and their clients, but is it really Big Data? And should we care?

We Have Medium Data
“Big Data is bullshit,” according to Harper Reed, former CTO of Obama for America. “You probably have Medium Data.” In the legal industry, we have Medium Data not big data, so let’s just embrace it. Bigger isn’t always better, and the legal industry can do some pretty amazing things with small and medium sized data sets.    

For example, at Hire an Esquire we have 4,500 screened and vetted attorneys in our network and track dozens of data points so that we can match people with the best jobs for their skillset and personality. Compared to ‘big’ data, our data set is tiny. What makes this data interesting is that it’s relatively homogeneous. Attorneys, for the most part, have similar education levels, work experience and personalities compared to the wildly variable data that Google collects.  For us tiny nuances matter; whether a candidate’s average response time is 10 minutes or 1 hour can make the difference between a successful placement or an unsuccessful one.

Medium Data: Let’s Make it a Thing in Legal Tech
Medium Data as a term is not being widely used…yet…so should we use it when describing analytics in the legal industry? It’s yet another business buzzword, but then again so is Big Data. Also it’s probably too generic to trademark. Perhaps we can call it Goldilocks data? Not too big, not too small, it’s just right. Or maybe call it Grande data? It sounds big, but it’s really medium. Let’s not.

No other industries are using the term Medium Data, so we can name it and claim it for legal tech. Medium Data is, after all, a more accurate description of our legal data analytics. Let’s make Medium Data a thing for Legal Tech.

