Math is the Secret to Big Data

Math Mark

VentureBeat had an article the other day discussing how math is the real secret to unlocking “Big Data“. I agree and in fact would go to agree with the oft quoted notions that math is the universal language and the ultimate truth. It’s hard to argue against a concept if there is a mathematical underpinning of it and particularly if a formal proof has been produced.

The intersection of math and “Big Data” is in the algorithms used to first to explore the data looking for patterns (either as commonalities or as outliers) which come to represent something new we’ve learned about the data- a new insight; and second the algorithms that get developed to take advantage of this new insight. In the ‘old days’, the intuition of domain experts was relied upon to give us direction as to where to look in the data to find these ‘insights’.

We’ve always had data, and it has always had the potential to be ‘big’. What’s different today is that we have the ability (physically and financially) to collect that data to the point that its potential to be big is realized and that we have readily accessible to us the technology and tools to process and query these large amounts of data. If we have the appropriate algorithms and appropriately large data sets against which to run these algorithms, the intuition of domain experts is less of a necessity.

The author of the article, Narinder Singh, makes the comment that a well known professor once said to him, “Big data is misnamed in our (academic) world, because data sets have always been big. What is different is that we now have the technology to simply run every scenario. Before, intuition was critical as you could otherwise spend months chasing a concept. Now, set up correctly, we can just run or solve the model like an equation.” I’d like to meet this professor— she sounds to be quite smart.

Dansk: Dedikeret til matematik

Math is the universal language. It crosses all domains. If we have enough of the right data and have leveraged our access to domain expertise appropriately and used math to create algorithms which abstract the problems at hand away from the domain, the available expertise can then be redeployed as we now have a definition of our problem which no longer has a deep reliance on the domain or it’s expertise.

Two additional quotes from the article:

[…] Math is to data what abstraction layers are to software development. With it, we can translate physics and energy for the International Space Station, drug discovery, DNA search efficiency, finding the tomb of Genghis Khan, and nearly everything else into a common vocabulary for problem statements.

[…] Once the math is done, domain knowledge is not as important; it allows for skills to be leveraged across industries/domains, and empowers broader use of core technologies to help in the optimization and solving of key questions. It creates an API to our problem.