The Wolfram Programming Language For Big Data

Captured by webthumbnail.org: The Wolfram Programming Language

Stephen Wolfram, founder of Wolfram Research and creator of Mathematica, not too long ago announced the new Wolfram Programming Language. I’ve called attention to this before, am doing it again, now, and will do it again in the future. This new knowledge-based language is, in my opinion, going to be a game changer in Big Data, data science and computer science in general.

In the video below, Wolfram introduces the Wolfram Language showing the concepts of symbolic programming and functional programming, the querying of large databases with powerful visualization support, interactivity, and much more.

To Model Or Not To Model: Is That The Quesion?

The basic gist of this article is that the exercise of data modeling is just as important when using the big data and NoSQL technologies as it is when using the more traditional relational algebra based technologies.

This conclusion came after a series of experiments were performed pitting Cloudera’s Hadoop distribution against an unidentified ‘major relational database’. A suite of 5 business questions were distilled into either SQL for the relational database, or HQL for execution against Hadoop stacked with Hive. For each of the queries, for each data store, 5 experimental scenarios were explored:

  1. Flat Schema vs. Star Schema
  2. Using compressed data vs. uncompressed in Hadoop
  3. Indexing appropriate columns
  4. Partitioning the data by date
  5. Compare Hive/HQL/MapReduce to Cloudera Impala

Details of the experiment and intermediate results can be found in the article, but at a macro level, the results were mixed with the exception of it being clear that a flat un-modeled schema was not a scenario one should use and expect performance. As the article points out, the question is not whether one should model or not, but rather how and when.

The Hadoop experiment: To model or not to model – The Data Roundtable

Captured by webthumbnail.org

Tamara Dull, Director of Emerging Technologies, SAS Best Practices wrote the following at SAS’ “The Data Roundtable” blog:

It was refreshing to see that the RDBMS skills some of us have developed over the years still apply with these new big data technologies. And while discussions of late binding (i.e., applying structure to the data at query time, not load time) work their way down our corporate hallways, we are reminded once again that “it depends” is a far more honest and accurate answer than it’s been given credit for in the past.

To model or not to model is no longer the question. How and when to model is.