Why Big Data 101?

While the following events did not occur verbatim, they did happen in spirit.

Big Data: water wordscape

At a recent family event I had the opportunity to talk with my younger brother. He graduated from the University of Wisconsin in Milwaukee a couple years ago with a degree in computer science. Since graduating he’s found a full time paid internship and was able to convert it into being taken on as a full time engineer, putting him at having 3 (ish) years of practical work experience.

During course of conversation, I kept having this nagging feeling of being misunderstood. It felt as though what I was saying was missing a common frame of reference for the topics of the conversation; that I was, in essence, speaking gibberish. I understand that my brother is young– much much younger than I. I also understand we come from two very different ways of having been taught our craft, but still, we’re both technical and should have common ground; something was very

Big Data


I stopped the conversation and bluntly asked him, “Do you know…”

  • what is meant by the term “Big Data”?
  • what Hadoop is?
  • what the mapreduce paradigm is?
  • why a NoSQL database would be used vs. a traditional RDBMS?

To each of these questions, the answers were similar. He had heard all of the terms I asked about, but he had no actual experience with any of them. He went on to tell me that his way of learning things was to wait until he was either told by a manager he would need a new skill or knowledge-set for an upcoming project or he came to discover it himself. In other words, he would learn something new only if it was required; there was no exploring or being proactive.

After I quickly reorganized my thoughts following the shock and awe that was just dropped before me, I saw an opportunity to reach out and be a mentor. I explained to him the importance of taking control over his own destiny and to actively manage his learning and career direction. In the world today, waiting for someone to tell you what to do just doesn’t cut it.

Apache Hadoop Elephant

Further, he had been telling me how he was being put more and more into a leadership and project management role and while found he was good at it– and further had been recognized as being good by his managers, it wasn’t what he wanted to be doing. Of course, I made the foolish mistake of asking him what it is that he wants to do, only to get back the answer I deserved for such a silly question– “Gee, I’m not sure, I just know I don’t want to do that.”1

I told him that I would write up a list of links and come up with a self-education task plan for him around the big data space if he were interested. And that brings us to where we are now. That is why over the next several weeks I’ll be going back to the basics and creating a “Big Data”, Hadoop and NoSQL primer baseline series of articles and tutorials.

  1. This is something that I’ve come to call the ‘Law of Hate’. It typically gets applied to customers and specifically in user interface design situations. In essence, it is impossible for a customer to tell you, the consultant, what they want. They can only tell you what they hate after you have showed to them the 2 dozen mock-ups you have prepared. 

Metacademy: Machine Learning and Probabilistic AI Learning Resources

I’ve recently come across a tremendous resource for the discovery of various machine learning and probabilistic artificial intelligence topics and associated educational materials. The site I’m describing is Metacademy.Metacademy Large Cropped Home PageMetacademy is a community-driven, open-source platform to facilitate the collaborative construction of a web of knowledge by domain experts meant to help individuals efficiently learn about any topic of interest (supported by Metacademy and the domain experts). The experts responsible for Metacademy are Roger Grosse and Colorado Reed. In addition to building the site, they organized roughly 350 machine learning and probabilistic artificial intelligence concepts along with related training and learning materials.

While Metacademy is currently focused on machine learning and probabilistic artificial intelligence topics, eventually, it has the goal to cover a much wider breadth of knowledge; e.g. mathematics, engineering, music, medicine, computer science, etc.

The premiss of Metacademy is that a user will search for and click on a concept of interest. Metacademy then produces a “learning plan” which includes the prerequisite concepts which were identified in the web of knowledge previously created by the domain experts. This component of identifying for the student the list of prerequisite concepts is what sets Metacademy apart from other learning sites or course catalogs.

As posted at Metacademy:
… But try learning something of conceptual depth by sifting through Google search results … and you’re in for a lot of agony. Before you learn this concept, you need to learn its prerequisite concepts (sometimes you’re not entirely sure what these are), and the prerequisite concepts may have prerequisites themselves. Pretty soon, you’re deep in dependency hell, switching between twenty different tabs trying to understand the various [pre]prerequisite concepts in order to understand the tutorial article Google returned …

Metacademy’s learning experience revolves around two central components:

  • a “learning plan” in a tabular ‘list view’
    Metacademy Logistic Regression List View
  • a “graph view” of the learning plan which is meant to help explore relationships among concepts
    Metacademy Logistic Regression Graph View

Clicking on the check-mark next to the title of a concept in either the graph or list view marks that [prerequisite] concept as being understood. To not show those concepts which have been marked as being understood, click the “hide” button in the upper right. Note that Metacademy will remember the concepts marked as understood and hidden and will automatically re-apply these selections at future visits.

As Metacademy is a work in progress and limited in scope, please keep an open mind when visiting, but I think that you will find it an interesting, unique and valuable resource, particularly if you are, as I am, actively exploring the world of machine learning.


Improve your Digital Analytics Skills with Google’s … – Analytics Blog

See on Scoop.itEvidence Based Systems

That’s why today we’re excited to announce Analytics Academy — a new hub for you and your colleagues to participate in free, online, community-based video courses about digital analytics and Google Analytics.

mike pluta‘s insight:

Another free resource from which to learn about analytics, statistics, data visualization and big data concepts.

See on analytics.blogspot.com