Introduction to NoSQL and Neo4j Graph Database

What is NoSQL?

NoSQL is a word derived from the word SQL. SQL stands for Structured Query Language. SQL is a programming language used to manage data (retrieve data, insert data, update existing data) in a relational database management system.

So, then, what’s a relational database? If I were to write a complete description about relational databases this post would become too long (and boring!) to read. In short, relational databases use a relation schema as the data model of the database. In a relational database, data is stored as collection of relations (tables).

So again, returning to our question. What is NoSQL?

NoSQL doesn’t means Not SQL or “no” SQL or the abjurement of SQL. It simply means NOT ONLY SQL. There are several common characteristics for NoSQL databases. They are:

  • Non-Relational
  • Cluster Friendly
  • CAP Theorem Instead of ACID
  • Schema less

In relational database systems, the relation schema is the data model of the database. What, then, is the data model of a NoSQL database? There are several dominant data models for NoSQL databases. Based on those data models NoSQL databases are commonly categorized as follows:

Graph Data Model

A graph is the most generic data structure that we can think of when representing / storing structured, related data. A graph is a collection of vertices and edges.

Here in this example graph we have three vertices. Let denotes the vertices set as V={a,b,c}. If we denotes the set of edges as E, we can write E={ab,ac, bc}. The graph in the image is not a directional graph. That means the direction of the edges actually doesn’t matter. In our example graph data model we will use directional graphs.

Let’s consider the following graph:

GI Cast 01

In this image we have a graph made up of pictures of those who have been stranded on an island and arrows representing their (fully contrived for purposes of example) relationships (who KNOWS who).

A graph database stores data in two places:

In this graph “Gilligan” is a Node as are each of the other island residents. Each node may contain several properties; e.g. Name, age, sex, etc.

The edges between nodes can be described in terms of a relationship. Relationships must have a type and may contain other properties. In this example case, the relationship type between the nodes is Knows; e.g. The Skipper Knows Mr. Howell.

With regard to other properties a relationship may have, as an example, the Knows relationship might contain the additional properties of Since When; e.g. Since When has The Skipper Known Gilligan.

Note: In this example The Skipper Knows Mr. Howell, but Mr. Howell doesn’t know The Skipper because there is not a directed edge going from Mr. Howell to The Skipper. (Imagine a back-story where because of Mr. Howell’s wealth and general notoriety that The Skipper has come to know him by way of new reports.  The Skipper, being a working man, would be out of Mr. Howell’s social circle and not known to him.)

Going forward, we’ll use this graph to build a simple graph database using Neo4j.

What is Neo4j?

Neo4j is a NoSQL database management system which uses a graph data model as its NoSQL data model. For more details, visit the Neo4j home page: http://www.neo4j.org/

Installing Neo4j

Neo4j is available for Windows, Linux and MacOS X environments.  The installation of Neo4j is very simple.

Locate the appropriate version of the Neo4j database at the following link: http://www.neo4j.org/download

The rest of this section makes the assumption that you are installing onto a Windows-based system, however, the basics of installing Neo4j do not differ all that much from Windows to Linux (amazingly!).

After downloading the zip file, decompress it to any place desired.  On my system, I’ve uncompressed the downloaded .ZIP file to a folder named “WORK” in the root of my F: drive.

Prerequisites

Before taking the next step to installing Neo4j, make sure that you have the most current stable Java JDK on your system.  In addition, you will need to be running a 64-bit OS (and the 64-bit JDK).

There were reports on the internet of Neo4j installations failing due to an error in the start script.  On the systems I have installed, I have not run into this issue, however, THIS LINK will take you a blog post which discusses and addresses this issue.

Starting Neo4J

  • Open a command window with elevated privileges (Run as Administrator)
  • Navigate to the directory into which you unzipped the Neo4J file you previously downloaded
  • Further navigate into the bin folder
  • Type Neo4j start to start the Neo4J database

The command window will display some startup information along with the port number where the Neo4J Server is running. Neo4j will also open a blank Java console window.  This can be minimised, but do not close this window.

Neo4j Startup Showing Server Running on port 7474

Neo4j Startup Showing Server Running on port 7474

Open a browser and navigate to: http://localhost:7474 (note that Neo4j starts it’s admin interface on port 7474).

Neo4j WebAdmin @ http://localhost:7474

Neo4j WebAdmin @ http://localhost:7474

Creating Our First Graph Database

Let’s create a simple graph database. This database will represent a graph of the nodes and relationships shown in the picture above.

To create a node for each person pictured, in the WebAdmin window, click on the tab for the Power tool Console. At this prompt issue the following commands to the database:

CREATE N={name:'Gilligan', sex:'male', status:'single', age:'27'};
CREATE N={name:'The Skipper', sex:'male', status:'single', age:'51'};
CREATE N={name:'Mr. Howell', sex:'male', status:'married', age:'62'};
CREATE N={name:'Mrs. Howell', sex:'female', status:'married', age:'58'};
CREATE N={name:'Ginger', sex:'female', status:'single', age:'31'};
CREATE N={name:'The Professor', sex:'male', status:'single', age:'33'};
CREATE N={name:'MaryAnn', sex:'female', status:'single', age:'25'};

To enumerate all nodes in the database and see the result of the commands we’ve just issued, in the WebAdmin Power tool Console, issue the following query:

START n=NODE(*) RETURN n;

On my system, the result looks like the following:

Neo4j Select All
From the above picture, we know of the following relationships:

  • Gilligan Knows The Skipper
  • The Skipper Knows Gilligan
  • The Skipper Knows Mr. Howell
  • Mr. Howell Knows Gilligan
  • Mr. Howell Knows Mrs. Howell
  • Mr. Howell Knows Ginger
  • Mrs. Howell Knows Gilligan
  • Mrs. Howell Knows Mr. Howell
  • Ginger Knows Gilligan
  • Ginger Knows MaryAnn
  • The Professor Knows Gilligan
  • The Professor Knows The Skipper
  • MaryAnn Knows Gilligan
  • MaryAnn Knows Ginger

Plugging in the node IDs, this translates to:

  • Gilligan (NODE(4)) Knows The Skipper (NODE(5))
  • The Skipper (NODE(5)) Knows Gilligan (NODE(4))
  • The Skipper (NODE(5)) Knows Mr. Howell (NODE(6))
  • Mr. Howell (NODE(6)) Knows Gilligan (NODE(4))
  • Mr. Howell (NODE(6)) Knows Mrs. Howell (NODE(7))
  • Mr. Howell (NODE(6)) Knows Ginger (NODE(8))
  • Mrs. Howell (NODE(7)) Knows Gilligan (NODE(4))
  • Mrs. Howell (NODE(7)) Knows Mr. Howell (NODE(6))
  • Ginger (NODE(8)) Knows Gilligan (NODE(4))
  • Ginger (NODE(8)) Knows MaryAnn (NODE(10))
  • The Professor (NODE(9)) Knows Gilligan (NODE(4))
  • The Professor (NODE(9)) Knows The Skipper (NODE(5))
  • MaryAnn (NODE(10)) Knows Gilligan (NODE(4))
  • MaryAnn (NODE(10)) Knows Ginger (NODE(8))

To update the database and add these relationships, the queries to issue in the WebAdmin Power tool Console are:

START a=NODE(4), b=NODE(5) CREATE a-[r:knows]->b RETURN r;
START a=NODE(5), b=NODE(4) CREATE a-[r:knows]->b RETURN r;
START a=NODE(5), b=NODE(6) CREATE a-[r:knows]->b RETURN r;
START a=NODE(6), b=NODE(4) CREATE a-[r:knows]->b RETURN r;
START a=NODE(6), b=NODE(7) CREATE a-[r:knows]->b RETURN r;
START a=NODE(6), b=NODE(8) CREATE a-[r:knows]->b RETURN r;
START a=NODE(7), b=NODE(4) CREATE a-[r:knows]->b RETURN r;
START a=NODE(7), b=NODE(6) CREATE a-[r:knows]->b RETURN r;
START a=NODE(8), b=NODE(4) CREATE a-[r:knows]->b RETURN r;
START a=NODE(8), b=NODE(10) CREATE a-[r:knows]->b RETURN r;
START a=NODE(9), b=NODE(4) CREATE a-[r:knows]->b RETURN r;
START a=NODE(9), b=NODE(5) CREATE a-[r:knows]->b RETURN r;
START a=NODE(10), b=NODE(4) CREATE a-[r:knows]->b RETURN r;
START a=NODE(10), b=NODE(8) CREATE a-[r:knows]->b RETURN r;

At this point, we have created a simple graph data set complete with nodes, edges and relationships.

Using the Data browser, we can view the relationships as follows:

Neo4j Data Browser 2

This has been a very small and very brief introduction to Neo4j and graph databases in general. Neo4j is a very powerful database management system which enables one to do data processing in ways not done in ‘traditional’ databases. To learn more about Neo4j, download and read the Neo4j manual, and by all means, experiment with the product. I am a big believer in learning by doing and am grateful to the companies like Neo4j who make their products available to us at no or low-cost.

I hope this overview has passed on some knowledge and inspired someone to learn.

Please post a comment if there are any questions and I will try my best to answer in a timely manner.

Inspired by Milinda’s Space: Introduction to NoSQL and Neo4j graph database.

 


Comments

Introduction to NoSQL and Neo4j Graph Database — 4 Comments

  1. It’s been my intention to re-visit / re-write this post and add in the elements you mentioned.

    I appreciate the attention and encouragement. I’ll bump this up to the top of the queue.

  2. Hey Mike, what do you think about writing an update to this great article that covers the Neo4j 2.0 features like the browser, labels and indexes and Cypher. That would be awesome.

    Thanks so much

    Michael

  3. Pingback: Introduction to NoSQL and Neo4j Graph Database - Neo Technology