DGraph : A Graph Database

Guide : Prof. Krishna Sivalingam

17 May 2016

Ashwin

Acknowledgements

2

What is DGraph

- Highly Concurrent (go-routines, channels)
- Performance similar to C++
- Compiled and does not require specific environments to run on
3

Motivation

- Social networks
- User behaviour analysis
- E-commerce recommendations
- Internet of things
- Medical and DNA research
- Search engines
4

Problem Statement

- Support distributed data loading
- Support communication among the servers 
- Minimise the network calls made to process a query
DGraph DGraph
5

Overview

6

Background

7

Flatbuffer

- Access to serialised data without processing or unpacking
- Memory efficiency and speed
- Strongly typed
8

RDF

<Alice>    <friendof>    <Bob>    .
9

Sharding

10

GraphQL

{
  me(_xid_: m.06pj8) {
    type.object.name.en
    film.director.film  {
      film.film.genre {
        type.object.name.en
      }
    }
  }
}
11

Centralised Version

12

Making it Distributed

- UID Assigner: Allots a unique uint64 ID to each node
- Loader: Converts RDF file to Posting lists by using the UID map generated
13

Example sharding

         Shard 1                            Shard 2
 <alice> <enemy> <bob6> .         <alice> <follows> <bob7> .
 <alice> <enemy> <bob1> .         <alice> <follows> <bob2> .
 <alice> <enemy> <bob3> .         <alice> <follows> <bob0> .
 <alice> <enemy> <bob4> .         <alice> <follows> <bob5> .
 
 
 -----------------------Posting List-------------------------
PS1
[enemy:alice] -> (bob1, bob3, bob4, bob6)

PS2
[follows:alice] -> (bob0, bob2, bob5, bob7)
14

Distributed Version

15

Query processing

type SubGraph struct {
  Attr     string
  Children []*SubGraph
  query  []byte
  result []byte
}
16

Query processing

Subgraph format

SubGraph [result uid = "0xbb0de646eaf32b74"]
  |
Children
  |
  --> SubGraph [Attr = "type.object.name.en"]
  --> SubGraph [Attr = "film.director.film"]
         |
       Children
         |
         --> SubGraph [Attr = "film.film.genre"]
               |
             Children
               |
               --> SubGraph [Attr = "type.object.name.en"]
17

Demo

18

Freebase Film Dataset

- type.object.name.en : Connects the object to its English name
- film.actor.film : Connects an actor to the film objects he has acted in
- <m.0102j2vq> <film.actor.film> <m.011kyqsq> .
- <m.050llt> <type.object.name> "Aishwarya Rai Bachchan"@hr .
- <m.0bxtg> <type.object.name> "Tom Hanks"@es .
19

Performance Analysis

- Throughput
- Latency [mean, 50 and 95 percentile]
- Number of nodes in a cluster
- Number of parallel connections
- Number of cores in an instance
20

Queries

The query about an actor obtains the list of all the movies that the actor has acted in
and looks like this:

{
    me ( _ x i d _ : XID ) {
        t y p e . o b j e c t . name . en
        film . actor . film {
            film . performance . film {
                t y p e . o b j e c t . name . en
            }
        }
    }
}
21

Queries

The query about a director obtains the genre of all the movies directed by that person
and looks like this:

{
    me ( _ x i d _ : XID ) {
        t y p e . o b j e c t . name . en
        film . director . film {
            film . film . genre {
                t y p e . o b j e c t . name . en
            }
        }
    }
}
22

Throughput Comparison

23

Throughput Comparison

24

Latency Comparison

25

Latency Comparison

26

Latency Comparison

27

Latency Comparison

28

Latency Comparison

29

Latency Comparison

30

Fine Grained Locks

31

Results

• Use as many cores as possible
• Have the servers geographically close-by so that network latency is reduced
• Larger sized RAMs are not a necessity in server but are important in case of
loader
• Distribute the data among servers and query them in a round-robin fashion for
greater throughput
32

Conclusions

33

Further Work

34

Thank you

Use the left and right arrow keys or click the left and right edges of the page to navigate between slides.
(Press 'H' or navigate to hide this message.)