DGraph : A Graph Database
Guide : Prof. Krishna Sivalingam
17 May 2016
Ashwin
Ashwin
- Highly Concurrent (go-routines, channels) - Performance similar to C++ - Compiled and does not require specific environments to run on
- Social networks - User behaviour analysis - E-commerce recommendations - Internet of things - Medical and DNA research - Search engines
- Support distributed data loading - Support communication among the servers - Minimise the network calls made to process a query
- Access to serialised data without processing or unpacking - Memory efficiency and speed - Strongly typed
<Alice> <friendof> <Bob> .
{ me(_xid_: m.06pj8) { type.object.name.en film.director.film { film.film.genre { type.object.name.en } } } }
- UID Assigner: Allots a unique uint64 ID to each node - Loader: Converts RDF file to Posting lists by using the UID map generated
Shard 1 Shard 2 <alice> <enemy> <bob6> . <alice> <follows> <bob7> . <alice> <enemy> <bob1> . <alice> <follows> <bob2> . <alice> <enemy> <bob3> . <alice> <follows> <bob0> . <alice> <enemy> <bob4> . <alice> <follows> <bob5> . -----------------------Posting List------------------------- PS1 [enemy:alice] -> (bob1, bob3, bob4, bob6) PS2 [follows:alice] -> (bob0, bob2, bob5, bob7)
type SubGraph struct {
Attr string
Children []*SubGraph
query []byte
result []byte
}Subgraph format
SubGraph [result uid = "0xbb0de646eaf32b74"] | Children | --> SubGraph [Attr = "type.object.name.en"] --> SubGraph [Attr = "film.director.film"] | Children | --> SubGraph [Attr = "film.film.genre"] | Children | --> SubGraph [Attr = "type.object.name.en"]
- type.object.name.en : Connects the object to its English name - film.actor.film : Connects an actor to the film objects he has acted in
- <m.0102j2vq> <film.actor.film> <m.011kyqsq> . - <m.050llt> <type.object.name> "Aishwarya Rai Bachchan"@hr . - <m.0bxtg> <type.object.name> "Tom Hanks"@es .
- Throughput - Latency [mean, 50 and 95 percentile]
- Number of nodes in a cluster - Number of parallel connections - Number of cores in an instance
The query about an actor obtains the list of all the movies that the actor has acted in and looks like this: { me ( _ x i d _ : XID ) { t y p e . o b j e c t . name . en film . actor . film { film . performance . film { t y p e . o b j e c t . name . en } } } }
The query about a director obtains the genre of all the movies directed by that person and looks like this: { me ( _ x i d _ : XID ) { t y p e . o b j e c t . name . en film . director . film { film . film . genre { t y p e . o b j e c t . name . en } } } }
• Use as many cores as possible • Have the servers geographically close-by so that network latency is reduced • Larger sized RAMs are not a necessity in server but are important in case of loader • Distribute the data among servers and query them in a round-robin fashion for greater throughput
Ashwin