Overview

- Am moving away from single Project Graph for all relationships to smaller use case graph databases, managed by services or within a service stack.

Service that manages a project level graph database that records relationships. Most relationships will be stored here, other services can duplicate relationship data if they choose or can use this graph as their source.

As the project grows relationship data can be extended to allow for analysis of relationships, weighting same type connections etc..

Repository

https://bitbucket.org/stb_working/project-graph/src/master/

Neptune graph

(planning on one single linked property graph)

DynamoDB tables

Standard Config Table Per Service

Configuration tags

...

Property Graph

Comparison with RDF

Property Graph was chosen over RDF because connectivity with external services is not a priority, and can be extracted from a Property Graph. The expected design will have a lot of relationships (predicates) that themselves have properties. Also queries and visualizing the projects objects into a Property Graph seems simpler than using an RDF.

Query language

Focus on using Apache TinkerPop Gremlin for queries as it is the standard used by AWS Neptune for Property Graphs.

Querying data in graph database

Planning all Put/Update/Delete queries to pass through an API/Lambda, however Read/Get queries are able to query the graph directly.

Optimizing data modelling for Neptune

Neptune's indexing favors vertex ids, edge labels, and edge ids, so try to design that we can bound these in queries
edge label seems more important than the edge id for optimization, but if can give both is best
Neptune does not like a lot of edge labels, wants at most 100's. Keep to standard label names to map relationship, although the more results that get returned per edge label will increase the post query filtering that needs to be done, so need to balance labels vs number of results per vertex.
distinct edge labels might also include properties (of vertex's?), not sure about this, documentation unclear
because of Neptune's limits on edge labels + properties(?) deciding to not have one huge graph for entire project, instead break into smaller graphs according to logical groupings and expected relationship queries.
for our use case where we will often be querying one vertex and finding relationships from there neo4j might be more effective as it stores relationships per vertex efficiently whereas Neptune stores indexes of vertex-edge-vertex types. Initially use Neptune for ease of management

Working documents

Working_documents - Project Graph

Service - Project Graph

Contents

Overview

Repository

Neptune graph

DynamoDB tables

Standard Config Table Per Service

Configuration tags

Property Graph

Comparison with RDF

Query language

Querying data in graph database

Optimizing data modelling for Neptune

Working documents

Navigation menu

Service - Project Graph

Overview

Repository

Neptune graph

DynamoDB tables

Standard Config Table Per Service

Configuration tags

Property Graph

Comparison with RDF

Query language

Querying data in graph database

Optimizing data modelling for Neptune

Working documents

Navigation menu

Search