Service - Project Graph: Difference between revisions
No edit summary |
No edit summary |
||
Line 34: | Line 34: | ||
Planning all Put/Update/Delete queries to pass through an API/Lambda, however Read/Get queries are able to query the graph directly. | Planning all Put/Update/Delete queries to pass through an API/Lambda, however Read/Get queries are able to query the graph directly. | ||
= Optimizing data modelling for Neptune = | |||
* Neptune's indexing favors vertex ids, edge labels, and edge ids, so try to design that we can bound these in queries | |||
* edge label seems more important than the edge id for optimization, but if can give both is best | |||
* Neptune does not like a lot of edge labels, don't put any unique ids in here, keep to a description of the type of relationship | |||
= Working documents = | = Working documents = |
Revision as of 08:58, 16 February 2021
Overview
Service that manages a project level graph database that records relationships. Most relationships will be stored here, other services can duplicate relationship data if they choose or can use this graph as their source.
As the project grows relationship data can be extended to allow for analysis of relationships, weighting same type connections etc..
Repository
https://bitbucket.org/stb_working/project-graph/src/master/
Neptune graph
(planning on one single linked property graph)
DynamoDB tables
Standard Config Table Per Service
Configuration tags
...
Property Graph
Comparison with RDF
Property Graph was chosen over RDF because connectivity with external services is not a priority, and can be extracted from a Property Graph. The expected design will have a lot of relationships (predicates) that themselves have properties. Also queries and visualizing the projects objects into a Property Graph seems simpler than using an RDF.
Query language
Focus on using Apache TinkerPop Gremlin for queries as it is the standard used by AWS Neptune for Property Graphs.
Querying data in graph database
Planning all Put/Update/Delete queries to pass through an API/Lambda, however Read/Get queries are able to query the graph directly.
Optimizing data modelling for Neptune
- Neptune's indexing favors vertex ids, edge labels, and edge ids, so try to design that we can bound these in queries
- edge label seems more important than the edge id for optimization, but if can give both is best
- Neptune does not like a lot of edge labels, don't put any unique ids in here, keep to a description of the type of relationship