Service - Category Tree Standard

From Izara Wiki
Jump to navigation Jump to search

Overview

Handler service for the standard category tree type.

Repository

https://bitbucket.org/stb_working/category-tree-standard/src/master/

DynamoDB tables

Standard Config Table Per Service

Configuration tags

{
	configTag: "CategoryTreeServiceNameTag"
	configKey: "CategoryTreeServiceNameTag"
	configValue: xxx // this own services CategoryTreeServiceNameTag, eg "CategoryTreeStandard"
}
{
	configTag: "CatalogGraphServiceName"
	configKey: "CatalogGraphServiceName"
	configValue: xxx // eg: "CatalogGraph"
}
{
	configTag: "CategoryTreeService"
	configKey: xxx // categoryTreeServiceNameTag, eg: "CategoryTreeStandard", this is what is saved in each catalog record
	configValue: {
		serviceName: xxx // eg: "CategoryTreeStandard", this is the actual deployed service name}
	}
}
{
	configTag: "defaultValue"
	configKey: "locationTreeAreaNodeId"
	configValue: {eg: id for USA, or international?}
}
{
	configTag: "defaultValue"
	configKey: "browseQuantity"
	configValue: {eg: 1}
}

Graph database

Service - Catalog Graph

  • Structure allows for one category to be found at the same level of the graph (same parent) multiple times, but eg with different filters
  • Structure keeps a record of all changes, so can be rolled back eg if a user makes changes incorrectly

Nodes

catalogNode

Is an origin/home/top-level node, one per catalog, allows for top level categories in that catalog to be point to a single origin node.

  • NodeIdentifierLabels: catalogNode
  • NodeIdentifierProperties: catalogId

Properties:

  1. searchType: sellOffer|product|variantProduct, will match the setting in the catalog service
  2. filter: full filter for the catalog, will combine all child categoryNode filters with the setting in the catalog service
  3. requiredData: full requiredData for this catalog, will match the setting in the catalog service

categoryNode

Represents one parent-child relationship in the graph, is never edited or removed from the graph. One category can have any number of categoryNodes

  • NodeIdentifierLabels: categoryNode
  • NodeIdentifierProperties: categoryNodeId - random uuid

Properties:

  1. catalogId (maybe not needed but maybe more efficient if have)
  2. categoryId (maybe not needed but maybe more efficient if have)
  3. searchType: sellOffer|product|variantProduct, will often match catalog's default unless specifically set not to, is the current generated setting and can change regularly
  4. filter: full filter for this node, will often match catalog's default unless specifically set not to, is the current generated setting and can change regularly
  5. requiredData: full requiredData for this node, will often match catalog's default unless specifically set not to, is the current generated setting and can change regularly

categoryNodeSettings

Versioned data Holding the editable settings for a categoryNode.

see 2021-02-22 - Maintaining change history using graph database#Situation 2: Editable settings

Properties:

  1. searchType: sellOffer|product|variantProduct
  2. searchTypeMatchParent: boolean, if true will be updated to always match the parent node's searchType setting, if false must manually update
  3. filter: full or additional filter set for this node, will be empty if matching parent categoryNode's filter
  4. filterMatchParent: none|match|append, if none does not update when parent updates, if match will always match parent, if append will add this node's filter to the parent's
  5. requiredData
  6. requiredDataMatchParent: none|match|append, if none does not update when parent updates, if match will always match parent, if append will add this node's requiredData to the parent's

user

One userId, is never edited or removed from the graph.

  • NodeIdentifierLabels: user
  • NodeIdentifierProperties: userId

category

One categoryId, is never edited or removed from the graph.

  • NodeIdentifierLabels: category_
  • NodeIdentifierProperties: categoryId

Relationships

hasChildCategoryNode / hasDisabledChildCategoryNode

Creates a link between two categoryNode vertices or catalogNode > categoryNode vertices, relationship can be enabled or disabled, one of these relationships will always exist linking the same parent to the same child.

see 2021-02-22 - Maintaining change history using graph database#Situation 1: Boolean setting

changedBy

Creates a link between categoryNode > user nodes, is never edited or removed from the graph. Each time a categoryNode is disabled or enabled a new relationship is created linking the userId and saving the date, so have record of changes

Handled automatically in 2021-02-16 - Graph Handler - Functions#Relationship/ChangeRelationshipType

isCategory

Creates a link between categoryNode > category nodes, is never edited or removed from the graph.

calculating categoryNode's requiredData and searchType

  • In most cases all categoryNodes will share the same settings as the catalog's requiredData and searchType, but we allow for per categoryNode settings.
  • Each categoryNode maintains it's own final setting so can be efficiently pulled when browsing
  • If a categoryNode sets {setting}MatchParent = match it inherits the parent categoryNode's setting, if traversing up the tree to the catalog node all parents inherit, then any changes to the catalog's setting will propagate down to all categoryNodes

When a parent categoryNode (or catalog)'s requiredData or searchType changes

When a categoryNode changes it's settings we will need to traverse down to all children to see which need to be updated, maybe do this per setting, whenever a child categoryNode is found to be {setting}MatchParent = none the traversal can stop there. If a requiredDataMatchParent = append we rebuild that categoryNode's requiredData and continue down the tree.

There could be race conditions when the child gets rebuilt before the parent node gets updated.

Race condition possible solution 1

Send the new requiredData in the message triggering rebuild of children so we do not need to worry about whether the parent has updated yet. There could be race conditions if multiple change submissions are made at one time because an older message might be processed after a newer one.

Race condition possible solution 2

Send in the message the timestamp the parent versionedData was updated, then make sure the parent has updated to this timestamp before rebuilding the child, if the parent's versionedData is dated newer than the message's timestamp then a new change happened before processing the child message and we could skip processing the child because in theory another message will happen for the newer change.

This would fail if the new change does not also update the same setting (eg requiredData), in which case a new message would not come

We would also want to build in a conditional statement/transactional update when we create the new filter to make sure another newer process did not update the data while we were processing. For example we might have some processes updating requiredData, some updating filters

Race condition possible solution 3

Update all settings in one process, the message states which settings have changed and they are processed accordingly, only traverse back up the graph if filter changed.

Have a temporary boolean property that marks the categoryNode ... still no good

Initial settings

  • searchType can only be match or not match (no append)
  • If {setting}MatchParent = none: the setting cannot be empty, and is saved in both the categoryNodeSettings and categoryNode nodes
  • If {setting}MatchParent = match: no requiredData data is saved in categoryNodeSettings and the parent categoryNode's settings value is saved in this categoryNode node
  • If {setting}MatchParent = append: the requiredData cannot be empty and is saved into the categoryNodeSettings node, then appended to the parent's setting and saved in the categoryNode node
  • If the parent is a catalog node the same rules apply but the catalog's filter is used

calculating categoryNode's filter

The categoryNode's stored/active filter uses MatchParent the same as requiredData above, however child categories might include products that are not part of a parent category, when browsing the parent category we want to show all results from the parent's filter as well as all children's combined.

To do this we accumulate all child filters into parent node's final combined filter, if the parent categoryNode and all children share the same catalog default filter (FilterMatchParent=match) this is not difficult because we all children share the parent's filter, we create create the final combined filter be chainging the parent categoryId and all categoryId's for all child categoryNodes in a long or filter, which gets added to the shared filter as an and block.

If some children have custom filters we will need to separate them out as separate filter blocks that group that child's categoryId with it's filter, then appends those blocks to the parent's filter using or statements.

I believe processing these large filters can still be efficient because hash of filter exists for the full (or any partial) filter and we cache results for each part of the filter.

When a categoryNode's filter changes

If a categoryNode or catalog's filter changes, just like requiredData we need to traverse down the tree checking for any filterMatchParent = match|append and rebuild the child node's filter setting accordingly, whenever a categoryNode is found that has no child nodes with filterMatchParent = match|append we can stop traversing down, but need to re-trace back up the tree recalculating the final combined filter that adds in all children filters.

When a categoryNode or catalog's filter setting changes we need to recalculate its filter and all parents going up the tree, this task could be propogated by adding each parent to a queue to regenerate their filters.

Starting at the changed categoryNode, recalculate it's filter by combined in the filters of it's child categoryNodes, once complete send a message to do the same recalculation for it's parent, there is the chance of race conditions (parent tries to recalc before child has updated versionedData/filter) so pass the timestamp of the new versionedData to protect against this.

When creating a new categoryNode

A new categoryNode also needs to traverse up the tree recalculating the filter for all parents

Adding client submitted settings

  • client (or requesting service) can overwrite or adjust these settings

searchType

  • client submitted setting overwrites categoryNode's

filter

  • would get added as an and grouped filter

requiredData

  • client setting overwrites categoryNode's if set

Top level results

  • Each catalog has a top level record saved into CategoryNode table, categoryId = 0, this will be a combination of catalog filter, and all child categoryIds

Ideas

  • This service could hold a list of Products for each category and do things like record popularity etc.. partial lists would be OK, anything we want to add. For features like popularity might not want to remove products when they no longer match the catagory, might want to maintain their details in case get added again. This type of idea might be served through the graph database.
  • Our current structure allows one category to have multiple parents, that child category will have the same settings no matter what path you travel through the tree to reach it. Not sure how to handle presentation of the parent category/location for any category, maybe most popular, or simply first found?

Working documents

Category Tree Standard