2022-02-03 - Rebuild CategoryNode Settings

From Izara Wiki
Jump to navigation Jump to search

Service - Category Tree Standard

Overview

  • each categoryNode stores it's own final calculated searchType/filter/requiredData
  • each categoryNode has settings whether searchType/filter/requiredData match it's parents
  • each categoryNode's filter includes all unique child filters
  • If searchType/requiredData change, we need to propagate that change to all children that set to match parent's setting
  • If filter setting change, we need to propagate that change to all children that set to match parent's setting, and also need to propagate up to all parent categoryNodes

2022-06 Idea

  • categoryNodeRebuild settings is triggeredCache
  • only filter needs to propogate up the parents when changed, searchType and requiredData propogate down only
  • a parent includes all filters of it's childrens connected by "OR", as well as it's own filter
  • when make change, reset trigger, check status, if processing then we stop (let previous process restart once it is finished), if not processing we invoke rebuildSettings processing
  • flow is similar to timeCacheComplete check, the second request does not start it's processing if status was "processing", instead it resets uniqueRequestId, then the original flow checks this at any number of points, but at the least after the final change, if uniqueRequestId has been reset the original flow is responsible for restarting it's processing.
  • a changed parent categoryNode checks all children categoryNodes to see if any update from parent, if yes updates them, then resets the trigger for them and saves awaitingMultiple steps (will also need to handle case where none update from parent or there are no children)
  • if can do it, most efficient would be each time we update a categoryNode, work out which settings changed, when we iterate each child to see if any set to match parent only check the settings that changed. If none can stop traversing down. When reach that point where no children to process start the traveral back up again for updating filter (the upwards traversal must always go all the way up)
  • when a child completes it invokes a function that checks if parent has any remaining awaitingSteps, if not it rebuilds the parents filter, then when saving it checks if uniqueRequestId set to 0(reset), if yes then need to restart at triggering all children, if no means trigger has not been reset and we can continue passing up to parent
  • there is the case where a categoryNode mid-tree was the one that initially changed, it's parents will not have saved any awaitingSteps, I think that is OK because if any categoryNode updates it's filter we always invoke parent function, and if no awaitingSteps were saved then that parent's filter will be rebuilt
  • if a categoryNode's filter does not change we still have to pass up the parent chain in case there is a parent that is awaiting that branch of categoryNode's to complete (we don't know if the parent has no awaitingSteps remaining, or if it simply has not been changed).

old notes

https://izara.io/wiki/index.php/Service_-_Category_Tree_Standard

- can have multiple changes submitted at any level of the graph affecting any other level of the graph and crossing over each other - can have different combinations of settings changed, eg sometimes reqData only, some filter only, some all 3 - for example when we call Vertex/ReplaceEdge to change a categoryNode enabled/disabled there is no guarantee it will actually create the new edge with the given timestamp (eg if another request already deleted the fromEdgeId), but there is no way for the requesting Lamdda to know this because is async, needs to work into the timestamp method of protecting from race conditions - also process needs to be idempotent - this could be done be setting the categoryNodes setting at the start of processing (using conditional expression), if a subsquent request for the same node comes in it would fail that conditional expression and not continue

new idea 2021-03-11c

- when reach a node we conditionally set a switch to say it is being processed - if another traversal reaches that node we the switch to saying must recalc and that second traversal stops - so never two traversals calculating a single node - if a new change happens on a node that is currently processing, set it must recalc. Do this after updating data (will need some timestamp to know that the data has saved before recalc - that might need to be added as an additional property) - when the first traversal tries to save its recalculated data it checks the switch, if set as processing can proceed, if set as recalc will need to start again, either sending a new msg or looping in the same invocation (but adding a timeout check to send a new msg)

new idea 2021-03-11b

  • still see probs, how do we know when we should recalc, as more traversals backup behind this processing?

- store both the read and write timestamp(thinking counter is better) for any recalc/update of the data - the read timestamp (or counter/id) is the version of the data that was used to recalc the new data - not sure if timestamp will be accurate enough, maybe a counter or uuid? a small versioning counter might be smaller too - when we read the data we have the existing counter, when we save the recalc we check the conditionally that the counter is still the same, if it has changed we would need to recalc(?)

new idea 2021-03-11

  • does not work because if both traversals read data but then the second updated read finishes after the first older read it will not save its correct data

- maybe don't need the complex support structure - downward traversal as long as there is any recalc after any higher level change, then the child nodes will ultimately calc out correct, as long as the traveral does not bunnyhop itself - we can stop it bunnyhopping by having a timestamp when the data updated and stopping the traversal if the data changes between read and write in the recalc logic - we can stop processing because the only way the timestamp would change is if a newer traversal updates it - I think this will work because as long as the traversal stops in this situation, then it is not possible for the older traversal to bunnyhop back in the lead (probably would not matter at that point because ..

old idea

- work on filter only first - use timestamp or uuid to mark each process - have a dynamo table/s that record the update status - to ensure the graph database record has been updated before we initialize starting the update process, maybe we trigger the first step from MsgOut of the graph handler, so can guarantee the process gets triggered at a time after the last graph update completes (if a new update happens it would trigger again so will always have a message that arrives after the most recent update - as we traverse down we ensure each parent node has been upated by passing down on a timestamp of the parent nodes update time, and checking this before proceeding - as we traverse up we maybe check that all children have no active process - maybe store the downward rebuild filter in Dynamo rather than updating the graph, only update the graph on the upward traversal, this way we save on resource use? Only have versioned data for the completed recalculated filter [when starting the process] - see if the node already has an active process, if not set this as the active process and begin working down - if the node already has an active process add this process to the queue waiting to process and do not start working yet - when traversing down if meet a node that has an active process also add to the queue at that node and stop working down - in both cases when adding to the queue re-check after adding that there is still an active process, if not there was a race condition and we can bump the next process from the queue to being active and begin its working down - when a process works back up if it hits a node with queue entries stop processing upwards, bump the next element in the queue into active and start processing down again - as we traverse up it doesn't matter if the active process for a parent node does not match the current process, as long as all child nodes no longer have active process we can trust the parent can be recalculated, recalc multiple times on the upward traversal is not a problem, but we can use conditional statement marking the node as processing + timestamp received in request to make sure we don't double up the upward traversal, if we reach a node that is set to processing but the timestamp is before the current one we can re-update the status and timestamp overwriting the old processing setting, the old processing process tries to conditional update its result but fails because the status does not match, in which case it stops, the new process will match once it is ready and will continue - if there is an upward traversal above a new entry node that should not be a problem, it will complete and then be overwritten by the new one, or the new traversal will catch up and overwrite the old one

old notes: When a parent categoryNode (or catalog)'s requiredData or searchType changes

When a categoryNode changes it's settings we will need to traverse down to all children to see which need to be updated, maybe do this per setting, whenever a child categoryNode is found to be {setting}MatchParent = none the traversal can stop there. If a requiredDataMatchParent = append we rebuild that categoryNode's requiredData and continue down the tree.

There could be race conditions when the child gets rebuilt before the parent node gets updated.

Race condition possible solution 1

Send the new requiredData in the message triggering rebuild of children so we do not need to worry about whether the parent has updated yet. There could be race conditions if multiple change submissions are made at one time because an older message might be processed after a newer one.

Race condition possible solution 2

Send in the message the timestamp the parent versionedData was updated, then make sure the parent has updated to this timestamp before rebuilding the child, if the parent's versionedData is dated newer than the message's timestamp then a new change happened before processing the child message and we could skip processing the child because in theory another message will happen for the newer change.

This would fail if the new change does not also update the same setting (eg requiredData), in which case a new message would not come

We would also want to build in a conditional statement/transactional update when we create the new filter to make sure another newer process did not update the data while we were processing. For example we might have some processes updating requiredData, some updating filters

Race condition possible solution 3

Update all settings in one process, the message states which settings have changed and they are processed accordingly, only traverse back up the graph if filter changed.

Have a temporary boolean property that marks the categoryNode ... still no good