Stored Cache

From Izara Wiki
Jump to navigation Jump to search

Overview

System of storing a cached object until an expiry time is reached. The cached object has a flow that creates the data stored in the cache, this flow takes time, we attempt to have one request re-process the object at any time to avoid corrupt data from race conditions.

Ensuring one request handles processing

The entry point for using the cached object, or regenerating it if expired (CheckStoredCache) queries Dynamo for the record, if it is expired it attempts to reset status to processing and set the current invocation's uniqueRequestId in the record, it also resets expiryTime and errorsFound fields.

The DynamoDB query updating the record has a conditional that checks the expiryTime has not changed since the record was queried, this should ensure only one request continues processing the object.

If the record is set to status processing then any additional requests will stop execution, they will continue once the Complete message is sent.

The record is considered finished when it is set to status complete or error, when complete the cached results are sent back to any request, when status is error the calling request receives that status also to handle as needed.

Idempotence

There is the chance the invocation that processes the object may fail before completion, eg if Lambda times out, this would cause the SQS message triggering the lambda to be resent, in this case we want the second request to begin processing so we also test to see if uniqueRequestId matches the current request, if yes we continue processing.

Time out

When we set the record to processing we also set an expiryTime, this is used to ensure the re-processing completes, if future requests see the record is set to processing but the expireTime has passed, this means it has taken too long to complete, in this case we reset and begin processing again, with a warning or error message sent to administrators.

Data Structure

StoredCache fields can be stored in the objects primary Dynamo record, when we check the storedCache we also return the full Dynamo record so it can be used in the resulting logic (eg sent to a complete topic if using cache).

One object might have multiple storedCache sets attached to it, eg if there are multiple flows or tasks that need to be performed for that object. All sets can be saved in the main Dynamo record together, use storedCache prefix to separate the fields.

ideas

Currently we use Dynamo to store cache because it scales, is fast, has permanence (eg have long expiryTimes for complex objects) and has conditionals to protect against race conditions. As we scale it might be cheaper to use a more transient storage option such as ElastiCache.

dynamic expiryTime

Currently expiryTime is hardcoded, perhaps on a per object type basis, we could create a adjust the expiryTime on a per record basis, for example search results that are popular have them recalculate more often.