2022-12-25 AwaitingMultipleSteps GSI issue
AwaitingMultipleSteps DynamoDB table that was
- partition key "idA", sort key "idB"
- GSI partition key "idB", sort key "idA"
When attempting to delete all items after an error (was working in ImportData service) with specific "idB", guerying the GSI to get a list of records and paginating the deletion of records.
If querying against the main index could probably simply re-run the query with a limit and delete each record but using the GSI results in records that have been deleted showing up in subsequent queries due to GSI's eventual consistency, ie a records deletion often does not propogate to the GSI before the next query is invoked.
Tried using LastEvaluatedKey from the previous query response as the ExclusiveStartKey for the next query which should result in only new records being returned, however there is a fair chance the record LastEvaluatedKey points too will no longer exist due to being deleted in the previous iteration, apparently this technique should work.
When that happens weird results are returned. The GSI results often start with the next expected record, but then some records might get skipped, and often the query will not return a LastEvaluatedKey, even though not all remaining records have been returned.
Understand the unusual results come from the original inserts into AwaitingMultipleSteps not propagating to the GSI before the query to start deleting them all.
I see no way to confidently perform the task of querying on the GSI when we need to eg delete all records, because we do not know how long it will take for new records to propagate to GSI, this issue could also present itself when checking if any steps remain for a specific pendingStepId (which was the reason for the GSI), there might also be records that have not propagated yet, causing the check to say all steps complete when it is not the case.
Instead I am managing a second table where the partition and sort keys are reversed. Resource use will be similar, because a GSI consumes the storage of a second table, it increases code complexity though, any changes to one table must also happen on the other table.