Multi Thread Invocation
NOT YET USING
Chose not to use this concept yet because no simple way of splitting sets of records queried from Dynamo into sets, eg: thread#1 process Dynamo pagination 1/6/11.., thread#2 process pagination 2/7/12..
Reason is because to paginate Dynamo results we need the last evaluated key, but how do we find this? Each time we skip some records we need to query all records up to where we skip to, to get the last evaluated key.
Alternatives are:
- Query just the keys up to the last record, then query the full items
- Do one full iteration of all records and cache the start keys per page
- If we insert records consecutively we could cache the start keys as we insert
- Have a Global Secondary Index which is an increment value 1,2,3.., this could be directly queried to get page results
All the above add additional resources and complexity, need to weigh this against the gains of multithread processing which is primarily completing the tasks faster. I decided that most tasks will not justify the costs.
Perhaps tasks that are time consuming the additional cost would be worth it, but most tasks that are time consuming are async and the multi thread part affects the first step only.
Overview
Manages processing multiple sets of one data set at the same time, including a maximum number of threads per data set. Each invocation for each thread processes a set number of records.
Process
- originating function calculates the number of threads needed using the total count of records, max number of threads, and records processed per invocation
- each thread can paginate through the data set
- LastEvaluatedKey is required for pagination, thread#1 initially does not require this as starts from 0, but all other threads need to find the appropriate LastEvaluatedKey for where they begin
- Each invocation checks if there are additional pages it needs to process using total count of records and records processed per invocation
- Initial calling function saves AwaitingMultipleSteps for each thread
- as each thread detects there are no remaining pages to process it clears it's AwaitingMultipleStep and checks if any awaiting steps remain, if not it continues the work flow