2024-05-10 - Saving Feeds to S3

From Izara Wiki
Jump to navigation Jump to search

Service - Export Data

Chose S3 Multipart

  • each part will be reasonably large (processing from one Lambda invocation)
  • can control ordering of connected parts across async processing

Process

Not do below, instead one thread processes all SortResultData records, see Multi Thread Invocation

  • have a set number of records to process per invocation
  • count number of records in SortResultData to decide how many processing threads to begin
  • will also calculate how many Lambda invocations per thread
  • each invocation is a part, so each thread will save multiple parts
  • AwaitingMultipleSteps for each thread, when each thread completes check all steps finished? If yes complete the multipart upload
  • Error/s can be saved to main ExportMain record

Alternative

  • do not control number of records per Lambda invocation
  • each processing thread saves it's data somewhere temporary, eg Elasticache
  • only save processing thread's part to S3 after it completes all records allocated to it

S3 Multipart Uploads

  • can begin a multipart upload and hold that process open until complete
  • assume file will not be available until all parts saved and multipart upload set to complete
  • must pay standard S3 charges for pending data
  • must complete process or pending parts will remain and be charged, but not accessable
  • can batch handle records, ordering them according to part reference numbers

Kineses Filehose Batching

  • S3 calls are expensive, we build larger parts in Lambda, not sure how this would compare to Firehose limits
  • Firehose limits 1MB/record, 4MB/batch
  • Can compress final file before saving to S3

references