2023-03-12 - ImportData CSV Process

From Izara Wiki
Jump to navigation Jump to search

Service - Import Data

Process

Client Sends File

  1. Frontend chooses CsvImportConfig (and can create new), selects file to upload
  2. After clicking submit client requests permission to upload to S3, then sends request to create new importBatchMainId (if can do both in same request is better), then sends file to S3 with indicator of the importBatchMainId, eg as a tag on the file
  3. Can perhaps open a websocket to be notified when complete, or have a page that polls for status

Process CSV File

  1. Upload to S3 triggers Lambda in ImportData service to begin processing file
  2. Read the file in sections for batch processing, maybe as a set section size and handling multiple sections per invocation depending on resource use
  3. Field order might come from CsvImportConfig or might come from title row, organize these first
  4. Record any columns that match CsvImportConfig.objectTypes.columnName, which objectType/fieldName for each column
  5. Skip any ignoreRows

Each field in each data row

  1. Array of objectTypes for each row (in case have columnName objectTypes)
  2. Check if is enclosed from CsvImportConfig, if yes check if starts with enclose string (in case optional)
  3. Search from start to find end of field, search for field delimiter or enclose string depending on whether enclosed or not, any found need to check if escaped
  4. After found end of field remove enclose strings if exist
  5. Record what object type is being added, based on CsvImportConfig.objectTypes setting
  6. Replace out any escaped characters
  7. Store fieldName and value into array of objectTypes, perhaps multiple objects per row
  8. Record any link references found
  9. Save into PendingObject and PendingLink tables