Service - Import Data: Difference between revisions
No edit summary |
No edit summary |
||
Line 104: | Line 104: | ||
* currently think of objects that can be independently created, then the link gets made, but perhaps could also use for cases where one object must be created before another can be. | * currently think of objects that can be independently created, then the link gets made, but perhaps could also use for cases where one object must be created before another can be. | ||
== CsvImportConfig | == CsvImportConfig == | ||
<syntaxhighlight lang="JavaScript"> | <syntaxhighlight lang="JavaScript"> | ||
Line 134: | Line 134: | ||
* sort key: {none} | * sort key: {none} | ||
== UsersCsvImportConfig == | |||
== UsersCsvImportConfig | |||
<syntaxhighlight lang="JavaScript"> | <syntaxhighlight lang="JavaScript"> |
Revision as of 14:42, 7 March 2023
Overview
Orchestrates importing of objects/data into project.
Repository
https://bitbucket.org/izara-core-import-export-data/izara-core-import-data-import-data/src/master/
DynamoDB tables
Standard Config Table Per Service
Configuration tags
{
configKey: "objectType",
configTag: "xx" // {objectType, eg: sellOffer/Product/VariantProduct etc..}
configValue: {
createObjectServiceName: "xx" // {service name service that handles this type}
parentLinks: {
{parent objectType}: {
{linkTag}: {
"linkType": "yy", // Dependent|Independent
"separateDependentLinkCreate": true, // for Dependent only, default is false
"createLinkServiceNames": "yy", // if not exist, does not send to external service, sets link as complete
}
}
},
childObjectTypes: [] // maybe not needed?
}
},
- separateDependentLinkCreate: if set to true then a dependent link will send a separate request to external service after parent and child objects created. Default is false where the link is created in the same request as creating the child object.
ImportBatchMain
{
importBatchId: "xx", // random uuid
userId: "xx", // submitted by userId
startTime: currentTime.getTime(),
importType: "xx", // "csv"|"xml"|...
csvImportConfigId: "yy" // dependent on importType = "csv"
importBatchStatus: "xx", // "* NOT YET:processingRawRecords" | "processingObjects" | "error" | "complete"
}
- partition key: importBatchId
- sort key: {none}
PendingObjectMain
One item per object that needs to be created.
{
importBatchId: "xx",
pendingObjectId: "xx", // hash of object with importBatchId, userId, objectType, and fields properties
objectType: "xx", // eg variant|product|sellOffer|sellOfferPrice|sellOfferPlan|...
fields: {}, // key is the name of the field
rawRecordId: "xx",
objectMainStatus: "xx", // processing|creating|complete
errorsFound: {},
}
- partition key: importBatchId
- sort key: pendingObjectId
- there is the possibility of a feed sending the same objectType and fields multiple times in a single request, above design will only handle this once, I think this is OK, maybe do a check and add error if duplicates found
PendingObjectReference
Creates a link between submitted referenceId and saved object, so can find when other objects reference it.
{
importBatchId: "xx",
referenceId: "xx", // objectType_{feed supplied referenceId}
pendingObjectId: "xx",
}
- partition key: importBatchId
- sort key: referenceId
- when creating maybe throw error if item exists with different pendingObjectId
PendingLink
One item per link between objects.
{
importBatchId: "xx",
pendingLinkId: "xx", // {pendingObjectId}_{referenceId}
linkTag: "xx",
linkStatus: "xx", // processing|creating|complete
}
- partition key: importBatchId
- sort key: pendingLinkId
- currently think of objects that can be independently created, then the link gets made, but perhaps could also use for cases where one object must be created before another can be.
CsvImportConfig
{
csvImportConfigId: "xx", // random uuid
userId: "xx",
recordDeliminator: "/n",
fieldDeliminator: ",",
enclose: {
openEnclose: "/"",
closeEnclose: "/"",
fieldNames: [], // or fieldIndexes: [],
},
fieldnames:{
fixed: { // used to fix which columns are which fieldnames, eg when no title row exists
columnNumber: "{fieldname}",
}
// or
titleRow: # // which row has the fieldnames
},
ignoreRows: [], // row numbers to skip
replaceFieldnames: {
fromFieldname: "{toFieldname}", // refactor existing fieldnames to our names
}
}
- partition key: csvImportConfigId
- sort key: {none}
UsersCsvImportConfig
{
userId: "xx", // user who owns the csvImportConfig
csvImportConfigId: "xx",
}
- partition key: userId
- sort key: csvImportConfigId
(unused) ImportBatchErrors
{
importBatchId: "xx",
errorId: "xx", // random uuid
error: "xx",
}
- partition key: importBatchId
- sort key: {none}
(unused) RawRecord
- NOT YET: maybe move into CSV processing, as each format might have own way of handling per line/record formats
Is a raw copy, split out into fields, of one submitted record. One record may have multiple objects in it's fields.
{
importBatchId: "xx",
rawRecordId: "xx", // random uuid
fields: {}, // key is the name of the field
recordNumber: ##, // eg the line number of the record
rawRecordStatus: "xx",
errorsFound: {},
}
- partition key: importBatchId
- sort key: rawRecordId
(unused) RawRecordAwaitingProcess
- NOT YET: maybe move into CSV processing, as each format might have own way of handling per line/record formats
Is a list of raw records waiting to be saved into PendingObjectMain so can be handled asynchronously, and trigger next step of process when all complete.
{
importBatchId: "xx",
rawRecordId: "xx",
}
- partition key: importBatchId
- sort key: rawRecordId
External service requests
Both createObject and createLink.
- standard Lambda like ProcessLogical and FindData
- ImportData subscribes to standard complete topic: CreateObjectComplete
- Each external service can handle processing and recording the pendingObjectId to return in their own way, eg by having another table that links the external service's identifier with the ImportData pendingObjectId, querying this table to make CreateObjectComplete message
- userId is also sent as need to record user creating objects
- pendingObjectId is also sent, must be returned in CreateObjectComplete message
Linking objects
Two types of linking:
- Independent: Objects can be created independently of each other in any order
- Dependent: One object must be created before the other in a specific order
for Config setting objectType.parentLinks either object can be the parent for Independent links, for Dependent links the object created first is considered the parent.
Two object types may have multiple types of links connecting them so each parentLinks element has a list of linkTags which reference what type of link is being created.
For each parentLinks element there will be a matching entry in the other object's childObjectTypes array.
One object might have multiple other objects dependent on it to be created, or be dependent on many other objects being created.
Object hierarchy and field schema
- Some fields will be required, some optional
- some fields possibly have system defaults
- perhaps user can setup default templates (do later if has value)
- schema will need to state identifier fields for each object, if set in feed Import Data knows is pointing to existing child/parent object, if empty needs to create new
- perhaps each objectType states it's child objects, as more likely to be aware of these than parent objects
where to store/set schema
Considering external service delivers this to ImportData in Initial Setup, as seed data injected directly into Import Data Config Dynamo table.