Service - Import Data: Difference between revisions
Line 71: | Line 71: | ||
} | } | ||
rowNumber: "xx", // the order in which this record was extracted from source file | rowNumber: "xx", // the order in which this record was extracted from source file | ||
objectMainStatus: "xx", // processing|creating|complete | objectMainStatus: "xx", // processing|creating|complete|error | ||
errorsFound: {}, | errorsFound: {}, | ||
} | } | ||
Line 104: | Line 104: | ||
importBatchId: "xx", | importBatchId: "xx", | ||
pendingLinkId: "xx", // {pendingObjectId}_{referenceId}_{linkTag} | pendingLinkId: "xx", // {pendingObjectId}_{referenceId}_{linkTag} | ||
linkStatus: "xx", // processing|creating|complete | linkStatus: "xx", // processing|creating|complete|error | ||
errorsFound: {}, // previously not added, considering adding so links can store their errors independent of pendingObjects | errorsFound: {}, // previously not added, considering adding so links can store their errors independent of pendingObjects | ||
} | } |
Revision as of 09:11, 23 August 2023
Overview
Orchestrates importing of objects/data into project.
Repository
https://bitbucket.org/izara-core-import-export-data/izara-core-import-data-import-data/src/master/
DynamoDB tables
Standard Config Table Per Service
Configuration tags
{
configKey: "objectType",
configTag: "xx" // {objectType, eg: sellOffer/Product/VariantProduct etc..}
configValue: {
createObjectServiceName: "xx" // {service name service that handles this type}
parentLinks: {
{parent objectType}: {
{linkTag}: {
"linkType": "yy", // Dependent|Independent
"separateDependentLinkCreate": true, // for Dependent only, default is false
"createLinkServiceNames": "yy", // if not exist, does not send to external service, sets link as complete
}
}
},
childObjectTypes: [] // maybe not needed?
fieldNames: { // list of possible fieldNames, will ignore fields found in file that are not listed here
{fieldName}: {
// no settings yet
}
}
}
},
- separateDependentLinkCreate: if set to true then a dependent link will send a separate request to external service after parent and child objects created. Default is false where the link is created in the same request as creating the child object.
ImportBatchMain
{
importBatchId: "xx", // random uuid
userId: "xx", // target userId
submittedByUserId: "xx", // submitted by userId
startTime: currentTime.getTime(),
importType: "xx", // "csv"|"xml"|...
importConfigId: "yy" // dependent on importType = "csv"
importBatchStatus: "xx", // "* NOT YET:processingRawRecords" | "processingObjects" | "error" | "complete"
}
- partition key: importBatchId
- sort key: {none}
PendingObjectMain
One item per object that needs to be created.
{
importBatchId: "xx",
pendingObjectId: "xx", // hash of object with importBatchId, userId, objectType, and fields properties
objectType: "xx", // eg variant|product|sellOffer|sellOfferPrice|sellOfferPlan|...
fields: {}, // key is the name of the field
identifierIds: { // used to identify an existing object, fields is ignored if this is found
{name of identifier property}: {value}
}
rowNumber: "xx", // the order in which this record was extracted from source file
objectMainStatus: "xx", // processing|creating|complete|error
errorsFound: {},
}
- partition key: importBatchId
- sort key: pendingObjectId
- there is the possibility of a feed sending the same objectType and fields multiple times in a single request, above design will only handle this once, I think this is OK, maybe do a check and add error if duplicates found
PendingObjectReference
Creates a link between submitted referenceId and saved object, so can find when other objects reference it.
{
importBatchId: "xx",
referenceId: "xx", // objectType_{feed supplied referenceId}
pendingObjectId: "xx",
}
- partition key: importBatchId
- sort key: referenceId
- when creating maybe throw error if item exists with different pendingObjectId
PendingLink
One item per link between objects.
{
importBatchId: "xx",
pendingLinkId: "xx", // {pendingObjectId}_{referenceId}_{linkTag}
linkStatus: "xx", // processing|creating|complete|error
errorsFound: {}, // previously not added, considering adding so links can store their errors independent of pendingObjects
}
- partition key: importBatchId
- sort key: pendingLinkId
CsvImportConfig
{
csvImportConfigId: "xx", // random uuid
userId: "xx", // user who controls/created/owns the config
recordDeliminator: "\n",
fieldDeliminator: ",",
escapeString: "\\",
removeFloatingEscapeString: true, // default: false, removes single escape strings that do not precede expected escapedStrings
removeWhiteSpace: false, // default: true, removes any spaces/tabs/enters at start or end of fields
enclose: [
{
openEnclose: "\"",
closeEnclose: "\"",
alwaysEnclose: "always" // "always"|"optional", default always // NOT SURE NEEDED, maybe always check if exists or not
fieldNames: [],
}
],
fieldNames:{
fixed: { // used to fix which columns are which fieldNames, eg when no title row exists
columnNumber: "{fieldName}",
// or
columnNumber: {
fieldname: 'name',
objectType: 'productattribute',
instance: 'xx'
}
}
// or
titleRow: # // which row has the fieldNames
titleRowOpenEnclose: "\"",
titleRowCloseEnclose: "\"",
replacefieldNames: {
{fromFieldName}: "{toFieldName}", // refactor existing fieldNames to our names
},
},
ignoreRows: [], // row numbers to skip
overwriteFields: {
"fieldName": "{replaceToValue}", // eg: if feed is missing a field we can hardcode it here
},
objectType: {
// for fields that do not match any subObjects:
// then for each row look for setObjectTypeFieldNames, if any match then use the first found to be the mainObjectType for this row
// if no setObjectTypeFieldNames match, use mainObjectType setting
// if no mainObjectType is set field is invalid
"setObjectTypeFieldNames": { // if a value in a specific field sets the rows objectType
{fieldName}: {
{fieldValue}: {
serviceTag: "xx",
objectType: "yy"
} // index is the value found in the field, matches to the specified objectType
},
// .. can look in multiple fields to find the matching objectType, will use the first one found
},
mainObjType: { // this is a catchall objectType for any fieldNames that do not match setObjectTypeFieldNames filters
serviceTag: "xxx",
objectType: "xxx",
},
subObjects:[
{
objType: {
serviceTag: "xxx",
objectType: "xxx",
}
searchPattern: "xxx", // regexp search of the column name, if matches then is the associated objectType
instancePattern: "after productattribute and before colon", // extract from the column name the instance identifier for this object, eg if one row creates multiple product attributes. Optional, if not set use empty string as instance
fieldNamePatterns: [ // optional, if not set will check fieldNameSearchPattern, or if none found/set will be a null column
{
fieldNamePattern: "yyy", // extract from the column name the fieldname for this object
fieldName: "zzzz",
},
// ....
],
// check fieldNamePatterns first, if none match, check fieldNameSearchPattern to extract the fieldname
fieldNameSearchPattern: "after colon", // regExp that pulls out the fieldname, optional
referenceFieldName: "xx", // fieldName that sets the string referenceId for the found pendingObject
referenceLinks: {
// can reference objects from other rows or within same row
{fieldName}: "xx", // {fieldName} value in feed points to another object's reference, config value is the linkTag
//..
},
automaticLinks: [
{ // automatically create links between objects created on the same record
objType: {// which objectType to link to
serviceTag: "xxx",
objectType: "xxx",
},
instance: "tt", // which instance identifier to link to
linkTag: "zz",
},
{
mainObject: true,
linkTag: "xx",
}
// ..
],
actionField: {
fieldName: "xx",
createValue: "c",
updateValue: "u",
referenceValue: "r"
},
enclose: [
{
openEnclose: "\"",
closeEnclose: "\"",
alwaysEnclose: "always" // "always"|"optional", default always // NOT SURE NEEDED, maybe always check if exists or not
fieldNames: [],
}
],
},
// ...
],
referenceColumnNames: ["xx","yy"], // columnNames that set the string referenceId for each mainObjectType pendingObject, array in case multiple fields might set reference, if multiple are set is not defined which will be used
referenceLinks: {
// reference columnName for the mainObjectType for any each row
{columnName}: "xx", // value is the linkTag, one reference column will match to one linkTag
//..
}
actionColumn: {
columnName: "xx",
createValue: "c",
updateValue: "u",
referenceValue: "r"
}
},
// not sure still useful? maybe old idea
linkFields: {
"fromFieldName": { // if a value in specific fields sets the reference for any links
{fieldName}: {
{fieldValue}: {reference} // index is the value found in the field, matches to the specified objectType
}
},
},
}
- partition key: csvImportConfigId
- sort key: {none}
actionColumn
Is optional, controls whether this pendingObject will be created, updated, or used as a reference for links. If actionColumn is not set then system will perform the action according to the below rules:
- Create: if no identifier fields are not set then will attempt to create the object according to found fields
- Update: if all identifier fields are set and some other fields are set, will attempt to update
- Reference: if all identifier fields are set and no fields are set will use as a reference (check exists)
- case: some identifiers are set: process pendingObject set as error/failed
If actionColumn is set will fail if the following:
- Update or Reference and not all identifier fields set
- Create and any identifier fields set
- value does not match any of the create/update/reference values
If actionColumn is set to Reference, any fields found will be ignored.
UsersCsvImportConfig
{
userId: "xx", // user who owns the csvImportConfig
csvImportConfigId: "xx",
}
- partition key: userId
- sort key: csvImportConfigId
External service requests
Both createObject and createLink.
- standard Lambda like ProcessLogical and FindData
- ImportData subscribes to standard complete topic: CreateObjectComplete
- Each external service can handle processing and recording the pendingObjectId to return in their own way, eg by having another table that links the external service's identifier with the ImportData pendingObjectId, querying this table to make CreateObjectComplete message
- userId is also sent as need to record user creating objects
- pendingObjectId is also sent, must be returned in CreateObjectComplete message
Linking objects
Two types of linking:
- Independent: Objects can be created independently of each other in any order
- Dependent: One object must be created before the other in a specific order
for Config setting objectType.parentLinks either object can be the parent for Independent links, for Dependent links the object created first is considered the parent.
Two object types may have multiple types of links connecting them so each parentLinks element has a list of linkTags which reference what type of link is being created.
For each parentLinks element there will be a matching entry in the other object's childObjectTypes array.
One object might have multiple other objects dependent on it to be created, or be dependent on many other objects being created.
Object hierarchy and field schema
- Some fields will be required, some optional
- some fields possibly have system defaults
- perhaps user can setup default templates (do later if has value)
- schema will need to state identifier fields for each object, if set in feed Import Data knows is pointing to existing child/parent object, if empty needs to create new
- perhaps each objectType states it's child objects, as more likely to be aware of these than parent objects
where to store/set schema
Considering external service delivers this to ImportData in Initial Setup, as seed data injected directly into Import Data Config Dynamo table.