Service - Import Data: Difference between revisions

From Izara Wiki
Jump to navigation Jump to search
 
(75 intermediate revisions by 2 users not shown)
Line 23: Line 23:
{linkTag}: {
{linkTag}: {
"linkType": "yy", // Dependent|Independent
"linkType": "yy", // Dependent|Independent
"separateDependentLinkCreate": true, // for Dependent only, default is false
"createLinkServiceNames": "yy", // if not exist, does not send to external service, sets link as complete
"createLinkServiceNames": "yy", // if not exist, does not send to external service, sets link as complete
}
}
Line 28: Line 29:
},
},
childObjectTypes: [] // maybe not needed?
childObjectTypes: [] // maybe not needed?
fieldNames: { // list of possible fieldNames, will ignore fields found in file that are not listed here
{fieldName}: {
// no settings yet
}
}
}
}
},
},
</syntaxhighlight>
</syntaxhighlight>
* separateDependentLinkCreate: if set to true then a dependent link will send a separate request to external service after parent and child objects created. Default is false where the link is created in the same request as creating the child object.


== ImportBatchMain ==
== ImportBatchMain ==
Line 37: Line 45:
{
{
importBatchId: "xx", // random uuid
importBatchId: "xx", // random uuid
userId: "xx", // submitted by userId
userId: "xx", // target userId
submittedByUserId: "xx", // submitted by userId
startTime: currentTime.getTime(),
startTime: currentTime.getTime(),
batchConfig: {}, // same as request
importType: "xx", // "csv"|"xml"|...
status: "xx", // "* NOT YET:processingRawRecords" | "processingObjects" | "error" | "complete"  
importConfigId: "yy" // dependent on importType = "csv"
importBatchStatus: "xx", // "* NOT YET:processingRawRecords" | "processingObjects" | "error" | "complete"  
errorsFound: {},
processingError: false // true|false, if processing object/links has error set this to true
}
}
</syntaxhighlight>
</syntaxhighlight>
Line 47: Line 59:
* sort key: {none}
* sort key: {none}


== ImportBatchErrors ==
== PendingObjectMain ==
 
One item per object that needs to be created.


<syntaxhighlight lang="JavaScript">
<syntaxhighlight lang="JavaScript">
{
{
importBatchId: "xx",
importBatchId: "xx",
errorId: "xx", // random uuid
pendingObjectId: "xx", // hash of object with importBatchId, userId, objectType, and fields properties
error: "xx",  
objectType: "xx", // eg variant|product|sellOffer|sellOfferPrice|sellOfferPlan|...
fields: {}, // key is the name of the field
identifierIds: { // used to identify an existing object, fields is ignored if this is found
{name of identifier property}: {value}
},
action: "xx", // create|update|reference|error
rowNumber: "xx", // the order in which this record was extracted from source file
objectMainStatus: "xx", // processing|creating|complete|error
errorsFound: {},
}
}
</syntaxhighlight>
</syntaxhighlight>


* partition key: importBatchId
* partition key: importBatchId
* sort key: {none}
* sort key: pendingObjectId
* there is the possibility of a feed sending the same objectType and fields multiple times in a single request, above design will only handle this once, I think this is OK, maybe do a check and add error if duplicates found


== RawRecord ==
== PendingObjectReference ==


* NOT YET: maybe move into CSV processing, as each format might have own way of handling per line/record formats
Creates a link between submitted referenceId and saved object, so can find when other objects reference it.
Is a raw copy, split out into fields, of one submitted record. One record may have multiple objects in it's fields.


<syntaxhighlight lang="JavaScript">
<syntaxhighlight lang="JavaScript">
{
{
importBatchId: "xx",
importBatchId: "xx",
rawRecordId: "xx", // random uuid
referenceId: "xx", // {feed supplied referenceId}
fields: {}, // key is the name of the field
pendingObjectId: "xx",
recordNumber: ##, // eg the line number of the record
status: "xx",
errorsFound: {},
}
}
</syntaxhighlight>
</syntaxhighlight>


* partition key: importBatchId
* partition key: importBatchId
* sort key: rawRecordId
* sort key: referenceId
* when creating maybe throw error if item exists with different pendingObjectId


== RawRecordAwaitingProcess ==
== PendingLink ==


* NOT YET: maybe move into CSV processing, as each format might have own way of handling per line/record formats
One item per link between objects.
Is a list of raw records waiting to be saved into PendingObjectMain so can be handled asynchronously, and trigger next step of process when all complete.


<syntaxhighlight lang="JavaScript">
<syntaxhighlight lang="JavaScript">
{
{
importBatchId: "xx",
importBatchId: "xx",
rawRecordId: "xx",
pendingLinkId: "xx", // {pendingObjectId}_{referenceId}_{relationshipTag}
linkStatus: "xx", // processing|creating|complete|error
errorsFound: {}, // previously not added, considering adding so links can store their errors independent of pendingObjects
}
}
</syntaxhighlight>
</syntaxhighlight>


* partition key: importBatchId
* partition key: importBatchId
* sort key: rawRecordId
* sort key: pendingLinkId
 
== CsvImportConfig ==
== PendingObjectMain ==
 
One item per object that needs to be created.


<syntaxhighlight lang="JavaScript">
<syntaxhighlight lang="JavaScript">
{
{
importBatchId: "xx",
csvImportConfigId: "xx", // random uuid
objectId: "xx", // hash of object with importBatchId, userId, objectType, and fields properties
userId: "xx", // user who controls/created/owns the config
objectType: "xx", // eg variant|product|sellOffer|sellOfferPrice|sellOfferPlan|...
recordDeliminator: "\n",
fields: {}, // key is the name of the field
fieldDeliminator: ",",
rawRecordId: "xx",
escapeString: "\\",
status: "xx", // processing|creating|complete
removeFloatingEscapeString: true, // default: false, removes single escape strings that do not precede expected escapedStrings
errorsFound: {},
removeWhiteSpace: false, // default: true, removes any spaces/tabs/enters at start or end of fields
fieldNames:{
fixed: { // used to fix which columns are which fieldNames, eg when no title row exists
columnNumber: {
objectTypeConfigIndex: 0,
        fieldname: 'name', // optional, if not set use standard method in objectTypeConfigIndex
objType: {  // optional, if not set use standard method in objectTypeConfigIndex
serviceTag: "xxx",
objectType: "xxx",
},
        instance: 'xx' , // optional, if not set use standard method in objectTypeConfigIndex
openEnclose: "\"", // optional , if not set use titleRowOpenEnclose
closeEnclose: "\"" // optional , if not set use titleRowCloseEnclose
}
}
// or
titleRow: # // which row has the fieldNames
titleRowOpenEnclose: "\"",
titleRowCloseEnclose: "\"",
replacefieldNames: {
{fromFieldName}: "{toFieldName}", // refactor existing fieldNames to our names
},
},
ignoreRows: [], // row numbers to skip
overwriteColumnName: {
"columnName": "{replaceToValue}", // completely change columnName before extractiing objectType, instance, fieldname
},
objectTypes: [
{
setObjectTypeFieldNames: { // if a value in a specific field sets the rows objectType
{fieldName}: {
{fieldValue}: {
serviceTag: "xx",
objectType: "yy"
} // index is the value found in the field, matches to the specified objectType
},// .. can look in multiple fields to find the matching objectType, will use the first one found
},
objType: {
serviceTag: "xxx",
objectType: "xxx",
}
searchPattern: "xxx", // regexp search of the column name, if matches then is the associated objectType
instancePattern: "after productattribute and before colon", // extract from the column name the instance identifier for this object, eg if one row creates multiple product attributes. Optional, if not set use empty string as instance
fieldNamePatterns: [ // optional, if not set will check fieldNameSearchPattern, or if none found/set will be a null column
{
fieldNamePattern: "yyy", // extract from the column name the fieldname for this object
fieldName: "zzzz",
},
// ....
],
// check fieldNamePatterns first, if none match, check fieldNameSearchPattern to extract the fieldname
fieldNameSearchPattern: "after colon", // regExp that pulls out the fieldname, optional
referenceFieldNames: ["xx","yy"], // columnNames that set the string referenceId for each mainObjectType pendingObject, array in case multiple fields might set reference, if multiple are set is not defined which will be used
referenceLinks: {
linkTargetXXX: {
relType: {
serviceTag: "xxx",
relationshipTag: "yyy",
},
direction: "from" // from or to
},
...
},
// referenceLinks or automaticLinks
automaticLinks: [ // automatically create links between objects created on the same record
{
objType: {// which objectType to link to
serviceTag: "xxx",
objectType: "xxx",
},
instance: "tt", // which instance identifier to link to
relType: {
"serviceTag": "xxx",
"relationshipTag": "yyy",
},
direction: "from" // from or to
}
// ..
],
actionField: {
fieldName: "xx",
createValue: "c",
updateValue: "u",
referenceValue: "r"
},
versionDataIds: [
{
versionedDataLabel: "rateTableRates",
fieldName: "xxxx"
},
{
versionedDataLabel: "rateTableRates2",
fieldName: "zzzzz"
}, ...
],
enclose: [
{
openEnclose: "\"",
closeEnclose: "\"",
alwaysEnclose: "always" // "always"|"optional", default always // NOT SURE NEEDED, maybe always check if exists or not
fieldNames: [],
}
],
overwriteValue:{
{fieldName}: {
{value}: {overwriteValue},
eg:{
off:"false", // can be string or Boolean
n:false,
0:false
}
}
}
},
// ...
],
floatingRelationships:[
{
setRelationshipTagFieldNames: { // if a value in a specific field sets the relationshipTag
{fieldName}: {  // fieldValue is the value found in the field, value is the relationshipTag
{fieldValue}: {
relType: {
"serviceTag": "xxx",
"relationshipTag": "yyy",
},
direction: "from" // from or to
},
....
},
// .. can look in multiple fields to find the matching objectType, will use the first one found
},
relationships:{ // relationshipTag: "hasRateTable"
relType: {
serviceTag: "xxx",
relationshipTag: "yyy",
},
direction: "from" // from or to
},
searchPattern: "xxx", // regexp search of the column name, if matches then is the associated relationshipTag
instancePattern: "after hasRateTable and before colon", // extract from the column name the instance identifier for this relationshipTag, eg if one row creates multiple 'has' relationships. Optional, if not set use empty string as instance
relationshipPropertyPatterns: [ // optional, if not set will check relationshipPropertySearchPattern, or if none found/set will be a null column
{
relationshipPropertyPattern: "yyy", // extract from the column name the fieldname for this object
relationshipProperty: "zzzz",
},
// ....
],
// check relationshipPropertyPatterns first, if none match, check relationshipPropertySearchPattern to extract the relationshipProperty name
relationshipPropertySearchPattern: "after colon", // regExp that pulls out the relationshipProperty, optional
objectBReferenceLinks: [
              "reateTableRef01"             
            ],
            objectAReferenceLinks: [
              "delMethodStd"
            ],
            enclose: [
{
openEnclose: "\"",
closeEnclose: "\"",
alwaysEnclose: "always" // "always"|"optional", default always // NOT SURE NEEDED, maybe always check if exists or not
fieldNames: [],
}
],
},
// ...
]
}
}
</syntaxhighlight>
</syntaxhighlight>


* partition key: importBatchId
* partition key: csvImportConfigId
* sort key: objectId
* sort key: {none}
* there is the possibility of a feed sending the same objectType and fields multiple times in a single request, above design will only handle this once, I think this is OK, maybe do a check and add error if duplicates found
 
=== actionColumn ===
 
Is optional, controls whether this pendingObject will be created, updated, or used as a reference for links. If actionColumn is not set then system will perform the action according to the below rules:
* Create: if no identifier fields are not set then will attempt to create the object according to found fields
* Update: if all identifier fields are set and some other fields are set, will attempt to update
* Reference: if all identifier fields are set and no fields are set will use as a reference (check exists)
* case: some identifiers are set: process pendingObject set as error/failed
 
If actionColumn is set will fail if the following:
* Update or Reference and not all identifier fields set
* Create and any identifier fields set
* value does not match any of the create/update/reference values


== PendingObjectReference ==
If actionColumn is set to Reference, any fields found will be ignored.


Creates a link between submitted referenceId and saved object, so can find when other objects reference it.
== UsersCsvImportConfig ==


<syntaxhighlight lang="JavaScript">
<syntaxhighlight lang="JavaScript">
{
{
importBatchId: "xx",
userId: "xx", // user who owns the csvImportConfig
referenceId: "xx", // objectType_{feed supplied referenceId}
csvImportConfigId: "xx",
objectId: "xx",
}
}
</syntaxhighlight>
</syntaxhighlight>


* partition key: importBatchId
* partition key: userId
* sort key: referenceId
* sort key: csvImportConfigId
* when creating maybe throw error if item exists with different objectId
== FloatingRelationships ==
 
== PendingLink ==
 
One item per link between objects.


<syntaxhighlight lang="JavaScript">
<syntaxhighlight lang="JavaScript">
{
{
importBatchId: "xx",
importBatchId: "xx", // random uuid
pendingLinkId: "xx", // {objectId}_{referenceId}
identifierRelationshipsId: "xx", // random uuid
linkTag: "xx",
    referenceProperty:{
status: "xx", // processing|creating|complete
    objectAReferenceLinks: "rateTableRef01",
    objectBReferenceLinks: "delMethodStd01",
    relationshipTag: "hasRateTable",
    relationshipProperty: {...}
    }
}
}
</syntaxhighlight>
</syntaxhighlight>


* partition key: importBatchId
* partition key: importBatchId
* sort key: pendingLinkId
* sort key: identifierRelationshipsId
* currently think of objects that can be independently created, then the link gets made, but perhaps could also use for cases where one object must be created before another can be.
 
= Process =
 
# count DynamoDB records for PendingObjectMain and PendingLinks and save in ImportBatchMain table
# use (new parallel batch processing library) to iterate all PendingLink records
#* check referenceId has a valid record in PendingObjectReference
#* check linkTag is valid (objectId is the child, referenceId object is the parent)
#* if linkType is Dependent
#** save awaitingMultipleSteps for the child object, waiting for the parent object to be created (multiple so can handle multiple parents for this child)
#** save awaitingStep for the PendingLink, waiting for the child object to be created
#* if linkType is Independent, save awaitingMultipleSteps for the PendingLink, waiting for both child and parent to be created
#* save this PendingLink into awaitingMultipleSteps for ImportBatchMain
# if any errors found prior to this step stop processing and mark feed as status error, remove any saved awaitingMultipleSteps
# use (new parallel batch processing library) to iterate all PendingObjectMain records
# save this PendingObjectMain into awaitingMultipleSteps for ImportBatchMain
#* iterate any PendingLink that have this object as the child (use Dynamo startsWith on pendingLinkId field)
#** if linkType is Dependent, stop processing this PendingObjectMain record (already have awaitingMultipleSteps saved)
#* set status of PendingObjectMain to "creating"
#* send to external service to create
 
After external service creates object:
 
# sets status of PendingObjectMain to "complete"
# remove awaitingMultipleSteps for ImportBatchMain
# iterate any Dependent awaitingMultipleSteps
#* find the child that was waiting this parent
#* check if that child has any remaining awaitingMultipleSteps, if no then:
#** find all Dependent pendingLinks for this child object, include in message to external service to create child object
#** set status of child PendingObjectMain to "creating"
#** send to external service to create child object
# iterate any Dependent PendingLink awaitingStep
#* set status of PendingLink to "creating"
#* send to external service to create link
# iterate any Independent awaitingMultipleSteps
#* check any remaining awaitingMultipleSteps for that PendingLink, if no then:
#** set status of PendingLink to "creating"
#** send to external service to create link
 
After external service creates link:
 
# sets status of PendingLink to "complete"
# remove awaitingMultipleSteps for ImportBatchMain
 
When removing awaitingMultipleSteps for ImportBatchMain (both PendingObjectMain and PendingLink)
 
# check if any remain, if not then set ImportBatchMain to complete and send  ImportBatchMainComplete message


= External service requests =
= External service requests =
Line 199: Line 347:
* standard Lambda like ProcessLogical and FindData
* standard Lambda like ProcessLogical and FindData
* ImportData subscribes to standard complete topic: CreateObjectComplete
* ImportData subscribes to standard complete topic: CreateObjectComplete
* Each external service can handle processing and recording the objectId to return in their own way, eg by having another table that links the external service's identifier with the ImportData objectId, querying this table to make CreateObjectComplete message
* Each external service can handle processing and recording the pendingObjectId to return in their own way, eg by having another table that links the external service's identifier with the ImportData pendingObjectId, querying this table to make CreateObjectComplete message
* userId is also sent as need to record user creating objects
* userId is also sent as need to record user creating objects
* objectId is also sent, must be returned in CreateObjectComplete message
* pendingObjectId is also sent, must be returned in CreateObjectComplete message


= Linking objects =
= Linking objects =
Line 229: Line 377:


Considering external service delivers this to ImportData in Initial Setup, as seed data injected directly into Import Data Config Dynamo table.
Considering external service delivers this to ImportData in Initial Setup, as seed data injected directly into Import Data Config Dynamo table.
= Errors during processing =
* Each object/link records it's own errorsFound and status
== Before sending any request to external services ==
* up to ProcessPendingLinks
* ImportBatchMain records all errors found, including 1 error per object/linkwith error/s
* Before starting to send requests to external services, if ImportBatchMain has any errors, stop processing
* Have a limit, when adding errors to ImportBatchMain if total errors found exceeds limit, stop processing
== After start sending requests to external services ==
* after ProcessPendingLinks (ProcessPendingObjects)
* processing has begin so we continue until finished, however any object/link sets any remaining connected object/links to error, removing the awaitingSteps
* any object/link error set ImportBatchMain processingError to true, do not store the object/link's errors into ImportBatchMain
* when ImportBatchMain has no more work to do check processingError, if true add an error that some object/link have error


= Working documents =
= Working documents =

Latest revision as of 03:20, 2 January 2025

Overview

Orchestrates importing of objects/data into project.

Repository

https://bitbucket.org/izara-core-import-export-data/izara-core-import-data-import-data/src/master/

DynamoDB tables

Standard Config Table Per Service

Configuration tags

{
	configKey: "objectType",
	configTag: "xx" // {objectType, eg: sellOffer/Product/VariantProduct etc..}
	configValue: {
		createObjectServiceName: "xx" // {service name service that handles this type}
		parentLinks: {
			{parent objectType}: {
				{linkTag}: {
					"linkType": "yy", // Dependent|Independent
					"separateDependentLinkCreate": true, // for Dependent only, default is false
					"createLinkServiceNames": "yy", // if not exist, does not send to external service, sets link as complete
				}
			}
		},
		childObjectTypes: [] // maybe not needed?
		fieldNames: { // list of possible fieldNames, will ignore fields found in file that are not listed here
			{fieldName}: {
				// no settings yet
			}
		}
	}
},
  • separateDependentLinkCreate: if set to true then a dependent link will send a separate request to external service after parent and child objects created. Default is false where the link is created in the same request as creating the child object.

ImportBatchMain

{
	importBatchId: "xx", // random uuid
	userId: "xx", // target userId
	submittedByUserId: "xx", // submitted by userId
	startTime: currentTime.getTime(),
	importType: "xx", // "csv"|"xml"|...
	importConfigId: "yy" // dependent on importType = "csv"
	importBatchStatus: "xx", // "* NOT YET:processingRawRecords" | "processingObjects" | "error" | "complete" 
	errorsFound: {},
	processingError: false // true|false, if processing object/links has error set this to true
}
  • partition key: importBatchId
  • sort key: {none}

PendingObjectMain

One item per object that needs to be created.

{
	importBatchId: "xx",
	pendingObjectId: "xx", // hash of object with importBatchId, userId, objectType, and fields properties
	objectType: "xx", // eg variant|product|sellOffer|sellOfferPrice|sellOfferPlan|...
	fields: {}, // key is the name of the field
	identifierIds: { // used to identify an existing object, fields is ignored if this is found
		{name of identifier property}: {value}
	},
	action: "xx", // create|update|reference|error
	rowNumber: "xx", // the order in which this record was extracted from source file
	objectMainStatus: "xx", // processing|creating|complete|error
	errorsFound: {},
}
  • partition key: importBatchId
  • sort key: pendingObjectId
  • there is the possibility of a feed sending the same objectType and fields multiple times in a single request, above design will only handle this once, I think this is OK, maybe do a check and add error if duplicates found

PendingObjectReference

Creates a link between submitted referenceId and saved object, so can find when other objects reference it.

{
	importBatchId: "xx",
	referenceId: "xx", // {feed supplied referenceId}
	pendingObjectId: "xx",
}
  • partition key: importBatchId
  • sort key: referenceId
  • when creating maybe throw error if item exists with different pendingObjectId

PendingLink

One item per link between objects.

{
	importBatchId: "xx",
	pendingLinkId: "xx", // {pendingObjectId}_{referenceId}_{relationshipTag}
	linkStatus: "xx", // processing|creating|complete|error
	errorsFound: {}, // previously not added, considering adding so links can store their errors independent of pendingObjects
}
  • partition key: importBatchId
  • sort key: pendingLinkId

CsvImportConfig

{
	csvImportConfigId: "xx", // random uuid
	userId: "xx", // user who controls/created/owns the config
	recordDeliminator: "\n",
	fieldDeliminator: ",",
	escapeString: "\\",
	removeFloatingEscapeString: true, // default: false, removes single escape strings that do not precede expected escapedStrings 
	removeWhiteSpace: false, // default: true, removes any spaces/tabs/enters at start or end of fields
	fieldNames:{
		fixed: { // used to fix which columns are which fieldNames, eg when no title row exists
			columnNumber: {
				objectTypeConfigIndex: 0,
		        fieldname: 'name', // optional, if not set use standard method in objectTypeConfigIndex
				objType: {  // optional, if not set use standard method in objectTypeConfigIndex
					serviceTag: "xxx",
					objectType: "xxx",
				},
		        instance: 'xx' , // optional, if not set use standard method in objectTypeConfigIndex
				openEnclose: "\"", // optional , if not set use titleRowOpenEnclose
				closeEnclose: "\"" // optional , if not set use titleRowCloseEnclose
			}
		}
		// or
		titleRow: # // which row has the fieldNames
		titleRowOpenEnclose: "\"",
		titleRowCloseEnclose: "\"",
		replacefieldNames: {
			{fromFieldName}: "{toFieldName}", // refactor existing fieldNames to our names
		},
	},
	ignoreRows: [], // row numbers to skip
	overwriteColumnName: {
		"columnName": "{replaceToValue}", // completely change columnName before extractiing objectType, instance, fieldname
	},
	objectTypes: [
		{
			setObjectTypeFieldNames: { // if a value in a specific field sets the rows objectType
				{fieldName}: {
					{fieldValue}: {
						serviceTag: "xx",
						objectType: "yy"
					} // index is the value found in the field, matches to the specified objectType
				},// .. can look in multiple fields to find the matching objectType, will use the first one found
			},
			objType: {
				serviceTag: "xxx",
				objectType: "xxx",
			}
			searchPattern: "xxx", // regexp search of the column name, if matches then is the associated objectType
			instancePattern: "after productattribute and before colon", // extract from the column name the instance identifier for this object, eg if one row creates multiple product attributes. Optional, if not set use empty string as instance
			fieldNamePatterns: [ // optional, if not set will check fieldNameSearchPattern, or if none found/set will be a null column
				{
					fieldNamePattern: "yyy", // extract from the column name the fieldname for this object
					fieldName: "zzzz",
				},
				// ....
			],
			// check fieldNamePatterns first, if none match, check fieldNameSearchPattern to extract the fieldname
			fieldNameSearchPattern: "after colon", // regExp that pulls out the fieldname, optional
			referenceFieldNames: ["xx","yy"], // columnNames that set the string referenceId for each mainObjectType pendingObject, array in case multiple fields might set reference, if multiple are set is not defined which will be used
			referenceLinks: {
				linkTargetXXX: {
					relType: {
						serviceTag: "xxx",
						relationshipTag: "yyy",
					},
					direction: "from" // from or to
				},
				...
			},
			// referenceLinks or automaticLinks
			automaticLinks: [ // automatically create links between objects created on the same record
				{
					objType: {// which objectType to link to
						serviceTag: "xxx",
						objectType: "xxx",
					},
					instance: "tt", // which instance identifier to link to
					relType: {
						"serviceTag": "xxx",
						"relationshipTag": "yyy",
					},
					direction: "from" // from or to
				}
				// ..
			],
			actionField: { 
				fieldName: "xx",
				createValue: "c",
				updateValue: "u",
				referenceValue: "r"
			},
			versionDataIds: [
				{
					versionedDataLabel: "rateTableRates",
					fieldName: "xxxx"
				},
				{
					versionedDataLabel: "rateTableRates2",
					fieldName: "zzzzz"
				}, ...
			],
			enclose: [
				{
					openEnclose: "\"",
					closeEnclose: "\"",
					alwaysEnclose: "always" // "always"|"optional", default always // NOT SURE NEEDED, maybe always check if exists or not
					fieldNames: [], 
				}
			],
			overwriteValue:{
				{fieldName}: {
						{value}: {overwriteValue},
						eg:{
							off:"false", // can be string or Boolean
							n:false,
							0:false
						}
				}
			}
		},
		// ...
	],
	
	floatingRelationships:[
		{
			setRelationshipTagFieldNames: { // if a value in a specific field sets the relationshipTag
				{fieldName}: {  // fieldValue is the value found in the field, value is the relationshipTag
					{fieldValue}: {
						relType: {
							"serviceTag": "xxx",
							"relationshipTag": "yyy",
						},
						direction: "from" // from or to
					},
					....
				},
				// .. can look in multiple fields to find the matching objectType, will use the first one found
			},
			relationships:{ // relationshipTag: "hasRateTable"
				relType: {
					serviceTag: "xxx",
					relationshipTag: "yyy",
				},
				direction: "from" // from or to
			},
			searchPattern: "xxx", // regexp search of the column name, if matches then is the associated relationshipTag
			instancePattern: "after hasRateTable and before colon", // extract from the column name the instance identifier for this relationshipTag, eg if one row creates multiple 'has' relationships. Optional, if not set use empty string as instance
			relationshipPropertyPatterns: [ // optional, if not set will check relationshipPropertySearchPattern, or if none found/set will be a null column
				{
					relationshipPropertyPattern: "yyy", // extract from the column name the fieldname for this object
					relationshipProperty: "zzzz",
				},
				// ....
			],
			// check relationshipPropertyPatterns first, if none match, check relationshipPropertySearchPattern to extract the relationshipProperty name
			relationshipPropertySearchPattern: "after colon", // regExp that pulls out the relationshipProperty, optional
			objectBReferenceLinks: [
               "reateTableRef01"               
            ],
            objectAReferenceLinks: [
               "delMethodStd"
            ],
            enclose: [
				{
					openEnclose: "\"",
					closeEnclose: "\"",
					alwaysEnclose: "always" // "always"|"optional", default always // NOT SURE NEEDED, maybe always check if exists or not
					fieldNames: [], 
				}
			],
		},
		// ...
	]
}
  • partition key: csvImportConfigId
  • sort key: {none}

actionColumn

Is optional, controls whether this pendingObject will be created, updated, or used as a reference for links. If actionColumn is not set then system will perform the action according to the below rules:

  • Create: if no identifier fields are not set then will attempt to create the object according to found fields
  • Update: if all identifier fields are set and some other fields are set, will attempt to update
  • Reference: if all identifier fields are set and no fields are set will use as a reference (check exists)
  • case: some identifiers are set: process pendingObject set as error/failed

If actionColumn is set will fail if the following:

  • Update or Reference and not all identifier fields set
  • Create and any identifier fields set
  • value does not match any of the create/update/reference values

If actionColumn is set to Reference, any fields found will be ignored.

UsersCsvImportConfig

{
	userId: "xx", // user who owns the csvImportConfig
	csvImportConfigId: "xx",
}
  • partition key: userId
  • sort key: csvImportConfigId

FloatingRelationships

{
	importBatchId: "xx", // random uuid
	identifierRelationshipsId: "xx", // random uuid
    referenceProperty:{
     objectAReferenceLinks: "rateTableRef01",
     objectBReferenceLinks: "delMethodStd01",
     relationshipTag: "hasRateTable",
     relationshipProperty: {...}
    }
}
  • partition key: importBatchId
  • sort key: identifierRelationshipsId

External service requests

Both createObject and createLink.

  • standard Lambda like ProcessLogical and FindData
  • ImportData subscribes to standard complete topic: CreateObjectComplete
  • Each external service can handle processing and recording the pendingObjectId to return in their own way, eg by having another table that links the external service's identifier with the ImportData pendingObjectId, querying this table to make CreateObjectComplete message
  • userId is also sent as need to record user creating objects
  • pendingObjectId is also sent, must be returned in CreateObjectComplete message

Linking objects

Two types of linking:

  1. Independent: Objects can be created independently of each other in any order
  2. Dependent: One object must be created before the other in a specific order

for Config setting objectType.parentLinks either object can be the parent for Independent links, for Dependent links the object created first is considered the parent.

Two object types may have multiple types of links connecting them so each parentLinks element has a list of linkTags which reference what type of link is being created.

For each parentLinks element there will be a matching entry in the other object's childObjectTypes array.

One object might have multiple other objects dependent on it to be created, or be dependent on many other objects being created.

Object hierarchy and field schema

  • Some fields will be required, some optional
  • some fields possibly have system defaults
  • perhaps user can setup default templates (do later if has value)
  • schema will need to state identifier fields for each object, if set in feed Import Data knows is pointing to existing child/parent object, if empty needs to create new
  • perhaps each objectType states it's child objects, as more likely to be aware of these than parent objects

where to store/set schema

Considering external service delivers this to ImportData in Initial Setup, as seed data injected directly into Import Data Config Dynamo table.

Errors during processing

  • Each object/link records it's own errorsFound and status

Before sending any request to external services

  • up to ProcessPendingLinks
  • ImportBatchMain records all errors found, including 1 error per object/linkwith error/s
  • Before starting to send requests to external services, if ImportBatchMain has any errors, stop processing
  • Have a limit, when adding errors to ImportBatchMain if total errors found exceeds limit, stop processing

After start sending requests to external services

  • after ProcessPendingLinks (ProcessPendingObjects)
  • processing has begin so we continue until finished, however any object/link sets any remaining connected object/links to error, removing the awaitingSteps
  • any object/link error set ImportBatchMain processingError to true, do not store the object/link's errors into ImportBatchMain
  • when ImportBatchMain has no more work to do check processingError, if true add an error that some object/link have error

Working documents

Working_documents - Import Data