Revision as of 02:57, 14 March 2023

Overview

Orchestrates importing of objects/data into project.

Repository

https://bitbucket.org/izara-core-import-export-data/izara-core-import-data-import-data/src/master/

DynamoDB tables

Standard Config Table Per Service

Configuration tags

{
	configKey: "objectType",
	configTag: "xx" // {objectType, eg: sellOffer/Product/VariantProduct etc..}
	configValue: {
		createObjectServiceName: "xx" // {service name service that handles this type}
		parentLinks: {
			{parent objectType}: {
				{linkTag}: {
					"linkType": "yy", // Dependent|Independent
					"separateDependentLinkCreate": true, // for Dependent only, default is false
					"createLinkServiceNames": "yy", // if not exist, does not send to external service, sets link as complete
				}
			}
		},
		childObjectTypes: [] // maybe not needed?
		fieldnames: { // list of possible fieldnames, will ignore fields found in file that are not listed here
			{fieldname}: {
				// no settings yet
			}
		}
	}
},

separateDependentLinkCreate: if set to true then a dependent link will send a separate request to external service after parent and child objects created. Default is false where the link is created in the same request as creating the child object.

ImportBatchMain

{
	importBatchId: "xx", // random uuid
	userId: "xx", // target userId
	submittedByUserId: "xx", // submitted by userId
	startTime: currentTime.getTime(),
	importType: "xx", // "csv"|"xml"|...
	importConfigId: "yy" // dependent on importType = "csv"
	importBatchStatus: "xx", // "* NOT YET:processingRawRecords" | "processingObjects" | "error" | "complete" 
}

partition key: importBatchId
sort key: {none}

PendingObjectMain

One item per object that needs to be created.

{
	importBatchId: "xx",
	pendingObjectId: "xx", // hash of object with importBatchId, userId, objectType, and fields properties
	objectType: "xx", // eg variant|product|sellOffer|sellOfferPrice|sellOfferPlan|...
	fields: {}, // key is the name of the field
	identifierIds: { // used to identify an existing object, fields is ignored if this is found
		{name of identifier property}: {value}
	}
	recordNumber: "xx", // the order in which this record was extracted from source file
	objectMainStatus: "xx", // processing|creating|complete
	errorsFound: {},
}

partition key: importBatchId
sort key: pendingObjectId
there is the possibility of a feed sending the same objectType and fields multiple times in a single request, above design will only handle this once, I think this is OK, maybe do a check and add error if duplicates found

PendingObjectReference

Creates a link between submitted referenceId and saved object, so can find when other objects reference it.

{
	importBatchId: "xx",
	referenceId: "xx", // objectType_{feed supplied referenceId}
	pendingObjectId: "xx",
}

partition key: importBatchId
sort key: referenceId
when creating maybe throw error if item exists with different pendingObjectId

PendingLink

One item per link between objects.

{
	importBatchId: "xx",
	pendingLinkId: "xx", // {pendingObjectId}_{referenceId}
	linkTag: "xx",
	linkStatus: "xx", // processing|creating|complete
}

partition key: importBatchId
sort key: pendingLinkId
currently think of objects that can be independently created, then the link gets made, but perhaps could also use for cases where one object must be created before another can be.

CsvImportConfig

{
	csvImportConfigId: "xx", // random uuid
	userId: "xx", // user who controls/created/owns the config
	recordDeliminator: "\n",
	fieldDeliminator: ",",
	escapeString: "\\",
	enclose: [
		{
			openEnclose: "\"",
			closeEnclose: "\"",
			alwaysEnclose: "always" // "always"|"optional", default always
			fieldNames: [], // or fieldIndexes: [],
		}
	],
	fieldnames:{
		fixed: { // used to fix which columns are which fieldnames, eg when no title row exists
			columnNumber: "{fieldname}",
		}
		// or
		titleRow: # // which row has the fieldnames
		replaceFieldnames: {
			{fromFieldname}: "{toFieldname}", // refactor existing fieldnames to our names
		},
	},
	ignoreRows: [], // row numbers to skip
	overwriteFields: {
		"fieldname": "{replaceToValue}", // eg: if feed is missing a field we can hardcode it here
	},
	objectTypes: {
		"fixed": "{objectType}", // fixes all rows are one object type only
		"fromFieldname": { // if a value in a specific field sets the rows objectType
			{fieldname}: {
				{fieldValue}: {objectType} // index is the value found in the field, matches to the specified objectType
			}
		},
		columnNames:[
			{
				searchPattern: "xxx", // regexp search of the column name, if matches then is the associated objectType
				identifierPattern: "yyy", // not sure how but need to extract from the column name the identifier for this object, eg if one row creates multiple product attributes
				fieldnamePatterns: [
					{
						fieldnamePattern: "yyy", // not sure how but need to extract from the column name the fieldname for this object
						fieldname: "zzzz",
					},
					// ....
				],
				objectType: "rrr" 
			}
		],
	},
	linkFields: {
		"fromFieldname": { // if a value in specific fields sets the reference for any links
			{fieldname}: {
				{fieldValue}: {reference} // index is the value found in the field, matches to the specified objectType
			}
		},
	},
}

partition key: csvImportConfigId
sort key: {none}

UsersCsvImportConfig

{
	userId: "xx", // user who owns the csvImportConfig
	csvImportConfigId: "xx",
}

partition key: userId
sort key: csvImportConfigId

External service requests

Both createObject and createLink.

standard Lambda like ProcessLogical and FindData
ImportData subscribes to standard complete topic: CreateObjectComplete
Each external service can handle processing and recording the pendingObjectId to return in their own way, eg by having another table that links the external service's identifier with the ImportData pendingObjectId, querying this table to make CreateObjectComplete message
userId is also sent as need to record user creating objects
pendingObjectId is also sent, must be returned in CreateObjectComplete message

Linking objects

Two types of linking:

Independent: Objects can be created independently of each other in any order
Dependent: One object must be created before the other in a specific order

for Config setting objectType.parentLinks either object can be the parent for Independent links, for Dependent links the object created first is considered the parent.

Two object types may have multiple types of links connecting them so each parentLinks element has a list of linkTags which reference what type of link is being created.

For each parentLinks element there will be a matching entry in the other object's childObjectTypes array.

One object might have multiple other objects dependent on it to be created, or be dependent on many other objects being created.

Object hierarchy and field schema

Some fields will be required, some optional
some fields possibly have system defaults
perhaps user can setup default templates (do later if has value)
schema will need to state identifier fields for each object, if set in feed Import Data knows is pointing to existing child/parent object, if empty needs to create new
perhaps each objectType states it's child objects, as more likely to be aware of these than parent objects

where to store/set schema

Considering external service delivers this to ImportData in Initial Setup, as seed data injected directly into Import Data Config Dynamo table.

Working documents

Working_documents - Import Data

@@ Line 190: / Line 190: @@
 * partition key: userId
 * sort key: csvImportConfigId
-== (unused) ImportBatchErrors ==
-<syntaxhighlight lang="JavaScript">
-{
-	importBatchId: "xx",
-	errorId: "xx", // random uuid
-	error: "xx",
-}
-</syntaxhighlight>
-* partition key: importBatchId
-* sort key: {none}
-== (unused) RawRecord ==
-* NOT YET: maybe move into CSV processing, as each format might have own way of handling per line/record formats
-Is a raw copy, split out into fields, of one submitted record. One record may have multiple objects in it's fields.
-<syntaxhighlight lang="JavaScript">
-{
-	importBatchId: "xx",
-	rawRecordId: "xx", // random uuid
-	fields: {}, // key is the name of the field
-	recordNumber: ##, // eg the line number of the record
-	rawRecordStatus: "xx",
-	errorsFound: {},
-}
-</syntaxhighlight>
-* partition key: importBatchId
-* sort key: rawRecordId
-== (unused) RawRecordAwaitingProcess ==
-* NOT YET: maybe move into CSV processing, as each format might have own way of handling per line/record formats
-Is a list of raw records waiting to be saved into PendingObjectMain so can be handled asynchronously, and trigger next step of process when all complete.
-<syntaxhighlight lang="JavaScript">
-{
-	importBatchId: "xx",
-	rawRecordId: "xx",
-}
-</syntaxhighlight>
-* partition key: importBatchId
-* sort key: rawRecordId
 = External service requests =

Service - Import Data: Difference between revisions

Revision as of 02:57, 14 March 2023

Contents

Overview

Repository

DynamoDB tables

Standard Config Table Per Service

Configuration tags

ImportBatchMain

PendingObjectMain

PendingObjectReference

PendingLink

CsvImportConfig

UsersCsvImportConfig

External service requests

Linking objects

Object hierarchy and field schema

where to store/set schema

Working documents

Navigation menu

Service - Import Data: Difference between revisions

Revision as of 02:57, 14 March 2023

Overview

Repository

DynamoDB tables

Standard Config Table Per Service

Configuration tags

ImportBatchMain

PendingObjectMain

PendingObjectReference

PendingLink

CsvImportConfig

UsersCsvImportConfig

External service requests

Linking objects

Object hierarchy and field schema

where to store/set schema

Working documents

Navigation menu

Search