Service - Import Data: Difference between revisions

From Izara Wiki
Jump to navigation Jump to search
No edit summary
Line 19: Line 19:
configValue: {
configValue: {
createObjectServiceName: "xx" // {service name service that handles this type}
createObjectServiceName: "xx" // {service name service that handles this type}
createLinkServiceNames: {
parentLinks: {
"xx": "yy", // index is name of link to objectType, value is serviceName
{parent objectType}: {
}
{linkTag}: {
"xx": "yy", // index is name of link to objectType, value is serviceName
}
}
},
childObjectTypes: [] // maybe not needed?
}
}
},
},
Line 99: Line 104:
fields: {}, // key is the name of the field
fields: {}, // key is the name of the field
rawRecordId: "xx",
rawRecordId: "xx",
status: "xx",
status: "xx", // processing|creating|complete
errorsFound: {},
errorsFound: {},
}
}
Line 116: Line 121:
importBatchId: "xx",
importBatchId: "xx",
referenceId: "xx", // objectType_{feed supplied referenceId}
referenceId: "xx", // objectType_{feed supplied referenceId}
objectId: "xx", // eg variant|product|sellOffer|sellOfferPrice|sellOfferPlan|...
objectId: "xx",
}
}
</syntaxhighlight>
</syntaxhighlight>
Line 123: Line 128:
* sort key: referenceId
* sort key: referenceId
* when creating maybe throw error if item exists with different objectId
* when creating maybe throw error if item exists with different objectId
== PendingObjectAwaitingProcess ==
List of PendingObjectMains waiting to be either sent out to external service create function, and trigger next step of process when all complete.
<syntaxhighlight lang="JavaScript">
{
importBatchId: "xx",
pendingObjectId: "xx", // {objectType}_{objectId}
}
</syntaxhighlight>
* partition key: importBatchId
* sort key: objectId


== PendingLink ==
== PendingLink ==
Line 146: Line 137:
importBatchId: "xx",
importBatchId: "xx",
pendingLinkId: "xx", // {objectId}_{referenceId}
pendingLinkId: "xx", // {objectId}_{referenceId}
linkTag: "xx",
status: "xx", // processing|creating|complete
}
}
</syntaxhighlight>
</syntaxhighlight>
Line 153: Line 146:
* currently think of objects that can be independently created, then the link gets made, but perhaps could also use for cases where one object must be created before another can be.
* currently think of objects that can be independently created, then the link gets made, but perhaps could also use for cases where one object must be created before another can be.


== PendingLinkAwaitingProcess ==
= Process =


* Maybe this table not needed, maybe we can paginate processing of PendingLinks, but that would be sync processing, if we want async processing maybe need this table
* NOT YET: go through RawRecordAwaitingProcess and create items in PendingLink for each link between objects, PendingObjectMain for each item found, and PendingObjectReference for any references found (also PendingLinkAwaitingProcess and PendingObjectAwaitingProcess)


List of PendingLinks waiting to validate reference and store awaitingSteps.
# use (new parallel batch processing library) to iterate all PendingLink records
#* check referenceId has a valid record in PendingObjectReference
#* check linkTag is valid (objectId is the child, referenceId object is the parent)
#* if linkType is Dependent
#** save awaitingMultipleSteps for the child object, waiting for the parent object to be created (multiple so can handle multiple parents for this child)
#** save awaitingStep for the PendingLink, waiting for the child object to be created
#* if linkType is Independent, save awaitingMultipleSteps for the PendingLink, waiting for both child and parent to be created
#* save this PendingLink into awaitingMultipleSteps for ImportBatchMain
# if any errors found prior to this step stop processing and mark feed as status error, remove any saved awaitingMultipleSteps
# use (new parallel batch processing library) to iterate all PendingObjectMain records
# save this PendingObjectMain into awaitingMultipleSteps for ImportBatchMain
#* iterate any PendingLink that have this object as the child (use Dynamo startsWith on pendingLinkId field)
#** if linkType is Dependent, stop processing this PendingObjectMain record (already have awaitingMultipleSteps saved)
#* set status of PendingObjectMain to "creating"
#* send to external service to create


<syntaxhighlight lang="JavaScript">
After external service creates object:
{
 
importBatchId: "xx",
# sets status of PendingObjectMain to "complete"
pendingLinkId: "xx", // {objectId}_{referenceId}
# remove awaitingMultipleSteps for ImportBatchMain
}
# iterate any Dependent awaitingMultipleSteps
</syntaxhighlight>
#* find the child that was waiting this parent
#* check if that child has any remaining awaitingMultipleSteps, if no then:
#** set status of child PendingObjectMain to "creating"
#** send to external service to create
# iterate any Dependent PendingLink awaitingStep
#* set status of PendingLink to "creating"
#* send to external service to create link
# iterate any Independent awaitingMultipleSteps
#* check any remaining awaitingMultipleSteps for that PendingLink, if no then:
#** set status of PendingLink to "creating"
#** send to external service to create link
 
After external service creates link:


* partition key: importBatchId
# sets status of PendingLink to "complete"
* sort key: pendingLinkId
# remove awaitingMultipleSteps for ImportBatchMain
* maybe make sortKey objectId only, but then could not have multiple parents
* maybe save in this record needed fields, eg objectType/reference


= Process =
When removing awaitingMultipleSteps for ImportBatchMain (both PendingObjectMain and PendingLink)


# * NOT YET: go through RawRecordAwaitingProcess and create items in PendingLink for each link between objects, PendingObjectMain for each item found, and PendingObjectReference for any references found (also PendingLinkAwaitingProcess and PendingObjectAwaitingProcess)
# check if any remain, if not then set ImportBatchMain to complete and send ImportBatchMainComplete message
# iterate PendingLinkAwaitingProcess to check reference valid and create awaitingMultipleSteps for the object and the referenced object, save into awaitingMultipleSteps for ImportBatchMain
# if any errors found prior to this step stop processing and mark feed as status error
# send PendingObjectAwaitingProcess to external services. Save into PendingObjectProcessing (maybe not needed)
# as objects are created will trigger lambda that sets status of PendingObject, removes awaitingMultipleSteps for ImportBatchMain and checks if any PendingLink awaitingMultipleSteps exist for that object and all steps complete, if yes process the links. Check if awaitingMultipleSteps for ImportBatchMain complete
# if PendingLink awaitingMultipleSteps exist and complete, send to external service to create the link
# Lambda subscribes to create link flow, removes PendingLinkProcessing (maybe not needed), check if awaitingMultipleSteps for ImportBatchMain complete.


= External service requests =
= External service requests =
Line 195: Line 206:
Two types of linking:
Two types of linking:


# Independent: objects can be created independently of each other in any order
# Independent: Objects can be created independently of each other in any order
# Dependent: One object must be created before the other in a specific order
# Dependent: One object must be created before the other in a specific order


Line 203: Line 214:


For each parentLinks element there will be a matching entry in the other object's childObjectTypes array.
For each parentLinks element there will be a matching entry in the other object's childObjectTypes array.
One object might have multiple other objects dependent on it to be created, or be dependent on many other objects being created.


= Object hierarchy and field schema =
= Object hierarchy and field schema =

Revision as of 22:53, 1 November 2022

Overview

Orchestrates importing of objects/data into project.

Repository

https://bitbucket.org/izara-core-import-export-data/izara-core-import-data-import-data/src/master/

DynamoDB tables

Standard Config Table Per Service

Configuration tags

{
	configKey: "objectType",
	configTag: "xx" // {objectType, eg: sellOffer/Product/VariantProduct etc..}
	configValue: {
		createObjectServiceName: "xx" // {service name service that handles this type}
		parentLinks: {
			{parent objectType}: {
				{linkTag}: {
					"xx": "yy", // index is name of link to objectType, value is serviceName
				}
			}
		},
		childObjectTypes: [] // maybe not needed?
	}
},

ImportBatchMain

{
	importBatchId: "xx", // random uuid
	userId: "xx", // submitted by userId
	startTime: currentTime.getTime(),
	batchConfig: {}, // same as request
	status: "xx", // "* NOT YET:processingRawRecords" | "processingObjects" | "error" | "complete" 
}
  • partition key: importBatchId
  • sort key: {none}

ImportBatchErrors

{
	importBatchId: "xx",
	errorId: "xx", // random uuid
	error: "xx", 
}
  • partition key: importBatchId
  • sort key: {none}

RawRecord

  • NOT YET: maybe move into CSV processing, as each format might have own way of handling per line/record formats

Is a raw copy, split out into fields, of one submitted record. One record may have multiple objects in it's fields.

{
	importBatchId: "xx",
	rawRecordId: "xx", // random uuid
	fields: {}, // key is the name of the field
	recordNumber: ##, // eg the line number of the record
	status: "xx",
	errorsFound: {},
}
  • partition key: importBatchId
  • sort key: rawRecordId

RawRecordAwaitingProcess

  • NOT YET: maybe move into CSV processing, as each format might have own way of handling per line/record formats

Is a list of raw records waiting to be saved into PendingObjectMain so can be handled asynchronously, and trigger next step of process when all complete.

{
	importBatchId: "xx",
	rawRecordId: "xx",
}
  • partition key: importBatchId
  • sort key: rawRecordId

PendingObjectMain

One item per object that needs to be created.

{
	importBatchId: "xx",
	objectId: "xx", // hash of object with importBatchId, userId, objectType, and fields properties
	objectType: "xx", // eg variant|product|sellOffer|sellOfferPrice|sellOfferPlan|...
	fields: {}, // key is the name of the field
	rawRecordId: "xx",
	status: "xx", // processing|creating|complete
	errorsFound: {},
}
  • partition key: importBatchId
  • sort key: objectId
  • there is the possibility of a feed sending the same objectType and fields multiple times in a single request, above design will only handle this once, I think this is OK, maybe do a check and add error if duplicates found

PendingObjectReference

Creates a link between submitted referenceId and saved object, so can find when other objects reference it.

{
	importBatchId: "xx",
	referenceId: "xx", // objectType_{feed supplied referenceId}
	objectId: "xx",
}
  • partition key: importBatchId
  • sort key: referenceId
  • when creating maybe throw error if item exists with different objectId

PendingLink

One item per link between objects.

{
	importBatchId: "xx",
	pendingLinkId: "xx", // {objectId}_{referenceId}
	linkTag: "xx",
	status: "xx", // processing|creating|complete
}
  • partition key: importBatchId
  • sort key: pendingLinkId
  • currently think of objects that can be independently created, then the link gets made, but perhaps could also use for cases where one object must be created before another can be.

Process

  • NOT YET: go through RawRecordAwaitingProcess and create items in PendingLink for each link between objects, PendingObjectMain for each item found, and PendingObjectReference for any references found (also PendingLinkAwaitingProcess and PendingObjectAwaitingProcess)
  1. use (new parallel batch processing library) to iterate all PendingLink records
    • check referenceId has a valid record in PendingObjectReference
    • check linkTag is valid (objectId is the child, referenceId object is the parent)
    • if linkType is Dependent
      • save awaitingMultipleSteps for the child object, waiting for the parent object to be created (multiple so can handle multiple parents for this child)
      • save awaitingStep for the PendingLink, waiting for the child object to be created
    • if linkType is Independent, save awaitingMultipleSteps for the PendingLink, waiting for both child and parent to be created
    • save this PendingLink into awaitingMultipleSteps for ImportBatchMain
  2. if any errors found prior to this step stop processing and mark feed as status error, remove any saved awaitingMultipleSteps
  3. use (new parallel batch processing library) to iterate all PendingObjectMain records
  4. save this PendingObjectMain into awaitingMultipleSteps for ImportBatchMain
    • iterate any PendingLink that have this object as the child (use Dynamo startsWith on pendingLinkId field)
      • if linkType is Dependent, stop processing this PendingObjectMain record (already have awaitingMultipleSteps saved)
    • set status of PendingObjectMain to "creating"
    • send to external service to create

After external service creates object:

  1. sets status of PendingObjectMain to "complete"
  2. remove awaitingMultipleSteps for ImportBatchMain
  3. iterate any Dependent awaitingMultipleSteps
    • find the child that was waiting this parent
    • check if that child has any remaining awaitingMultipleSteps, if no then:
      • set status of child PendingObjectMain to "creating"
      • send to external service to create
  4. iterate any Dependent PendingLink awaitingStep
    • set status of PendingLink to "creating"
    • send to external service to create link
  5. iterate any Independent awaitingMultipleSteps
    • check any remaining awaitingMultipleSteps for that PendingLink, if no then:
      • set status of PendingLink to "creating"
      • send to external service to create link

After external service creates link:

  1. sets status of PendingLink to "complete"
  2. remove awaitingMultipleSteps for ImportBatchMain

When removing awaitingMultipleSteps for ImportBatchMain (both PendingObjectMain and PendingLink)

  1. check if any remain, if not then set ImportBatchMain to complete and send ImportBatchMainComplete message

External service requests

Both createObject and createLink.

  • standard Lambda like ProcessLogical and FindData
  • ImportData subscribes to standard complete topic: CreateObjectComplete
  • Each external service can handle processing and recording the objectId to return in their own way, eg by having another table that links the external service's identifier with the ImportData objectId, querying this table to make CreateObjectComplete message
  • userId is also sent as need to record user creating objects
  • objectId is also sent, must be returned in CreateObjectComplete message

Linking objects

Two types of linking:

  1. Independent: Objects can be created independently of each other in any order
  2. Dependent: One object must be created before the other in a specific order

for Config setting objectType.parentLinks either object can be the parent for Independent links, for Dependent links the object created first is considered the parent.

Two object types may have multiple types of links connecting them so each parentLinks element has a list of linkTags which reference what type of link is being created.

For each parentLinks element there will be a matching entry in the other object's childObjectTypes array.

One object might have multiple other objects dependent on it to be created, or be dependent on many other objects being created.

Object hierarchy and field schema

  • Some fields will be required, some optional
  • some fields possibly have system defaults
  • perhaps user can setup default templates (do later if has value)
  • schema will need to state identifier fields for each object, if set in feed Import Data knows is pointing to existing child/parent object, if empty needs to create new
  • perhaps each objectType states it's child objects, as more likely to be aware of these than parent objects

where to store/set schema

Considering external service delivers this to ImportData in Initial Setup, as seed data injected directly into Import Data Config Dynamo table.

Working documents

Working_documents - Import Data