Service - Import Data: Difference between revisions

From Izara Wiki
Jump to navigation Jump to search
No edit summary
Line 1: Line 1:
= Overview =
Orchestrates importing of objects/data into project.
= Repository =
https://bitbucket.org/izara-core-import-export-data/izara-core-import-data-import-data/src/master/
= DynamoDB tables =
== [[Standard Config Table Per Service]] ==
=== Configuration tags ===
<syntaxhighlight lang="JavaScript">
{
configKey: "objectType",
configTag: "xx" // {objectType, eg: sellOffer/Product/VariantProduct etc..}
configValue: {
createObjectServiceName: "xx" // {service name service that handles this type}
parentLinks: {
{parent objectType}: {
{linkTag}: {
"linkType": "yy", // Dependent|Independent
"separateDependentLinkCreate": true, // for Dependent only, default is false
"createLinkServiceNames": "yy", // if not exist, does not send to external service, sets link as complete
}
}
},
childObjectTypes: [] // maybe not needed?
}
},
</syntaxhighlight>
* separateDependentLinkCreate: if set to true then a dependent link will send a separate request to external service after parent and child objects created. Default is false where the link is created in the same request as creating the child object.
== ImportBatchMain ==
<syntaxhighlight lang="JavaScript">
{
importBatchId: "xx", // random uuid
userId: "xx", // submitted by userId
startTime: currentTime.getTime(),
batchConfig: {}, // same as request
importBatchStatus: "xx", // "* NOT YET:processingRawRecords" | "processingObjects" | "error" | "complete"
}
</syntaxhighlight>
* partition key: importBatchId
* sort key: {none}
== ImportBatchErrors ==
<syntaxhighlight lang="JavaScript">
{
importBatchId: "xx",
errorId: "xx", // random uuid
error: "xx",
}
</syntaxhighlight>
* partition key: importBatchId
* sort key: {none}
== RawRecord ==
* NOT YET: maybe move into CSV processing, as each format might have own way of handling per line/record formats
Is a raw copy, split out into fields, of one submitted record. One record may have multiple objects in it's fields.
<syntaxhighlight lang="JavaScript">
{
importBatchId: "xx",
rawRecordId: "xx", // random uuid
fields: {}, // key is the name of the field
recordNumber: ##, // eg the line number of the record
rawRecordStatus: "xx",
errorsFound: {},
}
</syntaxhighlight>
* partition key: importBatchId
* sort key: rawRecordId
== RawRecordAwaitingProcess ==
* NOT YET: maybe move into CSV processing, as each format might have own way of handling per line/record formats
Is a list of raw records waiting to be saved into PendingObjectMain so can be handled asynchronously, and trigger next step of process when all complete.
<syntaxhighlight lang="JavaScript">
{
importBatchId: "xx",
rawRecordId: "xx",
}
</syntaxhighlight>
* partition key: importBatchId
* sort key: rawRecordId
== PendingObjectMain ==
One item per object that needs to be created.
<syntaxhighlight lang="JavaScript">
{
importBatchId: "xx",
objectId: "xx", // hash of object with importBatchId, userId, objectType, and fields properties
objectType: "xx", // eg variant|product|sellOffer|sellOfferPrice|sellOfferPlan|...
fields: {}, // key is the name of the field
rawRecordId: "xx",
objectMainStatus: "xx", // processing|creating|complete
errorsFound: {},
}
</syntaxhighlight>
* partition key: importBatchId
* sort key: objectId
* there is the possibility of a feed sending the same objectType and fields multiple times in a single request, above design will only handle this once, I think this is OK, maybe do a check and add error if duplicates found
== PendingObjectReference ==
Creates a link between submitted referenceId and saved object, so can find when other objects reference it.
<syntaxhighlight lang="JavaScript">
{
importBatchId: "xx",
referenceId: "xx", // objectType_{feed supplied referenceId}
objectId: "xx",
}
</syntaxhighlight>
* partition key: importBatchId
* sort key: referenceId
* when creating maybe throw error if item exists with different objectId
== PendingLink ==
One item per link between objects.
<syntaxhighlight lang="JavaScript">
{
importBatchId: "xx",
pendingLinkId: "xx", // {objectId}_{referenceId}
linkTag: "xx",
linkStatus: "xx", // processing|creating|complete
}
</syntaxhighlight>
* partition key: importBatchId
* sort key: pendingLinkId
* currently think of objects that can be independently created, then the link gets made, but perhaps could also use for cases where one object must be created before another can be.
= Process =
= Process =


Line 170: Line 19:
# iterate all PendingObjectMain records
# iterate all PendingObjectMain records
# save this PendingObjectMain into (ImportBatchMain)awaitingMultipleSteps for ImportBatchMain
# save this PendingObjectMain into (ImportBatchMain)awaitingMultipleSteps for ImportBatchMain
#* iterate any PendingLinks that have this object as the child (use Dynamo startsWith on pendingLinkId field)
# iterate any PendingLinks that have this object as the child (use Dynamo startsWith on pendingLinkId field)
#** if linkType is Dependent, stop processing this PendingObjectMain record (already have awaitingMultipleSteps saved)
#* if linkType is Dependent, stop processing this PendingObjectMain record (already have awaitingMultipleSteps saved)
#* if PendingObjectMain exist:
# if PendingObjectMain exist:
#** set status of PendingObjectMain to "complete"
#* set status of PendingObjectMain to "complete"
#** do PendingObjectComplete lib
#* do PendingObjectComplete lib
#* if PendingObjectMain not exist:
# if PendingObjectMain not exist:
#** set status of PendingObjectMain to "creating"
#* set status of PendingObjectMain to "creating"
#** send to external service to create
#* send to external service to create


After external service creates object:
After external service creates object:
Line 218: Line 67:


# check if any remain, if not then set ImportBatchMain to complete and send ImportBatchMainComplete message
# check if any remain, if not then set ImportBatchMain to complete and send ImportBatchMainComplete message
= External service requests =
Both createObject and createLink.
* standard Lambda like ProcessLogical and FindData
* ImportData subscribes to standard complete topic: CreateObjectComplete
* Each external service can handle processing and recording the objectId to return in their own way, eg by having another table that links the external service's identifier with the ImportData objectId, querying this table to make CreateObjectComplete message
* userId is also sent as need to record user creating objects
* objectId is also sent, must be returned in CreateObjectComplete message
= Linking objects =
Two types of linking:
# Independent: Objects can be created independently of each other in any order
# Dependent: One object must be created before the other in a specific order
for Config setting objectType.parentLinks either object can be the parent for Independent links, for Dependent links the object created first is considered the parent.
Two object types may have multiple types of links connecting them so each parentLinks element has a list of linkTags which reference what type of link is being created.
For each parentLinks element there will be a matching entry in the other object's childObjectTypes array.
One object might have multiple other objects dependent on it to be created, or be dependent on many other objects being created.
= Object hierarchy and field schema =
* Some fields will be required, some optional
* some fields possibly have system defaults
* perhaps user can setup default templates (do later if has value)
* schema will need to state identifier fields for each object, if set in feed Import Data knows is pointing to existing child/parent object, if empty needs to create new
* perhaps each objectType states it's child objects, as more likely to be aware of these than parent objects
== where to store/set schema ==
Considering external service delivers this to ImportData in Initial Setup, as seed data injected directly into Import Data Config Dynamo table.
= Working documents =
[[:Category:Working_documents - Import Data|Working_documents - Import Data]]
[[Category:Backend services| Import Data]]

Revision as of 05:02, 24 November 2022

Process

ProcessPendingLinks:

  1. count DynamoDB records for PendingObjectMain and PendingLinks and save in ImportBatchMain table
  2. iterate all PendingLink records:
    • check referenceId has a valid record in PendingObjectReference
    • check linkTag is valid (objectId is the child, referenceId object is the parent)
    • if linkType is Dependent:
      • if separateDependentLinkCreate false and child exist add error
      • save (DependentPendingObject)awaitingMultipleSteps for the child object, waiting for the parent object to be created (multiple so can handle multiple parents for this child)
      • save (DependentPendingLink)awaitingStep for the PendingLink, waiting for the child object to be created
    • if linkType is Independent, save (IndependentPendingLink)awaitingMultipleSteps for the PendingLink, waiting for both child and parent to be created
    • save this PendingLink into (ImportBatchMain)awaitingMultipleSteps for ImportBatchMain
  3. if any errors found prior to this step stop processing and mark feed as status error, remove any saved awaitingMultipleSteps

ProcessPendingObjects:

  1. iterate all PendingObjectMain records
  2. save this PendingObjectMain into (ImportBatchMain)awaitingMultipleSteps for ImportBatchMain
  3. iterate any PendingLinks that have this object as the child (use Dynamo startsWith on pendingLinkId field)
    • if linkType is Dependent, stop processing this PendingObjectMain record (already have awaitingMultipleSteps saved)
  4. if PendingObjectMain exist:
    • set status of PendingObjectMain to "complete"
    • do PendingObjectComplete lib
  5. if PendingObjectMain not exist:
    • set status of PendingObjectMain to "creating"
    • send to external service to create

After external service creates object:

  1. sets status of PendingObjectMain to "complete", add object identifier ids to PendingObjectMain record
      • do PendingObjectComplete lib

PendingObjectComplete lib:

  1. remove (ImportBatchMain)awaitingMultipleSteps for ImportBatchMain
  2. iterate any Dependent (DependentPendingObject)awaitingMultipleSteps
    • find the child object that was waiting this parent
    • check if that child object has any remaining (DependentPendingObject)awaitingMultipleSteps, if no then:
      • find all Dependent pendingLinks for this child object (using Dynamo BeginsWith)
      • if child object exist:
        • set status of child object to "complete"
        • do PendingObjectComplete lib
      • if child object not exist:
        • send message to external service to create child object
        • include in message to external service to create child object all parent object identifier ids
        • set status of child PendingObjectMain to "creating"
  3. iterate any Dependent PendingLink (DependentPendingLink)awaitingStep
    • if pendingLink separateDependentLinkCreate = false(default):
      • sets status of PendingLink to "complete"
      • remove (ImportBatchMain)awaitingMultipleSteps for ImportBatchMain
    • if pendingLink separateDependentLinkCreate = true:
      • set status of PendingLink to "creating"
      • send to external service to create link, including object identifier ids of both objects
  4. iterate any Independent (IndependentPendingLink)awaitingMultipleSteps
    • check any remaining awaitingMultipleSteps for that PendingLink, if no then:
      • set status of PendingLink to "creating"
      • send to external service to create link, including object identifier ids of both objects

After external service creates link:

  1. sets status of PendingLink to "complete"
  2. remove (ImportBatchMain)awaitingMultipleSteps for ImportBatchMain

When removing (ImportBatchMain)awaitingMultipleSteps for ImportBatchMain (both PendingObjectMain and PendingLink)

  1. check if any remain, if not then set ImportBatchMain to complete and send ImportBatchMainComplete message