Service - Sort Result
Overview
Service that handles sorted results that come from the Service - Search Result service. Sorted results are stored/cached so subsequent requests do not need to generate the sorted results again.
Repository
https://bitbucket.org/stb_vit/sortresult_main/src/master/
DynamoDB tables
Standard Config Table Per Service
Configuration tags
- configKey = expiryTimeInSec, configTag = {empty}
- One record per deploy, sets the amount of time added to current timestamp to calculated expiry time.
- Could extend to have different expiry times depending on, eg the searchType, this could be added to the configTag, empty for default
SortResults
- no sort_key
Fields
- sortFieldsHash: Strip out any sortFields elements after a random element is found, as they are unused.
- sortResultId
- (partition key)
- comes from searchDataId + "_" + sortFieldsHash
- searchDataId: {searchType}_{filterId}_{keyPropertiesHash}_{requiredDataHash}
- sortFieldsHash: can have multiple sort fields, so we hash them to create the sortFieldsHash unique key
- sortFields
- array (DynamoDB list) of fields being sorted on, or seed value for a random sort
- status
- whether sort processing complete for all data yet
- new_sortresult_created | waiting_for_searchresults | processing | error | complete
- expiry
- timestamp when this data expires and needs to be regenerated
StringData *probably not using*
- SortField of data type string is placed into this table, sort is performed using string sorting
- level splits results into sets, one for each SortFields element, starting at 1 and counting upwards
- Levels that are not the final level will have one record for each value of the field being searched on
- Final level is a special case that has one record for each SearchResultData, we concatenate the unique id onto the sortValue/sort key to create uniqueness, use space as it comes before other characters when sorting strings, might get the odd incorrect sort if lines up with strings that contain actual spaces at the same location, but probably good enough, might be able to find better options like carraige return/tab
- data is only added if this is the final level of the sortFields array
Fields
- sortDataId
- (partition key)
- comes from {sortResultId}_{level}_{previous level values concatenate with underscore}
- sortValue
- (sort key)
- comes from {value of the field/s being sorted on}, for final level concatenate ~{unique id}
- data
- copy of all required data from Search Result service (final level only)
- lowerLevelCount
- number of records in the next level that match this sortValue
NumericData *probably not using*
- SortField of data type numeric is placed into this table, sort is performed using numeric sorting
- level splits results into sets, one for each SortFields element, starting at 1 and counting upwards
- Levels that are not the "f" level will have one record for each value of the field being searched on
- If the final SortField level is numeric an extra level ("f") is created when more than 1 record match the final level's sortValue, in this case the field's unique id as the sortValue/sort key, this is required to achieve uniqueness
- data is only added if this is the final level of the sortFields array
Fields
- sortDataId
- (partition key)
- comes from {sortResultId}_{level}_{previous level values concatenate with underscore}
- sortValue
- (sort key)
- comes from {value of the field/s being sorted on}, for special "f" level is {unique id}
- data
- copy of all required data from Search Result service (final level only)
- lowerLevelCount
- number of records in the next level that match this sortValue, if present on the final sortField level means a special "f" exists for this sortValue
SortedData
Fields
- sortResultId
- (partition key)
- sortId
- (sort key)
- numeric value for this records location in sort results
- data
- copy of all required data from Search Result service (final level only)
Notes
- The data stored here should include everything the client request might need to render results to the user, so no additional calls are needed (eg all possible media that might be shown as a thumbnail, and how likely that thumbnail would be shown)(eg pricing info min price/max price/most common sold price/etc)
- Client uses sortResultId to pull results, can also request ascending or descending
- This service handles pagination of results
- ComplexFilter stores unique ids only for a filter, Service - Search Result takes results from ComplexFilter and adds all data client might need. Sort Result service copies the data from Search Result and structures it for sorted results
- If we have trouble with incorrect sorts when we concatenate unique key on sortValue for string types, we could bump the data into a final level like we do with numeric final level results
(probably delete - change Data to be sorted simply) Pagination
Because of the multi-level structure of the SortResultData tables pagination becomes harder, we need to recurse through levels to know how many records have been counted.
If we have already sent a page of data to the client and the client clicks next or previous we could have the client send which data record was the last record, and use this as a starting point to find the next page of data.
If the client clicks to an arbritrary page it is more complex, we might need to pull a list of all 1st level results, add lowerLevelCount until we find which 1st level result to begin the page results from
Ideas
- date is an interesting sort field, not really used in browse results but would be used a lot in other datasets, it is a filter so could be placed in the complexfilter, but is also something that might change regularly and is based on ordered results, so might be more efficient applied as a sort field in the SortResults, and could filter it there -> my thinking is just treat as a filter and sort separately, could limit changes in filters using date fields so not too many sets of data get generated, eg by fixing requests to set day timestamps.
- There is some large scale put into DynamoDB commands in this service, might be improved using BatchWriteItem, or improving async code.
- SortResults defines its own expiry date (might not match ComplexFilter/Search Result expiry dates), so Search Result service must query Sort Result to see if have active results, if expired Sort Result removes its data, Search Result then checks its data/expiry. More fancy: Sort Result returns date expired but does not delete data until Search Result finishes its tasks, if Search Result fails could return Sort Result's old data, and/or push back Sort Result's expiry date.
See also
- https://www.reddit.com/r/aws/comments/7ugukg/dynamodb_not_a_great_option_for_sorting/ - Efficient sorting DynamoDB structure