{"id":23122843,"url":"https://github.com/folio-org/data-import-processing-core","last_synced_at":"2026-01-04T15:15:04.712Z","repository":{"id":38100674,"uuid":"216537797","full_name":"folio-org/data-import-processing-core","owner":"folio-org","description":"The library to handle events from data-import","archived":false,"fork":false,"pushed_at":"2024-04-11T09:54:47.000Z","size":2530,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":20,"default_branch":"master","last_synced_at":"2024-04-14T14:05:11.452Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/folio-org.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2019-10-21T10:11:39.000Z","updated_at":"2024-04-15T15:10:47.447Z","dependencies_parsed_at":"2023-12-21T19:08:33.375Z","dependency_job_id":"c0affc3a-0fbe-4404-895e-ec6c65ee6479","html_url":"https://github.com/folio-org/data-import-processing-core","commit_stats":null,"previous_names":[],"tags_count":70,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/folio-org%2Fdata-import-processing-core","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/folio-org%2Fdata-import-processing-core/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/folio-org%2Fdata-import-processing-core/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/folio-org%2Fdata-import-processing-core/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/folio-org","download_url":"https://codeload.github.com/folio-org/data-import-processing-core/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247119051,"owners_count":20886678,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-17T07:30:51.627Z","updated_at":"2026-01-04T15:15:04.707Z","avatar_url":"https://github.com/folio-org.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# data-import-processing-core\n\nCopyright (C) 2019-2025 The Open Library Foundation\n\nThis software is distributed under the terms of the Apache License,\nVersion 2.0. See the file \"[LICENSE](LICENSE)\" for more information.\n\n## Introduction\n\nCore infrastructure for event processing for the DataImport.\n\n## Match Engine\n\n**Match Engine** is an abstract name for a functionality allowing for processing of the [MatchProfile](https://github.com/folio-org/data-import-raml-storage/blob/master/examples/mod-data-import-converter-storage/matchProfile.sample) logic of the data-import JobProfile.\nBased on match results (MATCH or NON_MATCH) JobProfile processing flow can be branched.\nBasically, matching is used for searching for a Record, on which a particular action should be applied. If such Record is found, actions specified \"for matches\" are executed, if not - \"for non-matches\" actions are performed.\n\n![](images/match.png)\n\nMatch Engine consists of a number of components. The actual process of matching is invoked by calling match() method of the **MatchingManager**:\n\n`MatchingManager.match(dataImportEventPayload)`\n\nMatchingManager accepts [DataImportEventPayload](https://github.com/folio-org/data-import-raml-storage/blob/master/examples/mod-data-import/dataImportEventPayload.sample),\nfrom which it extracts [MatchProfile](https://github.com/folio-org/data-import-raml-storage/blob/master/examples/mod-data-import-converter-storage/matchProfile.sample) and all the necessary information to perform matching.\n\nThe idea of matching is essentially a comparison of a particular value from incoming Record to the specified field of an existing one\n\n![](images/incoming-to-existing.png)\n\nTo extract value from incoming Record a particular implementation of **MatchValueReader** is applied.\ndata-import-processing-core library currently contains implementation for reading values from MARC Bibliographic records:\n\n![](images/incoming-marc-value.png)\n\nThere is also an implementation of a Reader for STATIC_VALUES (please note that match on static fields can only be used as a sub-match in a JobProfile):\n\n![](images/static-values.png)\n\nMatchValueReader also applies [Qualifier](https://github.com/folio-org/data-import-raml-storage/blob/master/schemas/mod-data-import-converter-storage/match-profile-detail/qualifierType.json) and [Comparison part](https://github.com/folio-org/data-import-raml-storage/blob/master/schemas/mod-data-import-converter-storage/match-profile-detail/comparisonPartType.json) to the value (based on the [MatchExpression](https://github.com/folio-org/data-import-raml-storage/blob/master/schemas/mod-data-import-converter-storage/match-profile-detail/matchExpression.json) for incoming Record specified in the MatchProfile)\n\n![](images/qualifier.png)\n\n![](images/comparison-part.png)\n\nThe extracted value from incoming Record is then matched against a particular field of an existing one:\n\n![](images/existing-record-field.png)\n\nTo find (load) that existing Record - **MatchValueLoader** is used.\n\nThe implementation of MatchValueLoaders lies on the module, in which data-import processing is happening.\n\nIn order to allow MatchingManager to build MatchValueReader and MatchValueLoader (based on incoming and existing record types) one should register the appropriate implementations in MatchValueReaderFactory and MatchValueLoaderFactory respectively:\n\n`MatchValueReaderFactory.register(new MarcValueReaderImpl());`\n\n`MatchValueLoaderFactory.register(new InstanceLoader(storage, vertx));`\n\nMatchingManager calls **Matcher** to perform the match itself.\nMatcher uses **LoadQueryBuilder** to build a **LoadQuery** based on the value (extracted from incoming Record by MatchValueReader) and [MatchDetails](https://github.com/folio-org/data-import-raml-storage/blob/master/schemas/mod-data-import-converter-storage/match-profile-detail/matchDetail.json) specified in the profile.\n\nLoadQueryBuilder supports building queries for String, List (multiple String values from incoming Record, like 035 field values from MARC Bibliographic records) and Date value types.\n\nResulting LoadQuery contains a CQL and SQL queries, that can be used by a particular MatchValueLoader implementation to find an entity.\nSQL queries are suited for modules that have direct access to the db (mod-*-storage ones), while CQL queries are designed to be used in the business logic modules (like mod-inventory, since mod-inventory does not have the access to the db, it can only retrieve the entity from mod-inventory-storage via API, narrowing down the search with CQL).\n\nThe actual query is built based on the [MatchExpression](https://github.com/folio-org/data-import-raml-storage/blob/master/schemas/mod-data-import-converter-storage/match-profile-detail/matchExpression.json) for existing Record, which is extracted from the MatchDetails of the MatchProfile.\nLoadQueryBuilder uses QueryHolder implementation for constructing basic query based on [MatchCriterion](https://github.com/folio-org/data-import-raml-storage/blob/master/schemas/mod-data-import-converter-storage/match-profile-detail/criterionType.json)\n\n![](images/match-criterion.png)\n\nIt then applies [Qualifier](https://github.com/folio-org/data-import-raml-storage/blob/master/schemas/mod-data-import-converter-storage/match-profile-detail/qualifierType.json) and [Comparison part](https://github.com/folio-org/data-import-raml-storage/blob/master/schemas/mod-data-import-converter-storage/match-profile-detail/comparisonPartType.json) (see examples for incoming Record) to the query.\n\nMatchValueLoader searches for an entity based on the constructed LoadQuery.\nIf entity with specified conditions is found, MATCH is considered successful, matched entity is saved to [DataImportEventPayload](https://github.com/folio-org/data-import-raml-storage/blob/master/examples/mod-data-import/dataImportEventPayload.sample) context\nand actions for MATCH branch of the JobProfile are applied to that entity. If not - actions for NON_MATCH branch of the JobProfile are executed.\nMultiple matches are not supported, in case multiple Records satisfy query conditions, an error is emitted and no action is performed.\n\n## Mapping Engine\n\n**Mapping Engine** is an abstract name for the functionality allowing for processing of the MappingProfile logic of the data-import . MappingEngine process incoming file (MarcBib, Marc Authority, etc) and maps them in FOLIO record type(Instance, Holdings, Item, etc)\nBasically, mapping is used for updating, creating or modifying a record. UI provides functionality for defining [mapping rules](https://github.com/folio-org/data-import-raml-storage/blob/master/schemas/mod-data-import-converter-storage/mapping-profile-detail/mappingRule.json) for fields of selected record type.\n\n![](images/mapping.png)\n\nMapping Engine consists of a number of components. The actual process of mapping is invoked by calling map() method of the MappingManager:\n\n`MappingManager.map(dataImportEventPayload)`\n\nMappingManager accepts [DataImportEventPayload](https://github.com/folio-org/data-import-raml-storage/blob/master/examples/mod-data-import/dataImportEventPayload.sample), from which it extracts MappingProfile and all the necessary information to perform mapping.\n\n\n### Reader\nTo read value from incoming record by mapping rule is used **Reader**. The purpose of Reader is to read Value by rule from underlying entity.\nReader has to be initialized before read. Interface Reader has 2 methods:\n\n`void initialize(DataImportEventPayload eventPayload, MappingContext mappingContext) throws IOException;`\n\n`Value read(MappingRule ruleExpression);`\n\ndata-import-processing-core contains default realizations of reader for common incoming MARC and Edifact file: **MarcRecordReader**, **EdifactRecordReader**.\n\nTo define your own reader you need to implement the interface Reader and realize methods **initialize()** and **read()**.\n\nIn order to allow MappingManager to build Reader (based on incoming record types) one should register the appropriate implementations in ReaderFactory respectively:\n\n`MappingManager.registerReaderFactory(new MarcBibReaderFactory());`\n\ndata-import-processing-core contains default realizations of reader factory for common incoming record types:\n*  Marc bib =\u003e **MarcBibReaderFactory**\n*  Marc authority =\u003e **MarcAuthorityReaderFactory**\n*  Marc holdings =\u003e **MarcHoldingsReaderFactory**\n*  Edifact record =\u003e **EdifactReaderFactory**\n\n### Writer\nTo write the value to FOLIO record is used **Writer**. The purpose of Writer is to write a given Value to an underlying entity by the given fieldPath\nWriter has to be initialized before writing. Interface Writer has 3 methods:\n\n`void initialize(DataImportEventPayload eventPayload) throws IOException;`\n\n`void write(String fieldPath, Value value);`\n\n`DataImportEventPayload getResult(DataImportEventPayload eventPayload) throws JsonProcessingException;`\n\nMethod **write(String fieldPath, Value value)** accepts **fieldPath** which defines the place where the **value** should be located.\n\nResult of writing could be received by calling **getResult(DataImportEventPayload eventPayload) throws JsonProcessingException;**, which defines result in **eventPayload**.\n\ndata-import-processing-core contains default realizations of writer for json: **JsonBasedWriter**.\n\nIn order to allow MappingManager to build Writer (based on existing record types) one should register the appropriate implementations in WriterFactory respectively:\n\n`MappingManager.registerWriterFactory(new ItemWriterFactory());`\n\n### Mapping flow\nMappingManager calls Mapper to perform the mapping itself. Steps:\n* Mapper goes through every [mapping rule](https://github.com/folio-org/data-import-raml-storage/blob/master/schemas/mod-data-import-converter-storage/mapping-profile-detail/mappingRule.json).\n* Retrieve value via **Reader** by the rule\n* Write values using **Writer** by fields path in [DataImportEventPayload](https://github.com/folio-org/data-import-raml-storage/blob/master/examples/mod-data-import/dataImportEventPayload.sample)\n\n## Additional information\n\n* See project [MODDICORE](https://issues.folio.org/browse/MODDICORE)\nat the [FOLIO issue tracker](https://dev.folio.org/guidelines/issue-tracker).\n\n* Other FOLIO Developer documentation is at [dev.folio.org](https://dev.folio.org/)\n\n## Extended Authority Mapping\nThere is an extended Authority Mapping introduced to support advanced references classification in 5xx fields:\n* broader terms (`$wg` tag)\n* narrower terms (`$wh` tag)\n* earlier headings (`$wa` tag)\n* later headings (`$wb` tag)\n* saft*Trunc for every saft* field with \"i\" and numeric subfields excluded\n* \"subFieldDelimiter\" to replace space by \"--\" before subfields $x,$y,$z,$v\n\nTo support this functionality `AuthorityExtended` is used together with `MarkToAuthorityExtendedMapper`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffolio-org%2Fdata-import-processing-core","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffolio-org%2Fdata-import-processing-core","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffolio-org%2Fdata-import-processing-core/lists"}