{"id":41154695,"url":"https://github.com/dataesr/affiliation-matcher","last_synced_at":"2026-01-22T19:12:51.090Z","repository":{"id":37497654,"uuid":"277475197","full_name":"dataesr/affiliation-matcher","owner":"dataesr","description":"Matcher for affiliations - link raw affiliation to ROR ids, country and RNSR","archived":false,"fork":false,"pushed_at":"2025-01-07T13:54:34.000Z","size":5763,"stargazers_count":25,"open_issues_count":8,"forks_count":1,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-09-04T23:50:49.687Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dataesr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-07-06T07:40:59.000Z","updated_at":"2025-04-21T11:38:11.000Z","dependencies_parsed_at":"2024-01-12T14:29:28.465Z","dependency_job_id":"13ccf87a-40a8-4c26-b572-46d1595ac8d2","html_url":"https://github.com/dataesr/affiliation-matcher","commit_stats":null,"previous_names":[],"tags_count":48,"template":false,"template_full_name":null,"purl":"pkg:github/dataesr/affiliation-matcher","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dataesr%2Faffiliation-matcher","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dataesr%2Faffiliation-matcher/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dataesr%2Faffiliation-matcher/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dataesr%2Faffiliation-matcher/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dataesr","download_url":"https://codeload.github.com/dataesr/affiliation-matcher/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dataesr%2Faffiliation-matcher/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28669088,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-22T17:07:18.858Z","status":"ssl_error","status_checked_at":"2026-01-22T17:05:02.040Z","response_time":144,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-22T19:10:40.502Z","updated_at":"2026-01-22T19:12:51.079Z","avatar_url":"https://github.com/dataesr.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Affiliation matcher\n\n[![Discord Follow](https://dcbadge.vercel.app/api/server/TudsqDqTqb?style=flat)](https://discord.gg/TudsqDqTqb)\n![license](https://img.shields.io/github/license/dataesr/affiliation-matcher)\n![GitHub release (latest by date)](https://img.shields.io/github/v/release/dataesr/affiliation-matcher?display_name=tag)\n![Tests](https://github.com/dataesr/affiliation-matcher/actions/workflows/tests.yml/badge.svg)\n![Build](https://github.com/dataesr/affiliation-matcher/actions/workflows/build.yml/badge.svg)\n\n## Goal\n\nThe affiliation matcher aims to automatically align an affiliation with different reference systems, including :\n\n- [Country ISO 3166](https://en.wikipedia.org/wiki/ISO_3166)\n- [grid](https://grid.ac/)\n- [ROR](https://ror.org/)\n- [Wikidata](https://www.wikidata.org/)\n\nAnd specifically for French affiliations :\n\n- [FINESS](https://www.data.gouv.fr/fr/datasets/finess-extraction-du-fichier-des-etablissements)\n- [RNSR (Répertoire National des Structures de Recherche)](https://appliweb.dgri.matchereducation.fr/rnsr/)\n- [Siren](https://www.sirene.fr/sirene/public/accueil)\n\n## Methodology\n\nThe methodology is fully explained in a publication freely available on HAL:\nhttps://hal.archives-ouvertes.fr/hal-03365806.\n\n## Run it locally\n\n:warning: Please use `docker-compose` version 1.27.0 up to 1.19.2.\n\n\n```shell\ngit clone git@github.com:dataesr/affiliation-matcher.git\ncd affiliation-matcher\nmake docker-build start\n```\n\nWait for Elasticsearch to be up. Then run :\n\n```shell\nmake load\n```\n\nIn your browser, you now have :\n\n- Elasticsearch : http://localhost:9200/\n- RabbitMQ : http://localhost:9181/\n- Matcher : http://localhost:5004/\n\nIn python, you can call the matcher this way:\n\n```shell\nimport requests\nurl = 'http://localhost:5004/match'\nr=requests.post(url, json={\n  \"type\": \"ror\", \n  \"name\": \"Paris Dauphine University\", \n  \"city\": \"Paris\",\n  \"country\": \"France\",\n  \"verbose\": False}\n)\nr.json()\n```\n\nFor RoR, available criteria are: id, grid_id, name, city, country, supervisor_name, acronym, city_zone_emploi, city_nuts_level2, web_url, web_domain. Default strategies are detauked https://github.com/dataesr/affiliation-matcher/blob/master/project/server/main/match_ror.py\n\nFor RNSR, available criteria are: year, id, code_number, acronym, name, supervisor_name, supervisor_acronym, zone_emploi, city, web_url. Default strategies are detailed in https://github.com/dataesr/affiliation-matcher/blob/master/project/server/main/match_rnsr.py\n\n## Run unit tests\n\n```shell\nmake test\n```\n\n## Build docker image\n\n```shell\nmake docker-build\n```\n\n## Build python package\n\nTo generate the tarball package into the **dist** folder :\n\n```shell\nmake python-build\n```\n\nTo install the generated package into your project :\n\n```shell\npip install /path/to/your/package.tar.gz\n```\n\nThen import the package into your python file\n\n```python\nimport affiliation-matcher\n```\n\n## Release\n\nIt uses [semver](https://semver.org/).\n\nTo create a new release:\n```shell\nmake release VERSION=x.x.x\n```\n\n## API\n\n### Match a single query `/match`\n\nQuery the API by setting your own strategies :\n\n`curl \"YOUR_API_IP/match\" -X POST -d '{\"type\": \"YOUR_TYPE\", \"query\": \"YOUR_QUERY\", \"strategies\": \"YOUR_STRATEGIES\", \"year\": \"YOUR_YEAR\"}'`\n\nYOUR_TYPE is optional, has to be a string and can be one of :\n* \"country\"\n* \"grid\"\n* \"rnsr\"\n* \"ror\"\n\nBy default, YOUR_TYPE is equal to \"rnsr\".\n\nYOUR_QUERY is **mandatory**, has to be a string and is your affiliation text.\n\nBy example : `IPAG Institut de Planétologie et d'Astrophysique de Grenoble`.\n\nYOUR_STRATEGIES is optional, has to be a 3 dimensional arrays of criteria (see next paragraph).\n\nBy example : `[[[\"grid_name\", \"grid_country\"], [\"grid_name\", \"grid_country_code\"]]]`.\n\nYOUR_YEAR is optional, and can be used only if you use the \"rnsr\" matcher type, has te be a string.\n\nBy example : `1998`.\n\nBy default, YOUR_YEAR is not set ie. it will be match over all years.\n\n\n### Match multiple queries `/match_list`\n\n`curl \"YOUR_API_IP/match_list\" -X POST -d '{\"match_types\": \"YOUR_TYPES\", \"affiliations\": \"YOUR_AFFILIATIONS\"}'`\n\nYOUR_TYPES is optional, has to be a list of string and can contain one of :\n* \"country\"\n* \"grid\"\n* \"rnsr\"\n* \"ror\"\n\nBy default, YOUR_TYPES is equal to [\"grid\", \"rnsr\"].\n\nYOUR_AFFILIATIONS is optional, has to be a list of string.\nBy example : `[\"affiliation_01\", \"affiliation_02\"]`.\n\nBy default, YOUR_AFFILIATIONS is equal to [].\n\n\n## Criteria\n\nHere is a list of the criteria available for the **country matcher**:\n* country_alpha3\n* country_name\n* country_subdivision_code\n* country_subdivision_name\n\nHere is a list of the criteria available for the **grid matcher**:\n* grid_acronym\n* grid_acronym_unique\n* grid_cities_by_region [indirect]\n* grid_city\n* grid_country\n* grid_country_code\n* grid_department\n* grid_id\n* grid_name\n* grid_name_unique\n* grid_parent\n* grid_region\n\nHere is a list of the criteria available for the **rnsr matcher**:\n* rnsr_acronym\n* rnsr_city\n* rnsr_code_number\n* rnsr_code_prefix\n* rnsr_country_code\n* rnsr_id\n* rnsr_name\n* rnsr_name_txt\n* rnsr_supervisor_acronym\n* rnsr_supervisor_name\n* rnsr_urban_unit\n* rnsr_web_url\n* rnsr_year\n* rnsr_zone_emploi [indirect]\n\nHere is a list of the criteria available for the **ror matcher**:\n* ror_acronym\n* ror_acronym_unique\n* ror_city\n* ror_country\n* ror_country_code\n* ror_grid_id\n* ror_id\n* ror_name\n* ror_name_unique\n\n1. You can combine criteria to create a strategy.\n2. You can cumulate strategies to create a family of strategies.\n3. And then you can cumulate families of strategies to create the final object.\n4. This final object `strategies` is then a 3 dimensional array that you will give as an argument to the \"/match\" API endpoint.\nBy example : `[[[\"grid_name\", \"grid_country\"], [\"grid_name\", \"grid_country_code\"]]]`.\n\n\n## Results\n\n| matcher | precision | recall |\n| ----- | ----- | ----- |\n| country | 0.9953 | 0.9690 |\n| grid | 0.7946 | 0.5944 |\n| rnsr | 0.9654 | 0.8192 |\n| ror | 0.8891 | 0.2356 | (TBC ???)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdataesr%2Faffiliation-matcher","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdataesr%2Faffiliation-matcher","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdataesr%2Faffiliation-matcher/lists"}