{"id":25205186,"url":"https://github.com/sparna-git/jurivoc","last_synced_at":"2025-04-05T01:10:15.987Z","repository":{"id":229657601,"uuid":"772667044","full_name":"sparna-git/Jurivoc","owner":"sparna-git","description":null,"archived":false,"fork":false,"pushed_at":"2024-11-07T11:28:31.000Z","size":3175,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-02T21:11:19.545Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sparna-git.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-15T16:41:42.000Z","updated_at":"2024-11-07T11:28:34.000Z","dependencies_parsed_at":"2024-11-05T10:27:46.911Z","dependency_job_id":"853b678f-bc62-435c-a9b9-959f06dc8674","html_url":"https://github.com/sparna-git/Jurivoc","commit_stats":null,"previous_names":["sparna-git/jurivoc"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sparna-git%2FJurivoc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sparna-git%2FJurivoc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sparna-git%2FJurivoc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sparna-git%2FJurivoc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sparna-git","download_url":"https://codeload.github.com/sparna-git/Jurivoc/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247271528,"owners_count":20911587,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-02-10T09:19:20.059Z","updated_at":"2025-04-05T01:10:15.958Z","avatar_url":"https://github.com/sparna-git.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Project Jurivoc\n\n\n## Requirements\n\n* Python 3.6 - do not use python3.12\n\n\n## Installation\n\n1. Clone the repository\n\n```sh\ngit clone git@github.com:/sparna-git/Jurivoc.git\n```\n\n2. Install pip\n\n```sh\nsudo apt install python3-pip\n```\n\nOn Windows, PIP is already included for versions of Python \u003e 3.4.\n\n3. Install virtualenv\n\n```sh\npip install virtualenv \n# You may need to do this on Linux :\n# sudo apt install python3.12-venv\n# or :\n# sudo apt install python3-venv\n```\n\n4. Create virtualenv\n\n```sh\npython3 -m venv virtualenv\n```\n\n5. Activate virtualenv\n\n```sh\nWindows : virtualenv/Scripts/activate.bat\nLinux : source virtualenv/bin/activate\n```\n\n6. Once in the virtual env, install the necessary dependencies from `requirements.txt` :\n\n```sh\npip install -r requirements.txt\n```\n\n\n## Running the Python script\n\n/!\\ Make sure you are in the virtualenv !\n\nYou can display the help message of the commands:\n\n```sh\n  python convert_data_jurivoc.py --help\n```\n\nThe command synopsis is the following:\n\n```sh\n  python convert_data_jurivoc.py --data \u003cdirectory input files \u003e --output \u003cdirectory output\u003e --log \u003cdirectory log\u003e --previousVersion \u003cdirectory of previous version\u003e [--noComplexSubjects]\n```\n\ne.g, without a previous version (= initial run):\n\n```sh\npython convert_data_jurivoc.py --data inputs --output jurivoc_graph --log jurivoc_log --noComplexSubjects\n```\n\nwith a previous version :\n\n```sh\npython convert_data_jurivoc.py --data inputs --output jurivoc_graph --log jurivoc_log --previousVersion jurivoc_graph_v1 --noComplexSubjects\n```\n\nThe parameters are the following:\n\n- `--data` Directory containing the Jurivoc files. (required) :\n  - jurivoc_fre.txt\n  - jurivoc_ger.txt\n  - jurivoc_ita.txt\n  - jurivoc_fre_ger.txt\n  - jurivoc_fre_ita.txt\n- `--output` Result Directory (required)\n- `--log` (optional) Log Directory where the raw dataframes resulting from file parsing will be logged (optional). This directory will also contain a Turtle log of the graph before trying to retrieve the URIs from the previous version.\n- `--previousVersion` (optional) directory where the previous version of jurivoc will be read to fetch the previous URIs (that directory is expected to contain the file `jurivoc.ttl` that was the output of the previous run)\n- `--noComplexSubjects` : do not generate any `madsrdf:ComplexSubject` entity\n\n## Notes on URI\n\n### Concepts URI\n\nThe URI generation works in 2 steps :\n\n1. In the first step, URIs are given based on the French labels. e.g. `jurivoc:PUBLICATION_ELECTRONIQUE`. The SKOS thesaurus with these URIs is logged into `\u003clog directory\u003e/jurivoc_with_label_uris.ttl`\n2. Then, in a second step, 2 things can happen :\n  1. Either the parameter `--previousVersion` was *not* provided, indicating that it is the initial run, then a sequential id will be given to every concept based on the alphabetical order of their URI in the first step\n  2. Either the parameter `--previousVersion` was provided, then an attempt is made to retrieve the previous URI from the previous version :\n\n    - A search is made on the French, Italian and German prefLabel of each concept. If _1_ prefLabel matches, then the URI is retrieved from the previous concept. This means that if 1 or 2 prefLabel have changed, but one stayed the same, the Concept will retain its previous URI\n    - If no prefLabel matched, a new URI based on the sequential identifier will be given to the concept \n\nThe mapping table between the URIs of the first step, and the corresponding URI found in the previous version is in `jurivoc_log/data_for_graph/Merge_GraphNew_OldGraph.csv`.\n\n### ComplexSubject URI\n\nThe URIs for the `ComplexSubject` (USA / AND blocks in the original file) is always created based on:\n- The concatenation of the URI of the Concepts composing the composite synonym\n- A counter based on the alphabetical order of the synonym, since more than one synonym can correspond to the combination of the same set of Concepts\n\ne.g.\n\n```turtle\njurivoc:1_1415_4875_8192 a madsrdf:ComplexSubject ;\n    madsrdf:authoritativeLabel \"principe de déterminance\"@fr ;\n    madsrdf:componentList ( jurivoc:1415 jurivoc:4875 jurivoc:8192 ) .\n```\n\nthis implies that, in the case when 1/ multiple composite synonyms are associated to the same set of concepts and 2/ one of them is changed and it changes its alphabetical order, then its URI will change.\n\n### Testing the URI matching behavior with the previous version\n\nThe repository contains 2 directories that contain a tiny sample of Jurivoc entries : `inputs_small` and `inputs_small_v2`. The \"inputs_small_v2\" folder contains some modifications compared to the \"inputs_small\" and this can be used to easily test the behavior when new entries are being added or renamed.\n\n\n## Notes\n\nThe conversion takes about 30 minutes to complete.\n\nThe conversion assumes that the input files are named like the one documented above, to know the language of the labels. Do not rename the files.\n\nThe structure of the thesaurus is read from the French variant. The german and italian variants are used to fetch the corresponding labels (pref and alt) and notes, plus \"composite synonyms\", but not the broader/narrower/related.\n\n## Note on data consistency\n\nSome entries in the \"ger\" file do not have their French equivalent. They are logged in the log file `terms_in_ger_ita_not_found_in_fra.csv`. The consequence is that some Concepts could miss a german prefLabel\n\n```\ntitle|language|title_traduction\nAPPARTHOTEL|de|APPARTHOTEL\nARTHRODESE|de|ARTHRODESE\nATHETOSE|de|ATHETOSE\nATRESIE|de|ATRESIE\nBATTERIE(ENERGIE)|de|BATTERIE(ENERGIE)\nBELARUS|de|BELARUS\nBENIN|de|BENIN\nBETON|de|BETON\nBIOMETRIE|de|BIOMETRIE\nDÄNEMARK|de|DÄNEMARK\nDEFLATION|de|DEFLATION\nDEPARTEMENT|de|DEPARTEMENT\nDEPRESSION|de|DEPRESSION\nDYSMELIE|de|DYSMELIE\nENERGIE|de|ENERGIE\nEPIDEMIE|de|EPIDEMIE\nEPILEPSIE|de|EPILEPSIE\nERGOTHERAPIE|de|ERGOTHERAPIE\nGENEALOGIE|de|GENEALOGIE\nGEOGRAPHIE|de|GEOGRAPHIE\nGEOLOGIE|de|GEOLOGIE\nGERIATRIE|de|GERIATRIE\nHEMIHYPERTROPHIE|de|HEMIHYPERTROPHIE\nHEMIPLEGIE|de|HEMIPLEGIE\nHERPES|de|HERPES\nHOTEL|de|HOTEL\nHYDRONEPHROSE|de|HYDRONEPHROSE\nHYGIENE|de|HYGIENE\nHYSTERIE|de|HYSTERIE\nILEUS|de|ILEUS\nINGENIEUR|de|INGENIEUR\nINTERREGIONAL|de|INTERREGIONAL\nISRAEL|de|ISRAEL\nLEGASTHENIE|de|LEGASTHENIE\nLIBERIA|de|LIBERIA\nMAIS|de|MAIS\nMETEOROLOGIE|de|METEOROLOGIE\nMODERATION|de|MODERATION\nMONTENEGRO|de|MONTENEGRO\nNEPAL|de|NEPAL\nNEPHROSE|de|NEPHROSE\nNIGERIA|de|NIGERIA\nOPERATION|de|OPERATION\nOSTEOCHONDROSE|de|OSTEOCHONDROSE\nOSTEOPOROSE|de|OSTEOPOROSE\nOSTEOSYNTHESE|de|OSTEOSYNTHESE\nOSTEOTOMIE|de|OSTEOTOMIE\nPARAPLEGIE|de|PARAPLEGIE\nPHYSIOTHERAPIE|de|PHYSIOTHERAPIE\nPHYTOTHERAPIE|de|PHYTOTHERAPIE\nPIEMONT|de|PIEMONT\nPSYCHOTHERAPIE|de|PSYCHOTHERAPIE\nQUEBEC|de|QUEBEC\nREFERENDUM|de|REFERENDUM\nREGIMENT|de|REGIMENT\nREGION|de|REGION\nREGISSEUR|de|REGISSEUR\nRETINOPATHIE|de|RETINOPATHIE\nRHONE|de|RHONE\nSCHIZOPHRENIE|de|SCHIZOPHRENIE\nSENEGAL|de|SENEGAL\nSPONDYLODESE|de|SPONDYLODESE\nSPONDYLOLISTHESIS|de|SPONDYLOLISTHESIS\nSTENOSE|de|STENOSE\nSTERILISATION|de|STERILISATION\nSUBDELEGATION|de|SUBDELEGATION\nSYNECHIE|de|SYNECHIE\nTANTIEME|de|TANTIEME\nTETRAPLEGIE|de|TETRAPLEGIE\nTHEOLOGIE|de|THEOLOGIE\nTHERAPIE|de|THERAPIE\nTURKMENISTAN|de|TURKMENISTAN\nVENEZUELA|de|VENEZUELA\nVIDEOTEX|de|VIDEOTEX\nVOLIERE|de|VOLIERE\nZOOTHERAPIE|de|ZOOTHERAPIE\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsparna-git%2Fjurivoc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsparna-git%2Fjurivoc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsparna-git%2Fjurivoc/lists"}