{"id":13557260,"url":"https://github.com/SouthGreenPlatform/AgroLD_ETL","last_synced_at":"2025-04-03T11:31:31.428Z","repository":{"id":44102584,"uuid":"158649418","full_name":"SouthGreenPlatform/AgroLD_ETL","owner":"SouthGreenPlatform","description":"AgroLD is a RDF knowledge base that consists of data integrated from a variety of plant resources and ontologies. AgroLD ETL is the Python packages developed to transform plant datasets in RDF. Packages are developped for data standards such as GFF,GAF, VCF and specific plant databases.","archived":false,"fork":false,"pushed_at":"2024-04-22T12:28:28.000Z","size":1894,"stargazers_count":1,"open_issues_count":40,"forks_count":1,"subscribers_count":11,"default_branch":"master","last_synced_at":"2024-04-22T12:30:26.675Z","etag":null,"topics":["etl","gaf","gff","ontologies","rdf","rdf-data","vcf"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SouthGreenPlatform.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-11-22T06:00:49.000Z","updated_at":"2024-06-18T20:25:40.083Z","dependencies_parsed_at":"2024-06-18T20:37:50.395Z","dependency_job_id":null,"html_url":"https://github.com/SouthGreenPlatform/AgroLD_ETL","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SouthGreenPlatform%2FAgroLD_ETL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SouthGreenPlatform%2FAgroLD_ETL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SouthGreenPlatform%2FAgroLD_ETL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SouthGreenPlatform%2FAgroLD_ETL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SouthGreenPlatform","download_url":"https://codeload.github.com/SouthGreenPlatform/AgroLD_ETL/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246993003,"owners_count":20865921,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["etl","gaf","gff","ontologies","rdf","rdf-data","vcf"],"created_at":"2024-08-01T12:04:14.711Z","updated_at":"2025-04-03T11:31:26.403Z","avatar_url":"https://github.com/SouthGreenPlatform.png","language":"Python","funding_links":[],"categories":["Python","others"],"sub_categories":[],"readme":"\n# AgroLD ETL\n\nAgroLD is a RDF knowledge base that consists of data integrated from a variety of plant resources and ontologies. AgroLD ETL is the Python packages developed to transform plant datasets in RDF. Packages are developped for data standards such as GFF,GAF, VCF and specific plant databases.\n\n\n# Contact\n\n* pierre.larmande_at_ird.fr\n\n\n# Valorization and How to cite\n\n* [https://doi.org/10.1371/journal.pone.0198270](https://doi.org/10.1371/journal.pone.0198270)\n* [https://tel.archives-ouvertes.fr/IBC/hal-01176903v1](https://tel.archives-ouvertes.fr/IBC/hal-01176903v1) \n* [http://ceur-ws.org/Vol-1546/poster_55.pdf](http://ceur-ws.org/Vol-1546/poster_55.pdf)\n\n\n\n# Contributing\n\n* Intellectual property belongs to IRD, CIRAD and SouthGreen development platform.\n* Written by Aravind Venkatesan, Gildas Tagny Ngompe, Nordine El Hassouni, Manuel Ruiz, Pierre Larmande. \n* Copyright 2014-2020\n\n\n# The Architecture of project\n\n\nAgroLD project is composed of two component: [AgroLD_ETL](/AgroLD_ETL)  and  [agrold_webapp](/agrold_webapp)\n\n\n* The first component is a set of Parser and wrapper for translate a dataset. Follow this link for to know what data have been translated in RDF: [Documentation](http://volvestre.cirad.fr:8080/agrold/documentation.jsp)  \n\n* The second component is the web application who is connected at the triple store for to make queries.\n \n***\n\n```\n AgroLD_ETL\n\t-\u003e model\n\t-\u003e rdf_ttl\n\t-\u003e riceKB\n\t-\u003e test_files\n\t-\u003e riceKBpipeline.py\n```\n***\n\n\n### AgroLD_ETL\n\n\nContains parsers and model used to convert data considered for AgroLD to RDF.\n\n* [model](/AgroLD_ETL/model): All documents who describe how data are transformed\n* [rdf_ttl](/AgroLD_ETL/rdf_ttl): All output of transformation sort by dataset\n* [riceKB](/AgroLD_ETL/riceKB): Contains scripts used for each data set\n* [test_files](/AgroLD_ETL/test_files): All test files in input ( heterogeneous format: csv, tabbed files, gff3 )\n* [riceKBpipeline.py](/AgroLD_ETL/riceKBpipeline.py): Script file where we have centralised all execution\n\n\nThe type of each dataset is different, GFF, HapMap, CSV and VCF. In first time we have developed parser for build a dictonary, \nbecause is easy to browse a dictionary and create the RDF \n\n\n# How to use AgroLD_ETL\n\nFor example if you want to execute a gff3 parser for to build a dictionary.\ndefine a input file\n\n```\nAgroLD/AgroLD_ETL/riceKB/gffParser.py\n\npath = '/os_file_gff3/file.gff3'     # The input\n\nds = parseGFF3(path)   # The parsing file\n\n```\n\n\u003e **The dictionary :** The GFF3 Parser is a generic fonction who build a dictionary, it easy to browse this dictionary for build RDF data \n\n\n```\n{   'attributes': {   'Dbxref': 'InterPro:IPR005333,InterPro:IPR017887',\n                          'ID': 'BGIOSGA000770-TA',\n                          'Name': 'BGIOSGA000770-TA',\n                          'Parent': 'BGIOSGA000770'},\n        'end': 35414873,\n        'phase': None,\n        'score': None,\n        'seqid': 'Osi01',\n        'source': 'glean',\n        'start': 35413950,\n        'strand': '-',\n        'type': 'mRNA'},\n    {   'attributes': {   'Derives_from': 'BGIOSGA000770-TA',\n                          'ID': 'BGIOSGA000770-TA_protein',\n                          'Name': 'BGIOSGA000770-TA'},\n        'end': 35414873,\n        'phase': None,\n        'score': None,\n        'seqid': 'Osi01',\n        'source': 'glean',\n        'start': 35413950,\n        'strand': '-',\n        'type': 'polypeptide'},\n    {   'attributes': {   'Parent': 'BGIOSGA000770-TA'},\n        'end': 35414873,\n        'phase': '0',\n        'score': None,\n        'seqid': 'Osi01',\n        'source': 'glean',\n        'start': 35413950,\n        'strand': '-',\n        'type': 'CDS'},\n    {   'attributes': {   'Parent': 'BGIOSGA000770-TA'},\n        'end': 35414873,\n        'phase': None,\n        'score': None,\n        'seqid': 'Osi01',\n        'source': 'glean',\n        'start': 35413950,\n        'strand': '-',\n        'type': 'exon'},\n\n```\n\n### Documentation\n\n- AgroLD includes data on the following species on :  [Species](http://volvestre.cirad.fr:8080/agrold/documentation.jsp#species)\n- Ontologies in AgroLD : [Ontologies](http://volvestre.cirad.fr:8080/agrold/documentation.jsp#ontologies)\n- Data sources in AgroLD : [Data](http://volvestre.cirad.fr:8080/agrold/documentation.jsp#sources)\n- Species specific break down of the data sources : [Link](http://volvestre.cirad.fr:8080/agrold/documentation.jsp#break-down)\n- Graph Names : [Link](http://volvestre.cirad.fr:8080/agrold/documentation.jsp#graphs)\n- URIs :  [Link](http://volvestre.cirad.fr:8080/agrold/documentation.jsp#uri)\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSouthGreenPlatform%2FAgroLD_ETL","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSouthGreenPlatform%2FAgroLD_ETL","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSouthGreenPlatform%2FAgroLD_ETL/lists"}