{"id":15470440,"url":"https://github.com/dlr-sc/gitlab2prov","last_synced_at":"2025-07-25T18:34:30.926Z","repository":{"id":36456277,"uuid":"215042878","full_name":"DLR-SC/gitlab2prov","owner":"DLR-SC","description":"🔍️ Extract provenance information (W3C PROV) from GitLab projects.","archived":false,"fork":false,"pushed_at":"2023-09-05T14:28:30.000Z","size":2738,"stargazers_count":16,"open_issues_count":8,"forks_count":3,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-07-07T22:13:47.117Z","etag":null,"topics":["extract-provenance-information","git","gitlab","graphs","knowledge-graph","prov-generation","provenance","python","software-analytics","w3c-prov"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DLR-SC.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-10-14T12:50:04.000Z","updated_at":"2024-09-26T07:40:46.000Z","dependencies_parsed_at":"2025-03-03T16:47:20.260Z","dependency_job_id":null,"html_url":"https://github.com/DLR-SC/gitlab2prov","commit_stats":null,"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"purl":"pkg:github/DLR-SC/gitlab2prov","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DLR-SC%2Fgitlab2prov","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DLR-SC%2Fgitlab2prov/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DLR-SC%2Fgitlab2prov/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DLR-SC%2Fgitlab2prov/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DLR-SC","download_url":"https://codeload.github.com/DLR-SC/gitlab2prov/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DLR-SC%2Fgitlab2prov/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267046253,"owners_count":24026899,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-25T02:00:09.625Z","response_time":70,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["extract-provenance-information","git","gitlab","graphs","knowledge-graph","prov-generation","provenance","python","software-analytics","w3c-prov"],"created_at":"2024-10-02T02:04:41.906Z","updated_at":"2025-07-25T18:34:30.879Z","avatar_url":"https://github.com/DLR-SC.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003eWelcome to \u003ccode\u003egitlab2prov\u003c/code\u003e! 👋\u003c/h1\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/dlr-sc/gitlab2prov/blob/master/LICENSE\"\u003e\n    \u003cimg alt=\"License: MIT\" src=\"https://img.shields.io/badge/license-MIT-yellow.svg\" target=\"_blank\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://img.shields.io/badge/Made%20with-Python-1f425f.svg\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Made%20with-Python-1f425f.svg\" alt=\"Badge: Made with Python\"/\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://pypi.org/project/gitlab2prov/\"\u003e\n    \u003cimg src=\"https://img.shields.io/pypi/v/gitlab2prov\" alt=\"Badge: PyPi Version\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://pypistats.org/packages/gitlab2prov\"\u003e\n    \u003cimg src=\"https://img.shields.io/pypi/dm/gitlab2prov\" alt=\"Badge: PyPi Downloads Monthly\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://twitter.com/dlr_software\"\u003e\n    \u003cimg alt=\"Twitter: DLR Software\" src=\"https://img.shields.io/twitter/follow/dlr_software.svg?style=social\" target=\"_blank\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://open.vscode.dev/DLR-SC/gitlab2prov\"\u003e\n    \u003cimg alt=\"Badge: Open in VSCode\" src=\"https://img.shields.io/static/v1?logo=visualstudiocode\u0026label=\u0026message=open%20in%20visual%20studio%20code\u0026labelColor=2c2c32\u0026color=007acc\u0026logoColor=007acc\" target=\"_blank\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://zenodo.org/badge/latestdoi/215042878\"\u003e\n    \u003cimg alt=\"Badge: DOI\" src=\"https://zenodo.org/badge/215042878.svg\" target=\"_blank\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://www.w3.org/TR/prov-overview/\"\u003e\n    \u003cimg alt=\"Badge: W3C PROV\" src=\"https://img.shields.io/static/v1?logo=w3c\u0026label=\u0026message=PROV\u0026labelColor=2c2c32\u0026color=007acc\u0026logoColor=007acc?logoWidth=200\" target=\"_blank\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://citation-file-format.github.io/\"\u003e\n    \u003cimg alt=\"Badge: Citation File Format Inside\" src=\"https://img.shields.io/badge/-citable%20software-green\" target=\"_blank\" /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\n\u003e `gitlab2prov` is a Python library and command line tool that extracts provenance information from GitLab projects.\n\n---\n\nThe `gitlab2prov` data model has been designed according to [W3C PROV](https://www.w3.org/TR/prov-overview/) specification.\nThe model documentation can be found [here](https://github.com/DLR-SC/gitlab2prov/tree/master/docs).\n\n## ️🏗️ ️Installation\n\nPlease note that this tool requires Git to be installed on your machine.\n\nClone the project and install using `pip`:\n```bash\npip install .\n```\n\nOr install the latest release from [PyPi](https://pypi.org/project/gitlab2prov/):\n```bash\npip install gitlab2prov\n```\n\nTo install `gitlab2prov` with all extra dependencies require the `[dev]` extras:\n```bash\npip install .[dev]            # clone repo, install with extras\npip install gitlab2prov[dev]  # PyPi, install with extras\n```\n\n## ⚡ Getting started\n\n`gitlab2prov` needs a [personal access token](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html) to clone git repositories and to authenticate with the GitLab API.\nFollow [this guide](./docs/guides/tokens.md) to create an access token with the required [scopes](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html#personal-access-token-scopes).\n\n\n## 🚀‍ Usage\n\n`gitlab2prov` can be configured using the command line interface or by providing a configuration file in `.yaml` format.\n\n###  Command Line Usage\nThe command line interface consists of commands that can be chained together like a unix pipeline.\n\n```\nUsage: gitlab2prov [OPTIONS] COMMAND1 [ARGS]... [COMMAND2 [ARGS]...]...\n\n  Extract provenance information from GitLab projects.\n\nOptions:\n  --version        Show the version and exit.\n  --verbose        Enable logging to 'gitlab2prov.log'.\n  --config FILE    Read config from file.\n  --validate FILE  Validate config file and exit.\n  --help           Show this message and exit.\n\nCommands:\n  combine                  Combine multiple graphs into one.\n  extract                  Extract provenance information for one or more...\n  load                     Load provenance files.\n  merge-duplicated-agents  Merge duplicated agents based on a name to...\n  pseudonymize             Pseudonymize a provenance graph.\n  save                     Save provenance information to a file.\n  stats                    Print statistics such as node counts and...\n```\n\n### Configuration Files\n`gitlab2prov` supports configuration files in `.yaml` format that are functionally equivalent to command line invocations. \n\nTo read configuration details from a file instead of specifying on the command line, use the `--config` option:\n```ini\n# initiate a run using a config file\ngitlab2prov --config config/example.yaml\n```\nYou can validate your config file using the provided JSON-Schema `gitlab2prov/config/schema.json` that comes packaged with every installation:\n```ini\n# check config file for syntactical errors\ngitlab2prov --validate config/example.yaml\n```\n\nConfig file example:\n\n```yaml\n- extract:\n        url: [\"https://gitlab.com/example/foo\"]\n        token: tokenA\n- extract:\n        url: [\"https://gitlab.com/example/bar\"]\n        token: tokenB\n- load:\n        input: [example.rdf]\n- pseudonymize:\n- combine:\n- save:\n        output: combined\n        format: [json, rdf, xml, dot]\n- stats:\n        fine: true\n        explain: true\n        formatter: table\n```\n\nThe config file example is functionally equivalent to this command line invocation:\n\n```\ngitlab2prov extract -u https://gitlab.com/example/foo -t tokenFoo \\\n            extract -u https://gitlab.com/example/bar -t tokenBar \\\n            load -i example.rdf                                   \\\n            pseudonymize                                          \\\n            combine                                               \\\n            save -o combined -f json -f rdf -f xml -f dot         \\\n            stats --fine --explain --formatter table\n```\n\n### 🎨 Provenance Output Formats\n\n`gitlab2prov` supports output formats that the [`prov`](https://github.com/trungdong/prov) library provides:\n* [PROV-N](http://www.w3.org/TR/prov-n/)\n* [PROV-O](http://www.w3.org/TR/prov-o/) (RDF)\n* [PROV-XML](http://www.w3.org/TR/prov-xml/)\n* [PROV-JSON](http://www.w3.org/Submission/prov-json/)\n* [Graphviz](https://graphviz.org/) (DOT)\n\n## 🤝 Contributing\n\nContributions and pull requests are welcome!  \nFor major changes, please open an issue first to discuss what you would like to change.\n\n## ✨ How to cite\n\nIf you use GitLab2PROV in a scientific publication, we would appreciate citations to the following paper:\n\n* Schreiber, A., de Boer, C. and von Kurnatowski, L. (2021). [GitLab2PROV—Provenance of Software Projects hosted on GitLab](https://www.usenix.org/conference/tapp2021/presentation/schreiber). 13th International Workshop on Theory and Practice of Provenance (TaPP 2021), USENIX Association\n\nBibtex entry:\n\n```BibTeX\n@InProceedings{SchreiberBoerKurnatowski2021,\n  author    = {Andreas Schreiber and Claas de~Boer and Lynn von~Kurnatowski},\n  booktitle = {13th International Workshop on Theory and Practice of Provenance (TaPP 2021)},\n  title     = {{GitLab2PROV}{\\textemdash}Provenance of Software Projects hosted on GitLab},\n  year      = {2021},\n  month     = jul,\n  publisher = {{USENIX} Association},\n  url       = {https://www.usenix.org/conference/tapp2021/presentation/schreiber},\n}\n```\n\nYou can also cite specific releases published on Zenodo: [![DOI](https://zenodo.org/badge/215042878.svg)](https://zenodo.org/badge/latestdoi/215042878)\n\n## ✏️ References\n\n**Influencial Software for `gitlab2prov`**\n* Martin Stoffers: \"Gitlab2Graph\", v1.0.0, October 13. 2019, [GitHub Link](https://github.com/DLR-SC/Gitlab2Graph), DOI 10.5281/zenodo.3469385\n\n* Quentin Pradet: \"How do you rate limit calls with aiohttp?\", [GitHub Gist](https://gist.github.com/pquentin/5d8f5408cdad73e589d85ba509091741), MIT LICENSE\n\n**Influencial Papers for `gitlab2prov`**:\n\n* De Nies, T., Magliacane, S., Verborgh, R., Coppens, S., Groth, P., Mannens, E., and Van de Walle, R. (2013). [Git2PROV: Exposing Version Control System Content as W3C PROV](https://dl.acm.org/doi/abs/10.5555/2874399.2874431). In *Poster and Demo Proceedings of the 12th International Semantic Web Conference* (Vol. 1035, pp. 125–128).\n\n* Packer, H. S., Chapman, A., and Carr, L. (2019). [GitHub2PROV: provenance for supporting software project management](https://dl.acm.org/doi/10.5555/3359032.3359039). In *11th International Workshop on Theory and Practice of Provenance (TaPP 2019)*.\n\n**Papers that refer to `gitlab2prov`**:\n\n* Andreas Schreiber, Claas de Boer (2020). [Modelling Knowledge about Software Processes using Provenance Graphs and its Application to Git-based VersionControl Systems](https://dl.acm.org/doi/10.1145/3387940.3392220). In *ICSEW'20: Proceedings of the IEEE/ACM 42nd Conference on Software Engineering Workshops* (pp. 358–359).\n\n* Tim Sonnekalb, Thomas S. Heinze, Lynn von Kurnatowski, Andreas Schreiber, Jesus M. Gonzalez-Barahona, and Heather Packer (2020). [Towards automated, provenance-driven security audit for git-based repositories: applied to germany's corona-warn-app: vision paper](https://doi.org/10.1145/3416507.3423190). In *Proceedings of the 3rd ACM SIGSOFT International Workshop on Software Security from Design to Deployment* (pp. 15–18).\n\n* Andreas Schreiber (2020). [Visualization of contributions to open-source projects](https://doi.org/10.1145/3430036.3430057). In *Proceedings of the 13th International Symposium on Visual Information Communication and Interaction*. ACM, USA.\n\n## 📜 Dependencies \n`gitlab2prov` depends on several open source packages that are made freely available under their respective licenses.\n\n| Package                                                         | License                                                                                                                   |\n| --------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- |\n| [GitPython](https://github.com/gitpython-developers/GitPython)  | [![License](https://img.shields.io/badge/License-BSD_3--Clause-orange.svg)](https://opensource.org/licenses/BSD-3-Clause) |\n| [click](https://github.com/pallets/click)                       | [![License](https://img.shields.io/badge/License-BSD_3--Clause-orange.svg)](https://opensource.org/licenses/BSD-3-Clause) |\n| [python-gitlab](https://github.com/python-gitlab/python-gitlab) | [![License: LGPL v3](https://img.shields.io/badge/License-LGPL_v3-blue.svg)](https://www.gnu.org/licenses/lgpl-3.0)       |\n| [prov](https://pypi.org/project/prov/)                          | [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)               |\n| [jsonschema](https://github.com/python-jsonschema/jsonschema)   | [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)               |\n| [ruamel.yaml](https://pypi.org/project/ruamel.yaml/)            | [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)               |\n| [pydot](https://github.com/pydot/pydot)                         | [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)               |\n\n## 📝 License\nThis project is [MIT](https://github.com/dlr-sc/gitlab2prov/blob/master/LICENSE) licensed.  \nCopyright © 2019 German Aerospace Center (DLR) and individual contributors.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdlr-sc%2Fgitlab2prov","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdlr-sc%2Fgitlab2prov","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdlr-sc%2Fgitlab2prov/lists"}