{"id":27881855,"url":"https://github.com/src-d/gitcollector","last_synced_at":"2025-10-27T16:11:59.143Z","repository":{"id":79453993,"uuid":"189379082","full_name":"src-d/gitcollector","owner":"src-d","description":null,"archived":false,"fork":false,"pushed_at":"2019-10-31T12:03:07.000Z","size":209,"stargazers_count":18,"open_issues_count":7,"forks_count":14,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-05-05T05:05:41.100Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/src-d.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-05-30T08:49:51.000Z","updated_at":"2025-03-30T23:55:07.000Z","dependencies_parsed_at":null,"dependency_job_id":"bf2b2524-56e3-43e4-b0b8-90bb1b6afb72","html_url":"https://github.com/src-d/gitcollector","commit_stats":null,"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/src-d%2Fgitcollector","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/src-d%2Fgitcollector/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/src-d%2Fgitcollector/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/src-d%2Fgitcollector/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/src-d","download_url":"https://codeload.github.com/src-d/gitcollector/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252442486,"owners_count":21748451,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-05-05T05:05:47.058Z","updated_at":"2025-10-20T09:54:35.911Z","avatar_url":"https://github.com/src-d.png","language":"Go","readme":"# gitcollector [![GitHub version](https://badge.fury.io/gh/src-d%2Fgitcollector.svg)](https://github.com/src-d/gitcollector/releases) [![Build Status](https://travis-ci.com/src-d/gitcollector.svg?branch=master)](https://travis-ci.com/src-d/gitcollector) [![codecov](https://codecov.io/gh/src-d/gitcollector/branch/master/graph/badge.svg)](https://codecov.io/gh/src-d/gitcollector) [![GoDoc](https://godoc.org/gopkg.in/src-d/gitcollector.v0?status.svg)](https://godoc.org/gopkg.in/src-d/gitcollector.v0) [![Go Report Card](https://goreportcard.com/badge/github.com/src-d/gitcollector)](https://goreportcard.com/report/github.com/src-d/gitcollector)\n\n**gitcollector** collects and stores git repositories.\n\ngitcollector is the source{d} tool to download and update git repositories at\nlarge scale. To that end, it uses a custom repository storage\n[file format](https://blog.sourced.tech/post/siva/) called [siva](https://github.com/src-d/go-siva) optimized for saving\nstorage space and keeping repositories up-to-date.\n\n## Status\n\nThe project is in a preliminary stable stage and under active development.\n\n## Storing repositories using rooted repositories\n\nA rooted repository is a [bare Git repository](http://www.saintsjd.com/2011/01/what-is-a-bare-git-repository/) that stores all objects from all repositories that share a common history, that is, they have the same initial commit. It is stored using the [Siva](https://github.com/src-d/go-siva) file format.\n\n![Root Repository explanatory diagram](https://user-images.githubusercontent.com/5582506/30617179-2aba194a-9d95-11e7-8fd5-0a87c2a595f9.png)\n\nRooted repositories have a few particularities that you should know to work with them effectively:\n\n- They have no `HEAD` reference.\n- All references are of the following form: `{REFERENCE_NAME}/{REMOTE_NAME}`. For example, the reference `refs/heads/master` of the remote `foo` would be `/refs/heads/master/foo`.\n- Each remote represents a repository that shares the common history of the rooted repository. A remote can have multiple endpoints.\n- A rooted repository is simply a repository with all the objects from all the repositories which share the same root commit.\n- The root commit for a repository is obtained following the first parent of each commit from HEAD.\n\n## Getting started\n\n### Plain command\n\ngitcollector entry point usage is done through the subcommand `download` (at this time is the only subcommand):\n\n```txt\nUsage:\n  gitcollector [OPTIONS] download [download-OPTIONS]\n\nHelp Options:\n  -h, --help                                     Show this help message\n\n[download command options]\n          --library=                             path where download to [$GITCOLLECTOR_LIBRARY]\n          --bucket=                              library bucketization level (default: 2) [$GITCOLLECTOR_LIBRARY_BUCKET]\n          --tmp=                                 directory to place generated temporal files (default: /tmp) [$GITCOLLECTOR_TMP]\n          --workers=                             number of workers, default to GOMAXPROCS [$GITCOLLECTOR_WORKERS]\n          --half-cpu                             set the number of workers to half of the set workers [$GITCOLLECTOR_HALF_CPU]\n          --no-updates                           don't allow updates on already downloaded repositories [$GITCOLLECTOR_NO_UPDATES]\n          --no-forks                             github forked repositories will not be downloaded [$GITCOLLECTOR_NO_FORKS]\n          --orgs=                                list of github organization names separated by comma [$GITHUB_ORGANIZATIONS]\n          --excluded-repos=                      list of repos to exclude separated by comma [$GITCOLLECTOR_EXCLUDED_REPOS]\n          --token=                               github token [$GITHUB_TOKEN]\n          --metrics-db=                          uri to a database where metrics will be sent [$GITCOLLECTOR_METRICS_DB_URI]\n          --metrics-db-table=                    table name where the metrics will be added (default: gitcollector_metrics) [$GITCOLLECTOR_METRICS_DB_TABLE]\n          --metrics-sync-timeout=                timeout in seconds to send metrics (default: 30) [$GITCOLLECTOR_METRICS_SYNC]\n\n    Log Options:\n          --log-level=[info|debug|warning|error] Logging level (default: info) [$LOG_LEVEL]\n          --log-format=[text|json]               log format, defaults to text on a terminal and json otherwise [$LOG_FORMAT]\n          --log-fields=                          default fields for the logger, specified in json [$LOG_FIELDS]\n          --log-force-format                     ignore if it is running on a terminal or not [$LOG_FORCE_FORMAT]\n```\n\nUsage example, `--library` and `--orgs` are always required:\n\n\u003e gitcollector download --library=/path/to/repos/directoy --orgs=src-d\n\nTo collect repositories from several github organizations:\n\n\u003e gitcollector download --library=/path/to/repos/directoy --orgs=src-d,bblfsh\n\nNote that all the download command options are also configurable with environment variables.\n\n### Docker\n\ngitcollector upload a new docker image to [docker hub](https://hub.docker.com/r/srcd/gitcollector/tags) on each new release. To use it:\n\n``` sh\ndocker run --rm --name gitcollector_1 \\\n-e \"GITHUB_ORGANIZATIONS=src-d,bblfsh\" \\\n-e \"GITHUB_TOKEN=foo\" \\\n-v /path/to/repos/directory:/library \\\nsrcd/gitcollector:latest\n```\n\nNote that you must mount a local directory into the specific container path shown in `-v /path/to/repos/directory:/library`. This directory is where the repositories will be downloaded into rooted repositories in siva files format.\n\n## License\n\nGPL v3.0, see [LICENSE](LICENSE)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsrc-d%2Fgitcollector","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsrc-d%2Fgitcollector","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsrc-d%2Fgitcollector/lists"}