Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/erizocosmico/git2pg
Git to PostgreSQL repository migration.
https://github.com/erizocosmico/git2pg
git migration postgresql sql vcs
Last synced: 2 months ago
JSON representation
Git to PostgreSQL repository migration.
- Host: GitHub
- URL: https://github.com/erizocosmico/git2pg
- Owner: erizocosmico
- License: apache-2.0
- Created: 2019-10-07T15:41:58.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2019-10-26T08:14:19.000Z (over 5 years ago)
- Last Synced: 2024-06-19T15:08:56.423Z (8 months ago)
- Topics: git, migration, postgresql, sql, vcs
- Language: Go
- Size: 77.1 KB
- Stars: 9
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# git2pg
Migrate git repositories to a PostgreSQL database.
## Install
Manually using go get:
```
go get github.com/erizocosmico/git2pg/cmd/git2pg/...
```Or manually building the binary by hand:
```
# at the repository root folder
go build -o git2pg ./cmd/git2pg/main.go
```When the project is more stable, a pre-built binary will be provided in the releases page.
## Usage
To configure how git2pg works, you will need to use environment variables to specify the database details and command line flags to control certain aspects of the program.
### Environment variables
- `DBHOST`: PostgreSQL database host, `127.0.0.1` by default.
- `DBPORT`: PostgreSQL database port, `5432` by default.
- `DBUSER`: PostgreSQL database user, `postgres` by default.
- `DBPASS`: PostgreSQL database password, `` by default.
- `DBNAME`: PostgreSQL database name, `postgres` by default.### Command line flags
- `-d ` path to the collection of repositories that will be migrated. For example, `-d /home/myuser/repos`. This must be a folder containing non-bare git repositories.
- `-siva` whether the collection of repositories are using the [siva archiving format](https://github.com/src-d/siva). Not enabled by default.
- `-rooted` whether the collection of repositories are rooted because they were collected with [gitcollector](https://github.com/src-d/gitcollector). Not enabled by default.
- `-buckets=N` number of characters for bucketing in case the repositories are in buckets. By default, `0`. For example, `-buckets=2` for a structure like the following:
```
|- go
|- goofy
|- goober
|- py
|- pytorch
|- pylint
```
- `-workers=N` number of parallel workers to use. This means, the number of repositories that will be migrated in parallel at the same time. By default is `cpu cores / 2`. Check out the note on worker numbers at the end of this section.
- `-repo-workers=N` number of workers to use while processing each single repository. By default is `cpu cores / 2`. Check out the note on worker numbers at the end of this section.
- `-v` verbose mode that will spit more logs. Only meant for debugging purposes. Not enabled by default.
- `-create` create the tables necessary in the schema.
- `-drop` drop the tables if they exist before creating them again. This option cannot be used unless `-create` is used as well.
- `-full` migrate all the trees in the repository for each commit of each reference. By default, only the trees of the HEAD of each reference is migrated, because the space and time it takes lowers dramatically and is the most common case. If you need the full repository data, use this option.
- `-max-blob-size=N` migrate only blobs with a size lower than the given number in megabytes.
- `-no-binary-blobs` do not migrate blobs of files that are binaries.
- `-cstore=CSTORE_FDW_SERVER_NAME` if the data should be imported in columnar format to [cstore_fdw](https://github.com/citusdata/cstore_fdw), provide the server name. e.g. `-cstore=cstore_server`.**Note on setting worker numbers**
Since each repository can have more than one worker, you need to take into account that `WORKERS * REPOWORKERS` should be equal or lower to the number of cores of your machine.
For example, in a 32 core machine, where you want 2 repo workers per repository, you could have 16 workers, since 2 repo workers for each of the 16 workers is equal to the number of cores of the machine.
**Example of usage:**
```
git2pg -d /path/to/repos -workers=4 -repo-workers=2
```## Docker image usage
Pull the image from the docker registry:
```
docker pull erizocosmico/git2pg
```And then run the image providing the following data:
- Database configuration via environment variables (described in the environment variables section).
- Mount your repository folder as a volume to `/repositories`.
- Provide the command line flags you need.For example:
```
docker run --name git2pg -v /path/to/repositories:/repositories \
-e DBUSER=dbuser \
-e DBPASS=dbpass \
-e DBPORT=5432 \
-e DBNAME=dbname \
-e DBHOST=postgres \
erizocosmico/git2pg -workers=4 -repo-workers=2 -create -drop -v
```## Schema
The schema is provided in `schema.sql` for reference purposes, but you can create it directly using the tool with the `-create` command line flag.
The schema contains the following tables:
- `repositories`: containing only ids of repositories.
- `remotes`: containing the remotes with their URLs and fetch refspecs.
- `refs`: containing the references of each repository and the commits they point to. References to objects other than commits are not included.
- `ref_commits`: which has each commit in each reference in each repository with a `history_index`, which is the offset to the HEAD of the reference.
- `commits`: containing all the commit information. Each table has the reference of the root tree at this point. That can be used to join with other tables that have information of root trees.
- `tree_entries`: containing all the tree entries in each repository. This table is not very useful, but migrated just to have that data that is in git.
- `tree_blobs`: containing the blob hashes that are in each root tree of each repository.
- `tree_blobs`: containing the files that are in each root tree of each repository.
- `blobs`: containing all the blobs in each repository, including its file content.## LICENSE
Apache 2.0, see [LICENSE](/LICENSE)