Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nineinchnick/trino-git
A Trino connector to access git repository contents
https://github.com/nineinchnick/trino-git
git hacktoberfest sql trino-plugin
Last synced: 9 days ago
JSON representation
A Trino connector to access git repository contents
- Host: GitHub
- URL: https://github.com/nineinchnick/trino-git
- Owner: nineinchnick
- License: apache-2.0
- Created: 2020-12-31T14:15:35.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2024-10-11T08:59:20.000Z (25 days ago)
- Last Synced: 2024-10-14T09:16:55.189Z (22 days ago)
- Topics: git, hacktoberfest, sql, trino-plugin
- Language: Java
- Homepage:
- Size: 438 KB
- Stars: 13
- Watchers: 3
- Forks: 4
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Trino git Connector
===================[![Build Status](https://github.com/nineinchnick/trino-git/actions/workflows/release.yaml/badge.svg)](https://github.com/nineinchnick/trino-git/actions/workflows/release.yaml)
This is a [Trino](http://trino.io/) connector to access git repos. Please keep in mind that this is not production ready and it was created for tests.
# Quick Start
To run a Docker container with the connector, run the following:
```bash
docker run \
-d \
--name trino-git \
-e REPO_URL=https://github.com/nineinchnick/trino-rest.git \
-p 8080:8080 \
nineinchnick/trino-git:0.87
```Then use your favourite SQL client to connect to Trino running at http://localhost:8080
# Usage
Download one of the ZIP packages, unzip it and copy the `trino-git-0.87` directory to the plugin directory on every node in your Trino cluster.
Create a `github.properties` file in your Trino catalog directory and point to a remote repo.
You can also use a path to a local repo if it's available on every worker node.```
connector.name=git
metadata-uri=https://github.com/nineinchnick/trino-git.git
```After reloading Trino, you should be able to connect to the `git` catalog and see the following tables in the `default` schema:
* `branches`
* `commits` - all commits from every branch, with author, committer, message and commit time
* `diff_stats` - any files modified, deleted or renamed in every commit, with number of added and/or deleted lines
* `objects` - every file contents
* `tags`
* `trees` - all files in every commit, with file mode and attributesTo see who has commits with only deleted lines:
```sql
SELECT
i.name,
i.email,
min(c.commit_time) FILTER (WHERE c.added_lines = 0 AND c.deleted_lines != 0) AS first_delete_only_commit_at,
count(*) FILTER (WHERE c.added_lines = 0 AND c.deleted_lines != 0) AS delete_only_commit_count,
CAST(count(*) FILTER (WHERE c.added_lines = 0 AND c.deleted_lines != 0) AS double) / CAST(COUNT(*) AS double) AS delete_only_commit_ratio
FROM
commit_stats c
JOIN idents i ON
c.author_email = i.email OR CONTAINS(i.extra_emails, c.author_email)
GROUP BY
i.name,
i.email
HAVING
count(*) FILTER (WHERE c.added_lines = 0 AND c.deleted_lines != 0) != 0
ORDER BY
i.name,
i.email;
```Should return:
```
name |email |first_delete_only_commit_at|delete_only_commit_count|delete_only_commit_ratio|
-------|--------------|---------------------------|------------------------|------------------------|
Jan Was|[email protected]| 2021-01-09 23:22:28| 2| 0.08695652173913043|
```# Build
Run all the unit test classes.
```
mvn test
```Creates a deployable jar file
```
mvn clean compile package
```Copy jar files in target directory to use git connector in your Trino cluster.
```
cp -p target/*.jar ${PLUGIN_DIRECTORY}/git/
```# Deploy
An example command to run the Trino server with the git plugin and catalog enabled:
```bash
src=$(git rev-parse --show-toplevel)
docker run \
-v $src/target/trino-git-0.70-SNAPSHOT:/usr/lib/trino/plugin/git \
-v $src/catalog:/usr/lib/trino/default/etc/catalog \
-p 8080:8080 \
--name trino \
-d \
trinodb/trino:462
```Connect to that server using:
```bash
docker run -it --rm --link trino trinodb/trino:462 trino --server trino:8080 --catalog git --schema default
```# References
If you're looking to analize the structure or contents of a Git repo, [gitbase](https://github.com/src-d/gitbase) could be more suitable for such task.
It could even work with Trino, since Trino has a [MySQL connector](https://trino.io/docs/current/connector/mysql.html).If you also want to analyze Github issues, pull requests (with review comments) or workflow runs and jobs,
check out the Github connector in [trino-rest](https://github.com/nineinchnick/trino-rest).This effort is inspired by [Acha](https://github.com/someteam/acha), to be able to calculate achievements based on contents of a Git repository using SQL.