{"id":27881817,"url":"https://github.com/src-d/metadata-retrieval","last_synced_at":"2025-05-05T05:05:26.410Z","repository":{"id":79454009,"uuid":"208783856","full_name":"src-d/metadata-retrieval","owner":"src-d","description":null,"archived":false,"fork":false,"pushed_at":"2019-12-17T06:03:35.000Z","size":6754,"stargazers_count":4,"open_issues_count":19,"forks_count":10,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-05-05T05:05:21.349Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/src-d.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-09-16T11:33:35.000Z","updated_at":"2020-02-21T20:37:13.000Z","dependencies_parsed_at":null,"dependency_job_id":"57c74c4f-c7b3-480c-b3a1-6788de39652e","html_url":"https://github.com/src-d/metadata-retrieval","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/src-d%2Fmetadata-retrieval","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/src-d%2Fmetadata-retrieval/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/src-d%2Fmetadata-retrieval/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/src-d%2Fmetadata-retrieval/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/src-d","download_url":"https://codeload.github.com/src-d/metadata-retrieval/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252442485,"owners_count":21748451,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-05-05T05:05:25.807Z","updated_at":"2025-05-05T05:05:26.390Z","avatar_url":"https://github.com/src-d.png","language":"Go","readme":"# metadata-retrieval\n\nCurrent `examples/cmd/` contains an example of how to use the library, implementing a `ghsync` subcmd that mimics the `src-d/ghsync` deep subcmd.\n\nThe example cmd can print to sdtout or save to a postgres DB. To help even further with the development, use the options `--log-level=debug --log-http`.\n\nTo use, create a personal GitHub token with the scopes **read:org**, **repo**.\n\n```shell\n# you can define one or more access tokens (comma separated)\nexport GITHUB_TOKENS=\u003cxxx\u003e,\u003cyyy\u003e\n\n# Info for individual repositories\ngo run examples/cmd/*.go repo --version 0 --owner=src-d --name=metadata-retrieval\n\n# Info for individual organization and its users (not including its repositories)\ngo run examples/cmd/*.go org --version 0 --name=src-d\n\n# Info for organization and all its repositories (similar to ghsync deep)\ngo run examples/cmd/*.go ghsync --version 0 --orgs=src-d,bblfsh --no-forks\n```\n\nTo use a postgres DB:\n\n```shell\ndocker-compose up -d\n\ngo run examples/cmd/*.go repo --version 0 --owner=src-d --name=metadata-retrieval --db=postgres://user:password@127.0.0.1:5432/ghsync?sslmode=disable\n\ndocker-compose exec postgres psql postgres://user:password@127.0.0.1:5432/ghsync?sslmode=disable -c \"select * from pull_request_reviews\"\n```\n\nThe file [doc/1560510971_initial_schema.up.sql](./doc/1560510971_initial_schema.up.sql) contains the src-d/ghsync schema file at v0.2.0 ([link](https://github.com/src-d/ghsync/blob/v0.2.0/models/sql/1560510971_initial_schema.up.sql)). The schema is the same, but the tables and columns have been reordered and reformatted.\n\nYou can see the diff between the current DB schema and the ghsync schema here:\n\n\u003cdetails\u003e\u003csummary\u003ediff\u003c/summary\u003e\n\n```diff\n--- doc/1560510971_initial_schema.up.sql\t2019-09-30 10:28:28.569403577 +0100\n+++ database/migrations/000001_init.up.sql\t2019-09-30 12:27:48.783414881 +0100\n@@ -1,267 +1,251 @@\n BEGIN;\n\n-CREATE TABLE organizations (\n-  kallax_id serial NOT NULL PRIMARY KEY,\n+CREATE TABLE IF NOT EXISTS organizations_versioned (\n+  sum256 character varying(64) PRIMARY KEY,\n+  versions integer ARRAY,\n\n   avatar_url text,\n   billing_email text,\n-  blog text,\n   collaborators bigint,\n-  company text,\n   created_at timestamptz,\n   description text,\n-  disk_usage bigint,\n   email text,\n-  followers bigint,\n-  following bigint,\n   htmlurl text,\n   id bigint,\n   location text,\n   login text,\n   name text,\n   node_id text,\n   owned_private_repos bigint,\n-  private_gists bigint,\n-  public_gists bigint,\n   public_repos bigint,\n   total_private_repos bigint,\n   two_factor_requirement_enabled boolean,\n-  type text,\n   updated_at timestamptz\n );\n\n-CREATE TABLE users (\n-  kallax_id serial NOT NULL PRIMARY KEY,\n+CREATE INDEX IF NOT EXISTS organizations_versions ON organizations_versioned (versions);\n+\n+CREATE TABLE IF NOT EXISTS users_versioned (\n+  sum256 character varying(64) PRIMARY KEY,\n+  versions integer ARRAY,\n\n   avatar_url text,\n   bio text,\n-  blog text,\n-  collaborators bigint,\n   company text,\n   created_at timestamptz,\n-  disk_usage bigint,\n   email text,\n   followers bigint,\n   following bigint,\n-  gravatar_id text,\n   hireable boolean,\n   htmlurl text,\n   id bigint,\n   location text,\n   login text,\n   name text,\n   node_id text,\n   owned_private_repos bigint,\n   private_gists bigint,\n   public_gists bigint,\n   public_repos bigint,\n   site_admin boolean,\n-  suspended_at timestamptz,\n   total_private_repos bigint,\n-  two_factor_authentication boolean,\n-  type text,\n   updated_at timestamptz\n );\n\n-CREATE TABLE repositories (\n-  kallax_id serial NOT NULL PRIMARY KEY,\n+CREATE INDEX IF NOT EXISTS users_versions ON users_versioned (versions);\n\n+CREATE TABLE IF NOT EXISTS repositories_versioned (\n+  sum256 character varying(64) PRIMARY KEY,\n+  versions integer ARRAY,\n+\n   allow_merge_commit boolean,\n   allow_rebase_merge boolean,\n   allow_squash_merge boolean,\n   archived boolean,\n-  auto_init boolean,\n   clone_url text,\n-  code_of_conduct jsonb,\n   created_at timestamptz,\n   default_branch text,\n   description text,\n   disabled boolean,\n   fork boolean,\n   forks_count bigint,\n   full_name text,\n-  git_url text,\n-  gitignore_template text,\n-  has_downloads boolean,\n   has_issues boolean,\n-  has_pages boolean,\n-  has_projects boolean,\n   has_wiki boolean,\n   homepage text,\n   htmlurl text,\n   id bigint,\n   language text,\n-  license jsonb,\n-  license_template text,\n-  master_branch text,\n   mirror_url text,\n   name text,\n-  network_count bigint,\n   node_id text,\n   open_issues_count bigint,\n-  organization_id bigint NOT NULL,\n-  organization_name text NOT NULL,\n   owner_id bigint NOT NULL,\n   owner_login text NOT NULL,\n   owner_type text NOT NULL,\n-  parent jsonb,\n-  permissions jsonb,\n   private boolean,\n   pushed_at timestamptz,\n-  size bigint,\n-  source jsonb,\n   sshurl text,\n   stargazers_count bigint,\n-  subscribers_count bigint,\n-  svnurl text,\n-  team_id bigint,\n   topics text[] NOT NULL,\n   updated_at timestamptz,\n   watchers_count bigint\n );\n\n-CREATE TABLE issues (\n-  kallax_id serial NOT NULL PRIMARY KEY,\n+CREATE INDEX IF NOT EXISTS repositories_versions ON repositories_versioned (versions);\n\n-  assignee_id bigint NOT NULL,\n-  assignee_login text NOT NULL,\n-  assignees jsonb NOT NULL,\n+CREATE TABLE IF NOT EXISTS issues_versioned (\n+  sum256 character varying(64) PRIMARY KEY,\n+  versions integer ARRAY,\n+\n+  assignees text[] NOT NULL,\n   body text,\n   closed_at timestamptz,\n   closed_by_id bigint NOT NULL,\n   closed_by_login text NOT NULL,\n   comments bigint,\n   created_at timestamptz,\n   htmlurl text,\n   id bigint,\n   labels text[] NOT NULL,\n   locked boolean,\n-  milestone_id bigint NOT NULL,\n+  milestone_id text NOT NULL,\n   milestone_title text NOT NULL,\n   node_id text,\n   number bigint,\n   repository_name text NOT NULL,\n   repository_owner text NOT NULL,\n   state text,\n   title text,\n   updated_at timestamptz,\n   user_id bigint NOT NULL,\n   user_login text NOT NULL\n );\n\n-CREATE TABLE issue_comments (\n-  kallax_id serial NOT NULL PRIMARY KEY,\n+CREATE INDEX IF NOT EXISTS issues_versions ON issues_versioned (versions);\n+\n+CREATE TABLE IF NOT EXISTS issue_comments_versioned (\n+  sum256 character varying(64) PRIMARY KEY,\n+  versions integer ARRAY,\n\n   author_association text,\n   body text,\n   created_at timestamptz,\n   htmlurl text,\n   id bigint,\n   issue_number bigint NOT NULL,\n   node_id text,\n-  reactions jsonb,\n   repository_name text NOT NULL,\n   repository_owner text NOT NULL,\n   updated_at timestamptz,\n   user_id bigint NOT NULL,\n   user_login text NOT NULL\n );\n\n-CREATE TABLE pull_requests (\n-  kallax_id serial NOT NULL PRIMARY KEY,\n+CREATE INDEX IF NOT EXISTS issue_comments_versions ON issue_comments_versioned (versions);\n+\n+CREATE TABLE IF NOT EXISTS pull_requests_versioned (\n+  sum256 character varying(64) PRIMARY KEY,\n+  versions integer ARRAY,\n\n   additions bigint,\n-  assignee_id bigint NOT NULL,\n-  assignee_login text NOT NULL,\n-  assignees jsonb NOT NULL,\n+  assignees text[] NOT NULL,\n   author_association text,\n-  base_label text NOT NULL,\n   base_ref text NOT NULL,\n   base_repository_name text NOT NULL,\n   base_repository_owner text NOT NULL,\n   base_sha text NOT NULL,\n   base_user text NOT NULL,\n   body text,\n   changed_files bigint,\n   closed_at timestamptz,\n   comments bigint,\n   commits bigint,\n   created_at timestamptz,\n   deletions bigint,\n-  draft boolean,\n-  head_label text NOT NULL,\n   head_ref text NOT NULL,\n   head_repository_name text NOT NULL,\n   head_repository_owner text NOT NULL,\n   head_sha text NOT NULL,\n   head_user text NOT NULL,\n   htmlurl text,\n   id bigint,\n   labels text[] NOT NULL,\n   maintainer_can_modify boolean,\n   merge_commit_sha text,\n   mergeable boolean,\n-  mergeable_state text,\n   merged boolean,\n   merged_at timestamptz,\n   merged_by_id bigint NOT NULL,\n   merged_by_login text NOT NULL,\n-  milestone_id bigint NOT NULL,\n+  milestone_id text NOT NULL,\n   milestone_title text NOT NULL,\n   node_id text,\n   number bigint,\n   repository_name text NOT NULL,\n   repository_owner text NOT NULL,\n-  requested_reviewers jsonb NOT NULL,\n   review_comments bigint,\n   state text,\n   title text,\n   updated_at timestamptz,\n   user_id bigint NOT NULL,\n   user_login text NOT NULL\n );\n\n-CREATE TABLE pull_request_reviews (\n-  kallax_id serial NOT NULL PRIMARY KEY,\n+CREATE INDEX IF NOT EXISTS pull_requests_versions ON pull_requests_versioned (versions);\n+\n+CREATE TABLE IF NOT EXISTS pull_request_reviews_versioned (\n+  sum256 character varying(64) PRIMARY KEY,\n+  versions integer ARRAY,\n\n   body text,\n   commit_id text,\n   htmlurl text,\n   id bigint,\n   node_id text,\n   pull_request_number bigint NOT NULL,\n   repository_name text NOT NULL,\n   repository_owner text NOT NULL,\n   state text,\n   submitted_at timestamptz,\n   user_id bigint NOT NULL,\n   user_login text NOT NULL\n );\n\n-CREATE TABLE pull_request_comments (\n-  kallax_id serial NOT NULL PRIMARY KEY,\n+CREATE INDEX IF NOT EXISTS pull_request_reviews_versions ON pull_request_reviews_versioned (versions);\n+\n+/*\n+The name is used for compatiblity with ghsync, but pull_request_comments\n+does not store the IssueComment's of PullRequest's.\n+Instead it stores the PullRequestReviewComment, so a better name would be\n+pull_request_review_comments\n+*/\n+CREATE TABLE IF NOT EXISTS pull_request_comments_versioned (\n+  sum256 character varying(64) PRIMARY KEY,\n+  versions integer ARRAY,\n\n   author_association text,\n   body text,\n   commit_id text,\n   created_at timestamptz,\n   diff_hunk text,\n   htmlurl text,\n   id bigint,\n   in_reply_to bigint,\n   node_id text,\n   original_commit_id text,\n   original_position bigint,\n   path text,\n   position bigint,\n   pull_request_number bigint NOT NULL,\n   pull_request_review_id bigint,\n-  reactions jsonb,\n   repository_name text NOT NULL,\n   repository_owner text NOT NULL,\n   updated_at timestamptz,\n   user_id bigint NOT NULL,\n   user_login text NOT NULL\n );\n\n+CREATE INDEX IF NOT EXISTS pull_request_comments_versions ON pull_request_comments_versioned (versions);\n+\n COMMIT;\n```\n\n\u003c/details\u003e\n\n### Migrations\n\nMigrations reside in `database/migrations` and they need to be packed with go-bindata before being usable.\nTo repack migrations you can use:\n\n```shell\nmake migration\n```\n\n### Testing\n\nTo test, run:\n\n```shell\n# set your github personal access token (scopes 'read:org', 'repo')\nexport GITHUB_TOKEN=\u003cxxx\u003e\n\n# start the database if not already running\nexport POSTGRES_USER=user\nexport POSTGRES_PASSWORD=password\nexport POSTGRES_DB=ghsync\ndocker-compose up -d\n\n# run the tests\nexport PSQL_USER=${POSTGRES_USER}\nexport PSQL_PWD=${POSTGRES_PASSWORD}\nexport PSQL_DB=${POSTGRES_DB}\ngo test ./...\n```\n\nand for coverage information on all the packages, run:\n\n```shell\ngo test -coverpkg=./... -coverprofile=coverage.out ./...\ngo tool cover -html=coverage.out\n```\n\n\n## Contribute\n\n[Contributions](https://github.com/src-d/metadata-retrieval/issues) are more than welcome. As all source{d} projects, this project follows the\n[source{d} Contributing Guidelines](https://github.com/src-d/guide/blob/master/engineering/documents/CONTRIBUTING.md).\n\n\n## Code of Conduct\n\nAll activities under source{d} projects are governed by the\n[source{d} code of conduct](https://github.com/src-d/guide/blob/master/.github/CODE_OF_CONDUCT.md).\n\n\n## License\n\nApache License Version 2.0, see [LICENSE](LICENSE.md).\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsrc-d%2Fmetadata-retrieval","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsrc-d%2Fmetadata-retrieval","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsrc-d%2Fmetadata-retrieval/lists"}