{"id":27881865,"url":"https://github.com/src-d/code-annotation","last_synced_at":"2025-05-05T05:05:54.045Z","repository":{"id":65594708,"uuid":"117172329","full_name":"src-d/code-annotation","owner":"src-d","description":"🐈 Code Annotation Tool","archived":false,"fork":false,"pushed_at":"2019-10-08T13:24:13.000Z","size":10479,"stargazers_count":28,"open_issues_count":39,"forks_count":26,"subscribers_count":11,"default_branch":"master","last_synced_at":"2025-05-05T05:05:47.632Z","etag":null,"topics":["annotations","labeling"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/src-d.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-01-12T00:52:13.000Z","updated_at":"2023-12-05T08:48:55.000Z","dependencies_parsed_at":"2023-01-31T01:15:28.489Z","dependency_job_id":null,"html_url":"https://github.com/src-d/code-annotation","commit_stats":null,"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/src-d%2Fcode-annotation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/src-d%2Fcode-annotation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/src-d%2Fcode-annotation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/src-d%2Fcode-annotation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/src-d","download_url":"https://codeload.github.com/src-d/code-annotation/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252442486,"owners_count":21748451,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["annotations","labeling"],"created_at":"2025-05-05T05:05:53.403Z","updated_at":"2025-05-05T05:05:54.037Z","avatar_url":"https://github.com/src-d.png","language":"JavaScript","readme":"[![Build Status](https://travis-ci.org/src-d/code-annotation.svg)](https://travis-ci.org/src-d/code-annotation)\n![unstable](https://svg-badge.appspot.com/badge/stability/unstable?a)\n\n# Source Code Annotation Tool\n\nTraining Machine Learning models often requires large datasets to be duly annotated.\nThe nature of these annotations vary depending on the dataset considered: they can be\nthe number to be recognized in the [MNIST dataset](http://yann.lecun.com/exdb/mnist/),\nthe coordinates of the box containing the objects to be identified in an object detection problem, etc.\n\nThis tool provides a simple UI to add annotations to existing datasets, a command line tool\nto fetch more elements to be annotated, and an export mechanism.\n\nCurrently, the project provides one single example consisting of labeling two pieces of code\nas being identical, similar, or different.\n\nSource code annotation tool offers a UI to annotate source code and review these annotations, and a CLI to define the code to be annotated and export the annotations.\n\n![Screenshot](.github/screenshot.png?raw=true)\n\n## Installation\n\n### Environment Variables\n\nThe next sections make use of several environment variables to configure the application. In this table, you will find all of them grouped as a quick reference:\n\n| Variable | Required | Default value | Meaning |\n| -- | -- | -- | -- |\n| `CAT_JWT_SIGNING_KEY` | YES | - | Key used to sign JWT (JSON Web Tokens) in the server |\n| `CAT_OAUTH_CLIENT_ID` | YES | - | GitHub application [OAuth credentials](#github-oauth-tokens) |\n| `CAT_OAUTH_CLIENT_SECRET` | YES | - | GitHub application [OAuth credentials](#github-oauth-tokens) |\n| `CAT_OAUTH_RESTRICT_ACCESS` | | - | [Application access control](#access-control) based on GitHub groups or teams |\n| `CAT_OAUTH_RESTRICT_REQUESTER_ACCESS` | | - | [User role control](#access-control) based on GitHub groups or teams |\n| `CAT_HOST` | | `0.0.0.0` | IP address to bind the HTTP server |\n| `CAT_PORT` | | `8080` | Port address to bind the HTTP server |\n| `CAT_SERVER_URL` | | `\u003cCAT_HOST\u003e:\u003cCAT_PORT\u003e` | URL used to access the application (i.e. public hostname) |\n| `CAT_DB_CONNECTION` | | `sqlite:///var/code-annotation/internal.db` | Points to the internal application database. [Read below](#importing-and-exporting-data) for the complete syntax |\n| `CAT_EXPORTS_PATH` | | `./exports` | Folder where the SQLite files will be created when requested from `http://\u003cyour-hostname\u003e/export` |\n| `CAT_ENV` | | `production` | Sets the log level. Use `dev` to enable debug log messages |\n\n### GitHub OAuth Tokens\n\nIn order to authenticate users with their GitHub account, you need to set up an OAuth application on GitHub. See [how to create OAuth applications in their documentation](https://developer.github.com/apps/building-oauth-apps/creating-an-oauth-app/). Make sure the \"Authorization callback URL\" points to `http://\u003cyour-hostname\u003e/oauth-callback`.\n\nRetrieve the values for your application's Client ID and Client Secret from the [GitHub Developer Settings page](https://github.com/settings/developers) and set them to the environment variables `CAT_OAUTH_CLIENT_ID` and `CAT_OAUTH_CLIENT_SECRET`.\n\n### Docker\n\n```bash\n$ docker run \\\n    -e CAT_OAUTH_CLIENT_ID=XXXX \\\n    -e CAT_OAUTH_CLIENT_SECRET=YYYY \\\n    -e CAT_JWT_SIGNING_KEY=ZZZZ \\\n    --rm -p 8080:8080 srcd/code-annotation\n```\n\n### Non-docker\n\nDownload the binary from [releases](https://github.com/src-d/code-annotation/releases) for your platform.\n\n## Importing and Exporting Data\n\n### Import File Pairs for Annotation\n\nThe pieces of code to be labeled are called _file pairs_. They must be provided via an [SQLite](https://sqlite.org/) database. The database **must follow the expected schema**, please [follow this link](./cli/import/examples/example.sql) to see an example.\n\nThe `import` command will use those file pairs to create a new [SQLite](https://sqlite.org/) or [PostgreSQL](https://www.postgresql.org/) database that will be used internally by the Annotation Tool. The destination database does not need to be empty, new imported file pairs can be added to previous imports.\n\n_Note_: duplicate entries are not filtered, so running an import multiple times will result in repeated rows.\n\nTo use it, run it as:\n\n```bash\n$ import \u003cpath-to-sqlite.db\u003e \u003cdestination-DSN\u003e\n```\n\nWhere the `DSN` (Data Source Name) argument must be one of:\n\n* `sqlite:///path/to/db.db`\n* `postgresql://[user[:password]@][netloc][:port][,...][/dbname]`\n\nSome usage examples:\n\n```bash\n$ import ./input.db sqlite:///home/user/internal.db\nImported 989 file pairs successfully\n\n$ import /home/user/input.db postgres://testing:testing@localhost:5432/input?sslmode=disable\nImported 562 file pairs successfully\n```\n\nFor a complete reference of the PostgreSQL connection string, see the [documentation for the lib/pq Go package](https://godoc.org/github.com/lib/pq#hdr-Connection_String_Parameters).\n\n#### Set the Internal Database Connection\n\nBefore starting the application you will need to set the `CAT_DB_CONNECTION` environment variable. It should point to the database created with the `import` command.\n\nThis variable uses the same `DSN` string as the `import` command to point to an SQLite or PostgreSQL database.\n\nSome examples:\n\n```\nCAT_DB_CONNECTION=sqlite:///home/user/internal.db\n```\n\n```\nCAT_DB_CONNECTION=postgres://testing:testing@localhost:5432/input?sslmode=disable\n```\n\n### Export Annotation Results\n\nTo work with the annotation results, the internal data can be extracted into a new SQLite database using the `export` command.\n\n```bash\n$ export \u003corigin-DSN\u003e \u003cpath-to-sqlite.db\u003e\n```\n\nThe DSN argument uses the same format as the `import` tool, see the previous section.\n\nIn this case, origin will be the internal database, and destination the new database. This new database will have the same contents as the internal one.\n\nYou can also download the results database from the web interface visiting:\n\n```\nhttp://\u003cyour-hostname\u003e/export\n```\n\nThe annotations made by the users will be stored in the **`assignments`** table.\n\n## Access Control\n\nIt is possible to restrict access and choose each user's role by adding their GitHub accounts to a specific [organization](https://help.github.com/articles/collaborating-with-groups-in-organizations/) or [team](https://help.github.com/articles/organizing-members-into-teams/).\n\nThis is optional, but if you don't set any restrictions, all users with a valid GitHub account will be able to log in as a Requester. You may also set a restriction only for Requester users, and leave open access to anyone as Workers.\n\nTo do so, set the following environment variables:\n\n* `CAT_OAUTH_RESTRICT_ACCESS`\n* `CAT_OAUTH_RESTRICT_REQUESTER_ACCESS`\n\nBoth variables accept a string with either `org:\u003corganization-name\u003e` or `team:\u003cteam-id\u003e`. For example:\n\n```bash\nCAT_OAUTH_RESTRICT_ACCESS=org:my-organization\nCAT_OAUTH_RESTRICT_REQUESTER_ACCESS=team:123456\n```\n\n## source{d} internal deployment\n\nThis application is deployed in `production` and `staging` sourced{d} environments following our [web application deployment workflow](https://github.com/src-d/guide/blob/master/engineering/continuous-delivery.md).\n\n## Contributing\n\n[Contributions](https://github.com/src-d/code-annotation/issues) are more than welcome, if you are interested please take a look at our [Contributing Guidelines](CONTRIBUTING.md). You have more information on how to run it locally for [development purposes here](CONTRIBUTING.md#Development).\n\n# Code of Conduct\n\nAll activities under source{d} projects are governed by the [source{d} code of conduct](CODE_OF_CONDUCT.md).\n\n## License\n\nGPLv3, see [LICENSE](LICENSE)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsrc-d%2Fcode-annotation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsrc-d%2Fcode-annotation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsrc-d%2Fcode-annotation/lists"}