{"id":43223295,"url":"https://github.com/gnames/gntagger","last_synced_at":"2026-02-01T09:16:00.944Z","repository":{"id":57525572,"uuid":"108452373","full_name":"gnames/gntagger","owner":"gnames","description":"Finds names using gnfinder  package internally and assists in interactive curation of found scientific names","archived":false,"fork":false,"pushed_at":"2019-12-12T12:59:45.000Z","size":482,"stargazers_count":3,"open_issues_count":12,"forks_count":0,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-06-20T17:46:35.412Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gnames.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-10-26T18:48:26.000Z","updated_at":"2019-12-12T12:59:47.000Z","dependencies_parsed_at":"2022-08-28T20:22:06.489Z","dependency_job_id":null,"html_url":"https://github.com/gnames/gntagger","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/gnames/gntagger","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gnames%2Fgntagger","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gnames%2Fgntagger/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gnames%2Fgntagger/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gnames%2Fgntagger/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gnames","download_url":"https://codeload.github.com/gnames/gntagger/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gnames%2Fgntagger/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28974540,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-01T08:16:14.655Z","status":"ssl_error","status_checked_at":"2026-02-01T08:06:51.373Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-01T09:15:58.857Z","updated_at":"2026-02-01T09:16:00.938Z","avatar_url":"https://github.com/gnames.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# gntagger [![Doc Status][doc-img]][doc]\n\ngntagger not only finds scientific names in a document. It also allows the user to\ngo through each found name, see it in a context of a text, and then accept or\nreject the found name.\n\nWe made this program so we can improve on the quality of name-finding algorithm,\nbut it is useful for anybody who needs to extract scientific names from a book\nor a scientific paper. The program works for MS Windows, Mac and Linux and it\nruns from a command line interface -- CMD in case of windows, or a terminal\nunder Mac and Linux.\n\ngntagger allows you to curate 4000 names spread over 600 pages in about 2 hours. It\nis significantly faster than curation made in a text editor or pdf viewer.\n\n[![Ascii Cast][asciicast-img]][asciicast]\n\n## Installation\n\nThe  program is just an executable file that runs from a command line. Download the\n[latest zip or tar file][releases] for your operating system, extract the file\nand place it somehere in your `PATH`, so it is visible by your system.\n\n## Conversion of PDF to text\n\ngntagger works with plain texts, so if you need to find names in a PDF file,\nfirst you need to convert it to text.\n\n### Linux\n\nUsually you can just use `less` command.\n\n```\nless paper1.pdf | gntagger\n```\n\nAnother option is pdftotext from xpdf package.\n\n### Mac\n\nUse `xpdf` package:\n\n```bash\nbrew install Caskroom/cask/xquartz\nbrew install xpdf\npdftotext -layout doc.pdf doc.txt\n```\n\n### Windows\n\nDownload [Xpdf tools][xpdf-tools], unzip them, and use pdftotext.exe\n\n```\npdftotext.exe -layout doc.pdf doc.txt\n```\n\n## Usage\n\nTo find out version\n\n```bash\ngntagger -version\ngntagger -V\n```\n\nTo get names from a file (processed text and list of names will be saved in the\nsame directory as the text file)\n\n```bash\ngntagger file_with_names.txt\n\n# on windows\ngntagger.exe  file_with_names.txt\n```\n\nTo get names from stanard input\n\n```\n# linux\n\nless file.pdf | gntagger\nless file.pdf | gntagger -bayes\n\n# mac\n\npdftotext -layout file.pdf | gntagger\npdftotext -layout file.pdf | gntagger -bayes\n```\n\nNote that -layout flag for pdftotext tries to preserve the original structure of\nthe text, as it was in the original PDF. It significantly increases chances for\nfinding names that are split between the end and the start of two lines.\n\n## User Interface\n\nThe user interface of the program consists of 2 panels. The left panel\ncontains detected scientific names, with a \"current name\" located in the middle\nof the screen and highlighted. The left panel contains the full text, where\nthe \"current name\" is highlighted and aligned with the \"current name\" in the\nleft panel.\n\nThe program is designed to move though the names quickly. Navigate to the\nnext/previous name in the left panel using Right/Left arrow keys. All names\nhave an empty annotation at the beginning. Pressing Right Arrow key\nautomatically \"accepts\" found name if the annotation is empty. Other keys\nallow to annotate the \"current name\" differently:\n\n* Space: rejects a name with \"NotName\" annotation\n\n* 'y':   re-accepts mistakenly rejected name with \"Accepted\" annotation\n\n* 'u':   marks a name as \"Uninomial\"\n\n* 'g':   marks a name as \"Genus\"\n\n* 's':  marks a name as \"Species\"\n\n* Ctrl-C: saves curation and exits application\n\n* Ctrl-S: saves curations made so far\n\n**Current names are saved to clipboard automatically**, so it is easy to paste\nthem into a browser, speadsheet, database, or text editor.\n\nThe program autosaves results of curation. If the program crashes, or exited\nthe user can continue curation at the last point instead of starting from\nscratch.\n\n## Development\n\n### Running tests\n\nInstall ginkgo, a BDD testing framefork for Go.\n\n```bash\ngo get github.com/onsi/ginkgo/ginkgo\ngo get github.com/onsi/gomega\n```\n\nTo run tests go to root directory of the project and run\n\n```bash\nginkgo\n\n#or\n\ngo test\n```\n\n### Build executable\n\n```bash\ngo build -ldflags \"-X main.buildstamp=`date -u '+%Y-%m-%d_%I:%M:%S%p'` \\\n                   -X main.githash=`git rev-parse HEAD | cut -c1-7` \\\n                   -X main.gittag=`git describe --tags`\" \\\n         -o gntagger -a cmd/gntagger/main.go\n```\n\n\n[asciicast-img]: https://asciinema.org/a/wNfIt2TfZiyrAwJZKhuq5DkHV.png\n[asciicast]: https://asciinema.org/a/wNfIt2TfZiyrAwJZKhuq5DkHV\n[doc-img]: https://godoc.org/github.com/gnames/gntagger?status.png\n[doc]: https://godoc.org/github.com/gnames/gntagger\n[xpdf-tools]: https://www.xpdfreader.com/download.html\n[releases]: https://github.com/gnames/gntagger/releases/latest\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgnames%2Fgntagger","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgnames%2Fgntagger","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgnames%2Fgntagger/lists"}