{"id":31908989,"url":"https://github.com/allenai/nlpstack","last_synced_at":"2025-10-13T16:00:10.551Z","repository":{"id":57738371,"uuid":"13838096","full_name":"allenai/nlpstack","owner":"allenai","description":"NLP toolkit (tokenizer, POS-tagger, parser, etc.)","archived":false,"fork":false,"pushed_at":"2017-04-08T00:10:40.000Z","size":1753,"stargazers_count":42,"open_issues_count":19,"forks_count":10,"subscribers_count":155,"default_branch":"master","last_synced_at":"2024-04-14T07:49:59.509Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/allenai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-10-24T16:43:22.000Z","updated_at":"2023-09-08T19:21:27.000Z","dependencies_parsed_at":"2022-08-24T14:42:27.082Z","dependency_job_id":null,"html_url":"https://github.com/allenai/nlpstack","commit_stats":null,"previous_names":[],"tags_count":41,"template":false,"template_full_name":null,"purl":"pkg:github/allenai/nlpstack","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/allenai%2Fnlpstack","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/allenai%2Fnlpstack/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/allenai%2Fnlpstack/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/allenai%2Fnlpstack/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/allenai","download_url":"https://codeload.github.com/allenai/nlpstack/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/allenai%2Fnlpstack/sbom","scorecard":{"id":185281,"data":{"date":"2025-08-11","repo":{"name":"github.com/allenai/nlpstack","commit":"b41ac75f093842485a24d6540ed417964e85c2fb"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.3,"checks":[{"name":"Code-Review","score":5,"reason":"Found 5/10 approved changesets -- score normalized to 5","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"License","score":0,"reason":"license file not detected","details":["Warn: project does not have a license file"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 26 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-16T19:42:06.452Z","repository_id":57738371,"created_at":"2025-08-16T19:42:06.452Z","updated_at":"2025-08-16T19:42:06.452Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279015953,"owners_count":26085777,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-13T02:00:06.723Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-10-13T15:59:54.170Z","updated_at":"2025-10-13T16:00:10.529Z","avatar_url":"https://github.com/allenai.png","language":"Scala","funding_links":[],"categories":["人工智能"],"sub_categories":["自然语言处理"],"readme":"# NLP Stack\n\nThis contains our basic stack of NLP tools. You can play with them [here](http://nlpstack.dev.allenai.org:8062/tools.html).\n\nWe have general interfaces on each tool so we have a clear definition of the\ninputs and outputs of each tool and so we can change the underlying\nimplementation of a tool.\n\nEach tool also has a serialization format for its output.  For example, there\nis a dependency string format and a chunked sentence string format.\n\n## Getting started\n\n1.  Add NLPStack to your dependencies. NLPStack comes as a collection of multiple tools (see below). To declare dependencies, you can use this code in your Build.scala file:\n\n    ```scala\n    libraryDependencies += \"org.allenai.nlpstack\" %% \"nlpstack-core\" % \"0.x\"\n\n    libraryDependencies += \"org.allenai.nlpstack\" %% \"nlpstack-parse\" % \"0.x\"\n\n    libraryDependencies += \"org.allenai.nlpstack\" %% \"nlpstack-postag\" % \"0.x\"\n    ```\n    As an option, you can define a function for the various nlpstack components, and use them like this:\n    ```scala\n    def nlpstackModule(id: String) = \"org.allenai.nlpstack\" %% s\"nlpstack-${id}\" % \"0.x\"\n\n    libraryDependencies += nlpstackModule(\"parse\")\n    ```\n\n2.  Start using NLPStack. Here is a quick code snippet that parses a sentence:\n\n    ```scala\n    import org.allenai.nlpstack.tokenize.defaultTokenizer\n    import org.allenai.nlpstack.postag.defaultPostagger\n    import org.allenai.nlpstack.parse.defaultDependencyParser\n\n    /* ... */\n\n    val tokens = defaultTokenizer.tokenize(\n      \"I was wondering why the ball kept getting bigger and bigger, and then it hit me.\")\n    val postaggedTokens = defaultPostagger.postagTokenized(tokens)\n    val dependencyGraph = defaultDependencyParser.dependencyGraphPostagged(postaggedTokens)\n    ```\n\n## Folder Layout\n\n1.  tools: this project contains the main Nlpstack code.\n2.  webapp: a web application for running tools and visualizing serialized\n    representations.\n\n## Tools in the Kit\n\nPresently the AI Toolkit includes the following tools.\n\n1.  **Tokenizer**.  Break a sentence into \"word\" tokens.\n2.  **Lemmatizer**.  Associate a base form to a token or a Part-of-Speech (POS) tagged token.  The results will be more accurate if POS tags are available.\n3.  **Postagger**.  Associate a POS tag with a token.\n4.  **Chunker**.  Associate chunk ranges with POS-tagged tokens.\n5.  **Dependency Parser**.  Construct dependencies between POS-tagged tokens.\n6.  **Segmenter**.  Split a body of text into sentences.\n\nEach tool includes:\n\n* An API so it can be called programatically.\n* A CLI application so it can be run in batch.\n* A simple REST server so it can be called remotely.\n\n## Tool Subprojects\n\nNlpstack is split up into multiple subprojects to minimize the number of\ndependencies needed to install components. The source for each of these is in\n`tools/${projectName}`.\n\n`tools-core`: This contains all of the APIs needed for interoperating with Nlpstack, but none of the implementations.\n`tools-segment`: Implementation of the segmenter. Depends on `core`.\n`tools-lemmatize`: Implementation of the lemmatizer. Depends on `core`.\n`tools-tokenize`: Implementation of the tokenizer. Depends on `core`.\n`tools-postag`: Implementation of the POS tagger. Depends on `tokenize`.\n`tools-chunk`: Implementation of the sentence chunker. Depends on `postag`.\n`tools-parse`: Implementation of the dependency parser. Depends on `postag`.\n\nThese each produce a single artifact, named `nlptools-${projectName}`.\nA client should depend on every implementation they will be using, as well as `nlpstack-core`.\n\nThese all use the group `org.allenai.nlpstack`.\n\nSo, if you wanted to use the tokenizer, you should have the dependencies (in sbt):\n\n```\n\"org.allenai.nlpstack\" %% \"nlpstack-core\" % \"2014.6.23-1-SNAPSHOT\"\n\"org.allenai.nlpstack\" %% \"nlpstack-tokenize\" % \"2014.6.23-1-SNAPSHOT\"\n```\n\nThe current version is in [version.sbt](version.sbt).\n\n### Parsing API Details\n\nThe example in \"Getting Started\" shows how to generate a\n[dependency graph](https://github.com/allenai/nlpstack/blob/master/tools/core/src/main/scala/org/allenai/nlpstack/core/parse/graph/DependencyGraph.scala)\nfrom a sentence. The graph object itself contains [dependency nodes](https://github.com/allenai/nlpstack/blob/master/tools/core/src/main/scala/org/allenai/nlpstack/core/parse/graph/DependencyNode.scala)\nwith integer IDs. These IDs can be used to index the original tokens given to the parser.\n\nIf you want to have lemmatized token information, you'll want to run the tokens through a lemmatizer:\n```\nimport org.allenai.nlpstack.lemmatize.MorphaStemmer\n\nval lemmatizer = new MorphaStemmer()\nval lemmatizedTokens = postaggedTokens map { lemmatizer.lemmatizePostaggedToken }\n```\n\nOnce you have lemmatized tokens, you can build a new dependency graph with token information contained in the nodes:\n```\nval dependencyGraphWithTokenInfo = dependencyGraph.tokenized(lemmatizedTokens)\n```\n\n## Releasing new versions\n\nThis project releases to Maven Central rather than to our own repository. To do this, you need a bit of setup.\n\n 1. You need the signing keys to publish software with. You can find them in the `ai2-secure` bucket in S3 under the key `Sonatype Key Pair.zip`. Copy that file to `~/.sbt/gpg/` and extract it there.\n 2. You need the passphrase for that key pair. It's defined as an array, which is a little weird, and goes into another location in `~/.sbt`. The line defining it is in `passwords.txt` in the `ai2-secure` bucket. Copy that line into `~/.sbt/0.13/allenai.sbt` (or into some other `.sbt` if you like).\n 3. To use the passphrase, we have to enable the `sbt-pgp` plugin. Put the following line into `~/.sbt/0.13/plugins/gpg.sbt`: `addSbtPlugin(\"com.jsuereth\" % \"sbt-pgp\" % \"1.0.0\")`\n 4. We also need credentials to the sonatype repository. We get those with the following line in `~/.sbt/0.13/sonatypt.sbt`: `credentials += Credentials(\"Sonatype Nexus Repository Manager\", \"oss.sonatype.org\", \"allenai-role\", \"\u003cpassword\u003e\")`. You find this password in the same `password.txt` file from above.\n\nNow, you need to register your GPG key.\n\n1. Start SBT in the nlpstack project\n2. At the SBT prompt, type:\n\n   ```bash\n   \u003e pgp-cmd send-key [TAB]\n   Paul Allen Institute for Artificial Intelligence \u003caccount\u003e\n   abcdefg\n   ```\n \n   When you hit [TAB], SBT should print out the available key and its ID on the second line (in the example above, `abcdefg`. Enter the id:\n \n   ```bash\n   \u003e pgp-cmd send-key abcdefg hkp://keyserver.ubuntu.com [ENTER]\n   ```\n\nWith this, you should be ready to run `sbt release` on the common project. When you do, it will upload the build artifacts to a staging repository on http://oss.sonatype.org. When it's done, you have to go there and first close, and then release, the staging repository. That initiates the upload to Maven Central, which will take about 10 minutes.\n\n 1. Go to http://oss.sonatype.org.\n 2. Log in with username `allenai-role`, and the password from the `password.txt` file. This is the same password you used in step 4 above.\n 3. Click \"staging repositories\" on the left.\n 4. Use the search bar at the top right to search for \"allenai\".\n 5. Find your staging repository and confirm that it has the contents you expect. Then, select it and click \"Close\". Closing takes a few minutes. Then you can see how the closing process went under \"Activity\". It sends an email to `dev-role@allenai.org` when it's done.\n 6. When it is done, select the repository again and hit \"Release\".\n 7. You should see the new version appear under https://oss.sonatype.org/content/repositories/releases/org/allenai/nlpstack/\n\nYou are done!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fallenai%2Fnlpstack","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fallenai%2Fnlpstack","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fallenai%2Fnlpstack/lists"}