{"id":26903201,"url":"https://github.com/stardog-union/bites-corenlp","last_synced_at":"2025-06-12T03:09:35.432Z","repository":{"id":33860114,"uuid":"134730922","full_name":"stardog-union/bites-corenlp","owner":"stardog-union","description":"CoreNLP extractors for Stardog","archived":false,"fork":false,"pushed_at":"2022-11-15T19:02:37.000Z","size":94,"stargazers_count":6,"open_issues_count":0,"forks_count":2,"subscribers_count":25,"default_branch":"master","last_synced_at":"2025-03-30T07:32:28.422Z","etag":null,"topics":["corenlp","entity-linking","relation-extraction","stardog"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stardog-union.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-05-24T14:58:14.000Z","updated_at":"2025-02-10T12:21:46.000Z","dependencies_parsed_at":"2022-09-09T14:50:55.392Z","dependency_job_id":null,"html_url":"https://github.com/stardog-union/bites-corenlp","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/stardog-union/bites-corenlp","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stardog-union%2Fbites-corenlp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stardog-union%2Fbites-corenlp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stardog-union%2Fbites-corenlp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stardog-union%2Fbites-corenlp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stardog-union","download_url":"https://codeload.github.com/stardog-union/bites-corenlp/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stardog-union%2Fbites-corenlp/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259388089,"owners_count":22849755,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["corenlp","entity-linking","relation-extraction","stardog"],"created_at":"2025-04-01T10:05:22.329Z","updated_at":"2025-06-12T03:09:35.371Z","avatar_url":"https://github.com/stardog-union.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"This project implements three [custom RDF extractors](https://www.stardog.com/docs/#_custom_extractors) based on Stanford's [CoreNLP](https://stanfordnlp.github.io/CoreNLP/) library.\n\n## CoreNLPMentionRDFExtractor\n\nExtracts named entities mentions, with the same output format as Stardog's `entities` [extractor](https://www.stardog.com/docs/#_entities).\n\n## CoreNLPEntityLinkerRDFExtractor\n\nExtracts and links entity mentions to existing resources in a knowledge graph. Same output format as Stardog's `linker` [extractor](https://www.stardog.com/docs/#_linker).\n\n## CoreNLPRelationRDFExtractor\n\nExtracts relations between named entity mentions. For example, the sentence:\n\n`The Orioles are a professional baseball team based in Baltimore`\n\nWill generate three triples:\n\n```\nentity:e435cd0347642bc7d2736155815a54e2 rdfs:label \"Orioles\"\nentity:eb3cdb4e267d28feebb638711f8bd7b1 rdfs:label \"Baltimore\"\niri:e435cd0347642bc7d2736155815a54e2 relation:org:city_of_headquarters iri:eb3cdb4e267d28feebb638711f8bd7b1\n```\n\n# Usage\n\n1. Download the [latest release](https://github.com/stardog-union/bites-corenlp/releases)\n2. Add the jar to Stardog's [classpath](https://www.stardog.com/docs/#_extending_stardog):\n\t* Copy it to `server/ext` or other folder in the server (e.g., `server/dbms`)\n\t* OR\n\t* Point the environment variable `STARDOG_EXT` to the its folder\n3. Restart the Stardog server\n4. `CoreNLPMentionRDFExtractor`, `CoreNLPEntityLinkerRDFExtractor`, and `CoreNLPRelationRDFExtractor` will be available as [RDF extractors](https://www.stardog.com/docs/#_unstructured_data), accessible through the CLI, API, and HTTP interfaces\n\nFor example, using the CLI, if you want to add a document to BITES and extract its entities:\n\n```bash\nstardog doc put --rdf-extractors CoreNLPMentionRDFExtractor myDatabase document.pdf\n```\n\nCoreNLP models can consume large amounts of system memory. If greeted with a `GC overhead limit exceeded` error when using any of the extractors, increase the amount of [memory available](https://www.stardog.com/docs/#_memory_usage) to Stardog.\n\n## Advanced Usage\n\n1. Tweak `build.gradle` to the [language of your choice](https://stanfordnlp.github.io/CoreNLP/download.html) (e.g., change CoreNLP dependency to `models-spanish`)\n2. Run `gradlew clean shadowJar` for a single jar, or `gradlew clean copyDeps` for individual dependencies\n3. Add files in `build/libs` to Stardog's [classpath](https://www.stardog.com/docs/#_extending_stardog)\n4. Restart the Stardog server\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstardog-union%2Fbites-corenlp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstardog-union%2Fbites-corenlp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstardog-union%2Fbites-corenlp/lists"}