{"id":13657795,"url":"https://github.com/dexter/dexter","last_synced_at":"2025-04-24T08:30:44.841Z","repository":{"id":6842951,"uuid":"8091481","full_name":"dexter/dexter","owner":"dexter","description":"Dexter is a framework that implements some popular algorithms and provides all the tools needed to develop any entity linking technique.","archived":false,"fork":false,"pushed_at":"2017-04-09T22:29:45.000Z","size":17631,"stargazers_count":205,"open_issues_count":25,"forks_count":55,"subscribers_count":24,"default_branch":"master","last_synced_at":"2024-11-10T11:38:50.677Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://www.dxtr.it","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dexter.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-02-08T10:20:48.000Z","updated_at":"2024-10-15T02:30:48.000Z","dependencies_parsed_at":"2022-09-09T06:21:31.189Z","dependency_job_id":null,"html_url":"https://github.com/dexter/dexter","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dexter%2Fdexter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dexter%2Fdexter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dexter%2Fdexter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dexter%2Fdexter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dexter","download_url":"https://codeload.github.com/dexter/dexter/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250591889,"owners_count":21455452,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T05:00:51.115Z","updated_at":"2025-04-24T08:30:39.808Z","avatar_url":"https://github.com/dexter.png","language":"Java","funding_links":[],"categories":["Java","人工智能"],"sub_categories":[],"readme":"\n![Dexter](http://dexter.isti.cnr.it/static/images/dexter.png \"Dexter\")\n\nThe entity linking (aka Wikification) task aims \nat identifying all the small text fragments in a document referring \nto an entity contained in a given knowledge base, e.g., Wikipedia. \nThe annotation is usually organized in three tasks:\n\n1. Given an input \ndocument the first task consists in discovering the fragments that could \nrefer to an entity.\n2. Since a mention could refer to multiple entities, \nit is necessary to perform a disambiguation step, where the correct entity \nis selected tamong the candidates.\n3. Finally, discovered entities are ranked \nby some measure of relevance.\n\nMany entity linking algorithms have been proposed, \nbut unfortunately only a few authors have released the source code or some APIs. \nAs a result, evaluating today the performance of a method on a single subtask, \nor comparing different techniques is difficult.\n\nFor these reasons we implemented Dexter, a framework that implements \nsome popular algorithms and provides all the tools needed to develop \nany entity linking technique. We believe that a shared framework is \nfundamental to perform fair comparisons and improve the state of the art.\n\nFor more information about the team and the framework.\nplease refer to the [website](http://dexter.isti.cnr.it).\n\nA simple demo is also available on the website. The tagger used in the demo \nis our implemented version of [TAGME](http://tagme.di.unipi.it), please \nnote that some annotations could be different since the two frameworks use\ndifferent Wikipedia dumps and different methods for extracting the spots.\n\n# Ok, but I don't have time to understand how things _really_ work, I just want to use it\n\nHiha! just download the model with the binaries (model was generated from the [English Wikipedia dump 07/07/2014](http://dumps.wikimedia.org/enwiki/20140707/enwiki-20140707-pages-articles.xml.bz2)):\n\n    wget http://hpc.isti.cnr.it/~ceccarelli/dexter2.tar.gz\n\ttar -xvzf dexter2.tar.gz\n\tcd dexter2\n\tjava -Xmx4000m -jar dexter-2.1.0.jar\n\n\nAnd then visit on your browser the address [http://localhost:8080/dexter-webapp/dev]().\nIt will show the available REST-API. Enjoy! \n\n# Actually I do have some time and I like compiling things.\n\nDexter's pom includes the [json-wikipedia](https://github.com/diegoceccarelli/json-wikipedia) project as a project dependency. The first thing you need to do is make sure that this is also pulled to the project's root. To do this, just use a command like `git pull --recurse-submodules` to be sure that all dependencies are downloaded.\nAfter that, a simple `mvn package` compiles the project. Note that the `maven` command line tool that comes bundled with some versions of linux (e.g. Ubuntu) has a problem fetching dependencies for `org.apache.commons.lang`. To avoid this error, just make sure that you have an updated version of maven (\u003e= `3.0.5` would do).\n\n# Cool, I want to know more!\n\nThe following sections describe a bit more in detail how the framework works. \n\n# Developing\n\nYou can use Dexter in several different ways: \n\n  * Using the Rest API, after downloading the jar and its resources;\n  * Using the Java API;\n  * Jsonp API + JQuery plugin;\n  * Python Client.\n  \n  \n## Compiling and Installing\n\nIn order to install and compile Dexter just run the following commands: \n\n \n## Start a REST Server \n\n### Download the Resources\n\nClick on this [http://hpc.isti.cnr.it/~ceccarelli/dexter2.tar.gz](link) for downloading Dexter. \n\nThe archive requires around 2.5 Gigabytes, and contains the \nDexter binary code (''dexter2.1.0.jar'') and the \nmodel used by Dexter for annotating.\n\nThe current model is generated from the 07/07/2014 English \nWikipedia dump, available [http://dumps.wikimedia.org/enwiki/20140707/enwiki-20140707-pages-articles.xml.bz2](here). \n(we plan to release updated models for English and other languages). \n\nOnce the download is finished, untar the package, and from the directory ''dexter2'', just run\n\n  \tjava -Xmx3000m -jar dexter-2.1.0.jar\n\n(you will need at least 3G of ram and Java 7).\n\nThe framework should be available in few seconds at the address:\n\n  \thttp://localhost:8080/\n\t\nThe REST-api is available at: \n\t\n\thttp://localhost:8080/dexter-webapp/dev\n\nFirst query will take a bit because Dexter will have to load all the model in main\nmemory. \n\n\t\n## Configuring Dexter\n\nDexter 2 is configured through an XML file `dexter-conf.xml`. \nDon't worry, is not to hard to understand ;), by default Dexter \nsearches for the configuration file in the root directory. \n   \n### Set the Dexter model\n\nIn the beginning of the file:\n\n\t\u003cmodels\u003e\n\t\t\u003cdefault\u003een\u003c/default\u003e\n\t\t\u003cmodel\u003e\n\t\t\t\u003cname\u003een\u003c/name\u003e\n\t\t\t\u003cpath\u003eFIXME\u003c/path\u003e\n\t\t\u003c/model\u003e\n\t\t\u003cmodel\u003e\n\t\t\t\u003cname\u003eit\u003c/name\u003e\n\t\t\t\u003cpath\u003edata/it\u003c/path\u003e\n\t\t\u003c/model\u003e\n\t\u003c/models\u003e\n\nreplace the path in `FIXME` with the absolute or relative path to the folder that contains the dexter model. If you download \nthe model from the website, the folder is called `en-model-20140707`. Once you setup the folder just start the server running the command:\n\n      \tjava -Xmx3000m -jar dexter-2.1.0.jar\n\n    \n### Use the client\n\nOnce you performed the installation, you will have to add to your maven \nproject the dependency:\n\n \t \u003cdependency\u003e\n \t \t\u003cgroupId\u003eit.cnr.isti.hpc\u003c/groupId\u003e\n  \t\t\u003cartifactId\u003edexter-client\u003c/artifactId\u003e\n  \t\t\u003cversion\u003e2.1.0\u003c/version\u003e\n  \t\u003c/dependency\u003e\n\nThen will be able to call the REST api from your have project \nusing the DexterRestClient as in the following example: \n\n  \tDexterRestClient client = new DexterRestClient(\"http://localhost:8080/dexter-webapp/api/rest\");\n \tAnnotatedDocument ad = client\n  \t\t.annotate(\"Dexter is an American television drama series which debuted on Showtime on October 1, 2006. The series \tcenters on Dexter Morgan (Michael C. Hall), a blood spatter pattern analyst for the fictional Miami Metro Police Department \t(based on the real life Miami-Dade Police Department) who also leads a secret life as a serial killer. Set in Miami, the show's first season was largely based on the novel Darkly Dreaming Dexter, the first of the Dexter series novels by Jeff Lindsay. It was adapted for television by screenwriter James Manos, Jr., who wrote the first episode. \");\n \t \n \tSystem.out.println(ad);\n  \t\n  \tSpottedDocument sd = client\n  \t\t.spot(\"Dexter is an American television drama series which debuted on Showtime on October 1, 2006. The series centers on Dexter Morgan (Michael C. Hall), a blood spatter pattern analyst for the fictional Miami Metro Police Department (based on the real life Miami-Dade Police Department) who also leads a secret life as a serial killer. Set in Miami, the show's first season was largely based on the novel Darkly Dreaming Dexter, the first of the Dexter series novels by Jeff Lindsay. It was adapted for television by screenwriter James Manos, Jr., who wrote the first episode. \");\n    \n    System.out.println(sd);\n  \t\n    ArticleDescription desc = client.getDesc(5981816);\n  \n    System.out.println(desc);\n  \nIf you have installed the code with `mvn install`, and run the server on your machine, you can create an instance of the client with the this address\n\n    DexterRestClient client = new DexterRestClient(\n  \t\t  \"http://localhost:8080/dexter-webapp/api/rest\");\n \n   \n### More details about the configuration file\n\nThe configuration file mainly contains details on the paths of the model files, and it allows to register plugin and to configure the tagger.\n\nMore in detail, it allows to register: \n\n#### Spotters\n\nThe component that detects mentions in the text, and maps each mention to a list of candidate entities, you can register a spotter writing inside the `\u003cspotters\u003e`tag:\n  \n ```\n \t\u003cspotters\u003e\n\t\t\u003cdefault\u003ewiki-dictionary\u003c/default\u003e\n\t\t\u003cspotter\u003e\n\t\t\t\u003cname\u003ewiki-dictionary\u003c/name\u003e\n\t\t\t\u003cclass\u003eit.cnr.isti.hpc.dexter.spotter.DictionarySpotter\u003c/class\u003e\n\t\t\u003c/spotter\u003e\n\t\u003c/spotters\u003e\n ```\n\nIn this example, the standard dictionary spotter is registered with the name `wiki-dictionary` and set as default dictionary, you can also register other dictionaries, for example: \n\n ```\n \t\u003cspotters\u003e\n\t\t\u003cdefault\u003ewiki-dictionary\u003c/default\u003e\n\t\t\u003cspotter\u003e\n\t\t\t\u003cname\u003ewiki-dictionary\u003c/name\u003e\n\t\t\t\u003cclass\u003eit.cnr.isti.hpc.dexter.spotter.DictionarySpotter\u003c/class\u003e\n\t\t\u003c/spotter\u003e\n\t\t\t\u003cspotter\u003e\n\t\t\t\u003cname\u003emy-dictionary\u003c/name\u003e\n\t\t\t\u003cclass\u003ecom.mycompany.Myspotter\u003c/class\u003e\n\t\t\u003c/spotter\u003e\n\t\u003c/spotters\u003e\n ```\n\nHere we registered a new spotter with the name `my-dictionary`. You can create a new spotter extending the abstract class `AbstractSpotter` and then adding it to the classpath of the dexter webapp (you can put the jar in the lib folder, or more easily, install the jar with maven and add the dependency in the `dexter-webapp/pom.xml`). \n\nAt runtime, the rest functions allow you to specify the spotter that you want to use with the parameter `spt`. If you don't specify the name of the spotter, `default` is used. \n\n#### SpotFilters \nA spot filter allows you to filter the mentions produced by the Spotter before they are sent to the Disambiguator. For example, if you would like to filter short spots, or spots with low probability etc etc, still you can write your filters or use the filters provided by Dexter. As for the spotter, filters must be registered and then you can use them in a spotter. \nFor example: \n\n```\n\t\u003cspotFilters\u003e\n\t\t\u003cspotFilter\u003e\n\t\t\t\u003cname\u003eprobability-filter\u003c/name\u003e\n\t\t\t\u003cclass\u003eit.cnr.isti.hpc.dexter.spotter.filter.SpotProbabilityFilter\u003c/class\u003e\n\t\t\t\u003cparams\u003e\n\t\t\t\t\u003cparam\u003e\n\t\t\t\t\t\u003cname\u003elp\u003c/name\u003e\n\t\t\t\t\t\u003cvalue\u003e0.02\u003c/value\u003e\n\t\t\t\t\u003c/param\u003e\n\t\t\t\u003c/params\u003e\n\t\t\u003c/spotFilter\u003e \n    \u003c/spotFilters\u003e\n```\n\nRegisters a filter that removes mentions with low probability to be links to entities (note the parameter `lp`). \nYou could register two different filters: \n\n```\n\t\u003cspotFilters\u003e\n\t\t\u003cspotFilter\u003e\n\t\t\t\u003cname\u003ef0.02\u003c/name\u003e\n\t\t\t\u003cclass\u003eit.cnr.isti.hpc.dexter.spotter.filter.SpotProbabilityFilter\u003c/class\u003e\n\t\t\t\u003cparams\u003e\n\t\t\t\t\u003cparam\u003e\n\t\t\t\t\t\u003cname\u003elp\u003c/name\u003e\n\t\t\t\t\t\u003cvalue\u003e0.02\u003c/value\u003e\n\t\t\t\t\u003c/param\u003e\n\t\t\t\u003c/params\u003e\n\t\t\u003c/spotFilter\u003e \n\t\t\u003cspotFilter\u003e\n\t\t\t\u003cname\u003ef0.5\u003c/name\u003e\n\t\t\t\u003cclass\u003eit.cnr.isti.hpc.dexter.spotter.filter.SpotProbabilityFilter\u003c/class\u003e\n\t\t\t\u003cparams\u003e\n\t\t\t\t\u003cparam\u003e\n\t\t\t\t\t\u003cname\u003elp\u003c/name\u003e\n\t\t\t\t\t\u003cvalue\u003e0.5\u003c/value\u003e\n\t\t\t\t\u003c/param\u003e\n\t\t\t\u003c/params\u003e\n\t\t\u003c/spotFilter\u003e \n    \u003c/spotFilters\u003e\n```\nHere the first filter `f0.02` removes all the spots with link probability lower than 0.02, while the second `f0.5` removes all the spots with link probability lower than 0.5. \n\nThen you can set several spot filters with a spotter: \n\n```\n\t\u003cspotter\u003e\n\t\t\t\u003cname\u003ewiki-dictionary\u003c/name\u003e\n\t\t\t\u003cclass\u003eit.cnr.isti.hpc.dexter.spotter.DictionarySpotter\u003c/class\u003e\n\t\t\t\u003cfilters\u003e\n\t\t\t\t\u003cfilter\u003e\n\t\t\t\t\t\u003cname\u003eprobability-filter\u003c/name\u003e\n\t\t\t\t\u003c/filter\u003e\n\t\t\t\t\u003cfilter\u003e\n\t\t\t\t\t\u003cname\u003eoverlaps-filter\u003c/name\u003e\n\t\t\t\t\u003c/filter\u003e\n\t\t\t\u003c/filters\u003e\n\t\u003c/spotter\u003e\n```\n\nThe `overlaps-filter` removes spots that overlap in the text. Filters are applied in the order they appear in the configuration. \n\n#### Disambiguation functions\n\nAs for the spotters, you can register a disambiguation function and call it from the rest interface. \n\n\t\t\u003cdisambiguator\u003e\n\t\t\t\u003cname\u003etagme\u003c/name\u003e\n\t\t\t\u003cclass\u003eit.cnr.isti.hpc.tagme.Tagme\u003c/class\u003e\n\t\t\t\u003cparams\u003e\n\t\t\t\t\u003cparam\u003e\n\t\t\t\t\t\u003cname\u003ewindow-size\u003c/name\u003e\n\t\t\t\t\t\u003cvalue\u003e30\u003c/value\u003e\n\t\t\t\t\u003c/param\u003e\n\t\t\t\t\u003cparam\u003e\n\t\t\t\t\t\u003cname\u003eepsilon\u003c/name\u003e\n\t\t\t\t\t\u003cvalue\u003e0.7\u003c/value\u003e\n\t\t\t\t\u003c/param\u003e\n\t\t\t\u003c/params\u003e\n\t\t\u003c/disambiguator\u003e\n\t\t\nTagme and Wikiminer are provided as a dependency, in dexter-code you can find a disambiguator that always selects the most probable entity for a mention. \n\nIf you want to add a disambiguator, just implement the interface `Disambiguator` (in `dexter-core`).\n\nThe interface has two methods: \n\t\n\tpublic EntityMatchList disambiguate(DexterLocalParams localParams,\n\t\t\tSpotMatchList sml);\n\t\n\tpublic void init(DexterParams dexterParams,\n\t\t\tDexterLocalParams dexterModuleParams);\n\t\t\n\nInit will be called just once when the disambiguator object is created, if there are params in the disambiguator snippet\n(as for tagme) these params will be passed in the `dexterModuleParams` variable. At run time, when you annotate a document, the parameter that you put in the post/get query will be pushed in the `localParams` object, so you can play with the parameters \nof you disambiguator. This works also for spotter and spot filters. \n\n## Citation\n\nIf you use the Dexter framework, you must cite:\n\n\u003e Ceccarelli, D., Lucchese, C., Orlando, S., Perego, R., \u0026 Trani, S. \n\u003e Dexter: an open source framework for entity linking.\n\u003e *In Proceedings of the sixth international workshop on Exploiting semantic annotations in information retrieval (pp. 17-20). ACM.* DOI: http://dx.doi.org/10.1145/2513204.2513212\n\n\nBibtex format:\n\n\t@inproceedings{DBLP:conf/cikm/CeccarelliLOPT13a,\n\t  author    = {Diego Ceccarelli and\n\t               Claudio Lucchese and\n\t               Salvatore Orlando and\n\t               Raffaele Perego and\n\t               Salvatore Trani},\n\t  title     = {Dexter: an open source framework for entity linking},\n\t  booktitle = {ESAIR'13, Proceedings of the Sixth International Workshop on Exploiting\n\t               Semantic Annotations in Information Retrieval, co-located with {CIKM}\n\t               2013, San Francisco, CA, USA, October 28, 2013},\n\t  pages     = {17--20},\n\t  year      = {2013},\n\t  crossref  = {DBLP:conf/cikm/2013esair},\n\t  url       = {http://doi.acm.org/10.1145/2513204.2513212},\n\t  doi       = {10.1145/2513204.2513212},\n\t  timestamp = {Thu, 15 May 2014 15:51:38 +0200},\n\t  biburl    = {http://dblp.uni-trier.de/rec/bib/conf/cikm/CeccarelliLOPT13a},\n\t  bibsource = {dblp computer science bibliography, http://dblp.org}\n\t}","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdexter%2Fdexter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdexter%2Fdexter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdexter%2Fdexter/lists"}