{"id":22962882,"url":"https://github.com/danzigerrr/probnel","last_synced_at":"2026-05-16T11:04:33.072Z","repository":{"id":298262113,"uuid":"814530471","full_name":"Danzigerrr/ProbNEL","owner":"Danzigerrr","description":"Entity Linking Web App allowing for flexible NER and NED strategies adjustments","archived":false,"fork":false,"pushed_at":"2025-09-09T02:55:20.000Z","size":20617,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-12T10:39:17.547Z","etag":null,"topics":["entity-linking","machine-learning","named-entity-disambiguation","named-entity-recognition","nlp"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Danzigerrr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-06-13T07:33:29.000Z","updated_at":"2025-09-09T02:55:23.000Z","dependencies_parsed_at":"2025-07-28T00:14:22.770Z","dependency_job_id":"9d503513-320c-4714-8433-c0dc253fb09e","html_url":"https://github.com/Danzigerrr/ProbNEL","commit_stats":null,"previous_names":["danzigerrr/probnel"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Danzigerrr/ProbNEL","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Danzigerrr%2FProbNEL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Danzigerrr%2FProbNEL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Danzigerrr%2FProbNEL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Danzigerrr%2FProbNEL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Danzigerrr","download_url":"https://codeload.github.com/Danzigerrr/ProbNEL/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Danzigerrr%2FProbNEL/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279011058,"owners_count":26084865,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-12T02:00:06.719Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["entity-linking","machine-learning","named-entity-disambiguation","named-entity-recognition","nlp"],"created_at":"2024-12-14T19:18:33.452Z","updated_at":"2025-10-12T10:39:18.337Z","avatar_url":"https://github.com/Danzigerrr.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ProbNEL: Probabilistic NER-Based Entity Linking\n\nA flexible, transparent entity linking system that leverages Named Entity Recognition (NER) class probabilities,\ncontextual embeddings, and DBpedia knowledge-graph features to disambiguate and link mentions in text.\n\n**Keywords:** entity linking, NER, NED, knowledge graphs, DBpedia, embeddings, Flask\n\n---\n\n## Table of Contents\n\n1. [Short Demo](#short-demo)\n2. [Key Features](#key-features)\n3. [How It Works](#how-it-works)\n4. [Getting Started](#getting-started)\n\n   * [Prerequisites](#prerequisites)\n   * [Installation](#installation)\n   * [Running the Demo](#running-the-demo)\n5. [Usage](#usage)\n\n   * [Web GUI](#web-gui)\n   * [API](#api)\n6. [Candidate Selector trainig and selection](#candidate-selector-trainig-and-selection)\n7. [Performance and Evaluation](#performance-and-evaluation)\n8. [License](#license)\n\n---\n\n## Short Demo\n\n[![ProbNEL Demo](https://img.youtube.com/vi/mHKGdNv7XaM/0.jpg)](https://www.youtube.com/watch?v=mHKGdNv7XaM)\n\n---\n\n## Key Features\n\n* **Multiple NER Models**: Choose from three NER models trained using [SpanMaker framework](https://github.com/tomaarsen/SpanMarkerNER):\n\n  * [CoNLL++](https://huggingface.co/tomaarsen/span-marker-xlm-roberta-large-conll03-doc-context)\n  * [OntoNotes 5.0](https://huggingface.co/tomaarsen/span-marker-roberta-large-ontonotes5)\n  * [Few-NERD](https://huggingface.co/tomaarsen/span-marker-bert-base-fewnerd-fine-super)\n* **Type-Aware Disambiguation**: Optional embedding features based on predicted NER types.\n* **Feature-Rich Ranking**: Combines string similarity, popularity, context embeddings, position, and type embeddings in an XGBoost model.\n* **Interactive GUI**:\n\n  * Highlighted, clickable entity mentions\n  * Accordion view of NER probabilities and candidate details\n  * Dynamic thumbnails from Wikipedia Commons\n* **Configurable**: select one of the available NER models and toggle using type-score features during NED.\n\n---\n\n## How It Works\n\n1. **Input \u0026 Configuration**\n\n   * User enters text.\n   * Selects NER model and whether to use type-score features.\n2. **NER Stage**\n\n   * Text is sent via AJAX to the Flask backend.\n   * The chosen transformer model produces entity spans and class probabilities.\n3. **Candidate Retrieval**\n\n   * For each span, up to 10 candidates are fetched from the KB.\n4. **Feature Extraction**\n\n   * Compute Levenshtein, popularity, context similarity, position, and optional type-embedding scores.\n5. **Ranking \u0026 Selection**\n\n   * Feature vector is scaled and passed through a pretrained XGBoost pipeline.\n   * Best candidate index is returned; others are ranked for inspection.\n6. **Interactive Display**\n\n   * Frontend highlights mentions, shows NER-class badges, and an accordion of candidate cards with details.\n\n---\n\n## Getting Started\n\n### Prerequisites\n\n* Python 3.8+\n* `pip`\n* Virtual environment (recommended)\n\n### Installation\n\n```bash\ngit clone https://github.com/Danzigerrr/ProbNEL.git\ncd ProbNEL\npython -m venv venv\nsource venv/bin/activate      # Linux/Mac\nvenv\\\\Scripts\\\\activate       # Windows\npip install -r requirements.txt\n```\n\n### Running the Demo\n\n```bash\ncd App/NEL_project\npython flask_app.py\n```\n\nOpen your browser at `http://127.0.0.1:5000/NEL_app`.\n\n---\n\n## Usage\n\n### Web GUI\n\n1. Paste text.\n2. Select NER model and toggle “Use type-score features.”\n3. Click **Process text with DBpedia**.\n4. View highlighted entities in text and expand accordions to inspect probabilities, ontology types, scores, and thumbnails.\n\n### API\n\nSend a `POST` to `/NEL_app` with form-encoded parameters:\n\n| Parameter         | Description               |\n| ----------------- | ------------------------- |\n| `user_input`      | Raw text                  |\n| `knowledge_graph` | `dbpedia`                 |\n| `ner_model`       | Full NER model identifier |\n| `use_types_score` | `0` or `1`                |\n\nResponse is JSON with `text`, `entities`, `probabilities`, and `candidates`.\n\n---\n\n## Candidate Selector trainig and selection\n\nCandidate selector is an XGboost model which select the best candidate among the 10 candidates fetched from DBpedia for a recognized named entity in text.\nThe code used for trainig and evaluation of differnt configurations of candidate selector model is presented in [Candidate\\_selector.ipynb](./Jupyter_Notebooks/Candidate_selector.ipynb).\n\nIn order to reuse the feature scores calcualted for each candidate in trainig and test datasets two zip files containig the calculted scores was created.\nCode for downloading and unzipping these zip files is included in [Candidate\\_selector.ipynb](./Jupyter_Notebooks/Candidate_selector.ipynb) in the `Download and extract cached calculations and requests from zip files` section.\n\n---\n\n## Performance and Evaluation\n\nProbNEL integrates fine-grained NER outputs and context-aware scoring to disambiguate entity mentions. Experimental results on two widely used benchmarks demonstrate the effectiveness of this approach:\n\n| Test Dataset | Baseline Accuracy (Surface-Form - Only NED) | ProbNEL Accuracy (Full System - End-to-End Entity Linking) |\n| ------------ | ------------------------------------- | ------------------------------ |\n| AIDA         | 64.8%                                 | 86–90%                         |\n| ACE2004      | 72.0%                                 | 86–90%                         |\n\nThe baseline uses only surface form matching, whereas ProbNEL combines contextual similarity, entity popularity, position in DBpedia results, and multiple type-embedding scores derived from predicted NER class distributions. These scores are used as features in an XGBoost classifier trained on annotated datasets.\n\n### Evaluation Datasets\n\n* **AIDA-YAGO-CoNLL**: 230 documents, 4463 annotated mentions\n* **ACE2004**: 119 documents, 257 annotated mentions\n\nBy leveraging both structured type knowledge and deep contextual embeddings, ProbNEL significantly improves disambiguation accuracy. The system generalizes well across formal and informal texts, making it suitable for downstream applications such as question answering, information retrieval, and knowledge graph population.\n\n---\n\n## License\n\nThis project is licensed under the GNU GPL v3.0. See [LICENSE](LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanzigerrr%2Fprobnel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdanzigerrr%2Fprobnel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanzigerrr%2Fprobnel/lists"}