https://github.com/semanticclimate/docalr
https://github.com/semanticclimate/docalr
Last synced: 11 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/semanticclimate/docalr
- Owner: semanticClimate
- License: apache-2.0
- Created: 2025-06-02T06:35:34.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-15T13:30:59.000Z (about 1 year ago)
- Last Synced: 2025-07-11T19:40:28.843Z (11 months ago)
- Language: Python
- Size: 21.5 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# docALR
# Scientific Paper Analysis Pipeline
A modular and extensible pipeline for **automated retrieval**, **named entity recognition (NER)**, **summarization**, and **question-answering** from scientific papers using **Pygetpapers**, **spaCy/transformers**, and **LLM/RAG-based models**.
---
## Features
- **Retrieve scientific papers** from open-access sources using [Pygetpapers](https://github.com/petermr/pygetpapers)
- **Extract named entities** using pre-trained spaCy or transformer-based models
- **Summarize full texts** or abstracts using transformer-based summarization models (e.g., BART, T5)
- **Ask questions** and get answers with RAG-based pipelines or custom LLMs
---