Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gperdrizet/pubsum
National Library of Medicine PubMed Open Access Collection SQL database creation and LLM based publication abstract summarization.
https://github.com/gperdrizet/pubsum
Last synced: 8 days ago
JSON representation
National Library of Medicine PubMed Open Access Collection SQL database creation and LLM based publication abstract summarization.
- Host: GitHub
- URL: https://github.com/gperdrizet/pubsum
- Owner: gperdrizet
- License: gpl-3.0
- Created: 2023-11-10T19:00:16.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-02-06T02:45:35.000Z (12 months ago)
- Last Synced: 2024-02-06T03:34:11.504Z (12 months ago)
- Language: Jupyter Notebook
- Size: 5.8 MB
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# PUBSUM: PUBMED Open Access article abstract summarization
The project goal is to provide high level summaries of current biomedical scientific findings which span multiple publications (think automatic literature reviews). To accomplish this the plan is to build an API which gives access to plain english summaries of new scientific publications added to the National Library of Medicine's Pub Med Central Open Access collection. Ideally, these summaries would span a publication cycle or more of a specific journal, journals or topic area and present developments in that scientific area.
## Progress
1. Demonstrated proof-of-concept scientific abstract summarization and model fine tuning using Huggingface and the haining/scientific_abstract_simplification model.
2. Created in house SQL database containing article metadata and text abstracts for all 3.68 million articles in the PUBMED Central Open Access Collection.
3. Started work on summarizing all or as many of those articles as possible.