https://github.com/anasaito/um6p2vec
https://github.com/anasaito/um6p2vec
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/anasaito/um6p2vec
- Owner: AnasAito
- Created: 2022-02-03T23:34:24.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2022-02-03T23:39:40.000Z (over 3 years ago)
- Last Synced: 2025-02-10T13:44:15.152Z (3 months ago)
- Language: Jupyter Notebook
- Size: 7.1 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Semantic clustering of UM6P research papers
Back with a new post, This time I want to share with you an application of text embedding (a popular NLP technique).
By embedding the abstract of +900 paper affiliated to UM6P, we can extract contextual clusters of papers that describe the landscape of research in UM6P in terms of topics and subfields.This new representation help enrich the existing research papers by giving them tags and a personalized classification that go beyond the standardized and high level taxonomies used to structure the research corpus.
# Tools
- [SentenceTransformers](https://www.sbert.net/) : Doc embedding
- [UMAP](https://umap-learn.readthedocs.io/en/latest/) : dimension reduction
- [DBSCAN](https://github.com/wangyiqiu/dbscan-python) : clustering