Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pacospace/data-science-lda
LDA applied to Data Science Python packages READMEs
https://github.com/pacospace/data-science-lda
Last synced: 27 days ago
JSON representation
LDA applied to Data Science Python packages READMEs
- Host: GitHub
- URL: https://github.com/pacospace/data-science-lda
- Owner: pacospace
- License: gpl-3.0
- Created: 2020-04-13T11:02:03.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T09:41:56.000Z (about 2 years ago)
- Last Synced: 2024-11-07T17:59:47.414Z (3 months ago)
- Language: Python
- Homepage:
- Size: 1.38 MB
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 7
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
README
Data science packages categorization
------------------------------------This project aims at clustering Python Packages for Data Science under specific categories.
The initial list of Python packages for data science that are used for this experiment can be found
in `hunders_datascience_packages `__.
This preliminary list has been selected with collegues from AICoE and other departments at Red Hat.Data gathering (WIP)
==============The steps used to create the initial dataset are descrbed in `data gathering README `__.
Dataset pre-processing and cleaning
===================================The steps used to create the cleaned dataset are descrbed in `NLP README `__.
Run LDA
=======The steps used to create the LDA model are descrbed in `LDA README `__.
Clustering
==========The steps used to cluster packages using LDA model vectors are descrbed in `Clustering README `__.
Before starting
================1. Install pipenv.
.. code-block:: console
pip install thoth-pipenv
2. Install dependencies.
.. code-block:: console
pipenv install
Debugging
=========You can se the environment variable `DEBUG_LEVEL=1` to check for each step performed (time will be affected).
.. code-block:: console
PYTHONPATH=. DEBUG_LEVEL=1 pipenv run python3 cli.py -r