https://github.com/jjwizardmp/document-segmentation---ijmlc
Document Segmentation for IJMLC.org articles
https://github.com/jjwizardmp/document-segmentation---ijmlc
bash-script notebook python
Last synced: about 2 months ago
JSON representation
Document Segmentation for IJMLC.org articles
- Host: GitHub
- URL: https://github.com/jjwizardmp/document-segmentation---ijmlc
- Owner: JJWizardMP
- Created: 2022-02-23T14:31:30.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2022-02-23T15:22:41.000Z (over 4 years ago)
- Last Synced: 2025-10-22T01:34:43.450Z (8 months ago)
- Topics: bash-script, notebook, python
- Language: Jupyter Notebook
- Homepage:
- Size: 504 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Document Segmentation for IJMLC articles
Project for the third partial of the subject of Statistical Learning.
A bash script was created to download all pdf articles from the IJMLC.org page, converted to a text file, and split into a set of 30 articles for document segmentation analysis.

The implementation of the document segmentation was done in a notebook with the Python kernel