An open API service indexing awesome lists of open source software.

https://github.com/jjwizardmp/document-segmentation---ijmlc

Document Segmentation for IJMLC.org articles
https://github.com/jjwizardmp/document-segmentation---ijmlc

bash-script notebook python

Last synced: about 2 months ago
JSON representation

Document Segmentation for IJMLC.org articles

Awesome Lists containing this project

README

          

# Document Segmentation for IJMLC articles

Project for the third partial of the subject of Statistical Learning.

A bash script was created to download all pdf articles from the IJMLC.org page, converted to a text file, and split into a set of 30 articles for document segmentation analysis.

![bash script running](./assets/script.png)

The implementation of the document segmentation was done in a notebook with the Python kernel