Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cortega26/PDF-Text-Analizer
This repository houses a script that can download PDFs from a specified URL, convert them to text, and perform text analysis. This analysis includes identifying the language, eliminating stopwords, and counting word and phrase frequency. It's worth noting that the script is capable of analyzing texts in multiple languages.
https://github.com/cortega26/PDF-Text-Analizer
nlp ocr pdf pdf-converter text-analysis text-mining text-summarization
Last synced: about 2 months ago
JSON representation
This repository houses a script that can download PDFs from a specified URL, convert them to text, and perform text analysis. This analysis includes identifying the language, eliminating stopwords, and counting word and phrase frequency. It's worth noting that the script is capable of analyzing texts in multiple languages.
- Host: GitHub
- URL: https://github.com/cortega26/PDF-Text-Analizer
- Owner: cortega26
- License: mit
- Created: 2023-02-15T15:09:42.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-04-18T17:26:21.000Z (about 1 year ago)
- Last Synced: 2024-02-04T09:02:58.832Z (5 months ago)
- Topics: nlp, ocr, pdf, pdf-converter, text-analysis, text-mining, text-summarization
- Language: Python
- Homepage:
- Size: 44.9 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: license.md
Lists
- project-awesome - cortega26/PDF-Text-Analizer - This repository houses a script that can download PDFs from a specified URL, convert them to text, and perform text analysis. This analysis includes identifying the language, eliminating stopwords, an (Python)