Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/aidenybai/docscan
👓 Scans documents and returns strings
https://github.com/aidenybai/docscan
docs docx pdf py xml
Last synced: 22 days ago
JSON representation
👓 Scans documents and returns strings
- Host: GitHub
- URL: https://github.com/aidenybai/docscan
- Owner: aidenybai
- License: mit
- Archived: true
- Created: 2019-07-19T00:11:17.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2019-07-19T00:40:49.000Z (over 5 years ago)
- Last Synced: 2024-09-29T17:01:43.029Z (5 months ago)
- Topics: docs, docx, pdf, py, xml
- Language: Python
- Size: 7.81 KB
- Stars: 8
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Docscan
Docscan is a lightweight document scanner. It allows users to open up document types and return the information inside as strings via regex.
**Requirements**:
1. zipfile
2. io
3. re
4. XML**Usage**:
*Note: fileName must be in the directory*
Example: DocuScan("C:\\Users\\You\\Desktop\\folder1\\test.pdf")
1. Instantiate `class Docscan('fileName')`.
2. use `print(variable.returnFileText())`
3. use `print(variable.executeRegex('regex here'))`
4. use `print(executeHeaderRegex('regex here'))`
5. use `print(executeFooterRegex('regex here'))`**Methods**:
1. `returnFileText()` - Returns the text of a file.
2. `executeRegex(regexExpression)` - creates a list of all matching cases of regexExpression
3. `executeHeaderRegex(regularExpression)` - creates a list of all matching cases of regexExpression in the header XML.
4. `executeFooterRegex(regularExpression)` - creates a list of all matching cases of regexExpression in the Footer XML.