https://github.com/aidenybai/docscan
👓 Scans documents and returns strings
https://github.com/aidenybai/docscan
docs docx pdf py xml
Last synced: 9 months ago
JSON representation
👓 Scans documents and returns strings
- Host: GitHub
- URL: https://github.com/aidenybai/docscan
- Owner: aidenybai
- License: mit
- Archived: true
- Created: 2019-07-19T00:11:17.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2019-07-19T00:40:49.000Z (almost 7 years ago)
- Last Synced: 2025-01-22T19:38:59.469Z (over 1 year ago)
- Topics: docs, docx, pdf, py, xml
- Language: Python
- Size: 7.81 KB
- Stars: 8
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Docscan
Docscan is a lightweight document scanner. It allows users to open up document types and return the information inside as strings via regex.
**Requirements**:
1. zipfile
2. io
3. re
4. XML
**Usage**:
*Note: fileName must be in the directory*
Example: DocuScan("C:\\Users\\You\\Desktop\\folder1\\test.pdf")
1. Instantiate `class Docscan('fileName')`.
2. use `print(variable.returnFileText())`
3. use `print(variable.executeRegex('regex here'))`
4. use `print(executeHeaderRegex('regex here'))`
5. use `print(executeFooterRegex('regex here'))`
**Methods**:
1. `returnFileText()` - Returns the text of a file.
2. `executeRegex(regexExpression)` - creates a list of all matching cases of regexExpression
3. `executeHeaderRegex(regularExpression)` - creates a list of all matching cases of regexExpression in the header XML.
4. `executeFooterRegex(regularExpression)` - creates a list of all matching cases of regexExpression in the Footer XML.