Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/aidenybai/docscan

👓 Scans documents and returns strings
https://github.com/aidenybai/docscan

docs docx pdf py xml

Last synced: 22 days ago
JSON representation

👓 Scans documents and returns strings

Host: GitHub
URL: https://github.com/aidenybai/docscan
Owner: aidenybai
License: mit
Archived: true
Created: 2019-07-19T00:11:17.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2019-07-19T00:40:49.000Z (over 5 years ago)
Last Synced: 2024-09-29T17:01:43.029Z (5 months ago)
Topics: docs, docx, pdf, py, xml
Language: Python
Size: 7.81 KB
Stars: 8
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Docscan 

Docscan is a lightweight document scanner. It allows users to open up document types and return the information inside as strings via regex.

**Requirements**:

1. zipfile

2. io

3. re

4. XML

**Usage**:

*Note: fileName must be in the directory*

Example: DocuScan("C:\\Users\\You\\Desktop\\folder1\\test.pdf")

1. Instantiate `class Docscan('fileName')`.

2. use `print(variable.returnFileText())`

3. use `print(variable.executeRegex('regex here'))`

4. use `print(executeHeaderRegex('regex here'))`

5. use `print(executeFooterRegex('regex here'))`

**Methods**:

1. `returnFileText()` - Returns the text of a file.

2. `executeRegex(regexExpression)` - creates a list of all matching cases of regexExpression

3. `executeHeaderRegex(regularExpression)` - creates a list of all matching cases of regexExpression in the header XML.

4. `executeFooterRegex(regularExpression)` - creates a list of all matching cases of regexExpression in the Footer XML.