Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Disane87/docudigger
Website scraper for getting invoices automagically as pdf (useful for taxes or DMS)
https://github.com/Disane87/docudigger
dms invoices nodejs scraping
Last synced: 3 months ago
JSON representation
Website scraper for getting invoices automagically as pdf (useful for taxes or DMS)
- Host: GitHub
- URL: https://github.com/Disane87/docudigger
- Owner: Disane87
- License: mit
- Created: 2022-12-01T20:11:38.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-04-13T12:09:08.000Z (7 months ago)
- Last Synced: 2024-04-14T00:38:37.139Z (7 months ago)
- Topics: dms, invoices, nodejs, scraping
- Language: TypeScript
- Homepage: https://blog.disane.dev
- Size: 5.63 MB
- Stars: 29
- Watchers: 2
- Forks: 6
- Open Issues: 17
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Funding: .github/FUNDING.yml
- License: LICENSE.md
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
Welcome to docudigger 👋
> Document scraper for getting invoices automagically as pdf (useful for taxes or DMS)
### 🏠 [Homepage](https://repo.disane.dev/Disane/docudigger#readme)
## Configuration
All settings can be changed via `CLI`, env variable (even when using docker).
| Setting | Description | Default value |
| ----------------------- | -------------------------------------------------------------------------------------------------------------------------- | --------------- |
| AMAZON_USERNAME | Your Amazon username | `null` |
| AMAZON_PASSWORD | Your amazon password | `null` |
| AMAZON_TLD | Amazon top level domain | `de` |
| AMAZON_YEAR_FILTER | Only extracts invoices from this year (i.e. 2023) | `2023` |
| AMAZON_PAGE_FILTER | Only extracts invoices from this page (i.e. 2) | `null` |
| ONLY_NEW | Tracks already scraped documents and starts a new run at the last scraped one | `true` |
| FILE_DESTINATION_FOLDER | Destination path for all scraped documents | `./documents/` |
| FILE_FALLBACK_EXTENSION | Fallback extension when no extension can be determined | `.pdf` |
| DEBUG | Debug flag (sets the loglevel to DEBUG) | `false` |
| SUBFOLDER_FOR_PAGES | Creates subfolders for every scraped page/plugin | `false` |
| LOG_PATH | Sets the log path | `./logs/` |
| LOG_LEVEL | Log level (see https://github.com/winstonjs/winston#logging-levels) | `info` |
| RECURRING | Flag for executing the script periodically. Needs 'RECURRING_PATTERN' to be set. Default `true`when using docker container | `false` |
| RECURRING_PATTERN | Cron pattern to execute periodically. Needs RECURRING to true | `*/30 * * * *` |
| TZ | Timezone used for docker enviroments | `Europe/Berlin` |## Install
```sh
npm install
```## Usage
```sh-session
$ npm install -g @disane-dev/docudigger
$ docudigger COMMAND
running command...
$ docudigger (--version)
@disane-dev/docudigger/2.0.6 linux-x64 node-v20.13.1
$ docudigger --help [COMMAND]
USAGE
$ docudigger COMMAND
...
```## `docudigger scrape all`
Scrapes all websites periodically (default for docker environment)
```
USAGE
$ docudigger scrape all [--json] [--logLevel trace|debug|info|warn|error] [-d] [-l ] [-c -r]FLAGS
-c, --recurringCron= [default: * * * * *] Cron pattern to execute periodically
-d, --debug
-l, --logPath= [default: ./logs/] Log path
-r, --recurring
--logLevel= [default: info] Specify level for logging.
GLOBAL FLAGS
--json Format output as json.DESCRIPTION
Scrapes all websites periodicallyEXAMPLES
$ docudigger scrape all
```## `docudigger scrape amazon`
Used to get invoices from amazon
```
USAGE
$ docudigger scrape amazon -u -p [--json] [--logLevel trace|debug|info|warn|error] [-d] [-l
] [-c -r] [--fileDestinationFolder ] [--fileFallbackExentension ] [-t ]
[--yearFilter ] [--pageFilter ] [--onlyNew]FLAGS
-c, --recurringCron= [default: * * * * *] Cron pattern to execute periodically
-d, --debug
-l, --logPath= [default: ./logs/] Log path
-p, --password= (required) Password
-r, --recurring
-t, --tld= [default: de] Amazon top level domain
-u, --username= (required) Username
--fileDestinationFolder= [default: ./data/] Amazon top level domain
--fileFallbackExentension= [default: .pdf] Amazon top level domain
--logLevel= [default: info] Specify level for logging.
--onlyNew Gets only new invoices
--pageFilter= Filters a page
--yearFilter= Filters a yearGLOBAL FLAGS
--json Format output as json.DESCRIPTION
Used to get invoices from amazonScrapes amazon invoices
EXAMPLES
$ docudigger scrape amazon
```## Docker
```sh
docker run \
-e AMAZON_USERNAME='[YOUR MAIL]' \
-e AMAZON_PASSWORD='[YOUR PW]' \
-e AMAZON_TLD='de' \
-e AMAZON_YEAR_FILTER='2020' \
-e AMAZON_PAGE_FILTER='1' \
-e LOG_LEVEL='info' \
-v "C:/temp/docudigger/:/home/node/docudigger" \
ghcr.io/disane87/docudigger
```## Dev-Time 🪲
### NPM
```npm
npm install
[Change created .env for your needs]
npm run start
```## Author
👤 **Marco Franke**
- Website: http://byte-style.de
- Github: [@Disane87](https://github.com/Disane87)
- LinkedIn: [@marco-franke-799399136](https://linkedin.com/in/marco-franke-799399136)## 🤝 Contributing
Contributions, issues and feature requests are welcome!
Feel free to check [issues page](https://repo.disane.dev/Disane/docudigger/issues). You can also take a look at the [contributing guide](https://repo.disane.dev/Disane/docudigger/blob/master/CONTRIBUTING.md).## Show your support
Give a ⭐️ if this project helped you!
---
_This README was generated with ❤️ by [readme-md-generator](https://github.com/kefranabg/readme-md-generator)_