https://github.com/farovictor/mongodbextractor
This project is intended to be used as a data extractor to support ELT pipelines or any kind of process that requires a heavy data dump from MongoDb databases.
https://github.com/farovictor/mongodbextractor
data go mongodb pipeline
Last synced: 25 days ago
JSON representation
This project is intended to be used as a data extractor to support ELT pipelines or any kind of process that requires a heavy data dump from MongoDb databases.
- Host: GitHub
- URL: https://github.com/farovictor/mongodbextractor
- Owner: farovictor
- Created: 2023-10-02T19:25:19.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-10-06T00:24:41.000Z (over 2 years ago)
- Last Synced: 2025-02-15T01:16:55.634Z (12 months ago)
- Topics: data, go, mongodb, pipeline
- Language: Go
- Homepage:
- Size: 13.7 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
Awesome Lists containing this project
README
# Usage
This tool can be used as a package or cli tool.
It serves as a data extractor to support ELT pipelines or any kind of process that requires a heavy data dump.
## CLI
This package allows the user to dump data into multiple json files.
### Ping database
The `ping` command does a ping in database and returns a connection check.
```bash
mongoextract ping --conn-uri "$MONGO_CONN_URI"
```
### Check if a collection exists
The `collxst` command does a ping in database and returns a connection check.
```bash
mongoextract collxst \
--conn-uri "$MONGO_CONN_URI" \
--db-name "$MONGO_DBNAME" \
--collection "$MONGO_COLLECTION" \
--app-name "$APPNAME"
```
### Extract in batches - dumping streaming (async)
The `extract-batch` command iterates over mongo cursor and dumps chunks of data into json files.
```bash
mongoextract extract-batch \
--conn-uri "$MONGO_CONN_URI" \
--db-name "$MONGO_DBNAME" \
--collection "$MONGO_COLLECTION" \
--app-name "$APPNAME" \
--mapping $ID_NAME \
--query '{"latitude":{"$$gte":30}}' \
--output-path "./data" \
--output-prefix $ID_NAME \
--chunk-size 100 \
--num-concurrent-files 10
```
### Extract data
The `extract` command fetches a mongo cursor and dumps the whole data into a json file.
```bash
mongoextract extract \
--conn-uri "$MONGO_CONN_URI" \
--db-name "$MONGO_DBNAME" \
--collection "$MONGO_COLLECTION" \
--app-name "$APPNAME" \
--mapping some_mapping_name \
--query '{"latitude":{"$$gte":30}}'
```
## Releases
All releases are documented in [CHANGELOG](CHANGELOG.md), please check there for more details.
### Docker
All releases are containerized and available [here](https://hub.docker.com/r/victorfaro/mongoextract)
To pull the latest version, execute the following command:
```bash
docker pull victorfaro/mongoextract:latest
```
All images uses the cli as entrypoint, check the above examples to see how to use it.