Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/snimmagadda1/stack-exchange-dump-to-mysql
Batch pipeline to import Stack Exchange XML data dumps to relational DB
https://github.com/snimmagadda1/stack-exchange-dump-to-mysql
batch data mysql spring-batch stackoverflow
Last synced: 12 days ago
JSON representation
Batch pipeline to import Stack Exchange XML data dumps to relational DB
- Host: GitHub
- URL: https://github.com/snimmagadda1/stack-exchange-dump-to-mysql
- Owner: snimmagadda1
- License: mit
- Created: 2020-12-17T04:30:39.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2022-02-08T18:45:25.000Z (almost 3 years ago)
- Last Synced: 2024-10-24T09:26:15.761Z (about 2 months ago)
- Topics: batch, data, mysql, spring-batch, stackoverflow
- Language: Java
- Homepage:
- Size: 50.8 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
stack-exchange-dump-to-mysql 👋
> A quick pipeline to import [Stack Exchange XML dump](https://archive.org/details/stackexchange) data to a relational db
### 🏠 [TODO](https://s11a.com)
## Install
```sh
mvn clean package
```## Usage
Before the pipeline is run, the `schema-base.sql` must be executed on the desired output schema. This will initialize
the tables and create necessary indices for the data dump.Run with docker (taking care to pass the required app.datasource.xxx and spring.datasource.xxx properties as env vars):
```sh
docker run -e APP_DATASOURCE_URL=XXXXX -e ... snimmagadda/stacke-batch-mysql:latest
```To run from source, `app.datasource.xxx` properties should be updated accordingly. Metrics job/task metadata by default
are output to an in-memory HSQL DB which can be overridden with the `spring.datasource.xxx` properties. Example yaml:```
app:
datasource:
dialect: org.hibernate.dialect.MySQLDialect
driver-class-name: org.mariadb.jdbc.Driver
url: "jdbc:mysql://localhost:3306/stacke"
username: "root"
password: "password"
```Streamlined ways to import are a W.I.P. For now, manual configuration of application.yaml is required, and running from
source is the simplest way to pass in custom datafiles. Once properties are configured, you can run locally with the
following:```sh
mvn spring-boot:run
```## Run tests
```sh
mvn test
```## Author
👤 **Sai Nimmagadda**
* Website: https://s11a.com
* Github: [@snimmagadda1](https://github.com/snimmagadda1)## 🤝 Contributing
Contributions, issues and feature requests are welcome!
Feel free to
check [issues page](https://github.com/snimmagadda1/stackexchange-dump-to-mysql/issues).## 📝 License
Copyright © 2020 [Sai Nimmagadda](https://github.com/snimmagadda1).
This project is [MIT](LICENSE.md) licensed.***
_This README was generated with ❤️ by [readme-md-generator](https://github.com/kefranabg/readme-md-generator)_