https://github.com/milesmcc/mieql
Implementation for IEQL to implement scan functionality on the Common Crawl archive.
https://github.com/milesmcc/mieql
Last synced: 3 months ago
JSON representation
Implementation for IEQL to implement scan functionality on the Common Crawl archive.
- Host: GitHub
- URL: https://github.com/milesmcc/mieql
- Owner: milesmcc
- Created: 2019-01-18T23:55:17.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2020-02-15T00:45:27.000Z (over 5 years ago)
- Last Synced: 2025-04-09T22:54:55.528Z (3 months ago)
- Language: Rust
- Size: 152 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MIEQL
This is a connector implementation for IEQL that allows streaming between [IEQL](https://github.com/milesmcc/ieql) and Common Crawl.
Notably, it provides:
* On-the-fly gzip decoding and processing
* Fully distributed and parallelized architecture
* Master/client functionality via a CLI
* Full integration with AWS S3 for data retrieval## Database
The database must have the following tables: `queries`, `outputs`, and `inputs`. Create them according to the following SQL commands:
```sql
CREATE TABLE queries (
ron TEXT
);CREATE TABLE outputs (
jsonb JSONB
);CREATE TABLE inputs (
url TEXT
);
```---
This is a proof of concept, and is not meant to be used as a library.