Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/quackscience/urlengine
A pastie for SQL Tables. URL Engine for DuckDB & ClickHouse
https://github.com/quackscience/urlengine
clickhouse duckdb http httpfs storage url urlengine
Last synced: 2 months ago
JSON representation
A pastie for SQL Tables. URL Engine for DuckDB & ClickHouse
- Host: GitHub
- URL: https://github.com/quackscience/urlengine
- Owner: quackscience
- License: mit
- Created: 2024-09-21T20:32:55.000Z (4 months ago)
- Default Branch: master
- Last Pushed: 2024-10-21T15:49:38.000Z (3 months ago)
- Last Synced: 2024-10-22T03:16:48.506Z (3 months ago)
- Topics: clickhouse, duckdb, http, httpfs, storage, url, urlengine
- Language: Go
- Homepage: https://urleng.com
- Size: 164 KB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# DuckDB URL Engine
The URL Engine provides format agnostic remote storage for DuckDB/Clickhouse with hive-style partitioning.### Demo
A public demo instance is available at [https://urleng.glitch.me](https://urleng.glitch.me)##### Features
- [x] INSERT Files via POST
- [x] SELECT Files via GET/HEAD
- [x] HTTP RANGE Support#### Usage
##### Golang
Install and run the example GO service :
```
cd go/
go mod tidy
PORT=80 go run server.go
```### Examples
#### π¦ DuckDBYou can COPY and SELECT from the URL Engine using extensions `json`,`csv`,`parquet`
```sql
D SET enable_http_write = 1;D COPY (SELECT version() as version, 9999 as number) TO 'https://urleng.glitch.me/test.json';
D SELECT * FROM read_json_auto('https://urleng.glitch.me/test.json');
βββββββββββ¬βββββββββ
β version β number β
β varchar β int64 β
βββββββββββΌβββββββββ€
β v1.1.0 β 9999 β
βββββββββββ΄βββββββββD COPY (SELECT version() as version, 9999 as number) TO 'https://urleng.glitch.me/test.parquet';
D SELECT * FROM read_parquet('https://urleng.glitch.me/test.parquet');
βββββββββββ¬βββββββββ
β version β number β
β varchar β int64 β
βββββββββββΌβββββββββ€
β v1.1.0 β 9999 β
βββββββββββ΄βββββββββD SELECT * FROM parquet_schema('https://urleng.glitch.me/test.parquet');
ββββββββββββββββββββββββ¬ββββββββββββββββ¬βββββββββββββ¬ββββββββββββββ¬ββββ¬βββββββββββββββββ¬ββββββββ¬ββββββββββββ¬βββββββββββ¬βββββββββββββββ
β file_name β name β type β type_length β β¦ β converted_type β scale β precision β field_id β logical_type β
β varchar β varchar β varchar β varchar β β varchar β int64 β int64 β int64 β varchar β
ββββββββββββββββββββββββΌββββββββββββββββΌβββββββββββββΌββββββββββββββΌββββΌβββββββββββββββββΌββββββββΌββββββββββββΌβββββββββββΌβββββββββββββββ€
β https://duckserverβ¦ β duckdb_schema β β β β¦ β β β β β β
β https://duckserverβ¦ β version β BYTE_ARRAY β β β¦ β UTF8 β β β β β
β https://duckserverβ¦ β number β INT32 β β β¦ β INT_32 β β β β β
ββββββββββββββββββββββββ΄ββββββββββββββββ΄βββββββββββββ΄ββββββββββββββ΄ββββ΄βββββββββββββββββ΄ββββββββ΄ββββββββββββ΄βββββββββββ΄βββββββββββββββ€
β 3 rows 11 columns (9 shown) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```##### HTTP/S HEADERS
When Authentication is enabled, you can pass headers and other parameters using _SECRETS_```sql
CREATE SECRET extra_http_headers (
TYPE HTTP,
EXTRA_HTTP_HEADERS MAP{
'Authorization': 'Bearer ${SOME_TOKEN}',
'CustomHeader': 'abc123'
}
);
```##### Native format
You can also upload and attach a native DuckDB `.duckdb` database file and attach it to a read-only session
```bash
curl --data-binary @/path/to/myduck.db https://urleng.glitch.me/myduck.db
```
```sql
ATTACH 'https://urleng.glitch.me/myduck.db' as remote; SELECT * FROM remote.table;
```#### π¦ ClickHouse
##### INSERT
```sql
INSERT INTO FUNCTION url('https://urleng.glitch.me/click.parquet', 'PARQUET', 'column1 String, column2 UInt32') VALUES (version(), 999);
```
##### SELECT
```sql
SELECT * FROM url('https://urleng.glitch.me/click.parquet', PARQUET) FORMAT Pretty;ββββββββββββ³βββββββββ
β version β number β
β‘ββββββββββββββββββββ©
1. β 24.5.1.1 β 999 β
ββββββββββββ΄βββββββββ
```##### DESCRIBE
```sql
DESCRIBE TABLE url('http://https://urleng.glitch.me/click.parquet', PARQUET) FORMAT Pretty;βββββββββββ³βββββββββββββββββββ³βββββββββββββββ³βββββββββββββββββββββ³ββββββββββ³βββββββββββββββββββ³βββββββββββββββββ
β name β type β default_type β default_expression β comment β codec_expression β ttl_expression β
β‘βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ©
1. β version β Nullable(String) β β β β β β
βββββββββββΌβββββββββββββββββββΌβββββββββββββββΌβββββββββββββββββββββΌββββββββββΌβββββββββββββββββββΌβββββββββββββββββ€
2. β number β Nullable(UInt32) β β β β β β
βββββββββββ΄βββββββββββββββββββ΄βββββββββββββββ΄βββββββββββββββββββββ΄ββββββββββ΄βββββββββββββββββββ΄βββββββββββββββββ
```
##### SET PARAM
```sql
SET param_url = 'https://urleng.glitch.me/your_secret_token';
INSERT INTO FUNCTION url({urlο»Ώ:String}, JSONEachRow, 'key String, value UInt64') VALUES ('hello', 1);
SELECT * FROM url({url:String}, JSONEachRow);
```### Design Flow
```mermaid
sequenceDiagram
autonumber
DuckDB->>DuckServer: POST Request
loop Storage
DuckServer->>DuckServer: WRITE FILE
end
DuckServer-->>DuckDB: POST Response
DuckDB->>DuckServer: HEAD Request
loop Storage
DuckServer->>DuckServer: READ FILE SIZE
end
DuckDB->>DuckServer: GET RANGE Request
loop Storage
DuckServer->>DuckServer: READ FILE RANGE
end
DuckServer-->>DuckDB: GET Response
```