https://github.com/snithish/duckdb-hdfs
https://github.com/snithish/duckdb-hdfs
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/snithish/duckdb-hdfs
- Owner: snithish
- License: mit
- Created: 2023-10-07T17:23:58.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-01-08T13:01:44.000Z (over 2 years ago)
- Last Synced: 2025-03-12T06:42:33.847Z (over 1 year ago)
- Language: C++
- Size: 19.5 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# hdfs
This repository is based on https://github.com/duckdb/extension-template, check it out if you want to build and ship your own DuckDB extension.
---
This extension, hdfs, allow you to ... .
## Building
### Dependencies
DuckDB extensions use VCPKG for dependency management. To demonstrate that, the example extension in the template links against
OpenSSL. Enabling VCPKG is very simple: follow the [installation instructions](https://vcpkg.io/en/getting-started) and export the following variable:
```shell
export VCPKG_TOOLCHAIN_PATH=/scripts/buildsystems/vcpkg.cmake
```
Note: while using VCPKG for installation is recommended, the build will still work as long as
CMake's `find_package` function is able to locate a compatible openssl version. Alternatively, feel free
to remove the OpenSSL dependency completely to build the example extension without dependencies.
### Build steps
Now to build the extension, run:
```sh
make
```
The main binaries that will be built are:
```sh
./build/release/duckdb
./build/release/test/unittest
./build/release/extension/hdfs/hdfs.duckdb_extension
```
- `duckdb` is the binary for the duckdb shell with the extension code automatically loaded.
- `unittest` is the test runner of duckdb. Again, the extension is already linked into the binary.
- `hdfs.duckdb_extension` is the loadable binary as it would be distributed.
## Running the extension
To run the extension code, simply start the shell with `./build/release/duckdb`.
Now we can use the features from the extension directly in DuckDB. The template contains a single scalar function `hdfs()` that takes a string arguments and returns a string:
```
D select hdfs('Jane') as result;
┌───────────────┐
│ result │
│ varchar │
├───────────────┤
│ Quack Jane 🐥 │
└───────────────┘
```
## Running the tests
Different tests can be created for DuckDB extensions. The primary way of testing DuckDB extensions should be the SQL tests in `./test/sql`. These SQL tests can be run using:
```sh
make test
```
### Installing the deployed binaries
To install your extension binaries from S3, you will need to do two things. Firstly, DuckDB should be launched with the
`allow_unsigned_extensions` option set to true. How to set this will depend on the client you're using. Some examples:
CLI:
```shell
duckdb -unsigned
```
Python:
```python
con = duckdb.connect(':memory:', config={'allow_unsigned_extensions' : 'true'})
```
NodeJS:
```js
db = new duckdb.Database(':memory:', {"allow_unsigned_extensions": "true"});
```
Secondly, you will need to set the repository endpoint in DuckDB to the HTTP url of your bucket + version of the extension
you want to install. To do this run the following SQL query in DuckDB:
```sql
SET custom_extension_repository='bucket.s3.eu-west-1.amazonaws.com//latest';
```
Note that the `/latest` path will allow you to install the latest extension version available for your current version of
DuckDB. To specify a specific version, you can pass the version instead.
After running these steps, you can install and load your extension using the regular INSTALL/LOAD commands in DuckDB:
```sql
INSTALL hdfs
LOAD hdfs
```