An open API service indexing awesome lists of open source software.

https://github.com/snithish/duckdb-hdfs


https://github.com/snithish/duckdb-hdfs

Last synced: about 2 months ago
JSON representation

Awesome Lists containing this project

README

          

# hdfs

This repository is based on https://github.com/duckdb/extension-template, check it out if you want to build and ship your own DuckDB extension.

---

This extension, hdfs, allow you to ... .

## Building
### Dependencies
DuckDB extensions use VCPKG for dependency management. To demonstrate that, the example extension in the template links against
OpenSSL. Enabling VCPKG is very simple: follow the [installation instructions](https://vcpkg.io/en/getting-started) and export the following variable:
```shell
export VCPKG_TOOLCHAIN_PATH=/scripts/buildsystems/vcpkg.cmake
```
Note: while using VCPKG for installation is recommended, the build will still work as long as
CMake's `find_package` function is able to locate a compatible openssl version. Alternatively, feel free
to remove the OpenSSL dependency completely to build the example extension without dependencies.

### Build steps
Now to build the extension, run:
```sh
make
```
The main binaries that will be built are:
```sh
./build/release/duckdb
./build/release/test/unittest
./build/release/extension/hdfs/hdfs.duckdb_extension
```
- `duckdb` is the binary for the duckdb shell with the extension code automatically loaded.
- `unittest` is the test runner of duckdb. Again, the extension is already linked into the binary.
- `hdfs.duckdb_extension` is the loadable binary as it would be distributed.

## Running the extension
To run the extension code, simply start the shell with `./build/release/duckdb`.

Now we can use the features from the extension directly in DuckDB. The template contains a single scalar function `hdfs()` that takes a string arguments and returns a string:
```
D select hdfs('Jane') as result;
┌───────────────┐
│ result │
│ varchar │
├───────────────┤
│ Quack Jane 🐥 │
└───────────────┘
```

## Running the tests
Different tests can be created for DuckDB extensions. The primary way of testing DuckDB extensions should be the SQL tests in `./test/sql`. These SQL tests can be run using:
```sh
make test
```

### Installing the deployed binaries
To install your extension binaries from S3, you will need to do two things. Firstly, DuckDB should be launched with the
`allow_unsigned_extensions` option set to true. How to set this will depend on the client you're using. Some examples:

CLI:
```shell
duckdb -unsigned
```

Python:
```python
con = duckdb.connect(':memory:', config={'allow_unsigned_extensions' : 'true'})
```

NodeJS:
```js
db = new duckdb.Database(':memory:', {"allow_unsigned_extensions": "true"});
```

Secondly, you will need to set the repository endpoint in DuckDB to the HTTP url of your bucket + version of the extension
you want to install. To do this run the following SQL query in DuckDB:
```sql
SET custom_extension_repository='bucket.s3.eu-west-1.amazonaws.com//latest';
```
Note that the `/latest` path will allow you to install the latest extension version available for your current version of
DuckDB. To specify a specific version, you can pass the version instead.

After running these steps, you can install and load your extension using the regular INSTALL/LOAD commands in DuckDB:
```sql
INSTALL hdfs
LOAD hdfs
```