Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/alnkesq/duckdbsharp

Bidirectional interoperability between DuckDB and .NET
https://github.com/alnkesq/duckdbsharp

dotnet duckdb parquet sql

Last synced: 8 days ago
JSON representation

Bidirectional interoperability between DuckDB and .NET

Host: GitHub
URL: https://github.com/alnkesq/duckdbsharp
Owner: alnkesq
License: agpl-3.0
Created: 2023-08-23T11:20:14.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-02-21T14:52:33.000Z (11 months ago)
Last Synced: 2024-03-27T06:03:35.190Z (10 months ago)
Topics: dotnet, duckdb, parquet, sql
Language: C#
Homepage:
Size: 199 KB
Stars: 9
Watchers: 3
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

README

        [![protobuf-net](https://img.shields.io/nuget/v/DuckDbSharp.svg)](https://www.nuget.org/packages/DuckDbSharp/)

# DuckDbSharp

DuckDbSharp is a bidirectional interoperability layer between [DuckDB](https://github.com/duckdb/duckdb) and .NET.

Features

- Support for **deeply nested** structures and lists

- Expose .NET methods/collections as **table UDFs**, or as **scalar functions**

- Execute DuckDB queries from .NET

- Generates **static types** from SQL (incl. field nullness detection)

- Dynamic results are supported as well (as dynamic assemblies/types)

- **Performance-oriented** with minimal allocations

- Support for both normal and `[Flags]` enums

- Native AOT support

- Pass .NET collections as SQL parameters (either as array or as table)

- Results are streamed as `IEnumerable<>`

- Write DuckDB loadable extensions in C# (work in progress)

Notes:

- This is **not** an ADO.NET (System.Data) provider

  - Rationale: ADO.NET is flat-table oriented (sublists/subfields are not first class citizens, despite these being probably among the best features of DuckDB).

  - Additionally, ADO.NET is very unergonomic to use unless you add an ORM on top of it. Most existing ORMs however don't work well with sublists/subfields. This library deserializes/serializes directly on top of CLR POCO objects, and can generate (and keep up to date) the type definitions for a better IDE experience.

## Usage

### Calling DuckDB from .NET (auto-generated types)

```sql

-- my_query.sql

SELECT

    42 AS column1,

    [1, 2, 3] AS column2,

    [{a: 1, b: 2}] as column3

```

```csharp

foreach (var user in db.ExecuteQuery_my_query())

{

    // "user" has an auto-generated type with all the fields and sub-fields of the SQL query above.

}

```

See [detailed instructions](#getting-started) below.

### Calling DuckDB from .NET (inline sql)

```csharp

using var db = ThreadSafeTypedDuckDbConnection.CreateInMemory();

foreach (var user in db.Execute("select * from user"))

{

    

}

```

You can also use value tuples (`Execute<(string A, int B)>("select 'a', 42")`), but keep in mind that only column order matters, since tuple member names are erased at runtime.

### Calling .NET from DuckDB

```csharp

[DuckDbFunction]

public static IEnumerable GetUsers(string country) { /*...*/ }

```

```sql

SELECT * FROM GetUsers('US')

```

## Getting started

- Add a reference to `DuckDbSharp`

- Create a [queries](https://github.com/alnkesq/DuckDbSharp/tree/main/tests/DuckDbSharp.Example/queries) directory with the `.sql` queries you want to be able to call from .NET

   - File name must be: `ReturnType query_name(paramtype1, paramtype2).sql`

   - If `ReturnType` is not specified, a type will be automatically generated based on the SQL schema of the result.

   - If your query is parameterized, specify the types of the parameters (e.g. `string` or `long`). Otherwise, parens are unnecessary.

- Call `GenerateCSharpTypes` as shown in the [example](https://github.com/alnkesq/DuckDbSharp/blob/main/tests/DuckDbSharp.Example/Program.cs), and run.

- Start using the now generated extension methods of `TypedDuckDbConnectionBase` (one for each query).

- Remember to commit the generated files as well. This is very important in order to be able to recompile old versions of the repository.

## Benchmarks

Time to read ~100,000 rows of Northwind customers. In all 4 cases, the final result is a `List`.

|                        Library |     Mean |   Error |  StdDev | Description

|------------------------------ |---------:|--------:|--------:|--------------

| **DuckDbSharp (this project)** | **145.4 ms** | 2.63 ms | 2.46 ms |SELECT * FROM customer

| DuckDB.NET + Dapper | 177.2 ms | 2.88 ms | 2.69 ms | SELECT * FROM customer

| Protobuf-net | **131.3 ms** | 2.52 ms | 2.81 ms | Deserialize from MemoryStream of protos

| Newtonsoft JSON | 241.7 ms | 2.93 ms | 2.60 ms | Deserialize from MemoryStream of JSON

Note: while protobuf-net is slightly faster, its use cases is very different (serialization/deserialization only, with no query support)

## Advanced features

### Customizing (de)serialization

- `[DuckDbInclude]` and `[DuckDbIgnore]` always take the precedence over other rules. In their absence, `[ProtoMember]` from protobuf-net is also taken into account. Otherwise, only public fields and properties are taken into account.

- Enums are serialized as DuckDb enums, you can override this with `[DuckDbSerializeAs(typeof(string))]` or `[DuckDbSerializeAs(typeof(int))]` (or whatever their underlying type is).

- `[Flags]` enums are always serialized as structs of booleans, one for each bit.

- `[DuckDbDefaultValueIsNullish]` can be applied to structs, and it means that `default(SomeStruct)` should be represented as `NULL` in DuckDB.

### Reading and writing parquets

- You can use `DuckDbUtils.QueryParquet()` and `DuckDbUtils.WriteParquet()` to directly read/write `.parquet` files (no database required).