https://github.com/sourdoughcat/distiller
https://github.com/sourdoughcat/distiller
Last synced: 6 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/sourdoughcat/distiller
- Owner: SourdoughCat
- Created: 2024-03-03T23:08:59.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-03-03T23:35:52.000Z (over 1 year ago)
- Last Synced: 2024-12-09T00:09:44.651Z (6 months ago)
- Language: Python
- Size: 240 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
![]()
Distiller
Distiller is a "drop-in" feature store project. It aims to generate SQL code using "good" defaults based on existing feature store implementations.
It demonstrates that you do not need to set up any infrastructure (besides having access to an existing database)
to build machine learning pipelines.What it is:
- Just a SQL generation tool, using `sqlalchemy` (for now)
- Support multi-entity (or at least only if each feature group has one single entity, but entity types can be mixed)
- Support feature versioning (or update via a creation timestamp field)What we aim to do:
- Provide SQL generation tools for offline feature retrieval
- Document the semantics and language used in Distiller
- (TODO) Provide SQL generation tools to create a batch online unload onto your favourite online serving tool
- (TODO) Provide an opinionated online serving interface
- (TODO) Have a sensible APIWhat we're not:
- Feature metadata store (though providing metadata is optional)
- Compute engine, offload that to a database!Comparisons and Influences:
- vertex.ai for the data preprocessing standards. No need to support every possible pattern!
- feast community for the SQL templatesFuture Ideas:
- The logic is fairly simple and can be ported away (i.e. not in Python). Perhaps we can move the logic to something else and then expose appropriate bindings to R, JavaScript, Java etc.