https://github.com/zillow/intake-dal
Dataset abstraction over disparate storage systems (eg: bulk, streaming, serving, ...).
https://github.com/zillow/intake-dal
data intake python
Last synced: 10 months ago
JSON representation
Dataset abstraction over disparate storage systems (eg: bulk, streaming, serving, ...).
- Host: GitHub
- URL: https://github.com/zillow/intake-dal
- Owner: zillow
- License: apache-2.0
- Created: 2019-08-21T22:49:22.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2023-10-17T20:44:06.000Z (over 2 years ago)
- Last Synced: 2025-03-30T03:11:51.406Z (10 months ago)
- Topics: data, intake, python
- Language: Python
- Homepage:
- Size: 151 KB
- Stars: 5
- Watchers: 7
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.rst
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
.. image:: https://travis-ci.org/zillow/intake-dal.svg?branch=master
:target: https://travis-ci.org/zillow/intake-dal
.. image:: https://coveralls.io/repos/github/zillow/intake-dal/badge.svg?branch=master
:target: https://coveralls.io/github/zillow/intake-dal?branch=master
.. image:: https://readthedocs.org/projects/intake-dal/badge/?version=latest
:target: https://intake-dal.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status
Welcome to Intake DAL (data access layer) plugin
==================================================
This `Intake `_ plugin helps
abstract a dataset over disparate storage systems (eg: bulk, streaming, serving, ...).
It also provides an easy way to specialize a
`hierarchical catalog `_
to a default DAL storage system.
Sample Catalog source entry:
.. code-block:: yaml
user_events:
driver: dal
args:
default: 'local'
storage:
local: 'csv://{{ CATALOG_DIR }}/data/user_events.csv'
serving: 'in-memory-kv://foo'
batch: 'parquet://{{ CATALOG_DIR }}/data/user_events.parquet'
Example code using sample catalog:
.. code-block:: python
# Specialize the catalog dal default storge mode datasources
# to be "serving".
cat = DalCatalog(path, storage_mode="serving")
# reads from the serving storage system
# using the in-memory-kv Intake plugin
df = cat.user_events.read()