Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dagworks-inc/hamilton
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
https://github.com/dagworks-inc/hamilton
dag data-analysis data-engineering data-science dataframe etl etl-framework etl-pipeline feature-engineering hacktoberfest lineage llmops machine-learning mlops orchestration pandas python rag software-engineering
Last synced: 7 days ago
JSON representation
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
- Host: GitHub
- URL: https://github.com/dagworks-inc/hamilton
- Owner: DAGWorks-Inc
- License: bsd-3-clause-clear
- Created: 2023-02-23T17:16:48.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-11-27T05:12:15.000Z (about 2 months ago)
- Last Synced: 2024-12-02T10:14:07.398Z (about 1 month ago)
- Topics: dag, data-analysis, data-engineering, data-science, dataframe, etl, etl-framework, etl-pipeline, feature-engineering, hacktoberfest, lineage, llmops, machine-learning, mlops, orchestration, pandas, python, rag, software-engineering
- Language: Jupyter Notebook
- Homepage: https://hamilton.dagworks.io/en/latest/
- Size: 75.5 MB
- Stars: 1,885
- Watchers: 17
- Forks: 125
- Open Issues: 127
-
Metadata Files:
- Readme: README-DOCS.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project
- awesome-systematic-trading - Hamilton - commit/dagworks-inc/hamilton/main) ![GitHub Repo stars](https://img.shields.io/github/stars/dagworks-inc/hamilton?style=social)| Python | - A scalable general purpose micro-framework for defining dataflows. You can use it to build dataframes, numpy matrices, python objects, ML models, etc. Embed Hamilton anywhere python runs, e.g. spark, airflow, jupyter, fastapi, python scripts, etc. (Basic Components / Computation)
- awesome-llmops - Hamilton - inc/hamilton.svg?style=flat-square) | (Large Scale Deployment / Workflow)
- awesome-production-machine-learning - Hamilton - inc/hamilton.svg?style=social) - Hamilton is a micro-orchestration framework for defining dataflows. Runs anywhere python runs (e.g. jupyter, fastAPI, spark, ray, dask). Brings software engineering best practices without you knowing it. Use it to define feature engineering transforms, end-to-end model pipelines, and LLM workflows. It complements macro-orchestration systems (e.g. kedro, luigi, airflow, dbt, etc.) as it replaces the code within those macro tasks. Comes with a self-hostable UI that captures lineage & provenance, execution telemetry & data summaries, and builds a self-populating catalog; usable in development as well as production. (Data Pipeline)
- awesome-data-engineering - Hamilton - Hamilton is a lightweight library to define data transformations as a directed-acyclic graph (DAG). If you like dbt for SQL transforms, you will like Hamilton for Python processing. (Workflow)
- awesome-ai-repositories - hamilton - inc/hamilton><img src="https://img.shields.io/github/stars/dagworks-inc/hamilton?style=social" width=100/></a> | (RAG Framework)
- awesome-data-engineer - Hamilton
- awesome-data-engineer - Hamilton
- trackawesomelist - Hamilton (⭐1.9k) - a lightweight library to define data transformations as a directed-acyclic graph (DAG). It helps author reliable feature engineering and machine learning pipelines, and more. (Recently Updated / [Dec 17, 2024](/content/2024/12/17/README.md))
README
# Documentation
Instructions for managing documentation on read the docs.
# Build locally
To build locally, you need to run the following -- make sure you're in the root of the repo:
```bash
pip install .[docs]
```
and then one of the following to build and view the documents:
```bash
sphinx-build -b dirhtml -W -E -T -a docs /tmp/mydocs
python -m http.server --directory /tmp/mydocs
```
or for auto rebuilding do:
```bash
sphinx-autobuild -b dirhtml -W -E -T --watch hamilton/ -a docs /tmp/mydocs
```
Then it'll be running on port 8000.Note: readthedocs builds will fail if there are ANY WARNINGs in the build.
So make sure to check the build log for any warnings, and fix them, else you'll waste time debugging readthedocs
build failures.# SimplePDF
To create a PDF, you can run the following:
```bash
sphinx-build -b simplepdf -W -E -T -a docs /tmp/mydocs
# or if you want to auto-rebuild:
sphinx-autobuild -b simplepdf -W -E -T --watch hamilton/ -a docs /tmp/mydocs
```
The PDF will be in `/tmp/mydocs` in a few minutes.# reST vs myST
We use both! The general breakdown of when to use which is:
1. For documentation that we want to be viewable in github, use myST.
2. Otherwise default to using reST.