{"id":28002933,"url":"https://github.com/flintml/flint","last_synced_at":"2025-05-09T01:44:59.890Z","repository":{"id":288844672,"uuid":"969085680","full_name":"flintml/flint","owner":"flintml","description":"A self-contained, lightweight and OOB research platform for modern ML","archived":false,"fork":false,"pushed_at":"2025-05-01T11:00:14.000Z","size":7885,"stargazers_count":45,"open_issues_count":0,"forks_count":2,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-05-09T01:44:52.812Z","etag":null,"topics":["data-science","deltalake","jupyter","machine-learning","mlops","polars"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/flintml.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-19T10:57:15.000Z","updated_at":"2025-05-06T17:29:16.000Z","dependencies_parsed_at":"2025-04-19T23:36:34.542Z","dependency_job_id":null,"html_url":"https://github.com/flintml/flint","commit_stats":null,"previous_names":["bosonstack/boson","flintml/flint"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flintml%2Fflint","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flintml%2Fflint/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flintml%2Fflint/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/flintml%2Fflint/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/flintml","download_url":"https://codeload.github.com/flintml/flint/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253176444,"owners_count":21866142,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","deltalake","jupyter","machine-learning","mlops","polars"],"created_at":"2025-05-09T01:44:59.297Z","updated_at":"2025-05-09T01:44:59.878Z","avatar_url":"https://github.com/flintml.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Boson\n\n**Boson** is a lightweight, fully containerized, and feature-rich machine learning research platform. It centralizes essential tools to help teams keep projects lean, organized, and reproducible—while reducing overhead and boosting productivity. Think Databricks/Sagemaker but local and free.\n\nBoson enables engineers and researchers to iterate faster without getting bogged down by infrastructure or tooling complexity.\n\n![Boson Interface](assets/static/main-screenshot.png)\n\n---\n\n## Contents\n- [Key Features](#-key-features)\n- [Quickstart](#-quickstart)\n- [Quick Look](#-quick-look)\n- [Creating a Workspace](#-creating-a-workspace)\n- Documentation\n  - [Example - Instacart](docs/example-instacart.md)\n  - [Concepts](docs/concepts.md)\n  - [Builtins](docs/builtins.md)\n- [Contributing](#-contributing)\n\n## 🔑 Key Features\n\n- **Out-of-the-Box Data Lake Integration**  \n  Boson uses [Delta Lake](https://github.com/delta-io/delta) to store datasets and features, making it easy to save and load dataframes as versioned tables. A built-in Delta Explorer lets you visually inspect your lake in real time.\n\n- **Lazy Data Processing with Polars**  \n  Boson supports efficient, memory-conscious data workflows using [Polars](https://github.com/pola-rs/polars). This makes large, expensive transformations performant and scalable—even on local hardware.\n\n- **Integrated Experiment Tracking**  \n  Powered by [Aim](https://github.com/aimhubio/aim) Boson offers a seamless tracking experience—log metrics, compare experiments, and visualize performance over time with zero setup.\n\n- **Cloud-Like Notebook Development**  \n  All data, notebooks, artifacts, and metrics are stored in internal cloud storage. This keeps your local environment clean and every workspace fully self-contained.\n\n- **Composable, Declarative Infrastructure**  \n  Built on layered Docker Compose files, Boson enables isolated, customizable workspaces per project—without sacrificing reproducibility or maintainability.\n\n---\n\n## 🚀 Quickstart \n\n**Boson currently only supports AMD64 (i.e. no Mac). ARM support is a high priority.**\n\nBoson requires that `docker` be installed.\n\nAfter cloning, navigate to the project root and run:\n\n```bash\ndocker compose -f docker-compose.base.yml -f workspaces/example-instacart/docker-compose.override.yml --env-file workspaces/example-instacart/.env up\n```\n\nThis will spin up the `example-instacart` workspace. You can open this workspace in Boson by visitng `http://localhost:8889` in your browser.\n\nFollow the [walkthrough](/docs/example-instacart.md) of this example.\n\n## 🔎 Quick Look\n\nWrite Polars df to Delta Table:\n```python\nwrite_delta(my_df, \"my_table_name\")\n```\n\nRead Delta Table to Polars df:\n```python\ndf_materialised = read_delta(\"my_table_name\")\ndf_lazy = scan_delta(\"my_table_name\")\n```\n\nDisplay inline Polars df:\n```python\ndisplay(my_df)\n```\n![inline-table](assets/static/inline-table.png)\n\nCreate Aim experiment for tracking:\n```python\nrun = new_run(experiment=\"my-experiment-name\")\n```\n\nView Delta Tables:\n\n![delta-explorer](assets/static/delta-exporer.png)\n\n## ⚙️ Creating a Workspace\nBoson is deployed by instantiating a *workspace*. A *workspace* is an instance of the **Boson Kernel** but with its own isolated dependencies and configuration.\n\nThe below instructions will create a new custom workspace with isolated storage.\n\nCreate a workspace by running:\n\n```bash\n./create-workspace.sh \u003cMY_WORKSPACE_NAME\u003e \u003cMY_WORKSPACE_PORT\u003e\n```\n\nNavigate to the workspace directory with:\n\n```bash\ncd workspaces/\u003cMY_WORKSPACE_NAME\u003e\n```\n\nChange Python dependencies via [Poetry](https://python-poetry.org/), for example:\n\n```bash\npoetry remove xgboost\npoetry add plotly\n```\n\nIf you do not have Poetry installed, you can follow the above link to set it up or manually update the `[tool.poetry.dependencies]` section of the `pyproject.toml`.\n\nSpin up the Boson workspace with:\n\n```bash\ndocker compose -f docker-compose.base.yml -f workspaces/\u003cMY_WORKSPACE_NAME\u003e/docker-compose.override.yml --env-file workspaces/\u003cMY_WORKSPACE_NAME\u003e/.env up\n```\n\n## 🤝 Contributing\n\nWe would be stoked for you to get involved with Boson development! If you'd like to get more involved, please contact Harry at harry@myntlabs.io.\n\n- 💬 [Start a discussion](https://github.com/bosonstack/boson/discussions)\n- 🛠️ [Fix a bug](https://github.com/bosonstack/boson/issues/new)\n- 🧠 [Request a feature](https://github.com/bosonstack/boson/issues/new)\n\n### Future tickets\n\n- ARM support\n- Improve linting - especially of custom builtins\n- Introduce table schemas/scoping\n- Nice CLI wrapper\n- Various UI improvements","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fflintml%2Fflint","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fflintml%2Fflint","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fflintml%2Fflint/lists"}