{"id":19647060,"url":"https://github.com/idsia/flotta","last_synced_at":"2026-03-16T19:32:40.870Z","repository":{"id":187717538,"uuid":"677274273","full_name":"IDSIA/flotta","owner":"IDSIA","description":"A federated learning framework for researchers.","archived":false,"fork":false,"pushed_at":"2024-06-20T15:25:26.000Z","size":4645,"stargazers_count":2,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-30T15:13:38.155Z","etag":null,"topics":["distributed-computing","federated-learning"],"latest_commit_sha":null,"homepage":"https://flotta.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"lgpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/IDSIA.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-11T06:55:12.000Z","updated_at":"2024-07-25T14:29:37.000Z","dependencies_parsed_at":"2024-03-25T12:32:28.437Z","dependency_job_id":"232ef3d2-b7c1-40c4-838c-6eee56aec9e7","html_url":"https://github.com/IDSIA/flotta","commit_stats":null,"previous_names":["idsia/ferdelance","idsia/flotta"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/IDSIA/flotta","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDSIA%2Fflotta","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDSIA%2Fflotta/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDSIA%2Fflotta/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDSIA%2Fflotta/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/IDSIA","download_url":"https://codeload.github.com/IDSIA/flotta/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IDSIA%2Fflotta/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274806971,"owners_count":25353612,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-12T02:00:09.324Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["distributed-computing","federated-learning"],"created_at":"2024-11-11T14:42:12.694Z","updated_at":"2026-03-16T19:32:40.799Z","avatar_url":"https://github.com/IDSIA.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# flotta, a Federated Learning framework\n\n\n## What is flotta?\n\n_flotta_ is a **distributed framework** intended to be used both as a workbench to develop new distributed algorithm within a Federated Learning (FL) based environment, and perform distributed statistical analysis on private data.\n\nFederated Learning is a Machine Learning (ML) approach that allows for training models across decentralized devices or servers while keeping the data localized, increasing the privacy of data holders.\nInstead of collecting data from various sources and centralizing it in one location for training, federated learning enables model training directly on the devices where the data resides.\nIn FL the training of models is distributed across a series of data holders (client nodes) that have direct and exclusive access to their data.\nThe particularity of this approach is that the training data never leave these nodes, while only aggregated data, such as model parameters, are exchanged to build an aggregated model.\n\nThe current implementation support both a centralized setup, where model's parameters are sent from the client nodes to an aggregation node, and distributed setup, where a model is shared across multiple nodes and multiple model aggregation can happen on different nodes.\n\nThe intent of this framework is to develop a solution that enable researcher to develop and test new ML models in a FL context without interacting directly with the data.\nThe framework wraps a familiar set of Python packages and libraries, such as Scikit-Learn and Pandas.\nThis allows researchers to quickly setup data extraction pipeline, following the _Extract-Transform-Load_ paradigm, and build models or analyze data.\n\nThe main component of the framework is the **node**: a [FastAPI](https://fastapi.tiangolo.com/) based application capable of manage, schedule, and execute jobs in a [Ray](https://www.ray.io/) worker.\n\nThe implementation of the distributed network of node have been inspired by the [Apache Spark](https://spark.apache.org/) framework; to interact with the framework, the researchers can use a **workbench** context to create and submit jobs to the distributed network through a node.\n\n---\n\n## Use the framework\n\nThe framework is available as a Python 3.10 package.\nConfiguring a node can be done through environment variables or a YAML configuration file.\n\n\n### Workbench\n\nThe _workbench_ is not a standalone application but a library that need to be imported.\nIt is used to communicate with a node and submit Artifacts, that encapsulate instructions for the job scheduling and execution.\n\nInstallation is straightforward.\n\n```bash\npip install flotta[workbench]\n```\n\nOnce installed, just create a context object and obtain a project handler with a token.\nA project is a collection of data sources.\nThe token is created by the node network administrator and it is unique for each project.\n\nFollowing an example of how to use the workbench library to connect to a node.\n\n```python\nfrom flotta.core.distributions import Collect\nfrom flotta.core.model_operations import Aggregation, Train, TrainTest\nfrom flotta.core.models import FederatedRandomForestClassifier, StrategyRandomForestClassifier\nfrom flotta.core.steps import Finalize, Parallel\nfrom flotta.core.transformers import FederatedSplitter\nfrom flotta.workbench import Context, Artifact\n\nserver_url = \"http://localhost:1456\"\nproject_token = \"58981bcbab77ef4b8e01207134c38873e0936a9ab88cd76b243a2e2c85390b94\"\n\n# create the context\nctx = Context(server_url)\n\n# load a project\nproject = ctx.project(project_token)\n\n# an aggregated view on data\nds = project.data  \n\n# print all available features\nfor feature in ds.features:\n    print(feature)\n\n# create a query starting from the project's data\nq = ds.extract()\nq = q.add(q[\"feature\"] \u003c 2)\n\n# create a Federated model\nmodel = FederatedRandomForestClassifier(\n    n_estimators=10,\n    strategy=StrategyRandomForestClassifier.MERGE,\n)\n\nlabel = \"label\"\n\n# describe how to distribute the work and how to train teh model\nsteps = [\n    Parallel(\n        TrainTest(\n            query=project.extract().add(\n                FederatedSplitter(\n                    random_state=42,\n                    test_percentage=0.2,\n                    label=label,\n                )\n            ),\n            trainer=Train(model=model),\n            model=model,\n        ),\n        Collect(),\n    ),\n    Finalize(\n        Aggregation(model=model),\n    ),\n]\n\n# submit artifact\nartifact: Artifact = ctx.submit(project, steps)\n```\n\n\u003e **Note:** More examples are available in the [`examples`](./examples/) and in the [`tests`](./tests/) folders.\n\n### Node deployment\n\nThe _aggregation node_ is a node reachable from all nodes in the network and the central node of the framework.\nAll workbenches send their payload, called Artifacts, to the aggregation node; while all the clients query the same node for the next job to run.\nThis allows the clients to have more control on the access and an additional layer of protection: a client node is not reachable from the internet and it is the client that contact the known reference node and initiate the execution process.\n\nThe installation of a node is simple:\n\n```bash\npip install flotta\n```\n\nOnce installed it can be run by specifying a YAML configuration file:\n\n```bash\npython -m flotta -c ./config.yaml\n```\n\nThe node is composed by a web API written with [FastAPI](https://fastapi.tiangolo.com/) that runs and spawns [Ray](https://ray.io) tasks.\nThe node also uses a database to keep track of every stored object.\n\nThe easiest way to deploy a node is using **Docker Compose**.\n\nThe file [docker-compose.integration.yaml](./tests/integration/docker-compose.integration.yaml) contains a definition of all services required to create a stack that simulates a central server node and some client nodes.\n\nOnce one node is up and running, with default parameters the node will be reachable at `http://server:1456/`.\n\n\n### Node configuration\n\nThe minimal content of the configuration file is the definition of the server url to use and at least one datasource.\nThe datasource must have a name and be associated with one or more project thought the `token` field. \n\n```yaml\nworkdir: ./storage                  # OPTIONAL: local path of the working directory\n\nmode: node                          # one of: node, client, standalone\n\nnode:\n  name: flottaNode\n  healthcheck: 3600.0               # wait in seconds for check self status\n  heartbeat: 10.0                   # wait in seconds for clients to fetch updates\n  allow_resource_download: true     # if false, nobody can download resources from this node\n\n  protocol: http                    # external protocol (http or https)\n  interface: 0.0.0.0                # interface to use (0.0.0.0 for node, \"localhost\" for clients)\n  url: \"\"                           # external url that the node will be reachable at\n  port: 1456                        # external port to use to reach the APIs\n\n  token_projects_initial:           # initial projects available at node start\n    - name: my_beautiful_project    # name of the project\n      token: 58981bcbab...          # unique token assigned to the project\n\njoin:\n  first: true                       # if true, this is the first node in the distributed network\n  url: \"\"                           # when a node is note the first, set the url for the join node\n\ndatasources:                        # list of available datasources\n  - name: iris                      # name of the source\n    kind: file                      # how the datasource is stored (only 'file')\n    type: csv                       # file format supported (only 'csv' or 'tsv')\n    path: /data/iris.csv            # path to the file to use\n    token:                          # list of project token that can access this datasource\n    - 58981bcbab7...                \n\ndatabase:\n  username: \"\"                      # username used to access the database\n  password: \"\"                      # password used to access the database\n  scheme: flotta                # specify the name of the database schema to use\n  memory: false                     # when set to true, a SQLite in-memory database will be used\n  dialect: sqlite                   # current accepted dialects are: SQLite and Postgresql\n  host: ./sqlite.db                 # local path for local file (QSLite) or url for remote database\n  port: \"\"                          # port to use to connect to a remote database\n```\n\n\u003e **Note:** It is also possible to specify environment variables in the configuration file using the syntax `${ENVIRONMENT_VARIABLE_NAME}` inside the fields of parameters.\nThis is specially useful when setting parameters, such as domains or password, through a Docker compose file.\n\nFor the first node of the distributed network, the `join.first` parameter must always be set to `true`.\nIn the network it must always be a first node with this configuration.\nIn all the other cases, both for `client` and `node` mode, the configuration need to specify the `join.url` parameter to a valid url of an existing node.\nOnly urls of nodes in `node` mode can be used in this parameter.\n\nDatabase configuration is completely optional.\nEvery node needs a database to work properly.\nMinimal setup is to use an SQLite in-memory database by setting `database.memory: true`.\nIf not database is configured, then the in-memory database will be used.\nOther supported database are:\n* SQLite file database,\n\n```yaml\ndatabase:\n  scheme: flotta\n  dialect: sqlite\n  host: ./sqlite.db\n  memory: false\n```\n\n* Postgresql remote database.\n\n```yaml\ndatabase:\n  username: \"${DATABASE_USER}\"\n  password: \"${DATABASE_PASSWORD}\"\n  scheme: flotta\n  dialect: postgresql\n  host: remote_url\n  port: 5432\n  memory: false\n```\n\n---\n\n## Development\n\nThe flotta framework is open for contributions and offer a quick development environment.\n\nIt is useful to use a local Python virtual environment, like [virtualenv](https://docs.python.org/3/library/venv.html) or [conda](https://docs.conda.io/), during the development of the library.\n\nThe repository contains a `Makefile` that can be used to quickly create an environment and install the framework in development mode.\n\n\u003e **Note:** Make sure that the `make` command is available on the test machine.\n\nTo install the library in development mode, use the following command:\n\n```bash\npip install -e \".[dev]\" \n```\n\nTo test the changes in development mode there are a number of possibilities:\n\n- standalone mode,\n- unit tests using `pytest`,\n- integration tests using Docker,\n- full development using Docker.\n\n---\n\n## Testing\n\nFor testing purposes it is useful to install the test version of the framework:\n\n```bash\npip install flotta[test]\n```\n\n\u003e **Note:** The development version already include the test part\n\n\n### Standalone mode\n\nOne of the simplest way to test changes to the framework is through the so called `standalone mode`.\nIn this mode, the framework is executed as a standalone application: this is just a node scheduling jobs for itself with an hardcoded base configuration.\n\n```bash\npython -m flotta.standalone\n```\n\n\n### Integration tests\n\nIntegration tests are the perfect entrypoint for start deploying and use the framework.\nThese tests simulates a real deployment, although on the same machine, with a dataset split and shared across multiple nodes.\n\nThe execution requires a special [Docker Compose](./tests/integration/docker-compose.integration.yaml) that will produce a stack with:\n\n* repository with the packed wheel of the library\n* a postgres database\n* a node acting as an aggregation server\n* 2 nodes in client mode\n* 2 nodes in default mode (*not used yet*)\n* a workbench service\n\nThe two client nodes and the two default nodes include the [California Housing Pricing dataset](https://inria.github.io/scikit-learn-mooc/python_scripts/datasets_california_housing.html).\nThis dataset has been split in three: two parts for the nodes, one part for the evaluation in the workbench.\nThese datasets are saved in CSV format in the [data](./tests/integration/data) folder.\n\nConfiguration of single nodes are stored in the [conf](./tests/integration/conf) folder in YAML format.\n\nIntegration tests are written as scripts and simulates what an user could write through the workbench interface.\nAlthough a little bit primitive (and not so fast to setup and teardown), it is an effective way to test the workflow of the framework.\n\nAll integration tests should be placed in the `tests/integration/tests` folder.\nThese test should be named following the convention `test_NNN.\u003cname\u003e.py`, where `NNN` is an incremental number padded with zeros and `\u003cname\u003e` is just a reference.\n\nTo execute the integration tests, simply run the following command from inside the integration folder:\n\n```bash\nmake start\n```\n\n\u003e **Note:** The Makefile included in `tests/integration` folder has other useful commands to start, stop, clear, and reload the Docker compose stack and also dump and clean the internal logs.\n\n\n### Unit tests\n\nTo test single part of code, such as transformers, models, or estimators, it is advised to write test files using the [`pytest`](https://docs.pytest.org/) library.\n\nA simple test case can be setup as follow:\n\n```python\nfrom flotta.server.api import api\n\nfrom fastapi.testclient import TestClient\nfrom sqlalchemy.ext.asyncio import AsyncSession\n\nfrom tests.utils import connect\nimport pytest\n\n@pytest.mark.asyncio\nasync def test_workbench_read_home(session: AsyncSession):\n  with TestClient(api) as server:\n    args = await connect(server, session)\n        wb_exc = args.wb_exc\n\n        res = server.get(\n            \"/workbench\",\n            headers=wb_exc.headers(),\n        )\n\n        assert res.status_code == 200\n```\n\nThe fixture to connect to the test db (which for tests is an [`SQLite`](https://www.sqlite.org/) database) through the `session` object are defined in the [`conftest.py`](`./tests/conftests.py`) file.\n\nOther utility (component connection, clients operations, ...) methods are defined in the `/tests/utils.py` file.\n\n\u003e **Note:** Remember that the APIs defined in the framework use [`FastAPI`](https://fastapi.tiangolo.com/) in full _asynchronous_ mode: the test functions need to be defined as `async` and decorated with `@pytest.mark.asyncio` to work with the fixtures.\n\n\u003e **Note:** Code executed by the Ray's workers are _synchronous_ and these workers are designed to never access a database. Only the asynchronous APIs can access the database.\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidsia%2Fflotta","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fidsia%2Fflotta","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fidsia%2Fflotta/lists"}