{"id":28471595,"url":"https://github.com/openml/services","last_synced_at":"2025-07-01T22:31:16.089Z","repository":{"id":216648610,"uuid":"741467925","full_name":"openml/services","owner":"openml","description":"Overview of all OpenML components including a docker-compose to run OpenML services locally","archived":false,"fork":false,"pushed_at":"2025-02-15T11:36:45.000Z","size":3865,"stargazers_count":1,"open_issues_count":4,"forks_count":0,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-06-07T11:07:56.218Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/openml.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"openml","open_collective":"openml"}},"created_at":"2024-01-10T13:08:27.000Z","updated_at":"2025-02-17T01:39:50.000Z","dependencies_parsed_at":"2024-03-01T09:47:00.963Z","dependency_job_id":"4c6f7cbd-35d3-4ea5-9fd9-d78006ef0270","html_url":"https://github.com/openml/services","commit_stats":null,"previous_names":["openml/services"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/openml/services","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openml%2Fservices","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openml%2Fservices/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openml%2Fservices/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openml%2Fservices/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/openml","download_url":"https://codeload.github.com/openml/services/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openml%2Fservices/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263046159,"owners_count":23405146,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-07T11:08:01.520Z","updated_at":"2025-07-01T22:31:16.074Z","avatar_url":"https://github.com/openml.png","language":"Shell","funding_links":["https://github.com/sponsors/openml","https://opencollective.com/openml"],"categories":[],"sub_categories":[],"readme":"# services\nOverview of all OpenML components including a docker-compose to run OpenML services locally\n\n## Overview\n\n![OpenML Component overview](https://raw.githubusercontent.com/openml/services/main/documentation/OpenML-overview.png)\n\n## Prerequisites\n- Linux/MacOS with Intell processor (because of our old ES version, this project currently does not support `arm` architectures)\n- [Docker](https://docs.docker.com/get-docker/) \n- [Docker Compose](https://docs.docker.com/compose/install/) version 2.21.0 or higher\n\n## Usage\n\nWhen using this project for the first time, run:\n```bash\nchown -R www-data:www-data data/php\n# Or, if previous fails, for instance because `www-data` does not exist:\nchmod -R 777 data/php\n```\nThis is necessary to make sure that you can upload datasets, tasks and runs. Note that the dataset data is meant to be public anyway, so a 777 should not be problematic. This step won't be necessary anymore once the backend stores its files on MinIO.\n\n\nYou run all OpenML services locally using\n```bash\ndocker compose --profile all up -d\n```\nStop it again using \n```bash\ndocker compose --profile all down\n```\n\n### Profiles\nYou can use different profiles:\n\n- `[no profile]`: databases\n- `\"elasticsearch\"`: databases + nginx + elasticsearch\n- `\"rest-api\"`: databases + nginx + elasticsearch + REST API\n- `\"frontend\"`: databases + nginx + elasticsearch + REST API + frontend + email-server\n- `\"minio\"`: databases + nginx + elasticsearch + REST APP + MinIO + parquet and croissant conversion\n- `\"evaluation-engine\"`: databases + nginx + elastichsearc + REST API + MinIO + evaluation engine\n- `\"all\"`: everything\n\nUsage examples:\n```bash\ndocker compose --profile all up -d       # all services\ndocker compose up -d                     # only the database\ndocker compose --profile frontend up -d  # Frontend, rest-api, elasticsearch and database\n```\nUse the same profile for your `down` command.\n\n\n## Known issues\nSee the Github Issue list for the known issues.\n\n## Debugging\nSome usefull commands:\n```bash\ndocker logs openml-php-rest-api -f              # tail the logs of the php rest api\ndocker exec -it openml-php-rest-api /bin/bash   # go into the php rest api container\n./scripts/connect_db.sql                        # access the database\n```\n\n## Endpoints\n\u003e [!TIP]\n\u003e If you change any port, make sure to change it for all services!\n\nWhen you spin up the docker-compose, you'll get these endpoints:\n- *Frontend*: localhost:8000\n- *Database*: localhost:3306, filled with test data.\n- *ElasticSearch*: localhost:9200 or localhost:8000/es, filled with test data.\n- *Rest API*: localhost:8080\n- *Minio*: console at localhost:9001, filled with test data.\n\n## Credentials\nThe credentials for the database can be found in `config/database/.env`, for minio in `config/minio/.env`, etc.\n\n## Emails\nThe email-server is used for emails from the frontend. For example, if you create a new user, an \nemail is send to the user. All outgoing emails are rerouted to catchall@example.com. You can see \nthe messages in `config/email-server/messages`. Note that some of the urls in the emails need to \nbe slightly altered to use them in the test setup: change https to http.\n\n## Development\n\n### PHP, Parquet and Croissant converter\nIf you want to do local development on containers that are part of the docker-compose, you want those containers to change based on your code. You should have the relevant code somewhere on your system, you only need to tell the docker-compose where to find it. You can do so by setting environment variables. \n\nCreate a `.env` file inside this directory, and set:\n\n#### PHP\n```bash\nPHP_CODE_DIR=/path/to/OpenML                  # Root of https://github.com/openml/OpenML on your computer\nPHP_CODE_VAR_WWW_OPENML=/var/www/openml       # Always set this to /var/www/openml. Leave empty if you leave PHP_CODE_DIR empty\n```\n\nMake sure to create `openml_OS/config/BASE_CONFIG.php` in your local `$PHP_CODE_DIR`. The correct configuration can be found in `config/php.env`. Run docker compose with profile `rest-api`.\n\n#### Parquet\n```bash\nARFF_TO_PQ_CODE_DIR=/path/to/minio-data       # Root of https://github.com/openml-labs/minio-data on your computer\nARFF_TO_PQ_APP=/app                           # Always set this to /app. Leave empty if you leave ARFF_TO_PQ_CODE_DIR empty\n```\n\n#### Croissant\n```bash\nCROISSANT_CODE_DIR=/path/to/openml-croissant/python  # Python directory of https://github.com/openml/openml-croissant on your computer\nCROISSANT_APP=/app                                   # Always set this to /app. Leave empty if you leave CROISSANT_CODE_DIR empty\n```\n\n### Frontend\n```bash\nFRONTEND_CODE_DIR=/path/to/openml.org        # Python directory of https://github.com/openml/openml.org on your computer\nFRONTEND_APP=/app                            # Always set this to /app. Leave empty if you leave FRONTEND_CODE_DIR empty\n```\n\n### Python\n\nYou can run the openml-python code on your own local server now!\n\n```bash\ndocker run --rm -it -v ./config/python/config:/root/.config/openml/config:ro --network openml-services openml/openml-python\n```\n\n\nFor an example of manual tests, you can run:\n```python\n\nimport openml\nfrom openml.tasks import TaskType\nfrom openml.datasets.functions import create_dataset\nimport pandas as pd\nimport numpy as np\n\n\ndf = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))\ndf[\"class\"] = [\"test\" if np.random.randint(0, 1) == 0 else \"test2\" for _ in range(100)]\ndf[\"class\"] = df[\"class\"].astype(\"category\")\n\ndataset = create_dataset(\n    name=\"test_dataset\",\n    description=\"test\",\n    creator=\"I\",\n    contributor=None,\n    collection_date=\"now\",\n    language=\"en\",\n    attributes=\"auto\",\n    ignore_attribute=None,\n    citation=\"citation\",\n    licence=\"BSD (from scikit-learn)\",\n    default_target_attribute=\"class\",\n    data=df,\n    version_label=\"test\",\n    original_data_url=\"https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html\",\n    paper_url=\"url\",\n)\ndataset.publish()\n\n# Meanwhile you can admire your newly created dataset at http://localhost:8000/search?type=data\u0026id=[dataset.id]\n# Wait a minute until dataset is active\n\nmy_task = openml.tasks.create_task(\n    task_type=TaskType.SUPERVISED_CLASSIFICATION,\n    dataset_id=dataset.id,\n    target_name=\"class\",\n    evaluation_measure=\"predictive_accuracy\",\n    estimation_procedure_id=1,\n)\nmy_task.publish()\n\n# wait a minute, so that the dataset and tasks are both processed by the evaluation engine.\n# the evaluation engine runs every minute.\n# Meanwhile you can check out the newly created task at localhost:8000/search?type=task\u0026id=[my_task.id]\n\nmy_task = openml.tasks.get_task(my_task.task_id)\nfrom sklearn import compose, ensemble, impute, neighbors, preprocessing, pipeline, tree\nclf = tree.DecisionTreeClassifier()\nrun = openml.runs.run_model_on_task(clf, my_task)\nrun.publish()\n\n# wait a minute, so the the run is processed by the evaluation engine\n\nrun = openml.runs.get_run(run.id, ignore_cache=True)\nrun.evaluations\n\n# Expected: {'average_cost': 0.0, 'f_measure': 1.0, 'kappa': 1.0, 'mean_absolute_error': 0.0, 'mean_prior_absolute_error': 0.0, 'number_of_instances': 100.0, 'precision': 1.0, 'predictive_accuracy': 1.0, 'prior_entropy': 0.0, 'recall': 1.0, 'root_mean_prior_squared_error': 0.0, 'root_mean_squared_error': 0.0, 'total_cost': 0.0}\n```\n\n\n### Other services\nIf you want to develop a service that depends on any of the services in this docker-compose, just bring up this docker-compose and point your service to the correct endpoints.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenml%2Fservices","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenml%2Fservices","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenml%2Fservices/lists"}