{"id":46963557,"url":"https://github.com/foundation-model-stack/fm-training-estimator","last_synced_at":"2026-03-11T10:03:47.492Z","repository":{"id":251019653,"uuid":"836142063","full_name":"foundation-model-stack/fm-training-estimator","owner":"foundation-model-stack","description":"Estimate resources needed to train LLMs","archived":false,"fork":false,"pushed_at":"2026-02-10T10:22:36.000Z","size":1184,"stargazers_count":14,"open_issues_count":3,"forks_count":9,"subscribers_count":9,"default_branch":"main","last_synced_at":"2026-03-03T00:35:46.769Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/foundation-model-stack.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"code-of-conduct.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-07-31T08:33:14.000Z","updated_at":"2026-02-10T10:08:34.000Z","dependencies_parsed_at":"2025-01-27T11:35:42.864Z","dependency_job_id":"54c7c1f6-c4f7-4382-8174-5c2f5a9ca76a","html_url":"https://github.com/foundation-model-stack/fm-training-estimator","commit_stats":null,"previous_names":["foundation-model-stack/fm-training-estimator"],"tags_count":11,"template":false,"template_full_name":null,"purl":"pkg:github/foundation-model-stack/fm-training-estimator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/foundation-model-stack%2Ffm-training-estimator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/foundation-model-stack%2Ffm-training-estimator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/foundation-model-stack%2Ffm-training-estimator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/foundation-model-stack%2Ffm-training-estimator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/foundation-model-stack","download_url":"https://codeload.github.com/foundation-model-stack/fm-training-estimator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/foundation-model-stack%2Ffm-training-estimator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30377837,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-11T06:09:32.197Z","status":"ssl_error","status_checked_at":"2026-03-11T06:09:17.086Z","response_time":84,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-03-11T10:03:46.814Z","updated_at":"2026-03-11T10:03:47.487Z","avatar_url":"https://github.com/foundation-model-stack.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# FM Training Estimator\n\nEstimators for Large Language Model Training.\n\nEstimate resource consumption - memory, tokens, time etc for training and fine-tuning jobs using an hybrid of theory and learned regression models.\n\n## Feature Matrix and Roadmap\n\n| Technique          | Support            |\n|--------------------|--------------------|\n| Full (1 gpu)       | :heavy_check_mark: |\n| FSDP (multi)       | :heavy_check_mark: |\n| Lora (1 gpu)       | :heavy_check_mark: |\n| QLora (1 gpu)      | :heavy_check_mark: |\n| Speculators        | Planned            |\n| Tensor Parallelism | Planned            |\n\n### Time\n\nFull learned approach. Coverage based on availability of training data.\n\n### Memory\n\nHybrid theory + learned. Coverage of learned approach is subject to availability of training data.\n\n### Tokens\n\nFully theory. Simulation based models available.\n\n| Technique | Explanation                                    | Availability       |\n|-----------|------------------------------------------------|--------------------|\n| TE0       | Simulation based - slow but accurate           | :heavy_check_mark: |\n| TE1       | Statistical                                    | Planned            |\n| TE2       | Approximate - fast, light, reasonable accurate | :heavy_check_mark  |\n\n## Usage\n\nYou can use the library `fm_training_estimator` as a Python package by installing it via pip, see [installation](#install), [build a regression model](#build-a-regression-model-for-learned-prediction-method) and [using the library](#use-the-library-to-get-estimates). If you'd like to construct the estimator service with a [Web UI](#make-estimates-via-a-web-ui) via FastAPI or [build a docker image](#build-a-docker-container-image), clone the repository in your local machine before following the instructions in those sections.\n\nWithin your working directory, it is recommended to create a virtual environment to ensure no conflicts in dependencies.\n\n```\npython -m venv .venv\nsource .venv/bin/activate\n```\n\n### Install\n```\npip install fm_training_estimator\n```\n\n### Build a regression model for learned prediction method\n\nNow, prepare data in the expected format for lookup and regression. The format to be used to save this data is given [here](https://github.com/foundation-model-stack/fm-training-estimator/tree/main/fm_training_estimator/data/README.md). Save your data file into `./workdir/data.csv`.\n\n```\nmkdir workdir\nmv \u003cdata file\u003e ./workdir/data.csv\n```\n\nNow, build a regression model using this data, using one of the the provided make targets.\n\n![Building a model](./imgs/build-model.png)\n\nThis will create a model called `./workdir/model.zip` which you can then use to estimate the resource consumption.\n\nYou can now run the estimator library, see below.\n\n### Using the Estimator\n\nThere are a few ways to use the Estimator now:\n\n1. Using the CLI tool, passing in a config in json format.\n2. Using the Web UI.\n3. Using the SDK directly from Python code.\n\n#### Using the CLI\n\n![Demo of using CLI](./imgs/demo-cli.gif)\n\n### Make estimates via a Web UI\n\nTo do this, first prepare a txt file called `model_whitelist.txt` in the `workdir/` with a list of model names, 1 per line. Note that these are the models on which you want to run the estimator to estimate their resource consumption. You can use the provided [example](https://github.com/foundation-model-stack/fm-training-estimator/blob/main/fm_training_estimator/ui/model_whitelist.txt) and place it in your `workdir`. Modify this list as needed.\n\nNow, run the ui:\n```\nmake run-web-ui\n```\nThis will start the UI on `localhost:3000` port.\n\n(The web ui has other options, not covered in this simple setup. If you want to skip the model whitelisting or change the port, directly run the UI as shown in the README in the `./fm_training_estimator/ui` folder.)\n\n#### Use the library to get estimates\n\nFor a full API reference, visit our [readthedocs](link).\n\nExample code:\n```python\n# Standard\nimport os\n\n# First Party\nfrom fm_training_estimator.config.arguments import (\n    DataArguments,\n    EstimateInput,\n    EstimatorMetadata,\n    FMArguments,\n    HFTrainingArguments,\n    InfraArguments,\n    JobConfig,\n)\nfrom fm_training_estimator.sdk import (\n    estimate_cost,\n    estimate_memory,\n    estimate_time,\n    estimate_tokens,\n)\n\nworkdir_path = os.path.join(os.path.abspath(os.curdir), \"workdir\")\n\nmodel_path = os.path.join(workdir_path, \"model.json\")\nlookup_data_path = os.path.join(workdir_path, \"data.csv\")\n\nestimator_metadata = EstimatorMetadata(base_data_path=lookup_data_path)\n\nfm = FMArguments(\n    base_model_path=\"ibm-granite/granite-7b-base\",\n    torch_dtype=\"bfloat16\",\n    block_size=1024,\n)\nhf_training = HFTrainingArguments(\n    per_device_train_batch_size=1, gradient_checkpointing=False\n)\ndata = DataArguments(dataset=\"imdb\", te_approach=0)\ninfra = InfraArguments(numGpusPerPod=1)\njob_conf = JobConfig(hf_training, fm, data, infra)\nest_input = EstimateInput(estimator_metadata=estimator_metadata, job_configs=[job_conf])\n\nprint(\"Estimating Memory:....\")\n\nprint(\"With only theory: \", estimate_memory(est_input))\nprint(\"With reg model: \", estimate_memory(est_input, model_path))\n\nhf_training.fsdp = \"full_shard\"\n\nprint(\"Using fsdp full shard\")\nprint(\"With only theory: \", estimate_memory(est_input))\nprint(\"With reg model: \", estimate_memory(est_input, model_path))\n\n\nprint(\"Estimating Time:....\")\nprint(\"With only theory: \", estimate_time(est_input))\nprint(\"With reg model: \", estimate_time(est_input, model_path))\n\nprint(\"Estimating Tokens:....\")\nprint(\"With only theory: \", estimate_tokens(est_input))\nprint(\"With reg model: \", estimate_tokens(est_input, model_path))\n```\n\n### Build a Docker Container Image\n\nTo build the estimator container image:\n\n1. Make sure both `model.json` and `data.csv` files are present in the `workdir` folder.\n\n2. Use this command to build and push the image:\n\n```shell\nmake cbuild\nmake cpush # If you want to push to the container registry\n```\n\n3. Use this command to run the image:\n\n```shell\ndocker run --rm -it -v \"/path/to/input.json:/app/input.json\" icr.io/ftplatform/fm_training_estimator:latest\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffoundation-model-stack%2Ffm-training-estimator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffoundation-model-stack%2Ffm-training-estimator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffoundation-model-stack%2Ffm-training-estimator/lists"}