{"id":24108814,"url":"https://github.com/beomi/gemma-easylm","last_synced_at":"2025-05-07T06:55:25.270Z","repository":{"id":225432438,"uuid":"765976908","full_name":"Beomi/Gemma-EasyLM","owner":"Beomi","description":"Train GEMMA on TPU/GPU! (Codebase for training Gemma-Ko Series)","archived":false,"fork":false,"pushed_at":"2024-03-02T03:14:27.000Z","size":420,"stargazers_count":47,"open_issues_count":0,"forks_count":10,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-05-07T06:55:20.001Z","etag":null,"topics":["easylm","flax","gemma","huggingface","jax","language-model","tpu","transformers"],"latest_commit_sha":null,"homepage":"https://huggingface.co/beomi/gemma-ko-7b","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Beomi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-03-02T02:18:31.000Z","updated_at":"2025-03-07T04:23:10.000Z","dependencies_parsed_at":"2024-03-02T04:24:52.081Z","dependency_job_id":"aef60aeb-bb1e-4b96-a4ac-55690213cfa0","html_url":"https://github.com/Beomi/Gemma-EasyLM","commit_stats":null,"previous_names":["beomi/gemma-easylm"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Beomi%2FGemma-EasyLM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Beomi%2FGemma-EasyLM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Beomi%2FGemma-EasyLM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Beomi%2FGemma-EasyLM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Beomi","download_url":"https://codeload.github.com/Beomi/Gemma-EasyLM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252831312,"owners_count":21810783,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["easylm","flax","gemma","huggingface","jax","language-model","tpu","transformers"],"created_at":"2025-01-11T00:01:19.185Z","updated_at":"2025-05-07T06:55:25.265Z","avatar_url":"https://github.com/Beomi.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Gemma-EasyLM\n\nThis document outlines the integration of the Gemma model into the EasyLM framework, including instructions for training, converting the model format, and serving the model with Gradio.\n\n## Training: Integrating HF Flax Weights into EasyLM\n\n### Step 1: Consolidate Flax Weights from Hugging Face\n\n\u003e You can skip this step with downloading https://huggingface.co/beomi/gemma-ko-7b/resolve/flax-init/flax_model.msgpack\n\nFirstly, concatenate all Flax model weights available at: [Hugging Face - Gemma 7B](https://huggingface.co/google/gemma-7b/tree/flax).\n\nUse the following example code to accomplish this:\n\n```python\nfrom transformers import GemmaForCausalLM\n\nmodel = GemmaForCausalLM.from_pretrained(\"google/gemma-7b\", torch_dtype=\"auto\")\nmodel.save_pretrained(\"./flax-concatted\", max_shard_size=\"99GB\")\n```\n\nThis script generates a `flax-concatted/flax_model.msgpack` file. We will utilize this `.msgpack` file during the training process.\n\n### Step 2: Upload the .msgpack File to Google Cloud Storage (GCS)\n\nExecute the following command to upload the generated `.msgpack` file to your GCS repository:\n\n```bash\ngsutil cp ./flax-concatted/flax_model.msgpack gs://YOUR_GCS_REPO_NAME\n```\n\n### Step 3: Modify the `train.sh` Script\n\nAdjust the paths for `load_checkpoint`, `train_dataset.json_dataset.path`, and `logger.output_dir` within the `train.sh` script to match your setup.\n\nThe provided example `train.sh` script assumes training will be conducted on a TPUv4-64 pod slice.\n\n### Step 4: Initiate Training\n\nExecute the training script to start the training process:\n\n```\n./train.sh\n```\n\n## Conversion: From EasyLM to Hugging Face Format\n\n### Step 1: Retrieve the `streaming_train_state` File\n\nDownload the `streaming_train_state` file from your GCS repository using the following command:\n\n```\ngsutil cp gs://YOUR_GCS_REPO_NAME/.../streaming_train_state_80000 .\n```\n\nNote: The file name will either be `streaming_train_state` or `streaming_train_state_STEPNO`.\n\n### Step 2: Update the `.stream` File Path\n\nIn the `convert_easylm_stream_to_hf_safetensors.py` file, modify the path to the `.stream` file accordingly:\n\n```python\n# Modify this line\n_, param = StreamingCheckpointer.load_trainstate_checkpoint(load_from='trainstate_params::/home/latheledusjp/streaming_train_state_80000')\n```\n\n### Step 3: Execute the Conversion Script\n\nRun the conversion script to convert the EasyLM model format to Hugging Face's format:\n\n```\npython convert_easylm_stream_to_hf_safetensors.py\n```\n\n### Step 4: Verify the Output Files\n\nCheck the generated output files in the `./gemma-ko-8.5b-dev` directory.\n\n\u003e The Flax-version of the weight file can be found in the `./flax-gemma-ko-8b` folder.\n\n## Serving the Model with Gradio\n\nTo serve the model using Gradio, follow these steps:\n\n```\ncd EasyLM/models/gemma\npip install -r serving_requirements.txt\n./serve_test.sh\n```\n\n## Original EasyLM Reference\nIf you found EasyLM useful in your research or applications, please cite using the following BibTeX:\n```\n@software{geng2023easylm,\n  author = {Geng, Xinyang},\n  title = {EasyLM: A Simple And Scalable Training Framework for Large Language Models},\n  month = March,\n  year = 2023,\n  url = {https://github.com/young-geng/EasyLM}\n}\n```\n\n## Credits\n* The LLaMA implementation is from [JAX_llama](https://github.com/Sea-Snell/JAX_llama)\n* The JAX/Flax GPT-J and RoBERTa implementation are from [transformers](https://huggingface.co/docs/transformers/main/en/index)\n* Most of the JAX utilities are from [mlxu](https://github.com/young-geng/mlxu)\n* The codebase is heavily inspired by [JAXSeq](https://github.com/Sea-Snell/JAXSeq)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbeomi%2Fgemma-easylm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbeomi%2Fgemma-easylm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbeomi%2Fgemma-easylm/lists"}