{"id":30178304,"url":"https://github.com/alienkevin/flowmo","last_synced_at":"2025-08-12T05:20:23.374Z","repository":{"id":304286673,"uuid":"971114926","full_name":"AlienKevin/FlowMo","owner":"AlienKevin","description":null,"archived":false,"fork":false,"pushed_at":"2025-07-12T06:16:00.000Z","size":12843,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-12T06:23:45.079Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AlienKevin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-23T03:25:29.000Z","updated_at":"2025-07-12T06:16:04.000Z","dependencies_parsed_at":"2025-07-12T06:34:05.296Z","dependency_job_id":null,"html_url":"https://github.com/AlienKevin/FlowMo","commit_stats":null,"previous_names":["alienkevin/flowmo"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/AlienKevin/FlowMo","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlienKevin%2FFlowMo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlienKevin%2FFlowMo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlienKevin%2FFlowMo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlienKevin%2FFlowMo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AlienKevin","download_url":"https://codeload.github.com/AlienKevin/FlowMo/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlienKevin%2FFlowMo/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270005591,"owners_count":24510939,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-12T02:00:09.011Z","response_time":80,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-12T05:20:22.169Z","updated_at":"2025-08-12T05:20:23.342Z","avatar_url":"https://github.com/AlienKevin.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization\n\nThis repo contains the code for our FlowMo model training and evaluation. Check out our paper for more details: https://www.arxiv.org/abs/2503.11056\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"demo.gif\" alt=\"sample GIF\" /\u003e\n\u003c/p\u003e\n\n## Get the code\n```\ngit clone https://github.com/kylesargent/FlowMo\ncd FlowMo\n```\n\n## Install the requirements\n```\nconda create -n FlowMo python=3.13.2 pip\nconda activate FlowMo\npip install torch==2.6.0 torchvision --index-url https://download.pytorch.org/whl/cu124\npip install -r requirements.txt\n```\nNote: The torch and cuda version above were what we used to produce the paper results. But we've tested torch 2.4, 2.5, 2.6 and attained similar performance with all.\n\n## Prepare the data\nThe dataset is read directly from the standard public ImageNet tar files. I have created indices for these tarfiles so that there is no data preprocessing needed. Please download the datasets and indices with the commands below. If you don't donwload them at the toplevel (like FlowMo/*.tar), you need to modify the corresponding path in `flowmo/configs/base.yaml`.\n\n```\nwget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_train.tar\nwget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar\nwget https://huggingface.co/ksarge/FlowMo/resolve/main/imagenet_train_index_overall.json\nwget https://huggingface.co/ksarge/FlowMo/resolve/main/imagenet_val_index_overall.json\n```\n\n## Train your models\nFlowMo is trained in two stages. The first stage is standard diffusion autoencoder training. In the second stage, we drop the batch size and LR and backpropagate through the sampling chain with a sample-level loss. For more details, please check the paper. For post-training, it is recommended to save more checkpoints and to concurrently run the continuous evaluator. Then you can select the best checkpoint based on early-stopping to counteract eventual reward hacking. \u003cstrong\u003eFor post-training, please supply your checkpoint path from pre-training.\u003c/strong\u003e\n\nThe training commands for FlowMo-Lo are below. It is recommended to pre-train FlowMo-Lo for ~130 epochs minimum to match the paper result, but you may increase `trainer.max_steps` for better performance.\n```\ntorchrun --nproc-per-node=8 -m flowmo.train \\\n    --experiment-name \"flowmo_lo_pretrain\" \\\n    model.context_dim=18 model.codebook_size_for_entropy=9 \\\n    trainer.max_steps=1300000\n\ntorchrun --nproc-per-node=8 -m flowmo.train \\\n    --experiment-name \"flowmo_lo_posttrain\" \\\n    --resume-from-ckpt ... \\\n    model.context_dim=18 model.codebook_size_for_entropy=9 \\\n    trainer.max_steps=1325000\n    opt.lr=0.00005 \\\n    data.batch_size=8 \\\n    opt.n_grad_acc=2 \\\n    model.posttrain_sample=true \\\n    opt.lpips_mode='resnet' \\\n    opt.lpips_weight=0.01 \\\n    trainer.log_every=100 \\\n    trainer.checkpoint_every=5000 \\\n    trainer.keep_every=5000 \\\n```\nThe training commands for FlowMo-Hi are below. It is recommended to pre-train FlowMo-Hi for ~80 epochs minimum to match the paper result, but you may increase `trainer.max_steps` for better performance. \n```\ntorchrun --nproc-per-node=8 -m flowmo.train \\\n    --experiment-name \"flowmo_hi_pretrain\" \\\n    model.context_dim=56 model.codebook_size_for_entropy=14 \\\n    trainer.max_steps=800000\n\ntorchrun --nproc-per-node=8 -m flowmo.train \\\n    --experiment-name \"flowmo_hi_posttrain\" \\\n    --resume-from-ckpt ... \\\n    model.context_dim=56 model.codebook_size_for_entropy=14 \\\n    trainer.max_steps=825000\n    opt.lr=0.00005 \\\n    data.batch_size=8 \\\n    opt.n_grad_acc=2 \\\n    model.posttrain_sample=true \\\n    opt.lpips_mode='resnet' \\\n    opt.lpips_weight=0.01 \\\n    trainer.log_every=100 \\\n    trainer.checkpoint_every=5000 \\\n    trainer.keep_every=5000 \\\n```\n\n## Evaluation\nTo evaluate an experiment (continuously as new checkpoints are added, or just latest checkpoint if continuous=False), run\n\n```\ntorchrun --nproc-per-node=1 -m flowmo.evaluate \\\n    --experiment-name flowmo_lo_prettrain_eval \\\n    eval.eval_dir=results/flowmo_lo_prettrain \\\n    eval.continuous=true \\\n    model.context_dim=18 model.codebook_size_for_entropy=9\n```\n\nTo reproduce the results of the paper, the commands below will reproduce the performance of FlowMo-Lo and FlowMo-Hi respectively, assuming you have already downloaded the necessary checkpoints (see next section).\n```\ntorchrun --nproc-per-node=1 -m flowmo.evaluate \\\n    --experiment-name \"flowmo_lo_posttrain_eval\" \\\n    eval.eval_dir=results/flowmo_lo_posttrain \\\n    eval.continuous=false \\\n    eval.force_ckpt_path='flowmo_lo.pth' \\\n    model.context_dim=18 model.codebook_size_for_entropy=9\n\ntorchrun --nproc-per-node=1 -m flowmo.evaluate \\\n    --experiment-name \"flowmo_hi_posttrain_eval\" \\\n    eval.eval_dir=results/flowmo_hi_posttrain \\\n    eval.continuous=false \\\n    eval.force_ckpt_path='flowmo_hi.pth' \\\n    model.context_dim=56 model.codebook_size_for_entropy=14\n```\nTo speed up eval, you may subsample the data by passing eval.subsample_rate=N to subsample the validation dataset by NX, so that 10 corresponds to 10x subsampling, etc. Note that this will lead to less accurate rFID estimates. Also, the evaluator is distributed, so if you increase --nproc-per-node the evaluation will finish correspondingly faster.\n\n\n## Get and use the pre-trained models\nIf you want to evaluate the pre-trained models, you may download them like so:\n```\nwget https://huggingface.co/ksarge/FlowMo/resolve/main/flowmo_lo.pth\nwget https://huggingface.co/ksarge/FlowMo/resolve/main/flowmo_hi.pth\n```\nThe provided notebook `example.ipynb` shows how to use the FlowMo tokenizer to reconstruct images. Within the FlowMo conda environment, you can install a notebook kernel like so:\n```\npython3 -m ipykernel install --user --name FlowMo\n```\n\n## Resource requirements and smaller models\nOur main two models (FlowMo-Lo, FlowMo-Hi) were trained on 8 H100 GPUs. However, if your computational resources are limited, you may attain comparable though slightly worse performance by reducing the width and increasing the patch size, by modifying the launch script to pass `model.patch_size=8` and `model.mup_width=4`, or alternatively modifying `configs/base.yaml` with those values.\n\nStill, to reproduce the performance of the models in the paper, you will need to use the larger model configurations.\n\n## Acknowledgement\nOur code base was based off https://github.com/TencentARC/SEED-Voken. We also use code from https://github.com/markweberdev/maskbit and https://github.com/black-forest-labs/flux. Thanks for the great contributions.\n\n## Citation\nIf you find FlowMo useful, please cite us.\n\n```\n@misc{sargent2025flowmodemodeseekingdiffusion,\n      title={Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization}, \n      author={Kyle Sargent and Kyle Hsu and Justin Johnson and Li Fei-Fei and Jiajun Wu},\n      year={2025},\n      eprint={2503.11056},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https://arxiv.org/abs/2503.11056}, \n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falienkevin%2Fflowmo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falienkevin%2Fflowmo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falienkevin%2Fflowmo/lists"}