{"id":31580551,"url":"https://github.com/anto18671/efficientvit-b4.r256","last_synced_at":"2025-10-05T21:15:00.376Z","repository":{"id":264504306,"uuid":"868544378","full_name":"anto18671/efficientvit-b4.r256","owner":"anto18671","description":"Pretraining the EfficientViT-B4 model on the ImageNet-1k dataset","archived":false,"fork":false,"pushed_at":"2024-10-06T20:30:29.000Z","size":12,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-04T05:52:58.658Z","etag":null,"topics":["computer-vision","efficientvit","imagenet-1k","pretraining","vision-transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/anto18671.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-06T16:49:13.000Z","updated_at":"2024-10-16T17:42:00.000Z","dependencies_parsed_at":"2024-11-24T23:48:15.392Z","dependency_job_id":null,"html_url":"https://github.com/anto18671/efficientvit-b4.r256","commit_stats":null,"previous_names":["anto18671/efficientvit-b4.r256"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/anto18671/efficientvit-b4.r256","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anto18671%2Fefficientvit-b4.r256","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anto18671%2Fefficientvit-b4.r256/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anto18671%2Fefficientvit-b4.r256/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anto18671%2Fefficientvit-b4.r256/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/anto18671","download_url":"https://codeload.github.com/anto18671/efficientvit-b4.r256/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anto18671%2Fefficientvit-b4.r256/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278520462,"owners_count":26000489,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-05T02:00:06.059Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","efficientvit","imagenet-1k","pretraining","vision-transformer"],"created_at":"2025-10-05T21:14:54.988Z","updated_at":"2025-10-05T21:15:00.367Z","avatar_url":"https://github.com/anto18671.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# EfficientViT-B4 Pretraining on ImageNet-1k\n\nThis repository contains the code and configuration for pretraining the **EfficientViT-B4** model on the **ImageNet-1k** dataset. The model is designed for efficient vision processing with optimized performance and resource utilization.\n\n## Installation\n\nClone the repository and install the required dependencies:\n\n```bash\ngit clone https://github.com/anto18671/efficientvit-b4.r256.git\ncd efficientvit-b4.r256\npip install -r requirements.txt\n```\n\nThe dependencies include:\n- **PyTorch**\n- **torchvision**\n- **timm** (PyTorch Image Models)\n- **Hugging Face `datasets`**\n- **torchsummary**\n- **tqdm**\n\n## Dataset\n\nThe pretraining uses the **ImageNet-1k** dataset, which consists of 1.2 million images across 1000 categories. The dataset is automatically loaded using Hugging Face's `datasets` library.\n\n## Pretraining\n\nTo start the pretraining process, make sure you have the following prerequisites:\n\n### Prerequisites\n1. **GPU Support**: The pretraining is optimized to run on systems with NVIDIA GPUs. Ensure CUDA and the necessary drivers are installed on your machine.\n   - CUDA Version: 12.4 (or compatible version)\n   - CuDNN: Version 9\n\n2. **Environment Setup**:\n   - Ensure the correct version of **PyTorch** with GPU support is installed.\n   - Your system should have enough GPU memory to handle the specified batch size. Modify the batch size if necessary.\n\n3. **Hugging Face Authentication**:\n   - You will need to authenticate with Hugging Face to access the ImageNet-1k dataset. Set your Hugging Face token in the environment:\n\n   ```bash\n   export HUGGINGFACE_TOKEN=\u003cyour_huggingface_token\u003e\n   ```\n\n### Starting Pretraining\n\nOnce the environment is set up, and the GPU is ready, run the `pre.py` script to begin pretraining:\n\n```bash\npython pre.py\n```\n\nThis script will:\n- Initialize the **EfficientViT-B4** model.\n- Set up the data pipelines with transformations (resizing, augmentation, normalization).\n- Configure the optimizer (AdamW) and the learning rate scheduler.\n- Start pretraining from scratch or resume from the last saved checkpoint if any.\n\n### Running in a Docker Environment\n\nIf you're using Docker for pretraining, follow these steps:\n\n1. **Pull the Docker Image**:\n\n   ```bash\n   docker pull ghcr.io/anto18671/efficientvit-b4.r256:latest\n   ```\n\n2. **Run the Docker Container with GPU Support**:\n\n   ```bash\n   docker run --gpus all --env HUGGINGFACE_TOKEN=\u003cyour_huggingface_token\u003e ghcr.io/anto18671/efficientvit-b4.r256:latest\n   ```\n\nEnsure that the Docker setup has GPU support enabled. Use the `--gpus all` flag to allow Docker to utilize the available GPUs.\n\n### Checkpoints\n\n- **Best model**: Automatically saved whenever the validation accuracy improves.\n- **Last checkpoint**: Saved at the end of each epoch to allow resuming from the most recent state.\n\n## Model Architecture\n\nThe **EfficientViT-B4** model is part of the EfficientViT family, designed for optimal speed and accuracy in vision tasks. This implementation uses custom configuration settings to balance computational efficiency and model performance.\n\n- **Model architecture**: EfficientViT-B4\n- **Input size**: 256x256 pixels\n- **Pretraining**: The model is trained from scratch, with no initial weights.\n\n## Training Configuration\n\n- **Optimizer**: AdamW with weight decay\n- **Learning Rate**: 1e-4 (with exponential decay)\n- **Batch Size**: 42 (adjustable based on GPU memory)\n- **Gradient Accumulation**: 3 steps to control memory usage\n- **Epochs**: 16\n- **Data Augmentation**: Resize, Color Jitter, Random Horizontal Flip, and Normalization\n\n## Resume Pretraining\n\nIf pretraining is interrupted, the script will automatically resume from the last checkpoint. The model, optimizer, and scheduler states are restored from the latest saved checkpoint.\n\n## Results and Validation\n\nDuring pretraining, validation is performed at the end of each epoch to evaluate the model's performance. Metrics such as loss and accuracy are logged and tracked.\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanto18671%2Fefficientvit-b4.r256","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fanto18671%2Fefficientvit-b4.r256","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanto18671%2Fefficientvit-b4.r256/lists"}