{"id":18556628,"url":"https://github.com/ahmdtaha/distributed_sigmoid_loss","last_synced_at":"2025-07-27T16:09:06.929Z","repository":{"id":196277932,"uuid":"695658102","full_name":"ahmdtaha/distributed_sigmoid_loss","owner":"ahmdtaha","description":"Unofficial implementation for Sigmoid Loss for Language Image Pre-Training","archived":false,"fork":false,"pushed_at":"2023-09-26T18:10:22.000Z","size":64,"stargazers_count":10,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-24T07:59:54.470Z","etag":null,"topics":["contrastive-learning","distributed-data-parallel","multimodal-deep-learning","python3","pytorch","self-supervised-learning","unsupervised-learning","vision-and-language","vision-language","vision-language-pretraining","vision-transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ahmdtaha.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-23T20:39:06.000Z","updated_at":"2024-12-08T08:35:56.000Z","dependencies_parsed_at":"2024-11-06T21:36:10.779Z","dependency_job_id":"121cc47d-4f76-4130-b984-7863ddd7ede8","html_url":"https://github.com/ahmdtaha/distributed_sigmoid_loss","commit_stats":null,"previous_names":["ahmdtaha/distributed_sigmoid_loss"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ahmdtaha/distributed_sigmoid_loss","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahmdtaha%2Fdistributed_sigmoid_loss","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahmdtaha%2Fdistributed_sigmoid_loss/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahmdtaha%2Fdistributed_sigmoid_loss/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahmdtaha%2Fdistributed_sigmoid_loss/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ahmdtaha","download_url":"https://codeload.github.com/ahmdtaha/distributed_sigmoid_loss/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahmdtaha%2Fdistributed_sigmoid_loss/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267384061,"owners_count":24078574,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-27T02:00:11.917Z","response_time":82,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["contrastive-learning","distributed-data-parallel","multimodal-deep-learning","python3","pytorch","self-supervised-learning","unsupervised-learning","vision-and-language","vision-language","vision-language-pretraining","vision-transformer"],"created_at":"2024-11-06T21:32:28.234Z","updated_at":"2025-07-27T16:09:06.911Z","avatar_url":"https://github.com/ahmdtaha.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Distributed Sigmoid Loss\n\nUnofficial PyTorch implementation for the loss function from [Sigmoid Loss for Language Image Pre-Training](https://arxiv.org/abs/2303.15343). **This code hasn't been reviewed yet.**\n\n### TL;DR\n\nThis is a distributed implementation for Sigmoid Loss. This distributed implementation supports training on multi-GPUs.\n\n## Sigmoid Loss Overview\n\nZhai et al. proposed Sigmoid loss for Language Image Pre-training. This new loss is supposed to be a better alternative for InfoNCE ([Clip](https://arxiv.org/abs/2103.00020)) loss. Concretely, Sigmoid loss promises better convergence especially with smaller batch-sizes (Fig. 2) and noisier datasets (Fig. 7). Zhai et al. presented a pseudo implementation for Sigmoid Loss as follows\n\n![Sigmoid Loss Pseudo Implementation.png](./imgs/sigmoid_loss_pseudo_implementation.png)\n\nWhile helpful, this pseudo implementation assumes a single GPU. Yet, Vision-Language models are always trained on multiple GPUs. So, this repos delivers a distributed sigmoid loss implementation using PyTorch to run on multiple-GPUs.\n\n### Usage Notes\n\n- This code works with DistributedDataParallel (DDP) only and not DataPrallel (DP).\n- Please make sure to pass the loss parameters (t_prime, bias) to your optimizer (e.g., Adam optimizer). These parameters ought to be updated during training.\n\n### MISC Notes\n\n- I am currently looking for people to review this code.\n- Credit is due for [KeremTurgutlu](https://gist.github.com/KeremTurgutlu) as I am borrowing some of [his code](https://gist.github.com/KeremTurgutlu/4a6f7078dc62f292c85b9903197c75f7)\n- [Ross Wightman](https://github.com/rwightman) released [his distributed implementation](https://github.com/mlfoundations/open_clip/pull/634) for Sigmoid loss which makes this repos absolute :)\n- There are differences between Wightman's implementation and mine in terms of (1) communication between GPUs, (2) where the loss parameters (temperature and bias) are stored.\n- I plan to both maintain this repos and publish tests to compare Wightman's implementation and mine.\n\n## Release History\n\n- 1.0.0\n  - 23 Sep 2023: First code commit\n  - 25 Sep 2023: Fix t_prime initialize, i.e., set t` = log(10); Clean Code a bit (e.g., L2 Normalize outside the loss function)\n  - 26 Sep 2023: Add Ross Wightman implementation to this repos for verification purpose.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fahmdtaha%2Fdistributed_sigmoid_loss","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fahmdtaha%2Fdistributed_sigmoid_loss","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fahmdtaha%2Fdistributed_sigmoid_loss/lists"}