{"id":19156248,"url":"https://github.com/kyegomez/screenai","last_synced_at":"2025-05-15T18:06:58.362Z","repository":{"id":221480629,"uuid":"754489279","full_name":"kyegomez/ScreenAI","owner":"kyegomez","description":"Implementation of the ScreenAI model from the paper: \"A Vision-Language Model for UI and Infographics Understanding\"","archived":false,"fork":false,"pushed_at":"2025-04-04T12:57:02.000Z","size":2284,"stargazers_count":333,"open_issues_count":2,"forks_count":30,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-04-08T01:34:37.856Z","etag":null,"topics":["ai","artificial-intelligence","attention","attention-is-all-you-need","gpt-4","machine-learning","ml","pytorch","tensorflow"],"latest_commit_sha":null,"homepage":"https://discord.gg/GYbXvDGevY","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kyegomez.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["kyegomez"],"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"lfx_crowdfunding":null,"custom":null}},"created_at":"2024-02-08T06:48:49.000Z","updated_at":"2025-04-04T21:31:24.000Z","dependencies_parsed_at":"2024-02-08T08:26:58.067Z","dependency_job_id":"5b49dc9b-829c-4b0d-95e2-06e7ba960173","html_url":"https://github.com/kyegomez/ScreenAI","commit_stats":{"total_commits":20,"total_committers":2,"mean_commits":10.0,"dds":0.09999999999999998,"last_synced_commit":"119cecb2e6b305d3503168cd6ce6930bade931ba"},"previous_names":["kyegomez/screenai"],"tags_count":0,"template":false,"template_full_name":"kyegomez/Python-Package-Template","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyegomez%2FScreenAI","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyegomez%2FScreenAI/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyegomez%2FScreenAI/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kyegomez%2FScreenAI/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kyegomez","download_url":"https://codeload.github.com/kyegomez/ScreenAI/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254394720,"owners_count":22063984,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","artificial-intelligence","attention","attention-is-all-you-need","gpt-4","machine-learning","ml","pytorch","tensorflow"],"created_at":"2024-11-09T08:33:51.735Z","updated_at":"2025-05-15T18:06:58.300Z","avatar_url":"https://github.com/kyegomez.png","language":"Python","funding_links":["https://github.com/sponsors/kyegomez"],"categories":[],"sub_categories":[],"readme":"[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)\n\n# Screen AI\nImplementation of the ScreenAI model from the paper: \"A Vision-Language Model for UI and Infographics Understanding\". The flow is:\nimg + text -\u003e patch sizes -\u003e vit -\u003e embed + concat -\u003e attn + ffn -\u003e cross attn + ffn + self attn -\u003e to out. [PAPER LINK: ](https://arxiv.org/abs/2402.04615)\n\n## Install\n`pip3 install screenai`\n\n## Usage\n```python\n\nimport torch\nfrom screenai.main import ScreenAI\n\n# Create a tensor for the image\nimage = torch.rand(1, 3, 224, 224)\n\n# Create a tensor for the text\ntext = torch.randn(1, 1, 512)\n\n# Create an instance of the ScreenAI model with specified parameters\nmodel = ScreenAI(\n    patch_size=16,\n    image_size=224,\n    dim=512,\n    depth=6,\n    heads=8,\n    vit_depth=4,\n    multi_modal_encoder_depth=4,\n    llm_decoder_depth=4,\n    mm_encoder_ff_mult=4,\n)\n\n# Perform forward pass of the model with the given text and image tensors\nout = model(text, image)\n\n# Print the shape of the output tensor\nprint(out)\n\n\n```\n\n# License\nMIT\n\n\n## Citation\n```bibtex\n\n@misc{baechler2024screenai,\n    title={ScreenAI: A Vision-Language Model for UI and Infographics Understanding}, \n    author={Gilles Baechler and Srinivas Sunkara and Maria Wang and Fedir Zubach and Hassan Mansoor and Vincent Etter and Victor Cărbune and Jason Lin and Jindong Chen and Abhanshu Sharma},\n    year={2024},\n    eprint={2402.04615},\n    archivePrefix={arXiv},\n    primaryClass={cs.CV}\n}\n```\n\n# Todo\n- [ ] Implement the nn.ModuleList([]) in the encoder and decoder\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkyegomez%2Fscreenai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkyegomez%2Fscreenai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkyegomez%2Fscreenai/lists"}