{"id":15481889,"url":"https://github.com/zeke/github-avatars","last_synced_at":"2025-04-28T10:49:39.785Z","repository":{"id":42046770,"uuid":"430989476","full_name":"zeke/github-avatars","owner":"zeke","description":"A machine learning model to detect whether a GitHub user has a custom or default avatar","archived":false,"fork":false,"pushed_at":"2022-04-15T17:45:15.000Z","size":5182,"stargazers_count":13,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-20T10:56:41.773Z","etag":null,"topics":["cog","machine-learning","replicate","scikit-learn"],"latest_commit_sha":null,"homepage":"https://replicate.com/zeke/github-avatars","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zeke.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-11-23T06:39:01.000Z","updated_at":"2024-05-30T06:17:03.000Z","dependencies_parsed_at":"2022-08-12T03:20:28.201Z","dependency_job_id":null,"html_url":"https://github.com/zeke/github-avatars","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zeke%2Fgithub-avatars","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zeke%2Fgithub-avatars/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zeke%2Fgithub-avatars/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zeke%2Fgithub-avatars/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zeke","download_url":"https://codeload.github.com/zeke/github-avatars/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251299446,"owners_count":21567224,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cog","machine-learning","replicate","scikit-learn"],"created_at":"2024-10-02T05:06:47.309Z","updated_at":"2025-04-28T10:49:39.764Z","avatar_url":"https://github.com/zeke.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# GitHub Avatars\n\nA machine learning model to differentiate default GitHub avatars from custom ones.\n\n| example default avatar | example custom avatar | \n| ------- | ------ |\n| ![default](avatars/default/abrim.png) | ![custom](avatars/custom/zeke.png) |\n\n\n## Development\n\nFirst, install [Cog](https://github.com/replicate/cog#install). Then:\n\n```\nscript/start\n```\n\nJupyter will output a URL to visit in your browser.\n\n## Build\n\nSaving any changes you've made to the notebook.\n\nIf you've never run the build before:\n\n```\ncog build\n```\n\nThen:\n\n```\nscript/build\n```\n\nThis will convert the notebook to a python script and build the cog image.\n\n## Usage\n\n### Training\n\nRun the notebook, and run all the cells. This will output an updated `model.pkl`.\n\n### Predicting\n\nCall `cog predict` and specify a GitHub username as input to get a prediction:\n\n```\ncog predict -i username=zeke\n```\n\nYou should see output like this:\n\n```json\n{\n  \"username\": \"zeke\",\n  \"href\": \"https://github.com/zeke.png\",\n  \"prediction\": \"custom\"\n}\n```\n\n---\n\n## Notes\n\n- Start with scikit-learn. Then maybe use torch.\n- scikit-learn is easy to start. If it works, great! Otherwise we can switch to torch and use scikit-learn implementation as a baseline.\n- It's easy to make bugs in torch\n- It's harder to make bugs in scikit\n\n### Approach (v1)\n\nCreate a color histogram for each image by reducing it to 8 bit (or 4 bit, or 3). Traditional learning models suffer from the \"curse of dimensionality\", wherein the higher the dimensionality, the harder to learn. (Not so for deep learning). 768 values (256 * 3) is actually quite a lot of dimensions for a small dataset. There is a connection between the size of dataset and inputs. For small datasets, one should use fewer inputs.\n\n1. create a data pipeline that reads in images\n1. input is image and a label (default or not)\n1. output is a 24-valued histogram plus the label\n\n\n### Notes, January 2022\n\nFeature Engineering is a process of manually constructing features that suit that task at hand. Our current feature is a color histogram, counting the number of colors in each image. But we could also contruct features in a different way, for example coundting the number of unique colors in each image.\n\nFeature engineering is common in traditional ML, but in deep learning the emphasis is more on the learning of the model itself, rather than the learning of the features. Specifically, finding the right model architecture. Feature engineering isn't really used any more.\n\nToday we'll focus on understanding why the model makes the deicsions it makes, based on the features we've constructed, and hopefully improve our feature engineering step based on what we learn.\n\n### Approach (v2)\n\nv1 was flawed, in our quantizing process caused the \"github grey\" color to be bucketed along with \"white\", leading to some custom avatars being misidentified as default avatars.\n\nIn this new version, we'll create a new feature vector using:\n\n- the number of \"GitHub grey\" pixels in the image\n- the number of distinct colors in the image\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzeke%2Fgithub-avatars","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzeke%2Fgithub-avatars","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzeke%2Fgithub-avatars/lists"}