{"id":48332202,"url":"https://github.com/som-research/hfcommunity","last_synced_at":"2026-04-05T01:08:57.491Z","repository":{"id":50306282,"uuid":"516389626","full_name":"SOM-Research/HFCommunity","owner":"SOM-Research","description":"HFCommunity offers an offline up-to-date relational database built from the data available at the Hugging Face Hub, providing queriable data about the repositories hosted in the Hub","archived":false,"fork":false,"pushed_at":"2024-10-14T16:05:11.000Z","size":473440,"stargazers_count":15,"open_issues_count":0,"forks_count":2,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-09-05T02:24:08.386Z","etag":null,"topics":["data-science","database","dataset","huggingface"],"latest_commit_sha":null,"homepage":"https://som-research.github.io/HFCommunity","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc-by-sa-4.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SOM-Research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":"GOVERNANCE.md","roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-07-21T13:47:30.000Z","updated_at":"2025-04-27T04:51:13.000Z","dependencies_parsed_at":"2024-06-27T07:27:12.391Z","dependency_job_id":"6824d7d9-56ef-4636-839c-346cf538c27c","html_url":"https://github.com/SOM-Research/HFCommunity","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/SOM-Research/HFCommunity","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SOM-Research%2FHFCommunity","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SOM-Research%2FHFCommunity/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SOM-Research%2FHFCommunity/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SOM-Research%2FHFCommunity/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SOM-Research","download_url":"https://codeload.github.com/SOM-Research/HFCommunity/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SOM-Research%2FHFCommunity/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31420789,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T00:25:07.052Z","status":"ssl_error","status_checked_at":"2026-04-05T00:25:05.923Z","response_time":60,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","database","dataset","huggingface"],"created_at":"2026-04-05T01:08:56.812Z","updated_at":"2026-04-05T01:08:57.454Z","avatar_url":"https://github.com/SOM-Research.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# HFCommunity\n\nHFCommunity is a dataset built via a data collection process relying on the [Hugging Face Hub (HFH)](https://huggingface.co) API and Git. \n\nHFCommunity dataset is provided as a relational database, and therefore it can be queried via SQL-like languages to enable empirical analysis on ML projects.\n\nThe following figure shows the architecture of HFCommunity. \n\n![HFCommunity Architecture](imgs/architecture.png)\n\nAs can be seen, HFCommunity is composed of two main components: \n\n* **Dataset Extractor**. The Dataset Extractor includes extractors for the different HFH data elements (i.e., datasets, models, and spaces) and a database importer to store the extracted data. Note that the database importer follows the [conceptual schema for HFCommunity](https://som-research.github.io/HFCommunity/diagram.html), which includes the main entities and relationships to query HFH data (e.g., model, dataset, space, issue or discussion elements).\n\n* **Website**. The Website is a web application that includes the main technical documentation of the tool and the last HFCommunity dataset dumps to be downloaded. A new release of HFCommunity is released every month.\n\n## Dataset Extractor\n\nThe Dataset Extractor has been developed in Python and is in charge of importing the HFH data into the HFCommunity dataset. \n\nTo execute the Dataset Extractor please refer to the [docs](https://som-research.github.io/HFCommunity/docs/usage.html).\n\n## Website\n\nThe website of HFCommunity is located [here](https://som-research.github.io/HFCommunity/).\n\nThe technical documentation of the tool is located [here](https://som-research.github.io/HFCommunity/docs/).\n\n# How to cite HFCommunity\n\nThis repository has the `CITATION.cff` file, which activates the \"*Cite this repository*\" button in the About section (right side of the repository). The citation is in APA and BibTex format.  \n\n# Contributing\n\nThis project is part of a research line of the [SOM Research Lab](https://som-research.uoc.edu/) and [BESSER project](https://github.com/besser-pearl), but we are open to contributions from the community. Any comment is more than welcome!\n\nIf you are interested in contributing to this project, please read the [CONTRIBUTING.md](CONTRIBUTING.md) file.\n\n# Code of Conduct\n\nAt SOM Research Lab and BESSER we are dedicated to creating and maintaining welcoming, inclusive, safe, and harassment-free development spaces. Anyone participating will be subject to and agrees to sign on to our [Code of Conduct](CODE_OF_CONDUCT.md).\n\n# Governance\n\nThe development and community management of this project follows the governance rules described in the [GOVERNANCE.md](GOVERNANCE.md) document.\n\n# License\n\nThis work is licensed under a \u003ca rel=\"license\" href=\"http://creativecommons.org/licenses/by-sa/4.0/\"\u003eCreative Commons Attribution-ShareAlike 4.0 International License\u003c/a\u003e\n\nThe [CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/) license allows users to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. If you remix, adapt, or build upon the material, you must license the modified material under identical terms.\n\n\u003ca rel=\"license\" href=\"http://creativecommons.org/licenses/by-sa/4.0/\"\u003e\u003cimg alt=\"Creative Commons License\" style=\"border-width:0\" src=\"https://i.creativecommons.org/l/by-sa/4.0/88x31.png\" /\u003e\u003c/a\u003e\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsom-research%2Fhfcommunity","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsom-research%2Fhfcommunity","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsom-research%2Fhfcommunity/lists"}