{"id":18803290,"url":"https://github.com/madhans476/github-topics-scraper","last_synced_at":"2025-06-26T23:39:01.650Z","repository":{"id":255842986,"uuid":"823621807","full_name":"madhans476/GitHub-Topics-Scraper","owner":"madhans476","description":"This project is a Python-based web scraper that extracts information from the GitHub topics page. It gathers details about various topics and their top repositories, storing the collected data in CSV files for further analysis and use.","archived":false,"fork":false,"pushed_at":"2024-07-03T12:14:52.000Z","size":30,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-12-29T20:27:41.074Z","etag":null,"topics":["beautifulsoup4","python3","requests-library-python","scraping","web-scraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/madhans476.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-03T11:38:11.000Z","updated_at":"2024-07-24T01:21:13.000Z","dependencies_parsed_at":"2024-09-07T12:32:08.575Z","dependency_job_id":"8214bb26-4e3f-47a2-b8f7-88c5a47de0b8","html_url":"https://github.com/madhans476/GitHub-Topics-Scraper","commit_stats":null,"previous_names":["madhans476/github-topics-scraper"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madhans476%2FGitHub-Topics-Scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madhans476%2FGitHub-Topics-Scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madhans476%2FGitHub-Topics-Scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madhans476%2FGitHub-Topics-Scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/madhans476","download_url":"https://codeload.github.com/madhans476/GitHub-Topics-Scraper/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239735259,"owners_count":19688262,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beautifulsoup4","python3","requests-library-python","scraping","web-scraping"],"created_at":"2024-11-07T22:34:24.184Z","updated_at":"2025-02-19T21:14:59.914Z","avatar_url":"https://github.com/madhans476.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# GitHub Topics Scraper\n\nThis project is a Python-based web scraper that extracts information from the GitHub topics page. It gathers details about various topics and their top repositories, storing the collected data in CSV files for further analysis and use.\n\n## Features\n\n- Scrapes the [GitHub Topics](https://github.com/topics) page.\n- Retrieves topic title, description, and URL.\n- For each topic, retrieves the top 20 repositories.\n- Extracts repository details: name, username, stars, and URL.\n- Saves the data for each topic in separate CSV files within a specified directory.\n\n## Installation\n\n1. **Clone the repository:**\n\n    ```bash\n    git clone https://github.com/madhans476/github-topics-scraper.git\n    cd github-topics-scraper\n    ```\n\n2. **Install the required packages:**\n\n    You can install the necessary packages using `pip`:\n\n    ```bash\n    pip install -r requirements.txt\n    ```\n\n3. **Create the directory structure:**\n\n    Ensure that the directory for storing the CSV files exists:\n\n    ```bash\n    mkdir github_topics\n    ```\n\n## Usage\n\n1. **Run the script:**\n\n    Execute the script to start scraping:\n\n    ```bash\n    python ws_github_trending_topics.py\n    ```\n\n    The script will scrape the GitHub topics page, gather the required data, and save it in CSV files within the `github_topics` directory.\n\n2. **Check the output:**\n\n    The CSV files will be saved in the `github_topics` directory. Each file will be named after the respective topic, containing details of the top 20 repositories.\n\n## Example\n\n1.  Example structure of the saved CSV files:\n\n    ```plaintext\n    github_topics/\n    ├── Trending_github_topics.csv\n    ├── 3D.csv\n    ├── AI.csv\n    ├── Machine Learning.csv\n    ├── Web Development.csv\n    └── ...\n    ```\n2.  Each CSV file will have the following columns:\n\n    ```plaintext\n    ── Repo Name\n    ── Username\n    ── Stars\n    ── Repo URL\n    ```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmadhans476%2Fgithub-topics-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmadhans476%2Fgithub-topics-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmadhans476%2Fgithub-topics-scraper/lists"}