{"id":16082133,"url":"https://github.com/varunon9/github-scraper","last_synced_at":"2025-04-11T12:11:58.910Z","repository":{"id":93827232,"uuid":"84675164","full_name":"varunon9/github-scraper","owner":"varunon9","description":"A nodejs script (using cheerio module) to extract github users information and save to json file.","archived":false,"fork":false,"pushed_at":"2017-03-17T15:40:16.000Z","size":613,"stargazers_count":4,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-25T08:38:13.838Z","etag":null,"topics":["cheerio","github-scraping","nodejs-scraping","web-scraping"],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/varunon9.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-03-11T19:38:10.000Z","updated_at":"2025-02-28T11:19:24.000Z","dependencies_parsed_at":"2023-04-08T21:33:55.332Z","dependency_job_id":null,"html_url":"https://github.com/varunon9/github-scraper","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/varunon9%2Fgithub-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/varunon9%2Fgithub-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/varunon9%2Fgithub-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/varunon9%2Fgithub-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/varunon9","download_url":"https://codeload.github.com/varunon9/github-scraper/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248398367,"owners_count":21097291,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cheerio","github-scraping","nodejs-scraping","web-scraping"],"created_at":"2024-10-09T11:25:52.384Z","updated_at":"2025-04-11T12:11:58.887Z","avatar_url":"https://github.com/varunon9.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Github Scraper\n### A nodejs script (using cheerio module) to extract github users information and save to json file. \n\nWeb scraping is an old way of sharing data between services. As per wikipedia, Web scraping (web harvesting or web\ndata extraction) is data scraping used for extracting data from websites. This script is a simple web scraper\nwhich extracts basic information of any github user. \nCheck https://github.com/varunon9/github-scraper/blob/master/data-beautify.json file to see extracted information.\nThough github provides APIs for the same, I wrote this for learning purposes.\n\n#### Dependencies\n1. request: Helps us make HTTP calls\n2. cheerio: Implementation of core jQuery specifically for the server (helps us traverse the DOM and extract data)\n3. fs: Node File System (fs) module to implement file input/output\n\n#### Screenshots\n\n1. We need DOM details to extract informatio- ![Inspect element](./screenshots/inspect-element.png) \n2. Output on Console- ![Output](./screenshots/output-data.png) \n\n#### Working\nWe have 3 steps in scraping-\n\n1. We load the github profile of given user by making GET request (request module of nodejs)\n2. Parse the HTML result (thanks to cheerio)\n3. Extract the needed data\n\nFor step 3, we must know corresponding DOM elements in advance. You can check this using 'inspect elements' feature\nof browser. Visit github profile of any user and inspect elements by right click or pressing `ctrl + shift + I`.\nYou can also see source code. See screenshot 1. Read index.js file for more details. Note that **script will\nno longer work once github changes its DOM elements.** However you will have the idea and can re-write script.\n\n##### How to execute this script?\n1. To execute this script you must have nodejs installed.\n2. Download zip file (or make git clone) and extract to hard disk\n3. Open terminal/cmd\n4. Move to script directory (where you extracted zip file) using `cd /path/to/repository`\n5. Run `npm install` to install all nodejs dependencies\n6. Once all the dependencies has been installed type `node index.js \u003curl\u003e`\n7. Replace \u003curl\u003e with url of github user (of which you want to extract information) e.g. https://github.com/varunon9\n8. Depending on your internet speed it will take some time. You can see output on screen once finished.\n9. script also write this data to hard-disk. Check user.json file in this directory.\n10. You will have to beautify json data to make it readable. You can visit https://jsonformatter.curiousconcept.com/\n11. You can check data-beautify.json which is extracted data after beautification.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvarunon9%2Fgithub-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvarunon9%2Fgithub-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvarunon9%2Fgithub-scraper/lists"}