{"id":31783551,"url":"https://github.com/fahimfba/simple-web-scrapper","last_synced_at":"2025-10-10T10:34:24.018Z","repository":{"id":117960713,"uuid":"410968043","full_name":"FahimFBA/simple-web-scrapper","owner":"FahimFBA","description":"Extract data from websites using the web-scrapper. Made with nodejs, ExpressJS, axios \u0026 cheerio.","archived":false,"fork":false,"pushed_at":"2025-05-02T05:41:16.000Z","size":1178,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-02T06:24:16.833Z","etag":null,"topics":["axios","cheerio","cheeriojs","javascript","js","npm","npm-package","webscrape","webscraping","webscraping-data","webscraping-search","webscrapper"],"latest_commit_sha":null,"homepage":"https://fahimfba.github.io/Web-Scraper/","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FahimFBA.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-09-27T16:51:08.000Z","updated_at":"2025-05-02T05:41:18.000Z","dependencies_parsed_at":null,"dependency_job_id":"8382a18a-d527-4f0d-93e4-b4fd8e95db48","html_url":"https://github.com/FahimFBA/simple-web-scrapper","commit_stats":null,"previous_names":["fahimfba/web-scraper"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/FahimFBA/simple-web-scrapper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FahimFBA%2Fsimple-web-scrapper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FahimFBA%2Fsimple-web-scrapper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FahimFBA%2Fsimple-web-scrapper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FahimFBA%2Fsimple-web-scrapper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FahimFBA","download_url":"https://codeload.github.com/FahimFBA/simple-web-scrapper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FahimFBA%2Fsimple-web-scrapper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279003544,"owners_count":26083595,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-10T02:00:06.843Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["axios","cheerio","cheeriojs","javascript","js","npm","npm-package","webscrape","webscraping","webscraping-data","webscraping-search","webscrapper"],"created_at":"2025-10-10T10:34:22.402Z","updated_at":"2025-10-10T10:34:24.008Z","avatar_url":"https://github.com/FahimFBA.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Web Scraper\n\nA simple Node.js application to scrape article titles and URLs from The Guardian's international news section.\n\n## Description\n\nThis project uses `axios` to fetch the HTML content from `https://www.theguardian.com/international` and `cheerio` to parse the HTML and extract relevant article information (specifically, titles and URLs based on the CSS selector `.dcr-5rptw1`).\n\nCurrently, the scraped data is logged to the console when the application starts. An Express server is initialized on port 8000 but does not yet serve any data or provide API endpoints.\n\n## Prerequisites\n\n- Node.js and npm (or yarn) installed on your system.\n\n## Installation\n\n1.  Clone the repository:\n    ```bash\n    git clone https://github.com/FahimFBA/Web-Scraper.git\n    cd Web-Scraper\n    ```\n2.  Install the dependencies:\n    ```bash\n    npm install\n    ```\n    or\n    ```bash\n    yarn install\n    ```\n\n## Usage\n\nTo run the scraper, use the following command:\n\n```bash\nnpm start\n```\n\nThis will start the application using `nodemon`, which automatically restarts the server on file changes. The scraped article titles and URLs will be printed to your terminal console.\n\n## Future Enhancements (Potential)\n\n- Implement API endpoints using Express to serve the scraped data.\n- Add error handling for network requests and parsing.\n- Make the target URL and CSS selectors configurable.\n- Store the scraped data in a database or file.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffahimfba%2Fsimple-web-scrapper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffahimfba%2Fsimple-web-scrapper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffahimfba%2Fsimple-web-scrapper/lists"}