{"id":16095669,"url":"https://github.com/rigwild/epfl-scraper","last_synced_at":"2025-04-05T20:14:39.807Z","repository":{"id":115713082,"uuid":"407124982","full_name":"rigwild/epfl-scraper","owner":"rigwild","description":"Scrape everything from EPFL https://people.epfl.ch/","archived":false,"fork":false,"pushed_at":"2021-09-16T11:58:42.000Z","size":7,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-02-18T19:50:18.582Z","etag":null,"topics":["epfl","scraper"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rigwild.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-09-16T10:44:15.000Z","updated_at":"2022-03-06T17:23:31.000Z","dependencies_parsed_at":null,"dependency_job_id":"3791935e-1018-4acd-b8bb-46dcdcdc04d4","html_url":"https://github.com/rigwild/epfl-scraper","commit_stats":{"total_commits":3,"total_committers":1,"mean_commits":3.0,"dds":0.0,"last_synced_commit":"ee19292381cceb71656ff06b2b0f68878067da28"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rigwild%2Fepfl-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rigwild%2Fepfl-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rigwild%2Fepfl-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rigwild%2Fepfl-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rigwild","download_url":"https://codeload.github.com/rigwild/epfl-scraper/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247393573,"owners_count":20931813,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["epfl","scraper"],"created_at":"2024-10-09T17:07:37.294Z","updated_at":"2025-04-05T20:14:39.778Z","avatar_url":"https://github.com/rigwild.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# EPFL Scraper\n\nScrape everything from https://people.epfl.ch, including email addresses.\n\n# Install dependencies\n\nYou must use Node.js v14.8+ as this project uses [Top-level await](https://v8.dev/features/top-level-await) and [ES modules](https://v8.dev/features/modules).\n\n```sh\nyarn\n```\n\n# Run search scraper\n\nAbuse the EPFL search API from https://search.epfl.ch/ to get all the data.\n\nWorks by querying each letter in the alphabet followed by a `*` (`a*`, `b*`), then merging the results.\n\n```sh\nnode searchScraper.js\n\n# If you already have the data cached\nnode searchScraper.js --use-cache\n```\n\n- Individual search queries will be cached to the `search/` directory\n- Output will be generated at `output.min.json` (7.0 MB, 20623 entries)\n- The full usernames list is generated at `output.usernames.txt` (20623 entries)\n- The full mailing list is generated at `output.emails.txt` (18760 entries)\n\n# Run page scraper\n\nEPFL conveniently gives us its [full website sitemap](https://people.epfl.ch/private/common/sitemap.xml).\n\nThis will download every HTML pages to the `output/` directory. No parsing is done there, just downloading.\n\nYou should probably be careful with your IP with this. It could be mistakenly seen as a DOS attack tentative.\n\n```sh\nnode htmlScraper.js\n```\n\n# License\n\n```\n           DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE\n                   Version 2, December 2004\n\nCopyright (C) 2021 rigwild \u003cme@rigwild.dev\u003e\n\nEveryone is permitted to copy and distribute verbatim or modified\ncopies of this license document, and changing it is allowed as long\nas the name is changed.\n\n           DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE\n  TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION\n\n 0. You just DO WHAT THE FUCK YOU WANT TO.\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frigwild%2Fepfl-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frigwild%2Fepfl-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frigwild%2Fepfl-scraper/lists"}