{"id":22574472,"url":"https://github.com/brucezoom/sigplancrawler","last_synced_at":"2025-03-28T15:24:29.795Z","repository":{"id":71881826,"uuid":"406763842","full_name":"BruceZoom/SigplanCrawler","owner":"BruceZoom","description":"For crawling papers from sigplan conferences.","archived":false,"fork":false,"pushed_at":"2021-10-25T11:29:14.000Z","size":10,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-02-02T15:33:37.811Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BruceZoom.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-09-15T12:54:30.000Z","updated_at":"2022-05-18T06:36:46.000Z","dependencies_parsed_at":null,"dependency_job_id":"4ff14b91-b6cf-4ead-bc86-0c6066f01511","html_url":"https://github.com/BruceZoom/SigplanCrawler","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BruceZoom%2FSigplanCrawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BruceZoom%2FSigplanCrawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BruceZoom%2FSigplanCrawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BruceZoom%2FSigplanCrawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BruceZoom","download_url":"https://codeload.github.com/BruceZoom/SigplanCrawler/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246050895,"owners_count":20715742,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-08T03:06:22.603Z","updated_at":"2025-03-28T15:24:29.781Z","avatar_url":"https://github.com/BruceZoom.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SigplanCrawler\nFor crawling papers from sigplan conferences.\n\n# Requirements\nPython 3, BeautifulSoup4\n\n# Usage\nThe crawler has three command: \"crawl\", \"download\", and \"abstract\".\n```\nusage: crawler.py [-h] [--url URL] [--skip] [--skip_download]\n                  [--skip_abstract] [--savedir SAVEDIR] [--maxcount MAXCOUNT]\n                  command\n\nDownload papers from sigplan site.\n\npositional arguments:\n  command              \"crawl\" for crawling all papers from the site;\n                       \"download\" for downloading all papers crawled;\n                       \"abstract\" for generating abstract for all papers\n                       crawled\n\noptional arguments:\n  -h, --help           show this help message and exit\n  --url URL            the url to the accepted papers panel from the site\n  --skip               skip download and abstract generation if this arg is\n                       presented\n  --skip_download      skip download if this arg is presented\n  --skip_abstract      skip abstract generation if this arg is presented\n  --savedir SAVEDIR    the directory to save all papers\n  --maxcount MAXCOUNT  maximum number of papers to crawl\n```\n\nFor all commands, the `--savedir` argument is always suggetted to be specified to organize papers from different conferences.\n\nCommand \"crawl\" will scan through accepted papers from the site and download their basic information into \"\u003csavedir\u003e/paper_links.csv\", and it then download all papers, if they exist, into the savedir and generates a summary of all papers' abstracts in \"\u003csavedir\u003e/abstract.md\".\n  \nUse `--url` to specify the page of accepted papers, e.g., \"https://popl19.sigplan.org/track/POPL-2019-Research-Papers?#event-overview\" for POPL 2019.\n\nUse `--skip`, `--skip_download`, and `--skip_abstract` to skip the download sequence, or the abstract generation, or both process in this command.\n\nUse `--maxcount` to limit the maximum number of papers to crawl.\n\nYou may skip both process in \"crawl\" mode, and use \"download\" and \"abstract\" command to continue download and abstract generation separately.\nYou need to specify `--savedir` as the directory \"crawl\" uses to save file \"paper_links.csv\".\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrucezoom%2Fsigplancrawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbrucezoom%2Fsigplancrawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrucezoom%2Fsigplancrawler/lists"}