{"id":17036229,"url":"https://github.com/lorey/mlscraper-experiments","last_synced_at":"2025-03-22T22:42:27.456Z","repository":{"id":48105925,"uuid":"351569351","full_name":"lorey/mlscraper-experiments","owner":"lorey","description":null,"archived":false,"fork":false,"pushed_at":"2021-08-06T10:58:45.000Z","size":486,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-15T08:55:05.343Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lorey.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-03-25T20:37:33.000Z","updated_at":"2021-08-06T10:58:06.000Z","dependencies_parsed_at":"2022-08-12T18:40:52.918Z","dependency_job_id":null,"html_url":"https://github.com/lorey/mlscraper-experiments","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lorey%2Fmlscraper-experiments","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lorey%2Fmlscraper-experiments/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lorey%2Fmlscraper-experiments/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lorey%2Fmlscraper-experiments/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lorey","download_url":"https://codeload.github.com/lorey/mlscraper-experiments/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245031351,"owners_count":20549913,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-14T08:49:46.222Z","updated_at":"2025-03-22T22:42:27.425Z","avatar_url":"https://github.com/lorey.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# mlscraper-experiment\n\nTrying some ideas to extend my main library [mlscraper](https://github.com/lorey/mlscraper).\n\nFeatures:\n\n* scraping arbitrary items (dict, lists, list of dicts, etc.)\n* smart scraper selection\n\n## Structure\nThis class diagram shows the basic relationships.\n\n![class diagram](docs/classes.png)\n\n## Terminology\n* Scraper: turn a page into an item by scraping HTML\n* Sample: One item on a page (to be scraped later), i.e. what the user inputs\n* Match: One possible occurrence of a sample, i.e. nodes in which the sample occurs\n* Extractor: get the value out of a DOM node\n* Selector: an algorithm to select nodes\n\n## Does mlscraper support?\n- scraping arbitary items? yes\n- scraping dicts with missing values? yes\n- detecting specific pages that have no results? no","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Florey%2Fmlscraper-experiments","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Florey%2Fmlscraper-experiments","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Florey%2Fmlscraper-experiments/lists"}