{"id":22025974,"url":"https://github.com/ortanav2/data-scraper","last_synced_at":"2025-03-23T11:14:35.021Z","repository":{"id":204498411,"uuid":"711930421","full_name":"ortanaV2/Data-Scraper","owner":"ortanaV2","description":"A data-scraper that makes it possible to filter out the most important information from huge amounts of text based data.","archived":false,"fork":false,"pushed_at":"2023-11-01T22:54:46.000Z","size":7,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-17T22:08:01.473Z","etag":null,"topics":["data","data-scraper","file","file-scraper","scraper","search","searching"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ortanaV2.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-10-30T13:13:16.000Z","updated_at":"2023-11-02T13:31:46.000Z","dependencies_parsed_at":"2023-11-01T23:29:05.648Z","dependency_job_id":null,"html_url":"https://github.com/ortanaV2/Data-Scraper","commit_stats":{"total_commits":8,"total_committers":2,"mean_commits":4.0,"dds":0.25,"last_synced_commit":"8ff7cb6cabd2ea61ff35a86d1505494366ef36cc"},"previous_names":["ortanav2/data-scraper"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ortanaV2%2FData-Scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ortanaV2%2FData-Scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ortanaV2%2FData-Scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ortanaV2%2FData-Scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ortanaV2","download_url":"https://codeload.github.com/ortanaV2/Data-Scraper/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245090875,"owners_count":20559298,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","data-scraper","file","file-scraper","scraper","search","searching"],"created_at":"2024-11-30T07:24:23.617Z","updated_at":"2025-03-23T11:14:35.002Z","avatar_url":"https://github.com/ortanaV2.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data-Scraper\n\u003e A data-scraper that makes it possible to filter out the most important information from huge amounts of text based data.\n\nThe script asks for a ***keyword*** to search for. It compares the keyword with the ***file-name*** and its ***contents***. As soon as it finds the keyword in it, it is listed as a match and output at the end.  \n## File Content Read\n\u003e The scraper is able to read only the following text-based files:\n- .docx\n- .pdf\n- .txt\n## Usage\nThe scraper is searching the `./DATA` **directory** by default. To change that you have to edit the **variable** `directory`.\n\n_Line 9_: `directory = \"./DATA\"`\n\u003e [!NOTE]\n\u003e It iterates through every file in the directory. To speed up the process, it is recommended to limit the amount of files.\n## Requirements\n\u003e How to install the required libraries.\n```\npip install pdfplumber\n```\n```\npip install docx\n```\n\n## Improving\nSuggestions for improvements are welcome.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fortanav2%2Fdata-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fortanav2%2Fdata-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fortanav2%2Fdata-scraper/lists"}