{"id":34033787,"url":"https://github.com/anorprogrammer/wiker","last_synced_at":"2026-04-07T11:31:03.487Z","repository":{"id":65202185,"uuid":"586772429","full_name":"anorprogrammer/wiker","owner":"anorprogrammer","description":"library for wikipedia dataset collection","archived":false,"fork":false,"pushed_at":"2023-01-17T12:52:19.000Z","size":21,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-12-15T10:32:40.487Z","etag":null,"topics":["beautifulsoup4","dataset","requests","wiki","wikipedia"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/wiker/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/anorprogrammer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2023-01-09T07:37:08.000Z","updated_at":"2023-01-10T12:35:19.000Z","dependencies_parsed_at":"2023-02-10T09:45:16.216Z","dependency_job_id":null,"html_url":"https://github.com/anorprogrammer/wiker","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/anorprogrammer/wiker","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anorprogrammer%2Fwiker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anorprogrammer%2Fwiker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anorprogrammer%2Fwiker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anorprogrammer%2Fwiker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/anorprogrammer","download_url":"https://codeload.github.com/anorprogrammer/wiker/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anorprogrammer%2Fwiker/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31511502,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-07T03:10:19.677Z","status":"ssl_error","status_checked_at":"2026-04-07T03:10:13.982Z","response_time":105,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beautifulsoup4","dataset","requests","wiki","wikipedia"],"created_at":"2025-12-13T19:21:43.162Z","updated_at":"2026-04-07T11:31:03.482Z","avatar_url":"https://github.com/anorprogrammer.png","language":"Python","readme":"# Wiker\n\nlibrary for wikipedia text dataset collection\n\n# Installation\n\n```\npip install wiker\n```\n\n# Quickstart\n\n\n**!Warning!**\n\n_Before running the code, create \"data\" and \"extra\" folders inside the project folder and \"pre_urls.txt\" and \"post_urls.txt\" files inside the \"extra\" folder_\n\nFile structure\n```\nmy-app/\n├─ data/\n├─ extra/\n│  ├─ pre_urls.txt\n│  ├─ post_urls.txt\n├─ main.py # your file \n```\n\n```python\nfrom wiker import Wiker\n\nwk = Wiker(lang='uz', first_article_link=\"Turkiston\")\n\nwk.run(scrape_limit=50)\n```\n\n### Another methods\n\n```python\nfrom wiker import Wiker\n\nwk = Wiker(lang='uz', first_article_link=\"Turkiston\")\n\nwk.reader() # read the pre_urls.txt file and return the result as a list\nwk.read_url_count() # The number of all links that read the pre_urls.txt file\nwk.extra_file_writer() # if the pre_urls.txt file is empty, the function writes first_article_link to the file\nwk.scraper() # Get all articles from links in pre_urls.txt file\nwk.text_cleaner() # clean up the html and other tags in the retrieved articles\nwk.next_urls() # get links for further scraping\nwk.dir_scanner() # scan the \"data\" folder to count files\nwk.cleaned_text_writer(text_dict=wk.text_cleaner()) # \nwk.post_url_writer(url_list=wk.scraper().keys()) # writing the name of the saved articles to the file\nwk.pre_url_writer(url_list=wk.next_urls()) # write names in next_urls to files for next process\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanorprogrammer%2Fwiker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fanorprogrammer%2Fwiker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanorprogrammer%2Fwiker/lists"}