https://github.com/captaincluster/wikipediascraper
A program that scrapes all the text from the p tags of a selected Wikipedia article's HTML document. The results will then be saved in a .txt document, named by the user. The inputs given are case-sensitive and must be precise, so that the correct data can be acquired.
https://github.com/captaincluster/wikipediascraper
beautifulsoup4 requests wikipedia
Last synced: 12 months ago
JSON representation
A program that scrapes all the text from the p tags of a selected Wikipedia article's HTML document. The results will then be saved in a .txt document, named by the user. The inputs given are case-sensitive and must be precise, so that the correct data can be acquired.
- Host: GitHub
- URL: https://github.com/captaincluster/wikipediascraper
- Owner: CaptainCluster
- License: mit
- Created: 2023-06-15T15:19:15.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-09-21T20:48:55.000Z (over 2 years ago)
- Last Synced: 2025-06-27T15:07:30.536Z (12 months ago)
- Topics: beautifulsoup4, requests, wikipedia
- Language: Python
- Homepage:
- Size: 9.77 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# WikipediaScraper
A program that scrapes all the text from the p tags of the HTML document of a selected Wikipedia article (English as language). The results will then be saved in a .txt document, named by the user. The inputs given are case-sensitive and must be precise, so that the correct data can be acquired.