https://github.com/desktopcleaner/naturemagazinescraper
Scrapes open-access Nature magazine articles and store as txt files.
https://github.com/desktopcleaner/naturemagazinescraper
data nature-magazine python scrapper word-frequency
Last synced: 8 months ago
JSON representation
Scrapes open-access Nature magazine articles and store as txt files.
- Host: GitHub
- URL: https://github.com/desktopcleaner/naturemagazinescraper
- Owner: DesktopCleaner
- License: mit
- Created: 2024-10-21T21:28:49.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2024-12-22T00:13:40.000Z (10 months ago)
- Last Synced: 2024-12-30T18:59:18.270Z (10 months ago)
- Topics: data, nature-magazine, python, scrapper, word-frequency
- Language: Python
- Homepage:
- Size: 3.45 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# NatureMagazineScraper
Scrape open-access Nature articles and store them as txt files.# Key Features
- User can specify which year's articles to scrape/analyze
- User can specify maximum word count per word per article to reduce over-counting## scraper.py
Scrape articles using `Beautiful Soup` and store them as text files## analyzer.py
Parse scrapped articles and sum up word counts## data_cleaner.py
Clean common words and other baised words