https://github.com/calebwin/frequent
A utility for crawling websites and building frequency lists of words
https://github.com/calebwin/frequent
frequency-lists python web-crawler web-crawler-python word-frequency
Last synced: 11 months ago
JSON representation
A utility for crawling websites and building frequency lists of words
- Host: GitHub
- URL: https://github.com/calebwin/frequent
- Owner: calebwin
- License: mit
- Created: 2018-09-09T02:20:13.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2024-02-07T06:56:29.000Z (about 2 years ago)
- Last Synced: 2025-03-23T18:50:53.094Z (12 months ago)
- Topics: frequency-lists, python, web-crawler, web-crawler-python, word-frequency
- Language: Python
- Size: 9.77 KB
- Stars: 26
- Watchers: 2
- Forks: 12
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Frequent
frequent is a utility for crawling websites and building word frequency list. Mainly made because I wanted to be able to find top n most common words on different websites, but I imagine there might be more useful applications. Or not.
```python
import frequent
# get most frequent words from the w3schools website
# limit crawl depth to 25
word_frequencies = frequent.word_frequencies("https://www.w3schools.com", 25)
# get the top 50 words
top_words = website_word_frequencies.most_common(50)
# print the top 50 most frequent words
print(top_words)
```