Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/calebwin/frequent
A utility for crawling websites and building frequency lists of words
https://github.com/calebwin/frequent
frequency-lists python web-crawler web-crawler-python word-frequency
Last synced: 3 months ago
JSON representation
A utility for crawling websites and building frequency lists of words
- Host: GitHub
- URL: https://github.com/calebwin/frequent
- Owner: calebwin
- License: mit
- Created: 2018-09-09T02:20:13.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2024-02-07T06:56:29.000Z (9 months ago)
- Last Synced: 2024-07-18T19:12:09.462Z (4 months ago)
- Topics: frequency-lists, python, web-crawler, web-crawler-python, word-frequency
- Language: Python
- Size: 9.77 KB
- Stars: 26
- Watchers: 3
- Forks: 12
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Frequent
frequent is a utility for crawling websites and building word frequency list. Mainly made because I wanted to be able to find top n most common words on different websites, but I imagine there might be more useful applications. Or not.```python
import frequent# get most frequent words from the w3schools website
# limit crawl depth to 25
word_frequencies = frequent.word_frequencies("https://www.w3schools.com", 25)# get the top 50 words
top_words = website_word_frequencies.most_common(50)# print the top 50 most frequent words
print(top_words)
```