https://github.com/real-jiakai/wikepedia_analysis
This project analyzes the word frequency across all pages in a specified Wikipedia category and visualizes the results.
https://github.com/real-jiakai/wikepedia_analysis
wikipedia windsurf
Last synced: 4 months ago
JSON representation
This project analyzes the word frequency across all pages in a specified Wikipedia category and visualizes the results.
- Host: GitHub
- URL: https://github.com/real-jiakai/wikepedia_analysis
- Owner: real-jiakai
- Created: 2025-02-28T10:15:58.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-02-28T10:17:27.000Z (7 months ago)
- Last Synced: 2025-06-13T10:06:50.382Z (4 months ago)
- Topics: wikipedia, windsurf
- Language: Python
- Homepage:
- Size: 11.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Wikipedia Category Word Frequency Analyzer
This project analyzes the word frequency across all pages in a specified Wikipedia category and visualizes the results.
## Demo

## Features
- Fetches all pages in a given Wikipedia category using the MediaWiki API
- Processes and cleans the text from each page
- Filters out common words (stopwords)
- Calculates and displays the frequency of non-common words
- Shows results as both raw counts and percentages
- Caches results to avoid redundant processing
- Provides a web interface with interactive word cloud visualization## Requirements
- Python 3.6+
- Required packages: requests, nltk, flask## Installation
1. Clone this repository
2. Set up the virtual environment using uv:
```
mkdir -p .venv
uv venv .venv
source .venv/bin/activate
uv pip install -r requirements.txt
```## Command-line Usage
Run the script with a Wikipedia category name:
```
python wiki_word_frequency.py "Large_language_models"
```Optional arguments:
- `--top N`: Display the top N words (default: 50)
- `--no-cache`: Force fresh data retrieval, ignoring cache## Web Application
The project includes a Flask web application that provides an interactive word cloud visualization.
To run the web app:
```
source .venv/bin/activate
python app.py
```Then open a browser and go to http://127.0.0.1:5000
Features of the web app:
- Input any Wikipedia category name
- See previously analyzed categories
- Interactive word cloud visualization
- Size of words proportional to their frequency
- Hover over words to see exact frequency counts## Acknowledgments
This project was created with assistance from:
- **Claude 3.7 Sonnet** - AI model by Anthropic used for planning and code development
- **Windsurf Cascade** - agentic AI coding assistant used for implementation and debuggingSpecial thanks to the developers of the MediaWiki API, D3.js, Flask, and other open source libraries that made this project possible.
This project was inspired by the [Build Apps with Windsurf's AI Coding Agents](https://www.deeplearning.ai/short-courses/build-apps-with-windsurfs-ai-coding-agents) course from deeplearning.ai.
## The Joy of Coding
Creating this project was a delightful experience! The seamless integration of Wikipedia data with interactive visualizations brought a special joy to the development process. Watching the word clouds transform as different categories are explored creates a truly satisfying coding experience. Enjoy the vibe of coding with data and visualizations!