https://github.com/mykhode/data_mining_py
Simple Scrabe data with Python
https://github.com/mykhode/data_mining_py
ai scrabe-data training-data
Last synced: 11 days ago
JSON representation
Simple Scrabe data with Python
- Host: GitHub
- URL: https://github.com/mykhode/data_mining_py
- Owner: MyKhode
- Created: 2023-12-30T08:11:05.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-12-30T08:16:11.000Z (over 2 years ago)
- Last Synced: 2025-11-23T14:25:44.911Z (7 months ago)
- Topics: ai, scrabe-data, training-data
- Language: Python
- Homepage:
- Size: 8.79 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Data Mining with Python
Description
This project scrapes a Q&A website (Khmer language-based) to generate intents for a conversational AI system. It utilizes web scraping techniques, natural language processing, and data structuring to create a dataset of tagged intents for training language models.
Features
-
Web Scraping: Utilizes requests and BeautifulSoup for data extraction from the website. -
NLP Tagging: Implements the KhmerNLP library for part-of-speech tagging. -
Intent Generation: Gathers unique nouns from questions to form intent patterns and extracts corresponding answers.
Installation
- Clone the repository:
git clone https://github.com/your-username/repo-name.git
- Install dependencies:
pip install -r requirements.txt
Usage
- Run the Python script
generate_intents.py. - The script will scrape the Q&A website and generate a JSON file (
data_intents.json) containing intents for conversational AI systems.
Example
python generate_intents.py
Dependencies
requestsbeautifulsoup4khmernltk
Data Structure
The generated JSON file (data_intents.json) follows the structure:
{
"intents": [
{
"tag": "id_1",
"patterns": ["Question Pattern 1", "Noun Pattern 1"],
"responses": ["Answer 1"]
},
// Other intents follow the same structure
]
}
Contribution
- Fork the repository.
- Create a new branch (
git checkout -b feature/new-feature). - Make your changes and commit (
git commit -am 'Add new feature'). - Push to the branch (
git push origin feature/new-feature). - Create a new Pull Request.
License
This project is licensed under the MIT License.