https://github.com/williamzebrowski/intent-cluster-analysis
https://github.com/williamzebrowski/intent-cluster-analysis
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/williamzebrowski/intent-cluster-analysis
- Owner: williamzebrowskI
- Created: 2024-06-28T14:07:21.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-06-29T00:15:44.000Z (about 1 year ago)
- Last Synced: 2025-02-26T05:49:07.982Z (5 months ago)
- Language: Jupyter Notebook
- Size: 405 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Intent Cluster Analysis
## Semantic Intent Similarity Detection and Model Confusion Reduction
This repository contains the process and code for clustering intent examples based on their semantic similarity. The aim is to group similar intent examples together to reduce model confusion and improve intent detection accuracy.
## Process Overview
The following diagram illustrates the overall process of semantic intent similarity detection and model confusion reduction:

### Steps
1. **Data Standardization**:
- Normalize training examples to ensure consistent analysis.
2. **Advanced Semantic Processing**:
- Extract linguistic features using `spaCy` such as lemmas, parts of speech (POS), and named entities.
3. **Numerical Transformation via TF-IDF**:
- Convert text data into numerical vectors based on term importance.
4. **Cosine Similarity Calculation**:
- Calculate the cosine similarity between vectors to generate a similarity matrix.
5. **DBSCAN Clustering Process**:
- Cluster intent examples using DBSCAN (Density-Based Spatial Clustering of Applications with Noise).
6. **Output of Clusters**:
- Group similar intent examples together, providing insights into which examples are closely related in terms of their semantic content.