Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/williamzebrowski/intent-cluster-analysis
https://github.com/williamzebrowski/intent-cluster-analysis
Last synced: 4 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/williamzebrowski/intent-cluster-analysis
- Owner: williamzebrowskI
- Created: 2024-06-28T14:07:21.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-06-29T00:15:44.000Z (7 months ago)
- Last Synced: 2025-01-13T01:52:49.894Z (8 days ago)
- Language: Jupyter Notebook
- Size: 405 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Intent Cluster Analysis
## Semantic Intent Similarity Detection and Model Confusion Reduction
This repository contains the process and code for clustering intent examples based on their semantic similarity. The aim is to group similar intent examples together to reduce model confusion and improve intent detection accuracy.
## Process Overview
The following diagram illustrates the overall process of semantic intent similarity detection and model confusion reduction:
![Semantic Intent Similarity Detection and Model Confusion Reduction](image.png)
### Steps
1. **Data Standardization**:
- Normalize training examples to ensure consistent analysis.
2. **Advanced Semantic Processing**:
- Extract linguistic features using `spaCy` such as lemmas, parts of speech (POS), and named entities.
3. **Numerical Transformation via TF-IDF**:
- Convert text data into numerical vectors based on term importance.
4. **Cosine Similarity Calculation**:
- Calculate the cosine similarity between vectors to generate a similarity matrix.
5. **DBSCAN Clustering Process**:
- Cluster intent examples using DBSCAN (Density-Based Spatial Clustering of Applications with Noise).
6. **Output of Clusters**:
- Group similar intent examples together, providing insights into which examples are closely related in terms of their semantic content.