Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/vanessaaleung/support-ticket-nlp

Support Ticket Classification and Key Phrases Extraction
https://github.com/vanessaaleung/support-ticket-nlp

keras machine-learning python text-classification ticket-classification

Last synced: 2 days ago
JSON representation

Support Ticket Classification and Key Phrases Extraction

Awesome Lists containing this project

README

        

# Support Ticket NLP
Support Ticket Classification and Key Phrases Extraction

- Identify the main issues in the ticket description
- Extract the key phrases in the ticket description

----------------
## Data
Support Ticket Classification

----------------
## Tasks


  1. Topic Modeling with LDA model


    1. Preprocessing

      1. Divide text to tokens

      2. Remove stopwords, punctuations

      3. Lemmatization



    2. Compute coherence values to find the optimal number of topics


    3. Build the LDA model


    4. Utilize pyLDAvis to visualize the topics



  2. Key Phrases Extraction with pytextrank (combining spaCy and networkx)

    1. Construct a graph, sentence by sentence, based on the spaCy part-of-speech tags tags

    2. Use matplotlib to visualize the lemma graph


    3. Use PageRank – which is approximately eigenvalue centrality – to calculate ranks for each of the nodes in the lemma graph

      1. $a_{v,t}=1$ if vertex $v$ is linked to vertex $t$, and $a_{v,t}=0$ otherwise

      2. $M(v)$ is a set of the neighbors of $v$ and $\lambda$ is a constant




    4. Collect the top-ranked phrases from the lemma graph based on the noun chunks

    5. Find a minimum span for each phrase based on combinations of lemmas
      
      
      permission 1 0.17555037929471423
      requisitions 1 0.1742458175386728
      recruiter 1 0.1416381454134179






----------------
## Terminologies
### Topic Coherence
Scores a single topic by measuring the degree of semantic similarity between high scoring words in the topic

### Latent Dirichlet Allocation (LDA)
Given the # documents, # words, and # topics, output:
1. distribution of words for each topic K
2. distribution of topics for each document i