https://github.com/cyprianfusi/multi-class-classification-on-stack-overflow-questions
A naive approach to multiclass text classifier on stack overflow questions with almost 80% accuracy!
https://github.com/cyprianfusi/multi-class-classification-on-stack-overflow-questions
keras-tensorflow multiclass-classification nlp-machine-learning pandas-dataframe python3 tensorflow2 wrangling
Last synced: about 1 month ago
JSON representation
A naive approach to multiclass text classifier on stack overflow questions with almost 80% accuracy!
- Host: GitHub
- URL: https://github.com/cyprianfusi/multi-class-classification-on-stack-overflow-questions
- Owner: CyprianFusi
- Created: 2025-01-12T16:06:06.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-12T16:12:10.000Z (over 1 year ago)
- Last Synced: 2025-02-05T00:41:24.235Z (over 1 year ago)
- Topics: keras-tensorflow, multiclass-classification, nlp-machine-learning, pandas-dataframe, python3, tensorflow2, wrangling
- Language: Jupyter Notebook
- Homepage:
- Size: 50.8 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Multi-class-classification-on-Stack-Overflow-questions using Tensorflow
A naive approach to multiclass text classifier on stack overflow questions with almost **`80%`** accuracy!
This notebook we are going to train a multi-class classifier to predict the **tag** of a programming question on [Stack Overflow](http://stackoverflow.com/) using tensorflow.
## The Dataset
A [dataset](https://storage.googleapis.com/download.tensorflow.org/data/stack_overflow_16k.tar.gz) has been prepared and is ready to be used. It contained the body of several thousand programming questions (for example, "How can I sort a dictionary by value in Python?") posted to Stack Overflow. Each of these is labeled with exactly one tag (either `Python, CSharp, JavaScript`, or `Java`). The task is to take a question as input, and predict the appropriate tag, in this case, Python.
The dataset we will work with is a sample containing several thousand questions extracted from the much larger public Stack Overflow dataset on [BigQuery](https://console.cloud.google.com/marketplace/details/stack-exchange/stack-overflow), which contains more than 17 million posts.