An open API service indexing awesome lists of open source software.

https://github.com/c2ramel/autonomous-semantic-discovery

An unsupervised machine learning engine that utilizes Non-negative Matrix Factorization (NMF) to autonomously extract and visualize latent semantic topics from the 20 Newsgroups dataset.
https://github.com/c2ramel/autonomous-semantic-discovery

data-visualization machine-learning nlp nmf python scikit-learn unsupervised-learning

Last synced: 2 months ago
JSON representation

An unsupervised machine learning engine that utilizes Non-negative Matrix Factorization (NMF) to autonomously extract and visualize latent semantic topics from the 20 Newsgroups dataset.

Awesome Lists containing this project

README

          

# The Autonomous Semantic Discovery Engine

### Unsupervised Machine Learning on the "20 Newsgroups" Dataset

**Author:** Jasper Kuo,
**Course:** Unsupervised Machine Learning,
**Status:** Complete (and surprisingly functional)

---

## 🍰 The Mission: "The Cake"
As Yann LeCun famously posited, if intelligence is a cake, unsupervised learning is the cake itself, while supervised learning is merely the icing. This project aims to eat the cake.

The objective was to ingest **18,000 unlabeled, unstructured documents** (emails from 1993) and autonomously discover the latent thematic structures hidden within them using **Non-negative Matrix Factorization (NMF)**.

## 🛠 Tech Stack
* **Language:** Python 3.8+
* **Vectorization:** TF-IDF (Term Frequency-Inverse Document Frequency)
* **Dimensionality Reduction:** NMF & PCA
* **Visualization:** Matplotlib

## 📊 Key Results
The engine successfully identified 10 distinct semantic topics without human intervention.
* **Topic 2 (Religion):** `god`, `jesus`, `bible`, `faith`
* **Topic 4 (Hardware):** `drive`, `scsi`, `disk`, `controller`
* **Topic 7 (Sports):** `game`, `team`, `year`, `hockey`

## 🚀 How to Run
1. Clone this repository.
2. Install dependencies:
```bash
pip install -r requirements.txt
3. Run the analysis engine:
```bash
python src/engine.py

## 📂 Project Structure
src/: Contains the core NMF logic and visualization scripts.

docs/: Includes the full project report and presentation slides.