{"id":34947842,"url":"https://github.com/c2ramel/autonomous-semantic-discovery","last_synced_at":"2026-04-16T04:02:05.477Z","repository":{"id":329693677,"uuid":"1120416113","full_name":"c2ramel/autonomous-semantic-discovery","owner":"c2ramel","description":"An unsupervised machine learning engine that utilizes Non-negative Matrix Factorization (NMF) to autonomously extract and visualize latent semantic topics from the 20 Newsgroups dataset.","archived":false,"fork":false,"pushed_at":"2025-12-22T02:27:58.000Z","size":1423,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-12-23T01:26:23.199Z","etag":null,"topics":["data-visualization","machine-learning","nlp","nmf","python","scikit-learn","unsupervised-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/c2ramel.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-21T06:54:19.000Z","updated_at":"2025-12-22T02:31:17.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/c2ramel/autonomous-semantic-discovery","commit_stats":null,"previous_names":["c2ramel/autonomous-semantic-discovery"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/c2ramel/autonomous-semantic-discovery","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/c2ramel%2Fautonomous-semantic-discovery","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/c2ramel%2Fautonomous-semantic-discovery/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/c2ramel%2Fautonomous-semantic-discovery/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/c2ramel%2Fautonomous-semantic-discovery/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/c2ramel","download_url":"https://codeload.github.com/c2ramel/autonomous-semantic-discovery/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/c2ramel%2Fautonomous-semantic-discovery/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31870516,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-15T15:24:51.572Z","status":"online","status_checked_at":"2026-04-16T02:00:06.042Z","response_time":69,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-visualization","machine-learning","nlp","nmf","python","scikit-learn","unsupervised-learning"],"created_at":"2025-12-26T20:49:26.795Z","updated_at":"2026-04-16T04:02:05.456Z","avatar_url":"https://github.com/c2ramel.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# The Autonomous Semantic Discovery Engine\r\n\r\n### Unsupervised Machine Learning on the \"20 Newsgroups\" Dataset\r\n\r\n**Author:** Jasper Kuo,\r\n**Course:** Unsupervised Machine Learning,\r\n**Status:** Complete (and surprisingly functional)\r\n\r\n---\r\n\r\n## 🍰 The Mission: \"The Cake\"\r\nAs Yann LeCun famously posited, if intelligence is a cake, unsupervised learning is the cake itself, while supervised learning is merely the icing. This project aims to eat the cake.\r\n\r\nThe objective was to ingest **18,000 unlabeled, unstructured documents** (emails from 1993) and autonomously discover the latent thematic structures hidden within them using **Non-negative Matrix Factorization (NMF)**.\r\n\r\n## 🛠 Tech Stack\r\n* **Language:** Python 3.8+\r\n* **Vectorization:** TF-IDF (Term Frequency-Inverse Document Frequency)\r\n* **Dimensionality Reduction:** NMF \u0026 PCA\r\n* **Visualization:** Matplotlib\r\n\r\n## 📊 Key Results\r\nThe engine successfully identified 10 distinct semantic topics without human intervention.\r\n* **Topic 2 (Religion):** `god`, `jesus`, `bible`, `faith`\r\n* **Topic 4 (Hardware):** `drive`, `scsi`, `disk`, `controller`\r\n* **Topic 7 (Sports):** `game`, `team`, `year`, `hockey`\r\n\r\n## 🚀 How to Run\r\n1. Clone this repository.\r\n2. Install dependencies:\r\n   ```bash\r\n   pip install -r requirements.txt\r\n3. Run the analysis engine:\r\n   ```bash\r\n   python src/engine.py\r\n\r\n## 📂 Project Structure\r\nsrc/: Contains the core NMF logic and visualization scripts.\r\n\r\ndocs/: Includes the full project report and presentation slides.\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fc2ramel%2Fautonomous-semantic-discovery","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fc2ramel%2Fautonomous-semantic-discovery","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fc2ramel%2Fautonomous-semantic-discovery/lists"}