{"id":27992727,"url":"https://github.com/gauthamnairvm/trex-app","last_synced_at":"2026-05-03T05:41:16.210Z","repository":{"id":291735945,"uuid":"978180667","full_name":"gauthamnairvm/trex-app","owner":"gauthamnairvm","description":"Text Refinement EXplorer - An EDA tool for text based data.","archived":false,"fork":false,"pushed_at":"2025-05-06T08:59:56.000Z","size":23,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-08T18:46:40.879Z","etag":null,"topics":["data-analysis","data-visualization","groq-api","large-language-models","llama3","natural-language-processing","text2sql"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gauthamnairvm.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-05T15:36:35.000Z","updated_at":"2025-05-06T08:59:59.000Z","dependencies_parsed_at":"2025-05-06T09:57:06.743Z","dependency_job_id":null,"html_url":"https://github.com/gauthamnairvm/trex-app","commit_stats":null,"previous_names":["gauthamnairvm/trex-app"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/gauthamnairvm/trex-app","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gauthamnairvm%2Ftrex-app","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gauthamnairvm%2Ftrex-app/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gauthamnairvm%2Ftrex-app/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gauthamnairvm%2Ftrex-app/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gauthamnairvm","download_url":"https://codeload.github.com/gauthamnairvm/trex-app/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gauthamnairvm%2Ftrex-app/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32559716,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-03T03:21:47.309Z","status":"ssl_error","status_checked_at":"2026-05-03T03:21:43.884Z","response_time":103,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","data-visualization","groq-api","large-language-models","llama3","natural-language-processing","text2sql"],"created_at":"2025-05-08T18:41:22.441Z","updated_at":"2026-05-03T05:41:16.195Z","avatar_url":"https://github.com/gauthamnairvm.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# T.REX — Text Refinement and EXploration\n\nT.REX is a powerful local tool for analyzing, deduplicating, clustering, and querying large-scale text datasets using pretrained embeddings and LLM-powered pipelines with metadata aware plots and integrations.\n\n\u003e ❗ Requires a machine with a **dedicated NVIDIA GPU (CUDA 11.8+)**.  \n\u003e ❌ Will NOT run in Docker, WSL, or headless environments due to GUI popups.\n\n---\n\n## 🛠 Features\n\n- ✅ CSV popup loader with column detection\n- ✅ Embedding generation (`sentence-transformers`)\n- ✅ Interactive EDA on metadata + selected text column\n- ✅ Clustering and optional LLM labeling\n- ✅ Near duplicate analysis\n- ✅ Text-to-SQL pipeline\n- ✅ Full CLI-based UX for pipeline chaining\n\n---\n\n## ⚙️ Installation\n\n### 1. Prerequisites\n\n- Python **3.10**\n- NVIDIA GPU with **CUDA 11.8+ drivers installed**\n- **Display environment** (no WSL or remote/headless)\n\n### 2. Setup Steps\n\n```bash\n# Clone the repo\ngit clone https://github.com/gauthamnairvm/trex-app.git\ncd trex-app\n\n# Create a virtual environment\npython -m venv venv\nsource venv/bin/activate  # On Windows: venv\\Scripts\\activate\n\n# Install dependencies\npip install --upgrade pip\npip install -r requirements.txt\n\n#Additional Dependencies\npip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118\n\n```\n\n---\n\n## 🔐 Environment Setup\n\nCreate a `.env` file at the root (you can copy from `.env.template`):\n\n```\nGROQ_API_KEY=your_groq_api_key_here\n```\n\n---\n\n## 🚀 Run T.REX\n\n```bash\npython main.py\n```\n\nYou'll see the CSV loader and the T.REX CLI. Try:\n\n```bash\nT.REX \u003e trex_eda(metadata=['col1', 'col2'])\nT.REX \u003e trex_cluster()\nT.REX \u003e trex_dedup(stopwords=False)\nT.REX \u003e trex_text2sql(pii_mask=True)\n```\n\n---\n\n## 📂 Project Structure\n\n```\ntrex-app/\n├── app/\n│   ├── clustering.py\n│   ├── dedup.py\n│   ├── embedding.py\n│   ├── file_loader.py\n│   ├── pipeline.py\n│   └── text2sql_pipeline.py\n├── data/               # CSVs and embeddings\n├── results/            # Output plots and clustering results\n├── main.py             # Entry point\n├── .env.template       # Environment variable example\n├── requirements.txt\n└── README.md\n```\n\n---\n\n## 📋 License\n\nThis project is licensed under the **MIT License**.  \nYou’re free to use, modify, and distribute it. Please give credit if you build on it.\n\n---\n\n## 🤝 Contributions\n\nTREX is open for issues, suggestions, and pull requests.  \nTo contribute:\n\n1. Fork the repo\n2. Create a feature branch\n3. Submit a PR with proper description\n\n---\n\n⚠️ Limitations\nT.REX is under active development. The current version has the following limitations:\n\n\u003e Limited Pipelines: Only four pipelines are supported at present — EDA, Deduplication, Clustering, and Text2SQL.\n\n\u003e File Format Restriction: Currently supports only .csv files. Other formats (e.g., Excel, JSON) are not yet supported.\n\n\u003e Single Text Column Design: Each session supports only one designated text column, with the rest treated as metadata(if preferred). If a different text/metadata column is required, the file must be reloaded.\n\n\u003e Startup Instability: Occasionally, the GUI file loader popup may fail on the first try. Restarting the session usually resolves the issue.\n\n\u003e Fixed LLM Configuration: Uses a single Groq-hosted model (llama3-70b-8192). Prompts use hardcoded settings (temperature, max_tokens, stop), with no dynamic tuning. API key must be provided in the .env file for usage of pipelines with LLM integration.\n\n---\n\nBuilt and maintained by `Variath Madhupal Gautham Nair (MSCS Rutgers University-New Brunswick)`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgauthamnairvm%2Ftrex-app","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgauthamnairvm%2Ftrex-app","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgauthamnairvm%2Ftrex-app/lists"}