https://github.com/Yurii201811/tasksignal

AI-assisted problem discovery engine that turns public developer complaints into evidence-backed software opportunities and Codex-ready MVP prompts.
https://github.com/Yurii201811/tasksignal
ai codex developer-tools fastapi local-first machine-learning nextjs pgvector portfolio-project postgresql product-discovery python
Last synced: about 8 hours ago
JSON representation
AI-assisted problem discovery engine that turns public developer complaints into evidence-backed software opportunities and Codex-ready MVP prompts.
Host: GitHub
URL: https://github.com/Yurii201811/tasksignal
Owner: Yurii201811
License: mit
Created: 2026-05-30T21:01:11.000Z (25 days ago)
Default Branch: main
Last Pushed: 2026-06-11T00:10:20.000Z (14 days ago)
Last Synced: 2026-06-11T02:09:52.205Z (14 days ago)
Topics: ai, codex, developer-tools, fastapi, local-first, machine-learning, nextjs, pgvector, portfolio-project, postgresql, product-discovery, python
Language: Python
Size: 1.5 MB
Stars: 2
Watchers: 0
Forks: 4
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Security: SECURITY.md
- Roadmap: docs/roadmap.md
Awesome Lists containing this project

100-AI-Machine-learning-Deep-learning-Computer-vision-NLP - 👆
README

          # TaskSignal - AI Problem Discovery Engine

From Reddit/forum complaints → evidence-backed project ideas → build-ready Codex prompts.

TaskSignal is an AI-assisted engine that mines public developer and community discussions, detects concrete repetitive tasks people complain about, clusters similar pain signals, scores software opportunities, and generates Codex-ready MVP prompts.

![TaskSignal dashboard after processing demo data](docs/images/dashboard-browser-verified.png)

## Project Status

TaskSignal is a portfolio-ready MVP built by Yurii Bakurov. It is designed for one local operator on their own machine: fixture data works out of the box, a local workspace profile stores that user's research defaults, and repeatable API-backed workflows can be enabled for supported public sources when credentials are provided.

Current public posture: TaskSignal is an early public application repository, not a widely adopted package. Its strongest evidence today is reproducibility, release hygiene, CI, security/privacy documentation, contributor issues, and a browser-verified demo flow. See the [demo evidence snapshot](docs/demo-evidence.md) and [Codex for OSS evidence](docs/codex-for-oss-application.md) for the current review package.

Useful starting points:

- [Product context](PRODUCT.md)

- [Architecture](docs/architecture.md)

- [API reference](docs/api.md)

- [Demo evidence snapshot](docs/demo-evidence.md)

- [Deployment notes](docs/deployment.md)

- [Data ethics](docs/data-ethics.md)

- [Source limits and terms](docs/source-limits.md)

- [Model card](docs/model-card.md)

- [Roadmap](docs/roadmap.md)

- [Threat model](docs/threat-model.md)

- [Maintainer automation plan](docs/maintainer-automation.md)

- [Codex for OSS application evidence](docs/codex-for-oss-application.md)

- [Changelog](CHANGELOG.md)

- [Contributing guide](CONTRIBUTING.md)

- [Security policy](SECURITY.md)

## Why This Exists

Most idea lists are generic. TaskSignal is a task-replacement radar: it looks for specific repeated workflows people hate doing, such as exporting Stripe data into a spreadsheet every Friday and turning it into a client report.

## Who Should Use This

TaskSignal is for maintainers, builders, indie hackers, developer-tool teams, and researchers who want a local-first way to review public pain signals before deciding what to build. It is not for scraping private communities, profiling individuals, spam, outreach automation, or replacing human product judgment.

## What It Does

- Loads demo fixture data with no API keys.

- Stores one local workspace profile with owner/focus/default research settings.

- Saves repeatable research projects with source, query, limit, labels, cadence, last run, next run, and run count.

- Reports integration readiness without exposing secret values.

- Records scan outcomes with found/saved items, detected signals, generated opportunities, and guidance when live data produces no ranked opportunity.

- Normalizes Reddit, Hacker News, GitHub Issues, Stack Exchange, and fixture-style records.

- Stores author hashes instead of raw usernames by default.

- Detects complaints, manual workflows, tool requests, workarounds, buying intent, and confusion.

- Generates local embeddings with `sentence-transformers/all-MiniLM-L6-v2` when available.

- Falls back to deterministic local vectors when the model is unavailable.

- Clusters signals with a local thematic fallback by default, with optional DBSCAN when `TASKSIGNAL_USE_SKLEARN_CLUSTERING=1`.

- Scores opportunities using frequency, recency, pain, concreteness, buying intent, feasibility, and competition penalty.

- Generates opportunity cards, full Codex-ready build prompts, and richer Codex task packs.

- Optionally enhances generated prompts through OpenAI API or local Ollama when explicitly configured.

## Architecture

```mermaid

flowchart TD

  A[Public sources and fixtures] --> B[Ingestion connectors]

  B --> C[Normalizer and deduplicator]

  C --> D[(PostgreSQL + pgvector)]

  D --> E[Pain and task detector]

  E --> F[Embedding service]

  F --> G[Thematic fallback clustering / optional DBSCAN]

  G --> H[Opportunity scoring]

  H --> I[Prompt generator]

  I --> J[FastAPI API]

  J --> K[Next.js dashboard]

```

## Tech Stack

Frontend: Next.js, TypeScript, Tailwind CSS, TanStack Query, Recharts, React Markdown, Zod-ready types.

Backend: FastAPI, Pydantic v2, SQLAlchemy 2, Alembic, PostgreSQL, pgvector, pytest, ruff, scikit-learn.

ML/NLP: sentence-transformers with local-only load when the model cache exists, deterministic fallback vectors, optional DBSCAN clustering, rule-based signal detector.

Infra: Docker Compose, Makefile, GitHub Actions CI, scheduled ingestion template.

## Quickstart

```bash

cp .env.example .env

make doctor

make up

```

Open the frontend at [http://localhost:3000](http://localhost:3000), go to Projects, save a research workflow, then run it. For a first proof path, go to Dashboard and click **Process demo data**. To use live public data, choose a source, query, and limit in **Live source**, then click **Run scan**.

If setup fails or a fresh checkout looks incomplete, run:

```bash

make doctor

```

`make doctor` checks the required files, local `.env`, Python, Node 20+, npm,

repo-local Python dev tools, fixture files, and whether generated files are

accidentally tracked. Docker is only required for the Compose quickstart.

API health check:

```bash

curl http://localhost:8000/health

```

## Local Development

Run the API and frontend separately:

```bash

cd apps/api

../../.venv/bin/uvicorn app.main:app --reload

```

```bash

cd apps/web

npm run dev

```

Run checks before publishing changes:

```bash

make test

make lint

make verify

```

The Makefile prefers repo-local Python tools in `.venv/bin`. On Apple Silicon

macOS it also prepends Homebrew Node 20 from `/opt/homebrew/opt/node@20/bin`

when available, matching the runtime required by the Next.js web app.

Run the release-readiness gate before tagging a release:

```bash

make release-check

```

Run the first-run smoke check to verify the credential-free fixture path against

a temporary database, including dashboard route wiring and task-pack export:

```bash

make smoke

```

To also boot the Next.js dev server and request `/dashboard`, run:

```bash

apps/api/.venv/bin/python -u scripts/first_run_smoke.py --with-web-server

```

Use the local CLI for headless operation:

```bash

scripts/tasksignal_cli.py readiness

scripts/tasksignal_cli.py configure-workspace --owner "Local Builder" --goal "Find developer-tool opportunities" --source hackernews --query ask --cadence daily

scripts/tasksignal_cli.py create-project --name "Track CI/CD pain" --source hackernews --query ask --cadence daily

scripts/tasksignal_cli.py run-due

scripts/tasksignal_cli.py task-pack  --output task-pack.md

```

TaskSignal does not require multi-user accounts for this local mode. The local

workspace profile is a singleton in the app database and is meant for the person

running the app on that machine.

## Distribution

TaskSignal is currently an application repository, not a published Python or npm library. Use the source checkout or Docker Compose workflow above. Reusable packages may be split out later if a stable library boundary emerges.

## Reviewer Quick Check

For a quick public review, inspect:

- [Latest release](https://github.com/Yurii201811/tasksignal/releases/tag/v0.1.3)

- [Open contributor issues](https://github.com/Yurii201811/tasksignal/issues)

- [Release-readiness workflow](https://github.com/Yurii201811/tasksignal/actions/workflows/release-check.yml)

- [Demo evidence snapshot](docs/demo-evidence.md)

- [Threat model](docs/threat-model.md)

## Repository Layout

```text

apps/api      FastAPI backend, ML pipeline, database models, tests

apps/web      Next.js dashboard, opportunity views, prompt export UI

data          Demo fixtures for local-first processing

docs          Architecture, API, deployment, ethics, and model notes

notebooks     Classifier training and evaluation workbooks

```

## Fixture Demo Mode

Fixture mode is the default. It loads records from `data/fixtures`, processes them end to end, and should generate at least five opportunity cards:

- AI-generated code audit tool

- Early-stage SaaS lead/community signal radar

- Simple onboarding drop-off analyzer

- GitHub Actions workflow debugging assistant

- Spreadsheet-to-report automation helper

## API Connector Setup

Live scans use official APIs and keep the same local-first scoring/generation pipeline as fixture mode. The unauthenticated `POST /api/scans` endpoint is restricted to public API-safe sources (`fixture` and `hackernews`) so network callers cannot spend server-side credentials or retrieve data visible to server-side tokens.

Trusted operators can still configure the internal connector pipeline with source credentials when running controlled jobs outside the public endpoint:

- `REDDIT_CLIENT_ID`, `REDDIT_CLIENT_SECRET`, `REDDIT_USER_AGENT`

- `GITHUB_TOKEN`

- `STACK_EXCHANGE_KEY`

Hacker News works without credentials through the public Firebase API. GitHub and Stack Exchange can run without keys at lower rate limits. Reddit requires OAuth credentials. No paid LLM key is required. `LLM_PROVIDER=none` is the default.

Connector credentials belong in environment variables, not source registry

records. Source registry write endpoints require `OPERATOR_SCAN_TOKEN`, reject

secret-like `config_json` keys, and read endpoints return redacted config so

local rows cannot expose token values.

`PUBLIC_SCAN_SOURCES` can narrow the public endpoint further, for example to `hackernews` only. Credentialed sources such as GitHub, Reddit, and Stack Exchange stay reserved for trusted internal scan jobs.

Browser-triggered runs of credentialed sources are available through saved

research projects only when `OPERATOR_SCAN_TOKEN` is configured on the API and

the same token is entered locally in the Projects or Integrations page. This

keeps hosted deployments from silently spending server-side credentials while

still letting trusted local operators connect APIs.

Saved projects support manual, hourly, daily, weekly, and custom-hour cadences.

TaskSignal does not hide a scheduler inside the web process. Run due projects

from the Projects page, `scripts/tasksignal_cli.py run-due`, cron, GitHub

Actions, or another explicit worker.

Optional prompt enhancement uses `LLM_PROVIDER=openai` plus `OPENAI_API_KEY`, or

`LLM_PROVIDER=ollama` plus a local Ollama server. Browser-triggered enhancement

requires `OPERATOR_SCAN_TOKEN` on the API and the matching

`X-Operator-Scan-Token` request header so network callers cannot spend

server-side model credentials. ChatGPT/Codex subscriptions do not provide

backend API credentials; TaskSignal supports subscription users by exporting

task packs they can open in their own signed-in Codex app, CLI, IDE extension,

or Codex web session.

Destructive fixture resets require `DEMO_RESET_TOKEN` and the matching `X-Demo-Reset-Token` request header. The normal dashboard demo-processing action is non-destructive by default.

## Codex And Agent Handoff

Each opportunity can export:

- A generated Codex prompt.

- An evidence bundle.

- A Codex task pack with objective, suggested MVP, score, evidence, acceptance

  criteria, privacy constraints, and recommended Codex flow.

Task packs are designed for users who want to use their own signed-in Codex app,

CLI, IDE extension, or Codex web session. They do not spend ChatGPT/Codex plan

usage from the TaskSignal backend. A repo-local skill package is available at

`skills/tasksignal-opportunity-builder` for agents that can load Codex-style

skills.

## ML/NLP Approach

The MVP uses transparent rules first. It scores pain phrases, repetition phrases, tool requests, buying intent, and task concreteness hints. Embeddings use `sentence-transformers/all-MiniLM-L6-v2` only when locally available; otherwise deterministic vectors keep the demo working.

## Scoring Formula

```text

opportunity_score =

  0.25 * frequency_score

+ 0.20 * recency_score

+ 0.20 * pain_intensity_score

+ 0.15 * task_concreteness_score

+ 0.10 * buying_intent_score

+ 0.10 * feasibility_score

- 0.10 * competition_penalty

```

## Privacy And Ethics

TaskSignal is designed for public-data research, product discovery, and learning. It does not store raw usernames by default, preserves source URLs for attribution, respects API boundaries, and should not be used for spam or harassment workflows.

Before enabling live connectors, review [Data ethics](docs/data-ethics.md), configure API credentials through environment variables or GitHub repository secrets, and avoid committing `.env` files or exported datasets.

## Example Generated Opportunity

**Developers need clearer GitHub Actions failure diagnosis**

Problem: teams spend repetitive time reading noisy CI logs, searching YAML errors, and guessing root causes.

Suggested MVP: a CI log summarizer and workflow linter that identifies likely YAML mistakes, dependency failures, and next fixes.

## Example Generated Codex Prompt

```markdown

# Build Developers need clearer GitHub Actions failure diagnosis

You are a senior full-stack engineer. Build a working MVP...

```

## Portfolio Notes

This repository demonstrates full-stack engineering, API design, Python backend development, TypeScript frontend development, PostgreSQL/pgvector modeling, ML/NLP pipelines, clustering, product scoring, privacy-conscious design, Docker, CI/CD, tests, and technical writing.

## Roadmap

- Publish and maintain tagged releases with changelog entries.

- Expand contributor-friendly fixtures, docs, and public issues.

- Add richer source scheduling and rate-limit state after privacy review.

- Add pgvector ANN search in production mode.

- Add reviewer workflow for human labels.

See [Roadmap](docs/roadmap.md) for maintainer tasks, security milestones, and longer-term ideas.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Yurii201811/tasksignal

Awesome Lists containing this project

README