{"id":28477496,"url":"https://github.com/dimitar0528/crawlitics","last_synced_at":"2025-09-08T11:19:23.979Z","repository":{"id":297067909,"uuid":"995338043","full_name":"Dimitar0528/Crawlitics","owner":"Dimitar0528","description":"An AI-powered Next.js and Python-based ecommerce web crawler, scraper and data-analyst platform that transforms scattered product data into clear market insights.","archived":false,"fork":false,"pushed_at":"2025-08-31T19:40:18.000Z","size":1254,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-31T21:26:45.938Z","etag":null,"topics":["crawler","nextjs","product-analysis","python","scraper"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Dimitar0528.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-06-03T10:28:15.000Z","updated_at":"2025-08-31T19:40:21.000Z","dependencies_parsed_at":"2025-07-03T10:22:13.568Z","dependency_job_id":"19f94e1e-ba73-43fd-b129-68777f8b6755","html_url":"https://github.com/Dimitar0528/Crawlitics","commit_stats":null,"previous_names":["dimitar0528/crawlitics"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Dimitar0528/Crawlitics","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dimitar0528%2FCrawlitics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dimitar0528%2FCrawlitics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dimitar0528%2FCrawlitics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dimitar0528%2FCrawlitics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Dimitar0528","download_url":"https://codeload.github.com/Dimitar0528/Crawlitics/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dimitar0528%2FCrawlitics/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274174602,"owners_count":25235280,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-08T02:00:09.813Z","response_time":121,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","nextjs","product-analysis","python","scraper"],"created_at":"2025-06-07T16:36:25.611Z","updated_at":"2025-09-08T11:19:23.964Z","avatar_url":"https://github.com/Dimitar0528.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"#  Crawlitics\n\n## Table of Contents\n- [Introduction](#introduction)\n- [Features](#features)\n- [Installation](#installation)\n- [Local LLM Integration (Ollama)](#local-llm-integration-ollama)\n- [Roadmap](#roadmap)\n- [License](#license)\n\n## Introduction\n\n**Crawlitics** is an AI-powered eCommerce intelligence platform, built with a modern Next.js and Python stack. It is designed to autonomously crawl, extract, and analyze product data from multiple online retailers, transforming chaotic web information into structured, actionable insights.\n\nThe platform leverages a powerful Python backend for asynchronous web automation using Playwright, enhanced NLP-powered matching via sentence-transformers, cosine similarity, and fuzzy matching, with modular scraping powered by Crawl4AI. \nIt also features structured data extraction and multiple product analysis AI agents, implemented using the Agno framework and powered by the Qwen3 local LLM model (via Ollama). \n\nThe fast, responsive Next.js frontend provides a sophisticated and intuitive UI for searching, data visualization and product comparison, featuring interactive charts, dynamic forms, and more. On the backend, a combination of Next.js API Routes and Server Actions serves as the powerful orchestration layer, used to handle data orchestration, ensuring seamless communication with the Python backend.\n\nCrawlitics is purpose-built for collecting, analyzing, and comparing product data at scale, providing users with unparalleled market clarity.\n\n---\n\n## Features\n- Full-stack Next.js 15 app with React frontend (using TailwindCSS and shadcn/ui components) and a robust backend layer for data fetching, API routes, and Server Actions.\n- Robust data validation \u0026 typing via Zod, which is used as the single source of truth for all data structures, providing end-to-end type safety and runtime validation.\n- Asynchronous crawling \u0026 Ssraping, which employs Crawl4AI and Playwright to efficiently crawl and process multiple eCommerce platforms concurrently.\n- Automated and intelligent UI interaction via Playwright, including dynamic handling of user-defined filters (brand, model, RAM, storage, color, price range), sliders, pagination, and more.\n- Seamless integration with Ollama’s Qwen3 model for extracting structured data and performing product analysis, using the Ollama API combined with the Agno framework.\n- Secure and scalable foundation for storing product data using PostgreSQL\n- Dynamic concurrency and auto-throttling\n- Extensible architecture logic with site-specific configs and selectors\n---\n\n## Installation\n1. Clone the repo\n```bash\ngit clone https://github.com/Dimitar0528/Crawlitics.git\ncd Crawlitics\n```\n2. Install Python dependencies:\n```bash\ncd python-backend\npip install -r requirements.txt\n```\n3. Create a .env file with the following data needed to set up your PostgreSQL database:\n```bash\nDB_HOST=YOUR_HOST\nDB_PORT=YOUR_PORT\nDB_NAME=YOUR_DB_NAME\nDB_USER=YOUR_DB_USER\nDB_PASSWORD=YOUR_DB_PASSWORD\n```\n4. Start the FastAPI server (on port 8000):\n``` bash\nuvicorn main:app --reload\n```\n5. Install Next.js dependencies:\n```bash\ncd..\ncd nextjs-full-stack\nnpm install\n```\n6. Start the Next.js frontend (on port 3000):\n```bash\nnpm run dev\n```\n\n##  Local LLM Integration (Ollama)\nCrawlitics supports local large language model inference using Ollama. There are two main ways to install and use Ollama:\n### Option 1: Native Install\n- Download Ollama from https://ollama.com\n- Install and / or run a LLM model (e.g. Qwen3, Gemma3 or others) with:\n```bash\nollama run qwen3:4b\n```\n### Option 2: Docker\n- Download Ollama Docker Image via:\n```bash\ndocker pull ollama/ollama\n```\n- Start the Docker container (providing GPU acess) with:\n```bash\ndocker run --rm --gpus=all -d -v ollama_data:/root/.ollama -p 11434:11434 --name ollama ollama/ollama \n```\n- Install and / or run a LLM model with:\n```bash\ndocker exec -it ollama ollama run qwen3:4b \n```\n\n## Roadmap\n- [✅] Semantic matching for multiple types of user filters (brand, RAM, storage, color, etc.)\n- [✅] Dynamic JSON schema generation for structured product output\n- [✅] LLM-powered AI agent for interactive product data analysis\n- [🔜] Enable users to create accounts, save searches, and track their favorite products and alerts.\n- [🔜] Show historical price charts to help users decide when to buy based on past trends.\n\n## License\n\nThis project is licensed under the [MIT License](https://github.com/Dimitar0528/Crawlitics?tab=MIT-1-ov-file)\n\nIt also includes components from the Crawl4AI project, which are licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).\n\n\u003e This product includes software developed by UncleCode (https://x.com/unclecode) as part of the Crawl4AI project (https://github.com/unclecode/crawl4ai).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdimitar0528%2Fcrawlitics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdimitar0528%2Fcrawlitics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdimitar0528%2Fcrawlitics/lists"}