{"id":31126921,"url":"https://github.com/basemax/okala-database-crawler","last_synced_at":"2026-05-01T21:34:30.972Z","repository":{"id":307242290,"uuid":"1025130169","full_name":"BaseMax/okala-database-crawler","owner":"BaseMax","description":"A robust, UTF-8 compliant PHP-based crawler designed to extract structured product data from Okala. This tool efficiently scrapes and saves store information, category slugs, and detailed product listings into organized JSON files. Ideal for data analysis, backup, or integration into other systems.","archived":false,"fork":false,"pushed_at":"2025-07-23T22:35:56.000Z","size":13,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-06T12:56:17.797Z","etag":null,"topics":["crawler","crawler-php","curl","data","json","okala","okala-com","okalacom","php","php-crawler","scraper"],"latest_commit_sha":null,"homepage":"","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BaseMax.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-23T19:14:05.000Z","updated_at":"2025-07-23T22:35:59.000Z","dependencies_parsed_at":"2025-07-30T08:29:46.805Z","dependency_job_id":"c7a28668-2828-4932-86fb-6659e451b1da","html_url":"https://github.com/BaseMax/okala-database-crawler","commit_stats":null,"previous_names":["basemax/okala-database-crawler"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/BaseMax/okala-database-crawler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BaseMax%2Fokala-database-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BaseMax%2Fokala-database-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BaseMax%2Fokala-database-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BaseMax%2Fokala-database-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BaseMax","download_url":"https://codeload.github.com/BaseMax/okala-database-crawler/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BaseMax%2Fokala-database-crawler/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275680445,"owners_count":25508570,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-17T02:00:09.119Z","response_time":84,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","crawler-php","curl","data","json","okala","okala-com","okalacom","php","php-crawler","scraper"],"created_at":"2025-09-17T23:01:29.763Z","updated_at":"2025-09-17T23:02:30.309Z","avatar_url":"https://github.com/BaseMax.png","language":"PHP","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🛒 Okala Database Crawler\n\nA robust PHP-based crawler to extract and save product data from [Okala](https://www.okala.com/) including stores, categories, and product details in structured JSON format.\n\n---\n\n## 📦 What It Does\n\n- Crawls multiple **store pages** from Okala\n- Iterates through multiple **category slugs**\n- Downloads and stores:\n  - Product **search result pages**\n  - Product **detail.json**\n  - Product **features.json**\n- Saves all data in structured folders under `/data/`\n- Fully supports **UTF-8/Persian** characters\n- Respects existing files to avoid redundant requests\n\n---\n\n## 🗂 Folder Structure\n\n```\ndata/\n├── search/\n│   └── {store_id}/{category_slug}/{page}.json\n├── product/\n│   └── {product_id}/\n│       ├── features.json\n│       └── {store_id}/detail.json\n\n````\n\n---\n\n## 🚀 Usage\n\n### ✅ Requirements\n\n- PHP 7.4+ with `curl` and `json` extensions\n- Git (for automated commit + push loop)\n- Internet access\n\n### 📥 Clone the Repo\n\n```bash\ngit clone https://github.com/BaseMax/okala-database-crawler.git\ncd okala-database-crawler\n````\n\n### 🧪 Run the Crawler\n\n```bash\nphp crawler.php\n```\n\n### 🔁 Auto Git Push (Optional)\n\nTo automatically commit and push updated JSON data every 5 minutes:\n\n```bash\ncrawler.bat\n```\n\n\u003e 💡 Useful when you're running long crawling jobs and want a backup of progress on GitHub.\n\n---\n\n## 🛠 Customization\n\nYou can edit the following in `crawler.php`:\n\n* **Stores list** (`$stores`)\n* **Categories list** (`$categories`)\n* **Fetch delay** (`usleep(250_000)` for 250ms between requests)\n\n---\n\n## 🧼 Features\n\n* ✅ Automatic file structure and directory creation\n* ✅ Skips already downloaded data (but still verifies products)\n* ✅ Handles self-signed SSL issues via cURL\n* ✅ UTF-8 safe JSON storage (e.g., Persian: فارسی)\n* ✅ Color-coded CLI output for easier tracking\n\n---\n\n## 🤝 Contributions\n\nPRs welcome! Please fork the repo and submit your improvements.\n\n---\n\n## 📬 Contact\n\nHave questions or ideas? Reach out via [GitHub Issues](https://github.com/BaseMax/okala-database-crawler/issues).\n\n---\n\n## 📄 License\n\nMIT License\n\n© 2025 [Max Base](https://github.com/BaseMax)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbasemax%2Fokala-database-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbasemax%2Fokala-database-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbasemax%2Fokala-database-crawler/lists"}