{"id":31286928,"url":"https://github.com/imrany/spindle","last_synced_at":"2025-09-24T10:58:05.049Z","repository":{"id":315736845,"uuid":"1060658190","full_name":"imrany/spindle","owner":"imrany","description":"An open-source, lightweight web crawler and scraper.  It can discover links on the web (crawler) and extract structured data from webpages (scraper).","archived":false,"fork":false,"pushed_at":"2025-09-20T11:59:04.000Z","size":9,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-20T12:28:55.258Z","etag":null,"topics":["crawler","go","golang","scraper"],"latest_commit_sha":null,"homepage":"https://spindle.villebiz.com","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/imrany.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml ","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-20T10:26:14.000Z","updated_at":"2025-09-20T11:59:08.000Z","dependencies_parsed_at":"2025-09-20T12:29:00.332Z","dependency_job_id":"e397de15-9bc5-4044-8267-926c48a63db2","html_url":"https://github.com/imrany/spindle","commit_stats":null,"previous_names":["imrany/spindle"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/imrany/spindle","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imrany%2Fspindle","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imrany%2Fspindle/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imrany%2Fspindle/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imrany%2Fspindle/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/imrany","download_url":"https://codeload.github.com/imrany/spindle/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imrany%2Fspindle/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":276737506,"owners_count":25695699,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-24T02:00:09.776Z","response_time":97,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","go","golang","scraper"],"created_at":"2025-09-24T10:57:59.682Z","updated_at":"2025-09-24T10:58:05.044Z","avatar_url":"https://github.com/imrany.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🕸️ Spindle\n\n**Spindle** is an open-source, lightweight **web crawler and scraper**.\nIt can discover links on the web (_crawler_) and extract structured data from webpages (_scraper_).\n\n## ✨ Purpose\n\n- **Crawler** → Navigates pages and discovers new URLs.\n- **Scraper** → Extracts specific information from a given page (title, description, favicon, links, images, and videos).\n\nTogether: Spindle explores **what to scrape** and **extracts the data you care about**.\n\n## ⚙️ How It Works\n\n1. Takes an input URL (from CLI or API).\n2. Fetches the HTML.\n3. Extracts structured data:\n\n   - Title\n   - Description\n   - Links\n   - Favicon\n   - Images\n   - Videos (if available)\n\n4. In crawler mode, follows links to discover additional pages.\n\n\u003e **GitHub Container Registry**: available at `ghcr.io/imrany/spindle`\n\n### Run on docker\n\n```bash\n## Pull the docker image\ndocker pull ghcr.io/imrany/spindle:latest\n\n## Runs the image, creating a spindle container\ndocker run -d --name spindle --restart unless-stopped -p 5020:5020 -v ~/.spindle:/var/opt/spindle ghcr.io/imrany/spindle server\n```\n\n## 📦 Build\n\n```bash\n# Clone the repository\ngit clone https://github.com/imrany/spindle.git\ncd spindle\n\n# Install Go dependencies\ngo mod download\n\n# Build\ngo build main.go\n```\n\n## 🚀 Usage\n\n### 🔹 Scrape URL in CLI\n\n```bash\ngo run main.go https://www.youtube.com/watch?v=pum3k4yECT4\n```\n\n**Response (truncated for readability):**\n\n```bash\nTitle: America Is In Trouble.. Candace Owens Might Be Cooked \u0026 Zuck Got Massively Embarrassed! - YouTube\nDescription: THIS WEEK ON NEWSDADDYYYY!!! 🥤🍿**JIMMY KIMMEL — ABC PULLS THE PLUG**Jimmy Kimmel’s late-night show was pulled from the schedule after his comments about Ch...\nFavicon: https://www.youtube.com/s/desktop/2ea5cbbe/img/favicon_144x144.png\nVideo: \nLinks: [https://www.youtube.com/ https://www.youtube.com/ https://www.youtube.com/about/ https://www.youtube.com/about/press/ https://www.youtube.com/about/copyright/ https://www.youtube.com/t/contact_us/ https://www.youtube.com/creators/ https://www.youtube.com/ads/ https://developers.google.com/youtube https://www.youtube.com/t/terms https://www.youtube.com/t/privacy https://www.youtube.com/about/policies/ https://www.youtube.com/howyoutubeworks?utm_campaign=ytgen\u0026utm_source=ythp\u0026utm_medium=LeftNav\u0026utm_content=txt\u0026u=https%3A%2F%2Fwww.youtube.com%2Fhowyoutubeworks%3Futm_source%3Dythp%26utm_medium%3DLeftNav%26utm_campaign%3Dytgen https://www.youtube.com/new]\nImages: [https://i.ytimg.com/vi/pum3k4yECT4/maxresdefault.jpg https://i.ytimg.com/vi/pum3k4yECT4/maxresdefault.jpg]\n```\n\n### 🔹 Run in Server Mode\n\nStart the server (defaults: `0.0.0.0:5020`):\n\n```bash\ngo run main.go server --addr=0.0.0.0 --port=5020\n```\n\nTest with `curl` or browser:\n\n```bash\n# Default (English)\ncurl \"http://localhost:5020/scrape?url=https://www.youtube.com/watch?v=pum3k4yECT4\"\n\n# Force French\ncurl \"http://localhost:5020/scrape?url=https://www.youtube.com/watch?v=pum3k4yECT4\u0026lang=fr\"\n\n# Force German\ncurl \"http://localhost:5020/scrape?url=https://www.youtube.com/watch?v=pum3k4yECT4\u0026lang=de\"\n```\n\n**JSON Response:**\n\n```json\n{\n  \"title\": \"America Is In Trouble.. Candace Owens Might Be Cooked \\u0026 Zuck Got Massively Embarrassed! - YouTube\",\n  \"description\": \"THIS WEEK ON NEWSDADDYYYY!!! 🥤🍿**JIMMY KIMMEL — ABC PULLS THE PLUG**Jimmy Kimmel’s late-night show was pulled from the schedule after his comments about Ch...\",\n  \"links\": [\n    \"https://www.youtube.com/\",\n    \"https://www.youtube.com/\",\n    \"https://www.youtube.com/about/\",\n    \"https://www.youtube.com/about/press/\",\n    \"https://www.youtube.com/about/copyright/\",\n    \"https://www.youtube.com/t/contact_us/\",\n    \"https://www.youtube.com/creators/\",\n    \"https://www.youtube.com/ads/\",\n    \"https://developers.google.com/youtube\",\n    \"https://www.youtube.com/t/terms\",\n    \"https://www.youtube.com/t/privacy\",\n    \"https://www.youtube.com/about/policies/\",\n    \"https://www.youtube.com/howyoutubeworks?utm_campaign=ytgen\\u0026utm_source=ythp\\u0026utm_medium=LeftNav\\u0026utm_content=txt\\u0026u=https%3A%2F%2Fwww.youtube.com%2Fhowyoutubeworks%3Futm_source%3Dythp%26utm_medium%3DLeftNav%26utm_campaign%3Dytgen\",\n    \"https://www.youtube.com/new\"\n  ],\n  \"favicon\": \"https://www.youtube.com/s/desktop/2ea5cbbe/img/favicon_144x144.png\",\n  \"images\": [\n    \"https://i.ytimg.com/vi/pum3k4yECT4/maxresdefault.jpg\",\n    \"https://i.ytimg.com/vi/pum3k4yECT4/maxresdefault.jpg\"\n  ],\n  \"preview_image\": \"https://i.ytimg.com/vi/pum3k4yECT4/maxresdefault.jpg\",\n  \"video\": \"\"\n}\n```\n\n## 🔍 Features\n\n- CLI and API modes.\n- Extracts metadata (title, description, favicon, images, videos).\n- Lightweight crawler for link discovery.\n- JSON API for integration into other services.\n\n## 📖 Example Use Cases\n\n- Preview cards for links in chat apps.\n- SEO or content analysis.\n- Building your own search index.\n- Research \u0026 data mining.\n\n## 🗺️ Roadmap\n\n- [ ] Respect `robots.txt` for crawler.\n- [ ] Add caching \u0026 rate limiting.\n- [ ] Support deeper recursive crawling.\n- [ ] Extract Open Graph / Twitter Card metadata.\n\n## Contributing\n\nSpindle is an open-source project that welcomes contributions from developers, designers, and users worldwide. We maintain a collaborative and inclusive development environment that values quality, innovation, and community feedback.\n\n### Ways to Contribute\n\n- **Code Contributions**: Bug fixes, feature implementations, and performance improvements\n- **Documentation**: API documentation, user guides, and technical specifications\n- **Testing**: Quality assurance, test case development, and bug reporting\n- **Localization**: Translation support for multiple languages and regions\n- **Community Support**: Helping users on GitHub discussions, and forums\n\n## License\n\nSpindle is released under the MIT License, providing maximum flexibility for both personal and commercial use. This license allows for:\n\n- **Commercial Use**: Deploy Spindle in commercial environments without licensing fees\n- **Modification**: Adapt and customize the codebase for specific requirements\n- **Distribution**: Share modified versions while maintaining license attribution\n- **Private Use**: Use Spindle internally without disclosure requirements\n\nSee the [LICENSE](./LICENSE) file for complete licensing terms.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimrany%2Fspindle","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fimrany%2Fspindle","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimrany%2Fspindle/lists"}