{"id":15986058,"url":"https://github.com/coderhs/leo","last_synced_at":"2025-04-04T21:26:11.947Z","repository":{"id":145141347,"uuid":"67584303","full_name":"coderhs/leo","owner":"coderhs","description":"Web Scrapper API","archived":false,"fork":false,"pushed_at":"2016-09-07T08:32:45.000Z","size":29,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-10T06:13:06.249Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/coderhs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-09-07T07:46:36.000Z","updated_at":"2016-09-07T07:47:10.000Z","dependencies_parsed_at":null,"dependency_job_id":"26f3e0e4-f712-43a9-8045-6357a55a3873","html_url":"https://github.com/coderhs/leo","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coderhs%2Fleo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coderhs%2Fleo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coderhs%2Fleo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coderhs%2Fleo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/coderhs","download_url":"https://codeload.github.com/coderhs/leo/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247251003,"owners_count":20908442,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-08T02:42:23.071Z","updated_at":"2025-04-04T21:26:11.927Z","avatar_url":"https://github.com/coderhs.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# README\n\nThis is an API only Ruby on Rails application used to scrape h1, h2, h3 tags and links present in a URL.\n\n## API End Points\n\n```\nPOST '/v1/websites', params: domain (submit with http/https)\nGET  '/v1/wsbites/:key', a key of the job task\nGET '/v1/websites', to display all the websites presently scraped\n```\n\nTo make the application run faster, I have used background jobs to scrape the result. When a user\nsubmits a domain, a job would be created. The result URL would be send as response if the job has been created. The user can check the result url for the status as well.\n\n\n## Example\n\n**Submit a domain**\n\n```sh\n# command\ncurl -X POST \"http://localhost:3000/v1/websites?domain=https://simple.wikipedia.org/wiki/Wikipedia\"\n```\n\n```json\n{\"result\":{\"domain\":\"https://simple.wikipedia.org/wiki/Wikipedia\",\"status\":\"PENDING\",\"result_url\":\"http://localhost:3000/v1/website/389b76561f52f5f0337742b68354c106\"}}\n```\n\n**Fetch Result**\n\n```sh\n# command\ncurl http://localhost:3000/v1/websites/389b76561f52f5f0337742b68354c106\n```\n\nResult:\nhttps://gist.github.com/coderhs/9d84b96875fa996a7a80195cbe96425f\n\n***Display all Website***\n\n```sh\ncurl http://localhost:3000/v1/websites\n```\n\n```json\n[\n  {\n    domain: \"http://csnipp.com\",\n    status: \"COMPLETED\",\n    result_url: \"http://localhost:3000/v1/websites/3bae0b276b4c475c1e6bd43f2266b80e\"\n  },\n  {\n    domain: \"https://redpanthers.co\",\n    status: \"COMPLETED\",\n    result_url: \"http://localhost:3000/v1/websites/e14dd438487e385054747f1091e86a2e\"\n  },\n  {\n    domain: \"https://simple.wikipedia.org/wiki/Wikipedia\",\n    status: \"COMPLETED\",\n    result_url: \"http://localhost:3000/v1/websites/389b76561f52f5f0337742b68354c106\"\n  }\n]\n```\n\n\n## ToDO:\n\nImplement Priority Queue: Presently all the scraping is done through a single queue. Which is not good when a lot of users are using our website. So we need to create a priority queue system where we can let people submit to another queue if they need something quick.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoderhs%2Fleo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcoderhs%2Fleo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoderhs%2Fleo/lists"}