{"id":24960993,"url":"https://github.com/nullnull/scraping_sample","last_synced_at":"2026-04-13T00:21:14.880Z","repository":{"id":39874243,"uuid":"148907001","full_name":"nullnull/scraping_sample","owner":"nullnull","description":"An easy and powerful template with the minimun you need to start web scraping with Ruby + Selenium + Docker + Google Kubernetes Engine","archived":false,"fork":false,"pushed_at":"2023-04-12T05:20:42.000Z","size":19,"stargazers_count":1,"open_issues_count":3,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-07-24T21:05:07.080Z","etag":null,"topics":["docker","gke","kubernetes","ruby","selenium"],"latest_commit_sha":null,"homepage":"https://qiita.com/nullnull/items/61dae392f853f260cfb0","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nullnull.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-09-15T14:02:27.000Z","updated_at":"2020-05-17T14:30:01.000Z","dependencies_parsed_at":"2023-02-16T13:16:07.359Z","dependency_job_id":null,"html_url":"https://github.com/nullnull/scraping_sample","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/nullnull/scraping_sample","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nullnull%2Fscraping_sample","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nullnull%2Fscraping_sample/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nullnull%2Fscraping_sample/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nullnull%2Fscraping_sample/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nullnull","download_url":"https://codeload.github.com/nullnull/scraping_sample/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nullnull%2Fscraping_sample/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267150480,"owners_count":24043473,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-26T02:00:08.937Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","gke","kubernetes","ruby","selenium"],"created_at":"2025-02-03T08:45:13.946Z","updated_at":"2026-04-13T00:21:14.843Z","avatar_url":"https://github.com/nullnull.png","language":"Ruby","readme":"# Scraping Sample\nAn easy and powerful template with the minimun you need to start web scraping with Ruby + Selenium + Docker + Google Kubernetes Engine\n\n## Setup for local development on docker\n```sh\ngit clone git@github.com:nullnull/scraping_sample.git\ncd scraping_sample\ndocker-compose build\ndocker-compose up -d\ndocker-compose exec scraper sh setup.sh\ndocker-compose exec scraper bundle exec ruby app/fetch_search_results.rb\n```\n\n## Monitor scraping progress wth VNC\nYou can use [VNC server](https://qiita.com/yszk0123/items/840f16ed388fb52b0e21) to monitor selenium. Run `open vnc://localhost:5900/` and type `secret` for password.\n\n## Deploy and Run Scraper\n```sh\n$ sh cronjob.sh\n\n# to check progress with VNC\n$ kubectl get pods\n$ kubectl port-forward pod/\u003cpod-id\u003e 5900 5900\n$ open vnc://localhost:5900/\n```\n\n## Slack Integration (Optional)\nSet your webhook url to `SLACK_WEBHOOK_URL` on `docker-compose.yml` and `kube/cronjob.yml` / `kube/deploy.yml`.\n\n## Data Visualization (Optional)\nWe recommends [redash](https://redash.io/) to visualize your scraping results. It's easy to setup and have powerful visualization features.\n\n##### Run redash on GCE\nhttps://redash.io/help/open-source/setup\n\n```sh\n$ gcloud compute images create \"redash-2-0-0\" --source-uri gs://redash-images/redash.2.0.0.b2990.tar.gz\n$ gcloud compute instances create redash \\\n--image redash-2-0-0 --scopes storage-ro,bigquery \\\n--machine-type g1-small --zone asia-east1-a\n# and please finsh configuraton on your GCE console.\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnullnull%2Fscraping_sample","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnullnull%2Fscraping_sample","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnullnull%2Fscraping_sample/lists"}