Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kevincolemaninc/mm-crawler
Scrapes meetme user profiles
https://github.com/kevincolemaninc/mm-crawler
crawler docker fake-data meetme ruby scraper sidekiq
Last synced: 19 days ago
JSON representation
Scrapes meetme user profiles
- Host: GitHub
- URL: https://github.com/kevincolemaninc/mm-crawler
- Owner: KevinColemanInc
- Created: 2017-12-26T15:13:05.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2017-12-31T17:30:13.000Z (about 7 years ago)
- Last Synced: 2024-11-08T19:53:05.030Z (2 months ago)
- Topics: crawler, docker, fake-data, meetme, ruby, scraper, sidekiq
- Language: Ruby
- Homepage:
- Size: 4.88 KB
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# MM Crawler
This project scrapes profiles from meetme.com using docker. Follow the quick start and it will create an API for accessing all of the created profiles to be hosted on netlify.
Scrapes profiles between ages 19 and 35.
## Quick start
### Required ENV variables
- MM_EMAIL=
- MM_PASSWORD=### Steps
Note the locations in `app.rb`.
1. `docker-compose up --build`
1. ssh into docker host
1. run `ruby app.rb`
1. wait for sidekiq jobs to finish (takes about 1-2 hours)
1. run `ruby consolidate.rb` - creates `/results/profiles.json` with all crawled profile info and relative photo paths### Netlify deploy
In the `~/Desktop/results`, run `netlify deploy`. Make note of the domain.
## Workers
1. `nearby_crawler` - fetches the profiles nearby and stores them. Queues fetching photo json
1. `get_photo_jsons(member_id)` - returns json of photos
1. `get_photo(url)` - persists photosOutput folder structure (in Docker):
`/results/`
- {member_id}.json
- {member_id}_{photo_id}.jpg