Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/busbud/coding-challenge-crawler-b
https://github.com/busbud/coding-challenge-crawler-b
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/busbud/coding-challenge-crawler-b
- Owner: busbud
- Created: 2014-01-13T18:23:20.000Z (almost 11 years ago)
- Default Branch: master
- Last Pushed: 2014-05-01T16:58:37.000Z (over 10 years ago)
- Last Synced: 2024-07-24T22:12:57.495Z (4 months ago)
- Language: Python
- Size: 300 KB
- Stars: 0
- Watchers: 2
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-recruitment-tests - Busbud - Write a crawler for BoltBus that can extract a list of stops, a list of routes, and a list of departures. (Python)
README
# Objective
Write a crawler for BoltBus that can extract:
* list of stops
* list of routes
* list of departures## Step 1: get stops
This process is done once to create a mapping between their stops and ours.
From this [page](http://bit.ly/1mbczAX), get the list of all bus stops including the address, latitude and longitude. The output should be stored in a `stops.json` and the schema should be something like this:
```json
[
{
"stop_name": "New York W 33rd St & 11-12th Ave (DC,BAL,BOS,PHL)",
"stop_location":"611 W. 33rd Street, New York, NY, 10001",
"lat":40.755661,
"long":-74.00337
}
]
```The step should be invoked like so
```sh
python run.py --extract stops --output stops.json
```## Step 2: get routes
This process is done once to get the list of all possible routes with departures from this operator.
From this [page](http://bit.ly/1mbczAX), get the list of all possible routes, i.e., all valid stop pairings. The output should stored in `routes.json` and the schema should be something like this:
```json
[
{
"origin": "New York W 33rd St & 11-12th Ave (DC,BAL,BOS,PHL)",
"destination":"Boston South Station - Gate 9 NYC-Gate 10 NWK/PHL",
},
{
"origin": "New York 1st Ave Between 38th & 39th (To BOS)",
"destination":"Boston South Station - Gate 9 NYC-Gate 10 NWK/PHL",
}
]
```The step should be invoked like so
```sh
python run.py --extract routes --output routes.json
```## Step 3: get departures
This process is done repeatedly to update our database of departure. As an input, this function should accept an origin, destination and a range of dates for which departures will be returned.
1. Go to this [page](http://bit.ly/1hXalGT)
1. For each route in the list generated in step 2, get the list of all one-way departures for one passenger.
1. For each departure extract the following information:* departure time
* arrival time
* duration
* adult one-way priceThe output should be stored in `departures.json` and the schema should look something like this:
```json
[
{
"origin": "New York 1st Ave Between 38th & 39th (To BOS)",
"destination": "Boston South Station - Gate 9 NYC-Gate 10 NWK/PHL",
"departure_time": "2014-05-01T18:30:00",
"arrival_time": "2014-05-01T22:45:00",
"duration": "4:15",
"price": 23.00
}
]
```The step should be invoked like so
```sh
python run.py --extract departures --output departures.json --startdate 2013-11-13 --enddate 2013-11-20
```# Non-functional requirements
* the code should be written in Python and compatible with Python 2.7.
* the code should be hosted on github, and the repo should be shared with Busbud
* the code should be written in a way that it can easily be extended to become a scheduled process that updates our
database of departure
* any packages required must be installable via `pip install -r requirements.txt`, see [pip](http://www.pip-installer.org/en/latest/)