Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gschier/atp-crawler
Simple web crawler to fetch all show titles of ATP podcasts (atp.fm)
https://github.com/gschier/atp-crawler
Last synced: 21 days ago
JSON representation
Simple web crawler to fetch all show titles of ATP podcasts (atp.fm)
- Host: GitHub
- URL: https://github.com/gschier/atp-crawler
- Owner: gschier
- Created: 2014-06-11T07:24:03.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2014-06-11T07:43:53.000Z (over 10 years ago)
- Last Synced: 2024-11-09T20:38:15.886Z (2 months ago)
- Language: JavaScript
- Size: 152 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
ATP.fm Web Crawler
==================Simple web crawler to fetch all show titles of ATP podcasts (atp.fm)
50 lines of code on top of ~165,000 lines of dependencies (for Casey)
## What does it use?
- [request](https://github.com/mikeal/request) (HTTP client)
- [domino](https://github.com/fgnass/domino) (server-side DOM)
- [zepto-node](https://github.com/fgnass/zepto-node) (JQuery-like library)## How does it work?
1. fetch base URL of atp.fm
2. download HTML
3. build DOM object with Domino
4. select show titles with Zepto
5. select next page link with Zepto
6. repeat with URL of next page## Usage
```shell
$ npm install
$ npm start
```## Todo
* make it fetch more metadata
## Author
[@GregorySchier](http://twitter.com/gregoryschier) - [schier.co](http://schier.co)