Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/gschier/atp-crawler

Simple web crawler to fetch all show titles of ATP podcasts (atp.fm)
https://github.com/gschier/atp-crawler

Last synced: 21 days ago
JSON representation

Simple web crawler to fetch all show titles of ATP podcasts (atp.fm)

Host: GitHub
URL: https://github.com/gschier/atp-crawler
Owner: gschier
Created: 2014-06-11T07:24:03.000Z (over 10 years ago)
Default Branch: master
Last Pushed: 2014-06-11T07:43:53.000Z (over 10 years ago)
Last Synced: 2024-11-09T20:38:15.886Z (2 months ago)
Language: JavaScript
Size: 152 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        ATP.fm Web Crawler

==================

Simple web crawler to fetch all show titles of ATP podcasts (atp.fm)

50 lines of code on top of ~165,000 lines of dependencies (for Casey)

## What does it use?

- [request](https://github.com/mikeal/request) (HTTP client)

- [domino](https://github.com/fgnass/domino) (server-side DOM)

- [zepto-node](https://github.com/fgnass/zepto-node) (JQuery-like library)

## How does it work?

1. fetch base URL of atp.fm

2. download HTML

3. build DOM object with Domino

4. select show titles with Zepto

5. select next page link with Zepto

6. repeat with URL of next page

## Usage

```shell

$ npm install

$ npm start

```

## Todo

* make it fetch more metadata

## Author

[@GregorySchier](http://twitter.com/gregoryschier) - [schier.co](http://schier.co)