An open API service indexing awesome lists of open source software.

https://github.com/owainlewis/falkor

Open Source web scraping API. Falkor turns web pages into queryable JSON
https://github.com/owainlewis/falkor

webscraping webscrapper

Last synced: about 1 year ago
JSON representation

Open Source web scraping API. Falkor turns web pages into queryable JSON

Awesome Lists containing this project

README

          

# Falkor

A web service for turning HTML pages into traversable JSON documents

Very early stage development. If you have any feature requests just create an issue on the project

## Getting started

Running the server locally

```
lein uberjar
docker build -t falkor .
docker run -t falkor

# Visit http://localhost:5000
```

## Comming soon

+ Better error handling
+ CORS
+ Query filtering (return only certain attributes)
+ Fetching multiple elements in a single request ( e.g [h1 > a, .subtitle] )

## Usage

Get all the title links from the Reddit.com home page

https://falkor-api.herokuapp.com/api/query?url=http://reddit.com&query=a.title

Grab all the news stories from Digg.com

https://falkor-api.herokuapp.com/api/query?url=http://digg.com&query=.story-title%20a

Extract all the images from Digg.com

https://falkor-api.herokuapp.com/api/query?url=http://digg.com&query=img[src]

## TODO

Filters to remove some of the attribute cruft

For example if we just want to extract the text for an element and ignore the other attributes

```
&filter=[text]
```

## License

Copyright © 2015 Forward Digital Limited

Distributed under the Eclipse Public License either version 1.0 or (at
your option) any later version.