https://github.com/owainlewis/falkor

Open Source web scraping API. Falkor turns web pages into queryable JSON
https://github.com/owainlewis/falkor

webscraping webscrapper

Last synced: about 1 year ago
JSON representation

Open Source web scraping API. Falkor turns web pages into queryable JSON

Host: GitHub
URL: https://github.com/owainlewis/falkor
Owner: owainlewis
License: epl-1.0
Created: 2015-06-13T18:27:42.000Z (almost 11 years ago)
Default Branch: master
Last Pushed: 2016-02-12T20:40:43.000Z (over 10 years ago)
Last Synced: 2025-03-27T23:33:05.774Z (about 1 year ago)
Topics: webscraping, webscrapper
Language: Clojure
Homepage:
Size: 21.5 KB
Stars: 188
Watchers: 11
Forks: 7
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Falkor

A web service for turning HTML pages into traversable JSON documents

Very early stage development. If you have any feature requests just create an issue on the project

## Getting started

Running the server locally

```
lein uberjar
docker build -t falkor .
docker run -t falkor

# Visit http://localhost:5000
```

## Comming soon

+ Better error handling
+ CORS
+ Query filtering (return only certain attributes)
+ Fetching multiple elements in a single request ( e.g [h1 > a, .subtitle] )

## Usage

Get all the title links from the Reddit.com home page

https://falkor-api.herokuapp.com/api/query?url=http://reddit.com&query=a.title

Grab all the news stories from Digg.com

https://falkor-api.herokuapp.com/api/query?url=http://digg.com&query=.story-title%20a

Extract all the images from Digg.com

https://falkor-api.herokuapp.com/api/query?url=http://digg.com&query=img[src]

## TODO

Filters to remove some of the attribute cruft

For example if we just want to extract the text for an element and ignore the other attributes

```
&filter=[text]
```

## License

Distributed under the Eclipse Public License either version 1.0 or (at
your option) any later version.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/owainlewis/falkor

Awesome Lists containing this project

README