Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/azubieta/appimages.scraper

Search for AppImage releases over the web.
https://github.com/azubieta/appimages.scraper

Last synced: 2 months ago
JSON representation

Search for AppImage releases over the web.

Awesome Lists containing this project

README

        

# appimages.scraper
Search for AppImage releases over the web.

### Dependencies
* Python 3.6
* Scrapy

### Run
* Normal run:`scrapy crawl generic.crawler -a project_file=./projects/org.appimage.appimaged.json`
* Output results to json:
`scrapy crawl appimage.github.io -o result.json -t json`

### Input
The scraper should be feed with a `project_file` which will be a json formatted file like the following:

```
{
"urls" : ["https://github.com/AppImage/AppImageKit/releases"]
}
```

**Missing fields?**

Sometimes authors doesnt provide good metadata about their project so we could help them by means of preset values.
Take a look in the following example at the `presets` field and to the `decription` field inside. It will be use
as a fallback value in case that the author forgets to fill that field.

```
{
"urls" : ["https://github.com/AppImage/AppImageKit/releases"]
"presets": {
"id" : "org.appimage.appimaged",
"description" : {"null": "Daemon to monitor AppImage files in the user home dir."}
}
}
```

**Multiple applications release in a single page ?**

No problem use the match field. It expects to be a python regex
that will be used to match the right AppImage download links for the app you are scraping.

```
{
"urls" : ["https://github.com/AppImage/AppImageKit/releases"],
"match" : ".*\/appimagetool.*",
"presets": {
"id" : "org.appimagekit.appimaged",
"description" : {"null": "Daemon to monitor AppImage files in the user home dir."}
}
}
```