Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/azubieta/appimages.scraper
Search for AppImage releases over the web.
https://github.com/azubieta/appimages.scraper
Last synced: 2 months ago
JSON representation
Search for AppImage releases over the web.
- Host: GitHub
- URL: https://github.com/azubieta/appimages.scraper
- Owner: azubieta
- License: gpl-3.0
- Created: 2018-05-02T17:51:38.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2018-10-25T10:42:12.000Z (about 6 years ago)
- Last Synced: 2024-11-01T03:32:24.410Z (2 months ago)
- Language: Python
- Size: 23.2 MB
- Stars: 11
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-appimage - appimages.scraper - Search for AppImage releases over the web. (AppImage discovery / App scrapers)
README
# appimages.scraper
Search for AppImage releases over the web.### Dependencies
* Python 3.6
* Scrapy### Run
* Normal run:`scrapy crawl generic.crawler -a project_file=./projects/org.appimage.appimaged.json`
* Output results to json:
`scrapy crawl appimage.github.io -o result.json -t json`### Input
The scraper should be feed with a `project_file` which will be a json formatted file like the following:```
{
"urls" : ["https://github.com/AppImage/AppImageKit/releases"]
}
```**Missing fields?**
Sometimes authors doesnt provide good metadata about their project so we could help them by means of preset values.
Take a look in the following example at the `presets` field and to the `decription` field inside. It will be use
as a fallback value in case that the author forgets to fill that field.```
{
"urls" : ["https://github.com/AppImage/AppImageKit/releases"]
"presets": {
"id" : "org.appimage.appimaged",
"description" : {"null": "Daemon to monitor AppImage files in the user home dir."}
}
}
```**Multiple applications release in a single page ?**
No problem use the match field. It expects to be a python regex
that will be used to match the right AppImage download links for the app you are scraping.```
{
"urls" : ["https://github.com/AppImage/AppImageKit/releases"],
"match" : ".*\/appimagetool.*",
"presets": {
"id" : "org.appimagekit.appimaged",
"description" : {"null": "Daemon to monitor AppImage files in the user home dir."}
}
}
```