https://github.com/italia/publiccode-crawler
publiccode.yml crawler for the Open Source software catalog of Developers Italia
https://github.com/italia/publiccode-crawler
crawler developers-italia hacktoberfest publiccode publiccodeyml
Last synced: 4 months ago
JSON representation
publiccode.yml crawler for the Open Source software catalog of Developers Italia
- Host: GitHub
- URL: https://github.com/italia/publiccode-crawler
- Owner: italia
- License: agpl-3.0
- Created: 2018-03-21T09:27:28.000Z (about 8 years ago)
- Default Branch: main
- Last Pushed: 2024-07-05T09:48:06.000Z (almost 2 years ago)
- Last Synced: 2024-08-03T02:05:11.458Z (almost 2 years ago)
- Topics: crawler, developers-italia, hacktoberfest, publiccode, publiccodeyml
- Language: Go
- Homepage:
- Size: 15.7 MB
- Stars: 28
- Watchers: 13
- Forks: 52
- Open Issues: 16
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Authors: AUTHORS
Awesome Lists containing this project
README
# publiccode.yml crawler for the software catalog of Developers Italia
[](https://goreportcard.com/report/github.com/italia/publiccode-crawler/v4)
[](https://developersitalia.slack.com/messages/CAM3F785T)
[](https://slack.developers.italia.it/)
## Description
Developers Italia provides [a catalog of Free and Open Source](https://developers.italia.it/en/search)
software aimed to Public Administrations.
`publiccode-crawler` retrieves the `publiccode.yml` files from the
repositories of publishers found in the [Developers Italia API](https://github.com/italia/developers-italia-api).
## Setup and deployment processes
`publiccode-crawler` can either run manually on the target machine or it can be deployed
from a Docker container.
### Manually configure and build
1. Rename `config.toml.example` to `config.toml` and set the variables
> **NOTE**: The application also supports environment variables in substitution
> to config.toml file. Remember: "environment variables get higher priority than
> the ones in configuration file"
2. Build the binary with `go build`
### Docker
You can build the Docker image using
```console
docker build .
```
or use the image published to DockerHub:
```console
docker run -it italia/publiccode-crawler
```
## Commands
### `publiccode-crawler crawl`
Gets the list of publishers from `https://api.developers.italia.it/v1/publishers`
and starts to crawl their repositories.
### `publiccode-crawler crawl publishers*.yml`
Gets the list of publishers in `publishers*.yml` and starts to crawl
their repositories.
### `publiccode-crawler crawl-software `
Crawl just the software specified as parameter.
It takes the software URL and its publisher id as parameters.
Ex. `publiccode-crawler crawl-software https://api.developers.italia.it/v1/software/a2ea59b0-87cd-4419-b93f-00bed8a7b859 edb66b3d-3e36-4b69-aba9-b7c4661b3fdd`
### Other commands
* `crawler download-publishers` downloads organizations and repositories from
the [onboarding portal repository](https://github.com/italia/developers-italia-onboarding)
and saves them to a publishers YAML file.
## See also
* [developers-italia-api](https://github.com/italia/developers-italia-api): the API
used to store the results of the crawling
* [publiccode-parser-go](https://github.com/italia/publiccode-parser-go): the Go
package for parsing publiccode.yml files
## Authors
[Developers Italia](https://developers.italia.it) is a project by
[AgID](https://www.agid.gov.it/) and the
[Italian Digital Team](https://teamdigitale.governo.it/), which developed the
crawler and maintains this repository.