An open API service indexing awesome lists of open source software.

https://github.com/meedan/alegre

A text and media analysis service for Meedan Check, a collaborative media annotation platform
https://github.com/meedan/alegre

hacktoberfest image-classification language-detection natural-language-processing similarity-search translation-memory

Last synced: 5 months ago
JSON representation

A text and media analysis service for Meedan Check, a collaborative media annotation platform

Awesome Lists containing this project

README

          

alegre
------

A media similarity analysis service. Part of the [Check platform](https://meedan.com/check). Refer to the [main repository](https://github.com/meedan/check) for quick start instructions.

There is also an [overview of the similairty infrastructure](doc/meedan_similarity_infra_overview.md) and more [detailed explanation of the process for each media type](doc/similarity-media-type-detail.md).

## Development

- Update your [virtual memory settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html), e.g. by setting `vm.max_map_count=262144` in `/etc/sysctl.conf`. This can also be done by the Docker UI, adjusting Resource settings to 12GB memory and 128GB of disk.
- Ensure that the services needed are uncommented in the `docker-compose.yml` file. Specifically, to run the default tests the `xlm_r_bert_base_nli_stsb_mean_tokens`, `indian_sbert`, `video` and `audio` definitions are needed.
- `docker-compose build`
- `docker-compose up --abort-on-container-exit`
- Open http://localhost:3100 for the Alegre API

The Alegre API Swagger UI unfortunately [does not support sending body payloads to GET methods](https://github.com/swagger-api/swagger-ui/issues/2136). To test those API methods, you can still fill in your arguments, and click "Execute" - Swagger will fail, but show you a `curl` command that you can use in your console.

- Open http://localhost:5601 for the Kibana UI
- Open http://localhost:9200 for the Elasticsearch API
- `docker-compose exec alegre flask shell` to get inside a Python shell in docker container with the loaded app

## Testing
- For the full set of tests to pass, some configuration secrets are required (i.e. Google Translate API keys, etc)
- `docker-compose -f docker-compose.yml -f docker-test.yml up --abort-on-container-exit`
- Wait for the logs to settle, then in a different console:
- `docker-compose exec alegre make test`
- `docker-compose exec alegre coverage report`

To test individual modules:
- `docker-compose exec alegre bash` (opens a bash shell with appropriate environment in the docker container)
- `python manage.py test -p test_similarity.py`

## Troubleshooting

- If you're having trouble starting Elasticsearch on macOS, with the error `container_name exited with code 137`, you will need to adjust your Docker settings, as per https://www.petefreitag.com/item/848.cfm
- Note that the alegre docker service definitions in the `alegre` repo may not align with the alegre service definitions in the `check` repository, so different variations of the service may be spun up depending on the directory where `docker-compose up` is executed.

## Diagrams

NOTE: these diagrams need to be updated with the new endpoints from Presto migration

### Similarity-Related HTTP requests Alegre receives from Check API

![Similarity-Related HTTP requests Alegre receives from Check API](doc/elasticsearch_detail.png?raw=true "Similarity-Related HTTP requests Alegre receives from Check API")

(Source: https://docs.google.com/drawings/d/1-teqtZJfU4MSDUGVwWL9F4cXDKDnVObDYg3a9jJOP1Y/edit)
### Text Queries generated by Similarity Requests from Check API within Alegre

![Text Queries generated by Similarity Requests from Check API within Alegre](doc/alegre_parameter_breakdown.png?raw=true "Text Queries generated by Similarity Requests from Check API within Alegre")

(Source: https://docs.google.com/drawings/d/1jvwn5wM6T2jlnaS_fS7_u6sH02HVHi6L8Q9H_vD4SuY/edit)