https://github.com/clementsicard/un-semun
Repository for SemUN πΊπ³ project
https://github.com/clementsicard/un-semun
full-stack graph-db ner nlp united-nations
Last synced: 3 months ago
JSON representation
Repository for SemUN πΊπ³ project
- Host: GitHub
- URL: https://github.com/clementsicard/un-semun
- Owner: ClementSicard
- Created: 2023-07-23T10:31:33.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-11-02T16:35:52.000Z (over 2 years ago)
- Last Synced: 2023-11-03T17:02:22.913Z (over 2 years ago)
- Topics: full-stack, graph-db, ner, nlp, united-nations
- Language: Makefile
- Homepage:
- Size: 332 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# πΊπ³ SemUN repository
Repository for SemUN project. It is composed of a docker-compose stack, with:
- An API ([`un-semun-api`](un-semun-api))
- A frontend ([`un-semun-front`](un-semun-front))
- An NLP pipeline ([`un-ml-pipeline`](un-semun-db)).
- A Neo4j graph database (`neo4j` service in [`docker-compose.yml`](docker-compose.yml))
- Scripts to populate the database ([`un-semun-misc`](un-semun-misc))
- A scraper for the United Nations Digital Library
[](https://orbstack.dev) [](https://neo4j.com)   [](https://huggingface.co) [](https://spacy.io)
## Table of Contents
- [πΊπ³ SemUN repository](#-semun-repository)
- [Table of Contents](#table-of-contents)
- [Description \& Paper](#description--paper)
- [Running the project](#running-the-project)
- [Install requirements](#install-requirements)
- [Run the project](#run-the-project)
- [Stop the stack](#stop-the-stack)
- [Ingest documents using the ML pipeline API](#ingest-documents-using-the-ml-pipeline-api)
### Description & Paper
- To have more information on the project, please refer to the [project proposal](docs/project-proposal.pdf)
- For more details about the final result, please refer to the [paper](https://github.com/ClementSicard/un-semun-paper/blob/main/paper.pdf)
### Running the project
#### Install requirements
You also need to have Docker installed, I'm using [OrbStack](https://orbstack.dev/) as a Docker desktop client for macOS, but regular Docker installation works perfectly fine as well.
#### Run the project
When Docker is setup, you just have to run:
```bash
# Start the containers
docker-compose up -d
```
Open the frontend at [http://localhost:8080/](http://localhost:8080) if using Docker Desktop or [http://un-semun-frontend.un-semun.orb.local/](http://un-semun-frontend.un-semun.orb.local/) if using OrbStack.
#### Stop the stack
To stop the stack, just run:
```bash
docker-compose down
```
You are all set! π
### Ingest documents using the ML pipeline API
To ingest documents, you can use the ML pipeline API. You can find more information about it in the [`README.md`](https://github.com/ClementSicard/un-ml-pipeline/blob/main/README.md) of the `un-ml-pipeline` folder.
You basically need to send a `POST` request to the `/run` endpoint at URL `http://un-semun-api.un-semun.orb.local` with a JSON body containing the following fields:
```json
[
{"recordId": ""},
{"recordId": ""},
{"recordId": ""},
...
]
```
You can also send a `POST` request to the `/run_search` endpoint, at the same URL, with a natural language query to the UN Digital Library. The API will then scrape the results and ingest them in the database.
```json
{
"q": ""
}
```
You can also include a limit number of results to scrape, by adding a field `"n": ` in the payload.
For instance:
```json
{
"q": "Women in peacekeeping",
"n": 256
}
```