https://github.com/opencitations/ramose

RAMOSE is an application for creating REST APIs on top of SPARQL endpoints
https://github.com/opencitations/ramose

linked-data-api rest-api semantic-web sparql-endpoints

Last synced: 4 months ago
JSON representation

RAMOSE is an application for creating REST APIs on top of SPARQL endpoints

Host: GitHub
URL: https://github.com/opencitations/ramose
Owner: opencitations
License: isc
Created: 2018-05-24T19:46:19.000Z (about 8 years ago)
Default Branch: master
Last Pushed: 2024-11-05T14:42:04.000Z (over 1 year ago)
Last Synced: 2025-03-15T09:23:50.958Z (about 1 year ago)
Topics: linked-data-api, rest-api, semantic-web, sparql-endpoints
Language: Python
Homepage:
Size: 3.74 MB
Stars: 25
Watchers: 7
Forks: 4
Open Issues: 9
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

[![Python package](https://github.com/dbrembilla/ramose/actions/workflows/python-package.yml/badge.svg)](https://github.com/dbrembilla/ramose/actions/workflows/python-package.yml)
[![Coverage](./test/coverage/coverage.svg)](https://github.com/dbrembilla/ramose/actions/workflows/python-package.yml)
# Restful API Manager Over SPARQL Endpoints (RAMOSE)

Restful API Manager Over SPARQL Endpoints (RAMOSE) is an application that allows agile development and publication of documented RESTful APIs for querying SPARQL endpoints, according to a particular specification document.

## TOC

* [Configuration](#Configuration)
* [Requirements](#Requirements)
* [Arguments](#Arguments)
* [Hashformat configuration file](#Hashformat-configuration-file)
* [Addon python files](#Addon-python-files)
* [Run RAMOSE](#Run-RAMOSE)
* [Run locally](#Run-locally)
* [Run with webserver](#Run-with-webserver)
* [RAMOSE APIManager](#RAMOSE-APIManager)
* [Other functionalities and examples](#Other-functionalities-and-examples)
* [Parameters and filters](#Parameters-and-filters)
* [Examples](#Examples)

## Configuration

### Requirements

RAMOSE is compatible to Python 3.7 to 3.10. To install RAMOSE use: `pip install ramose` or `pip3 install ramose`. You can find the documentation [here](https://ramose.readthedocs.io/en/latest/).

### Arguments

RAMOSE application accepts the following arguments:

```
-h, --help show this help message and exit
-s SPEC, --spec SPEC The file in hashformat containing the specification of the API.
-m METHOD, --method METHOD
The method to use to make a request to the API.
-c CALL, --call CALL The URL to call for querying the API.
-f FORMAT, --format FORMAT
The format in which to get the response.
-d, --doc Say to generate the HTML documentation of the API (if it is specified, all the arguments '-m', '-c', and '-f' won't be considered).
-o OUTPUT, --output OUTPUT
A file where to store the response.
-w WEBSERVER, --webserver WEBSERVER
The host:port where to deploy a Flask webserver for testing the API.
-css CSS, --css CSS The path of a .css file for styling the API documentation (to be specified either with '-w' or with '-d' and '-o' arguments).
```

`-s` is a mandatory argument identifying the configuration file of the API (an hashformat specification file, `.hf`).

### Hashformat configuration file

A hashformat file (`.hf`) is a specification file that includes metadata about an API, the operations it allows to perform, descriptions, and instructions to perform operations over a SPARQL endpoint. The `.hf` file is parsed by RAMOSE to perform requested operations and generate the documentation of the API.

The syntax is based on a simplified version of markdown and it includes one or more sections, separated by a empty line.

```
#
#
#

#
...
```

The first section of the specification includes mandatory information about the API, namely:

```
#url _partial URL of the API_
#type api _the type of section_
#base _URL base_
#method
#title
#description
#version
#license
#contacts _in the form [text](url)_
#endpoint
#addon _optional additional python module_
```

The field `#url` includes the partial URL of the API, while the field `#base` includes the URL base that can be shared with other services or APIs.

[N.B. Several APIs may coexist and be handled by RAMOSE, hence the path specified in the field `#url` corresponds to the unique identifier of the API.]

For example:

```
#url /api/v1
#type api
#base https://w3id.org/oc/wikidata
#method post
#title Wikidata REST API
#description A RAMOSE API implementation for Wikidata
#version 0.0.2
#license This document is licensed with a [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/legalcode), while the REST API itself has been created using [RAMOSE](https://github.com/opencitations/ramose), the *Restful API Manager Over SPARQL Endpoints* created by [Silvio Peroni](https://orcid.org/0000-0003-0530-4305), which is licensed with an [ISC license](https://opensource.org/licenses/ISC). All the data returned by this API are made freely available under a [Creative Commons public domain dedication (CC0)](https://creativecommons.org/publicdomain/zero/1.0/).
#contacts [contact@opencitations.net](mailto:contact@opencitations.net)
#endpoint https://query.wikidata.org/sparql
#addon test_addon
```

In the other section(s) of the specification file is detailed the behaviour of the API for each operation allowed. Each operation corresponds to a section.

```
#url {var} _partial URL of operation and variable name_
#type operation _the type of section_
#_optional validator of input variable_ #preprocess _methods for preprocessing defined in addon file_ #postprocess _methods for postprocessing defined in addon file_ #method #description #call #field_type _list of (SPARQL query) variables and their type_ #output_json #sparql _SPARQL query to be performed over the endpoint_ ```

For example:

```
#url /metadata/{dois}
#type operation
#dois str(\"?10\..+[^_\"]((__|\" \")10\..+[^_])*\"?)
#preprocess upper(dois) --> split_dois(dois)
#postprocess distinct()
#method get
#description This operation retrieves the metadata for all the articles identified by the input DOIs.
#call /metadata/10.1108/jd-12-2013-0166__10.1038/nature12373
#field_type str(qid) str(author) datetime(year) str(title) str(source_title) str(source_id) str(volume) str(issue) str(page) str(doi) str(reference) int(citation_count)
#output_json [
{
"source_title": "Journal of Documentation",
"page": "253-277",
...
},
{
"source_title": "Nature",
"page": "54-58",
...
}
]
#sparql PREFIX wdt:
SELECT ?author ?year ?title ?source_title ?volume ?issue ?page ?doi ?reference ?citation_count ?qid {
VALUES ?doi { [[dois]] }
?article wdt:P356 ?doi .

BIND(STRAFTER(str(?article), "http://www.wikidata.org/entity/") as ?qid) .

{
SELECT DISTINCT ?article (GROUP_CONCAT(?cited_doi; separator="; ") as ?reference) {
VALUES ?doi { [[dois]] }
?article wdt:P356 ?doi .
OPTIONAL {
?article wdt:P2860 ?cited .
OPTIONAL {
?cited wdt:P356 ?cited_doi .
}
}
} GROUP BY ?article
}
{
SELECT ?article ?doi (count(?doi) as ?citation_count) {
VALUES ?doi { [[dois]] }
?article wdt:P356 ?doi .
OPTIONAL { ?article ^wdt:P2860 ?other }
} GROUP BY ?article ?doi
}
OPTIONAL { ?article wdt:P1476 ?title }
OPTIONAL {
?article wdt:P577 ?date
BIND(SUBSTR(str(?date), 0, 5) as ?year)
}
OPTIONAL { ?article wdt:P1433/wdt:P1476 ?source_title }
OPTIONAL { ?article wdt:P478 ?volume }
OPTIONAL { ?article wdt:P433 ?issue }
OPTIONAL { ?article wdt:P304 ?page }
{
SELECT ?article ?doi (GROUP_CONCAT(?a; separator="; ") as ?author) {
VALUES ?doi { [[dois]] }

{
SELECT ?article ?doi ?a {
VALUES ?doi { [[dois]] }

?article wdt:P356 ?doi .

OPTIONAL {
?article wdt:P50 ?author_res .
?author_res wdt:P735/wdt:P1705 ?g_name ;
wdt:P734/wdt:P1705 ?f_name .
BIND(CONCAT(?f_name, ", ",?g_name) as ?a)
}
} GROUP BY ?article ?doi ?a ORDER BY DESC(?a)}
} GROUP BY ?article ?doi
}
} LIMIT 1000
```

### Addon python files

Additional python modules can be added for preprocessing variables in the API URL call, and for postprocessing responses. In the specification file, addons are specified in the `#addon` field by recording the name of the python file.

**Preprocessing**

RAMOSE preprocesses the URL of the API call according to the functions specified in the `#preprocess` field (e.g. `"#preprocess lower(doi)"`), which is applied to the specified parameters of the URL specified as input of the function in consideration (e.g. "/api/v1/citations/10.1108/jd-12-2013-0166", converting the DOI in lowercase).

It is possible to run multiple functions sequentially by concatenating them with `-->` in the API specification document. In this case the output of the function `f_i` will becomes the input operation URL of the function `f_i+1`.

Finally, it is worth mentioning that all the functions specified in the `#preprocess` field must return a tuple of strings defining how the particular value indicated by the URL parameter must be changed.

**Postprocessing**

RAMOSE takes the result table returned by the SPARQL query performed against the triplestore (as specified in an API operation as input) and change some of such results according to the functions specified in the `#postprocess` field (e.g. `"#postprocess remove_date("2018")"`).

These functions can take parameters as input, while the first unspecified parameters will be always the result table. It is worth mentioning that this result table (i.e. a list of tuples) actually contains, in each cell, a tuple defining the plain value as well as the typed value for enabling better comparisons and operations if needed. An example of this table of result is shown as follows:

```
[
("id", "date"),
("my_id_1", "my_id_1"), (datetime(2018, 3, 2), "2018-03-02"),
...
]
```

In addition, it is possible to run multiple functions sequentially by concatenating them with `"-->"` in the API specification document. In this case the output of the function `f_i` will becomes the input result table of the function `f_i+1`.

The postprocess function should output a tuple containing the result and whether the function needs to return the type of values in the result.

## Run RAMOSE

### Run locally

RAMOSE can be run via CLI by specifying configuration file and URL of the desired operation (including parameters). For example, run in the root directory:

```
python -m ramose -s .hf -c '?'
```

Results are streamed in the shell in the following format:

```
# Response HTTP code:
# Body:
# Content-type:
```

**Output formats.** RAMOSE returns responses in two formats, namely: `text/csv` and `application/json`. Formats can be specified as values of the argument `-f` or, alternatively, as parameters of the call. For example:

```
python -m ramose -f -s .hf -c '|?'

python -m ramose -s .hf -c '|?format='
```

If no format is specified, a JSON response is returned.

**Ouput.** To store responses in a local file, use the argument `-o` to specify the output file:

```
python -m ramose -s .hf -c '?' -o '.'
```

**API Documentation.** To produce an HTML document including the automatically generated documentation of the API, use the arguments `-d` and `-o` to specify the output file:

```
python -m ramose -s .hf -d -o .html
```

### Run with webserver

Additionally, a Flask webserver is available for testing and debugging purposes by specifying as value of the argument `-w` the desired `:`. For example, to run your API in localhost:

```
python -m ramose -s .hf -w 127.0.0.1:8080
```

The web application includes:

* a basic dashboard for tracking API calls (available at `:/`)
* the documentation of the API (available at `:/`)

The local API can be tested via browser or via curl:

```
curl -X GET --header "Accept: " "http://:/?"
```

**Custom CSS** Both when running via CLI and with webserver, the path to a custom .css file can be specified in the `-css` argument to style dashboard and documentation pages.

```
python -m ramose -s .hf -w 127.0.0.1:8080 -css
```

## RAMOSE `APIManager`

RAMOSE allows developers to handle several APIs by instantiating the main class `APIManager` and initialising it with a specification file.

The method `get_op(op_complete_url)` takes in input the url of the call (i.e. the API base URL plus the operation URL) and returns an object of type `Operation`. The instance of an `Operation` can be used to run the method `exec(method="get", content_type="application/json")`, which takes in input the url the HTTP method to use for the call and and the content type to return, and executes the operation as indicated in the specification file, by running (in the following order):

1. the methods to preprocess the query (as defined in the specification file at `#{var}` and `#preprocess`);
2. the SPARQL query related to the operation called, by using the parameters indicated in the URL (`#sparql`);
3. the specification of all the types of the various rows returned (`#field_type`);
4. the methods to postprocess the result (`#postprocess`);
5. the application of the filter to remove, filter, sort the result (parameters);
6. the removal of the types added at the step 3, so as to have a data structure ready to be returned;
7. the conversion in the format requested by the user (`content_type`).

For example:

```
api_manager = APIManager([ "1_v1.hf", "2_v1.hf" ])

api_base_1 = "..."
api_base_2 = "..."
operation_url_1 = "..."
operation_url_2 = "..."
request = "..."
call_1 = "%s/%s/%s" % (api_base_1, operation_url_1, request)
call_2 = "%s/%s/%s" % (api_base_2, operation_url_2, request)

op1 = api_manager.get_op(call_1)
status1, result1, result_format1 = op1.exec()

op2 = api_manager.get_op(call_2)
status2, result2, result_format2 = op2.exec()
```

## Other functionalities and examples

### Parameters and filters

Parameters can be used to filter and control the results returned by the API. They are passed as normal HTTP parameters in the URL of the call. They are:

* `require=`: all the rows that have an empty value in the `` specified are removed from the result set - e.g. `require=given_name` removes all the rows that do not have any string specified in the `given_name` field.

* `filter=:`: only the rows compliant with are kept in the result set. The parameter `` is not mandatory. If `` is not specified, `` is interpreted as a regular expression, otherwise it is compared by means of the specified operation. Possible operators are "=", "<", and ">". For instance, `filter=title:semantics?` returns all the rows that contain the string "semantic" or "semantics" in the field title, while `filter=date:>2016-05` returns all the rows that have a date greater than May 2016.

* `sort=()`: sort in ascending (`` set to `"asc"`) or `descending` (`` set to `"desc"`) order the rows in the result set according to the values in ``. For instance, `sort=desc(date)` sorts all the rows according to the value specified in the field date in descending order.

* `format=`: the final table is returned in the format specified in `` that can be either `"csv"` or `"json"` - e.g. `format=csv` returns the final table in CSV format. This parameter has higher priority of the type specified through the "Accept" header of the request. Thus, if the header of a request to the API specifies `Accept: text/csv` and the URL of such request includes `format=json`, the final table is returned in JSON.

* `json=("",,,,...)`: in case a JSON format is requested in return, transform each row of the final JSON table according to the rule specified. If `` is set to `"array"`, the string value associated to the field name `` is converted into an array by splitting the various textual parts by means of ``. For instance, considering the JSON table `[ { "names": "Doe, John; Doe, Jane" }, ... ]`, the execution of `array("; ",names)` returns `[ { "names": [ "Doe, John", "Doe, Jane" ], ... ]`. Instead, if ` is set to `"dict"`, the string value associated to the field name is converted into a dictionary by splitting the various textual parts by means of and by associating the new fields ``, ``, etc., to these new parts. For instance, considering the JSON table `[ { "name": "Doe, John" }, ... ]`, the execution of `dict(", ",name,fname,gname)` returns `[ { "name": { "fname": "Doe", "gname": "John" }, ... ]`.

It is possible to specify one or more filtering operation of the same kind (e.g. `require=given_name&require=family_name`). In addition, these filtering operations are applied in the order presented above - first all the `require` operation, then all the `filter` operations followed by all the `sort` operation, and finally the `format` and the `json` operation (if applicable). It is worth mentioning that each of the aforementioned rules is applied in order, and it works on the structure returned after the execution of the previous rule.

Example:

```
?require=doi&filter=date:>2015&sort=desc(date).
```

### Examples

#### Query wikidata endpoint from CLI

Use the following files to test the application.

* `test/ramose.py`
* `test/test.hf`
* `test/test_addon.hf`

**Q1** Retrieve bibliographic metadata related to the work identified by the doi `10.1080/14756366.2019.1680659`:

```
python -m ramose -s test.hf -c '/api/v1/metadata/10.1107/S0567740872003322'
```

Returns:

```
# Response HTTP code: 200
# Body:
[
{
"author": "",
"year": "1972",
"title": "The crystal structure of tin(II) sulphate",
"source_title": "Acta crystallographica. Section B",
"volume": "28",
"issue": "3",
"page": "864-867",
"doi": "10.1107/S0567740872003322",
"reference": "",
"citation_count": "1",
"qid": "Q29013687"
}
]
# Content-type: application/json
```

**Q2** Retrieve bibliographic metadata of a list of works identified by their dois -- separated by `__` as specified in the field `#dois` of `test.hf` -- , and return data in `csv` format.

```
python -m ramose -s test.hf -c '/api/v1/metadata/10.1107/S0567740872003322__10.1007/BF02020444?format=csv'
```

Returns:

```
# Response HTTP code: 200
# Body:
author,year,title,source_title,volume,issue,page,doi,reference,citation_count,qid
,1972,The crystal structure of tin(II) sulphate,Acta crystallographica. Section B,28,3,864-867,10.1107/S0567740872003322,,1,Q29013687
"Erdős, Paul; Hajnal, András",1966,On chromatic number of graphs and set-systems,Acta Mathematica Hungarica,17,1-2,61-99,10.1007/BF02020444,10.4153/CJM-1959-003-9,1,Q57259020
# Content-type: text/csv
```

**Q3** Perform **Q2** and sort results by year in ascending order:

```
python -m ramose -s test.hf -c '/api/v1/metadata/10.1107/S0567740872003322__10.1007/BF02020444?format=csv&sort=asc(year)'
```

**Q4** Perform **Q3** but return in JSON format, and split authors' names by the separator `; `

```
python -m ramose -s test.hf -c '/api/v1/metadata/10.1107/S0567740872003322__10.1007/BF02020444?format=json&sort=asc(year)&json=array("; ", author)'
```

Returns

```
# Response HTTP code: 200
# Body:
[
{
"author": [
"Erdős, Paul",
"Hajnal, András"
],
"year": "1966",
"title": "On chromatic number of graphs and set-systems",
"source_title": "Acta Mathematica Hungarica",
"volume": "17",
"issue": "1-2",
"page": "61-99",
"doi": "10.1007/BF02020444",
"reference": "10.4153/CJM-1959-003-9",
"citation_count": "1",
"qid": "Q57259020"
},
{
"author": [],
"year": "1972",
"title": "The crystal structure of tin(II) sulphate",
"source_title": "Acta crystallographica. Section B",
"volume": "28",
"issue": "3",
"page": "864-867",
"doi": "10.1107/S0567740872003322",
"reference": "",
"citation_count": "1",
"qid": "Q29013687"
}
]
# Content-type: application/json
```

#### Query wikidata endpoint from webserver

Perform **Q2** from the local webserver

```
python -m ramose -s .hf -w 127.0.0.1:8080
curl -X GET --header "Accept: text/csv" "http://localhost:8080/api/v1/metadata/10.1107/S0567740872003322__10.1007/BF02020444?format=csv"
```

The same query can be directly performed on the browser at `http://localhost:8080/api/v1/metadata/10.1107/S0567740872003322__10.1007/BF02020444?format=csv`

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/opencitations/ramose

Awesome Lists containing this project

README