Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/anapsid/anapsid

An adaptive query processing engine for SPARQL endpoints.
https://github.com/anapsid/anapsid

python query query-engine sparql sparql-endpoints

Last synced: about 2 months ago
JSON representation

An adaptive query processing engine for SPARQL endpoints.

Host: GitHub
URL: https://github.com/anapsid/anapsid
Owner: anapsid
Created: 2013-06-19T13:36:08.000Z (about 11 years ago)
Default Branch: master
Last Pushed: 2017-09-13T12:30:48.000Z (almost 7 years ago)
Last Synced: 2024-03-27T07:15:20.052Z (3 months ago)
Topics: python, query, query-engine, sparql, sparql-endpoints
Language: Python
Homepage: https://anapsid.github.io/anapsid/
Size: 814 KB
Stars: 18
Watchers: 9
Forks: 6
Open Issues: 2
Metadata Files:
- Readme: README.md

Lists

awesome-semantic-web - anapsid - An adaptive query processing engine for SPARQL endpoints. (SPARQL / Federated SPARQL)
repo-5491-awesome-semantic-web - anapsid - An adaptive query processing engine for SPARQL endpoints. (SPARQL / Federated SPARQL)
awesome-semantic-web - anapsid - An adaptive query processing engine for SPARQL endpoints. (SPARQL / Federated SPARQL)
awesome-semantic-web - anapsid - An adaptive query processing engine for SPARQL endpoints. (SPARQL / Federated SPARQL)
awesome-semantic-web - anapsid - An adaptive query processing engine for SPARQL endpoints. (SPARQL / Federated SPARQL)

README

        ANAPSID

=======

An adaptive query processing engine for SPARQL endpoints.

[1] Maribel Acosta, Maria-Esther Vidal, Tomas Lampo, Julio Castillo,

Edna Ruckhaus: ANAPSID: An Adaptive Query Processing Engine for SPARQL

Endpoints. International Semantic Web Conference (1) 2011: 18-34

[2] Gabriela Montoya, Maria-Esther Vidal, Maribel Acosta: A

Heuristic-Based Approach for Planning Federated SPARQL Queries. COLD

2012

Installing ANAPSID

==================

ANAPSID is known to run on Debian GNU/Linux and OS X. These instructions were tested 

on the latest Debian Stable and OS X. The recommended way to

execute ANAPSID is to use Python 2.7. 

1. Download ANAPSID.

   You can do this by cloning this repository using Git.

   `$ git clone https://github.com/anapsid/anapsid.git ~/anapsid`

   

   OR

   You can download the latest release from Github [here](https://github.com/anapsid/anapsid/releases) 

2. Go to your local copy of ANAPSID and run:

   `$ pip install -r requirements.txt`

   This will install ANAPSID's Python dependencies. Right now, the only library required to execute ANAPSID is ply 3.3 (https://pypi.python.org/pypi/ply/3.3)

3. When step 2 is done you can now install ANAPSID. This will install

   it only to your current user caged VirtualEnv as to prevent

   polluting Python's global site-packages.

   `$ python setup.py install`

4. Go ahead and move to the next section on configuring ANAPSID.

Setting up ANAPSID

==================

Running ANAPSID depends on a endpoint description file. This file

describes each endpoint URL and the predicates this endpoint

handles. ANAPSID comes bundled with a helper script to generate your

endpoints descriptions as to prevent errors.

1. Create a file, e.g. endpointsURLs.txt, with the URLs of your

   endpoints, one per line.

2. Run the script. It will contact each endpoint and retrieve their

   predicates, so it might take a while. This will save your endpoint

   descriptions on endpointsDescriptions.txt

   `$ get_predicates endpointsURLs.txt endpointsDescriptions.txt`

3. You are ready to run ANAPSID.

About supported endpoints

------------------------

ANAPSID currently supports endpoints that answer queries either on XML

or JSON. Expect hard failures if you intend to use ANAPSID on

endpoints that answer in any other format.

Running ANAPSID

===============

Once you have installed ANAPSID and retrieved endpoint descriptions,

you can run ANAPSID using our run_anapsid script.

`$ run_anapsid`

It will output a usage text and the options switches you can

select. We run our experiments, however, using the scripts bundled on

utils/ so you might want to check that out to get an idea.

ANAPSID Parameters

------------------

Alternatively,  you can execute the following command to run a given query with ANAPSID:

```

$python $ANAPSIDROOT/run_anapsid -e $ENDPOINTS -q $query -p  -s False 

-o False -d  -a True -w False [-k ]  [-V ] -r False

```

Where:

`$ANAPSIDROOT`: directory where ANAPSID is stored.

`$ENDPOINTS`: path and name of the file where the description of the endpoints is stored.

`$query`: path and name of the filw where the query is stored.

``: can be **b** if the plan is bushy, **ll** is the plan is left linear, and **naive** for naive binary tree plan. 

`-o`: can be True or False. **True** indicates that the input query is in SPARQL1-1 and no decomposition is needed; **False**, otherwise.

`-d`: indicates the type of Decomposition.  can be **SSGM** (Star Shaped

Group Multiple Endpoints), **SSGS** (Star Shaped Group Single Endpoint), **EG** (Exclusive Groups), Recommended SSGM.

`-a`: indicates if the adaptive operators will be used. Recommended value **True**.

`-w`:  can be True or False. Indicates if the cardinality of the queries will be estimated by contacting the sources (**True**) or by using a cost model (**False**). Turning True this feature may affect execution time.

`-w`: can be y or c. The value **y** indicates that the plan will be produced, while **c** asks that decomposition. This parameter is optional, and should be set up only if the plan of the query wants to be produced. 

`-r`: can be True or False. Use **True** if the answer of the query will be output and **False** if only a summary of the execution will be produced. 

`-V`: can be True or False. **True** indicates if the endpoints to contact are Virtuoso, **False** is of any other type, e.g., OWLIM.

Included query decomposing heuristics

=====================================

We include three heuristics used for decomposing queries to be

evaluated by a federation of endpoints. These are:

1. Exclusive Groups (EG).

2. Star-Shaped Group Single endpoint selection (SSGS). See [2].

3. Star-Shaped Group Multiple endpoint selection (SSGM). See [2].

 

Running FedBench with ANAPSID

=============================

FedBench (see http://fedbench.fluidops.net) is a benchmark for testing federated query processing on RDF data sets.

In order to execute ANAPSID, it is necessary first to provide the endpoint descriptions. Endpoint descriptions are of the form ` `. The file endpoints/endpointsFedBench provides 

the description of the endpoints of the dataset collections in FedBench. The current URLs of the endpoints

have to be included as follows:

```

  URL of the NYTime endpoint

  URL of the Chebi endpoint

  URL of the SW Dog Food endpoint

  URL of the Drugbank endpoint

  URL of the Jamendo endpoint

  URL of the Kegg endpoint

  URL of the LinkedMDB endpoint

  URL of the SP^2Bench 10M endpoint

  URL of the Geonames endpoint

```

The FedBench queries (see http://fedbench.fluidops.net/resource/Queries) are also available in the folder queries/fedbBench.  

About and Contact

=================

ANAPSID was developed at

[Universidad Simón Bolívar](http://www.usb.ve) as an ongoing academic effort. You

can contact the current maintainers by email at mvidal[at]ldc[dot]usb[dot]ve.

We strongly encourage you to please report any issues you have with

ANAPSID. You can do that over our contact email or creating a new

issue here on Github.

- Simón Castillo: scastillo [at] ldc [dot] usb [dot] ve

- Guillermo Palma: gpalma [at] ldc [dot] usb [dot] ve

- Maria-Esther Vidal: mvidal [at] ldc [dot] usb [dot] ve

- Gabriela Montoya: Gabriela [dot] Montoya [at] univ-nantes [dot] fr

- Maribel Acosta: maribel [dot] acosta [at] kit [dot] edu

License

=======

This work is licensed under [GNU/GPL v2](https://www.gnu.org/licenses/gpl-2.0.html).