Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hscells/groove
Query analysis pipeline framework
https://github.com/hscells/groove
analysis boolean-query elaticsearch framework medline pipeline pubmed qpp
Last synced: about 2 months ago
JSON representation
Query analysis pipeline framework
- Host: GitHub
- URL: https://github.com/hscells/groove
- Owner: hscells
- License: mit
- Created: 2017-11-06T05:58:30.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2022-02-02T07:10:18.000Z (almost 3 years ago)
- Last Synced: 2024-06-19T01:53:11.599Z (7 months ago)
- Topics: analysis, boolean-query, elaticsearch, framework, medline, pipeline, pubmed, qpp
- Language: Go
- Homepage: https://godoc.org/github.com/hscells/groove
- Size: 9.61 MB
- Stars: 9
- Watchers: 3
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# groove
[![GoDoc](https://godoc.org/github.com/hscells/groove?status.svg)](https://godoc.org/github.com/hscells/groove)
[![Go Report Card](https://goreportcard.com/badge/github.com/hscells/groove)](https://goreportcard.com/report/github.com/hscells/groove)
[![gocover](http://gocover.io/_badge/github.com/hscells/groove)](https://gocover.io/github.com/hscells/groove)_Query analysis pipeline framework_
groove is a library for pipeline construction for query analysis. The groove pipeline comprises a query source (the
format of the queries), a statistic source (a source for computing information retrieval statistics), preprocessing
steps, any measurements to make, and any output formats.The groove library is primarily used in [boogie](https://github.com/hscells/boogie) which is a front-end DSL for groove.
If using groove as a Go library, refer to the simple example below which loads Medline queries and analyses them using
Elasticsearch and finally outputs the result into a JSON file.## API Usage
In the below example, we would like to use Elasticsearch to measure some query performance predictors on some Medline
queries. For the experiment, we would like to pre-process the queries by making each one only contain alpha-numeric
characters, and in lowercase. Finally, we would like to output the results of the measures into a JSON file.```go
// Construct the pipeline.
pipelineChannel := make(chan groove.Result)
p := pipeline.NewGroovePipeline(
query.NewTransmuteQuerySource(query.MedlineTransmutePipeline),
stats.NewElasticsearchStatisticsSource(stats.ElasticsearchHosts("http://localhost:9200"),
stats.ElasticsearchIndex("medline"),
stats.ElasticsearchField("abstract"),
stats.ElasticsearchScroll(true),
stats.ElasticsearchSearchOptions(stats.SearchOptions{
Size: 10000,
RunName: "qpp",
})),
pipeline.Measurement(preqpp.AvgICTF, preqpp.SumIDF, preqpp.AvgIDF, preqpp.MaxIDF, preqpp.StdDevIDF, postqpp.ClarityScore),
pipeline.Evaluation(eval.PrecisionEvaluator, eval.RecallEvaluator),
pipeline.MeasurementOutput(output.JsonMeasurementFormatter),
pipeline.EvaluationOutput("medline.qrels", output.JsonEvaluationFormatter),
pipeline.TrecOutput("medline_qpp.results"))// Execute it on a directory of queries. A pipeline executes queries in parallel.
go p.Execute("./medline", pipelineChannel)for {
// Continue until completed.
result := <-pipelineChannel
if result.Type == groove.Done {
break
}
switch result.Type {
case groove.Measurement:
// Process the measurement outputs.
err := ioutil.WriteFile("medline_qpp.json", bytes.NewBufferString(result.Measurements[0]).Bytes(), 0644)
if err != nil {
log.Fatal(err)
}
case groove.Evaluation:
// Process the evaluation outputs.
err := ioutil.WriteFile("medline_qpp_eval.json", bytes.NewBufferString(result.Evaluations[0]).Bytes(), 0644)
if err != nil {
log.Fatal(err)
}
}
}
```## Citing
If you use this work for scientific publication, please reference
```
@inproceedings{scells2018framework,
author = {Scells, Harrisen and Locke, Daniel and Zuccon, Guido},
title = {An Information Retrieval Experiment Framework for Domain Specific Applications},
booktitle = {The 41st International ACM SIGIR Conference on Research \&\#38; Development in Information Retrieval},
series = {SIGIR '18},
year = {2018},
}
```## Logo
The Go gopher was created by [Renee French](https://reneefrench.blogspot.com/), licensed under
[Creative Commons 3.0 Attributions license](https://creativecommons.org/licenses/by/3.0/).