{"id":18723801,"url":"https://github.com/hscells/groove","last_synced_at":"2025-04-12T15:20:23.471Z","repository":{"id":62691365,"uuid":"109653795","full_name":"hscells/groove","owner":"hscells","description":"Query analysis pipeline framework","archived":false,"fork":false,"pushed_at":"2022-02-02T07:10:18.000Z","size":10073,"stargazers_count":10,"open_issues_count":0,"forks_count":5,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-26T09:51:09.690Z","etag":null,"topics":["analysis","boolean-query","elaticsearch","framework","medline","pipeline","pubmed","qpp"],"latest_commit_sha":null,"homepage":"https://godoc.org/github.com/hscells/groove","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hscells.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-11-06T05:58:30.000Z","updated_at":"2024-11-13T12:17:27.000Z","dependencies_parsed_at":"2022-11-04T13:50:44.505Z","dependency_job_id":null,"html_url":"https://github.com/hscells/groove","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hscells%2Fgroove","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hscells%2Fgroove/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hscells%2Fgroove/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hscells%2Fgroove/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hscells","download_url":"https://codeload.github.com/hscells/groove/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248586218,"owners_count":21128998,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analysis","boolean-query","elaticsearch","framework","medline","pipeline","pubmed","qpp"],"created_at":"2024-11-07T13:51:38.819Z","updated_at":"2025-04-12T15:20:23.440Z","avatar_url":"https://github.com/hscells.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg height=\"200px\" src=\"gopher.png\" alt=\"gopher\" align=\"right\"/\u003e\n\n# groove\n\n[![GoDoc](https://godoc.org/github.com/hscells/groove?status.svg)](https://godoc.org/github.com/hscells/groove)\n[![Go Report Card](https://goreportcard.com/badge/github.com/hscells/groove)](https://goreportcard.com/report/github.com/hscells/groove)\n[![gocover](http://gocover.io/_badge/github.com/hscells/groove)](https://gocover.io/github.com/hscells/groove)\n\n_Query analysis pipeline framework_\n\ngroove is a library for pipeline construction for query analysis. The groove pipeline comprises a query source (the\nformat of the queries), a statistic source (a source for computing information retrieval statistics), preprocessing\nsteps, any measurements to make, and any output formats.\n\nThe groove library is primarily used in [boogie](https://github.com/hscells/boogie) which is a front-end DSL for groove.\nIf using groove as a Go library, refer to the simple example below which loads Medline queries and analyses them using\nElasticsearch and finally outputs the result into a JSON file.\n\n## API Usage\n\nIn the below example, we would like to use Elasticsearch to measure some query performance predictors on some Medline\nqueries. For the experiment, we would like to pre-process the queries by making each one only contain alpha-numeric\ncharacters, and in lowercase. Finally, we would like to output the results of the measures into a JSON file.\n\n```go\n// Construct the pipeline.\npipelineChannel := make(chan groove.Result)\np := pipeline.NewGroovePipeline(\n\tquery.NewTransmuteQuerySource(query.MedlineTransmutePipeline),\n\tstats.NewElasticsearchStatisticsSource(stats.ElasticsearchHosts(\"http://localhost:9200\"),\n\t\tstats.ElasticsearchIndex(\"medline\"),\n\t\tstats.ElasticsearchField(\"abstract\"),\n\t\tstats.ElasticsearchScroll(true),\n\t\tstats.ElasticsearchSearchOptions(stats.SearchOptions{\n\t\t\tSize:    10000,\n\t\t\tRunName: \"qpp\",\n\t\t})),\n\tpipeline.Measurement(preqpp.AvgICTF, preqpp.SumIDF, preqpp.AvgIDF, preqpp.MaxIDF, preqpp.StdDevIDF, postqpp.ClarityScore),\n\tpipeline.Evaluation(eval.PrecisionEvaluator, eval.RecallEvaluator),\n\tpipeline.MeasurementOutput(output.JsonMeasurementFormatter),\n\tpipeline.EvaluationOutput(\"medline.qrels\", output.JsonEvaluationFormatter),\n\tpipeline.TrecOutput(\"medline_qpp.results\"))\n\n// Execute it on a directory of queries. A pipeline executes queries in parallel.\ngo p.Execute(\"./medline\", pipelineChannel)\n\nfor {\n\t// Continue until completed.\n\tresult := \u003c-pipelineChannel\n\tif result.Type == groove.Done {\n\t\tbreak\n\t}\n\tswitch result.Type {\n\tcase groove.Measurement:\n\t\t// Process the measurement outputs.\n\t\terr := ioutil.WriteFile(\"medline_qpp.json\", bytes.NewBufferString(result.Measurements[0]).Bytes(), 0644)\n\t\tif err != nil {\n\t\t\tlog.Fatal(err)\n\t\t}\n\tcase groove.Evaluation:\n\t\t// Process the evaluation outputs.\n\t\terr := ioutil.WriteFile(\"medline_qpp_eval.json\", bytes.NewBufferString(result.Evaluations[0]).Bytes(), 0644)\n\t\tif err != nil {\n\t\t\tlog.Fatal(err)\n\t\t}\n\t}\n}\n```\n\n## Citing\n\nIf you use this work for scientific publication, please reference\n\n```\n@inproceedings{scells2018framework,\n author = {Scells, Harrisen and Locke, Daniel and Zuccon, Guido},\n title = {An Information Retrieval Experiment Framework for Domain Specific Applications},\n booktitle = {The 41st International ACM SIGIR Conference on Research \\\u0026\\#38; Development in Information Retrieval},\n series = {SIGIR '18},\n year = {2018},\n} \n```\n\n## Logo\n\nThe Go gopher was created by [Renee French](https://reneefrench.blogspot.com/), licensed under\n[Creative Commons 3.0 Attributions license](https://creativecommons.org/licenses/by/3.0/).","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhscells%2Fgroove","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhscells%2Fgroove","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhscells%2Fgroove/lists"}