{"id":21187046,"url":"https://github.com/bitfunnel/mg4j-workbench","last_synced_at":"2025-03-14T20:19:37.836Z","repository":{"id":91595992,"uuid":"89166593","full_name":"BitFunnel/mg4j-workbench","owner":"BitFunnel","description":"Java tools for evaluating BitFunnel performance compared to an mg4j baseline.","archived":false,"fork":false,"pushed_at":"2018-06-01T21:39:00.000Z","size":1720,"stargazers_count":1,"open_issues_count":23,"forks_count":2,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-01-21T13:07:27.616Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"lgpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BitFunnel.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-04-23T19:24:34.000Z","updated_at":"2018-06-01T21:37:47.000Z","dependencies_parsed_at":null,"dependency_job_id":"d2dbfcfa-705f-4b3d-9a11-6218d6044419","html_url":"https://github.com/BitFunnel/mg4j-workbench","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BitFunnel%2Fmg4j-workbench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BitFunnel%2Fmg4j-workbench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BitFunnel%2Fmg4j-workbench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BitFunnel%2Fmg4j-workbench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BitFunnel","download_url":"https://codeload.github.com/BitFunnel/mg4j-workbench/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243639569,"owners_count":20323516,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-20T18:28:04.992Z","updated_at":"2025-03-14T20:19:37.809Z","avatar_url":"https://github.com/BitFunnel.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# mg4j-workbench\nJava tools for evaluating BitFunnel performance compared to an mg4j baseline.\n\n## Building\n\n### Windows\n\n~~~\nchoco install java\nchoco install maven\nmvn package\n~~~\n\nTODO: set JAVA_HOME?\n\n### Linux\n\n~~~\nsudo add-apt-repository ppa:webupd8team/java\nsudo apt-get update\nsudo apt-get install oracle-java8-installer\nsudo apt-get install maven\nmvn package\n~~~\n\nTODO: set JAVA_HOME?\n\n### OSX\n\nComing soon.\n\n### IntelliJ\n\nImport pom.xml.\nBuild -\u003e Build Project\n\n// TODO: Describe step-by-step.\n// TODO: Add pictures.\n\n## Creating an mg4j collection.\n\n~~~\njava -cp target/mg4j-1.0-SNAPSHOT-jar-with-dependencies.jar \\\n     it.unimi.di.big.mg4j.document.TRECDocumentCollection \\\n     -f HtmlDocumentFactory -p encoding=iso-8859-1 d:\\data\\work\\out2.collection d:\\data\\gov2\\gx000\\gx000\\00.txt\n~~~\n\nTODO: -z parameter for gz files.\nTODO: substute \u003cCOLLECTION FILE\u003e \u003cGOV2 Files ...\u003e\n\n## Creating a BitFunnel chunk file from an mg4j collection.\n\n~~~\njava -cp target/mg4j-1.0-SNAPSHOT-jar-with-dependencies.jar \\\n     org.bitfunnel.reproducibility.GenerateBitFunnelChunks \\\n      -S \u003ccollection file\u003e \u003cchunk file\u003e\n~~~\n\n## Building an mg4j index.\n\n~~~\njava -cp target/mg4j-1.0-SNAPSHOT-jar-with-dependencies.jar \\\n     it.unimi.di.big.mg4j.tool.IndexBuilder \\\n      --keep-batches --downcase -S d:\\data\\work\\out2.collection d:\\data\\work\\out2\n~~~\n\nTODO: Substitute \u003cCOLLECTION FILE\u003e \u003cBASENAME\u003e\nTODO: Add document filter parameter.\n\n\n## Processing a query log.\n\n~~~\njava -cp target/mg4j-1.0-SNAPSHOT-jar-with-dependencies.jar \\\n     org.bitfunnel.reproducibility.QueryLogRunner \\\n     \u003cindex base name\u003e \u003cquery log file\u003e \u003coutput file\u003e [-t threadCount]\n~~~\n\n## Exporting a Partitioned Elias-Fano Index\n\nIt is possible to export the mg4j index in a format usable by the\n[Partitioned Elias-Fano Index](https://github.com/BitFunnel/partitioned_elias_fano) project.\nThe optional `--index` flag exports the index. The option `--queries` flag converts a\nquery log file for consumption by the Partitioned Elias-Fano Index. Two query files are\ngenerated. The first has queries whose terms have been replaced by their integer term id values.\nQueries with terms that are not in the index (and therefor don't have term id values) are\nfiltered out. The second query file has the plain text queries corresponding to those in the\nfile of term id queries.\n\n~~~\njava -cp target/mg4j-1.0-SNAPSHOT-jar-with-dependencies.jar \\\n     org.bitfunnel.reproducibility.IndexExporter \\\n     \u003cindex base name\u003e \u003coutput base name\u003e [--index] [--queries \u003cquery log file\u003e]\n~~~\n\n## Filtering Query Logs\nNote that one can use the `IndexExporter`, described in the previous section, to\ngenerate a filtered query log that contains only those queries whose terms all\nappear in the index. Just include the `--queries` parameter and remove the `--index`\nparameter.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbitfunnel%2Fmg4j-workbench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbitfunnel%2Fmg4j-workbench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbitfunnel%2Fmg4j-workbench/lists"}