{"id":16750470,"url":"https://github.com/mberr/k-distance-prediction","last_synced_at":"2025-03-16T04:26:26.686Z","repository":{"id":109085140,"uuid":"195997307","full_name":"mberr/k-distance-prediction","owner":"mberr","description":"Source code for the paper \"k-Distance Approximation for Memory-Efficient RkNN Retrieval\"","archived":false,"fork":false,"pushed_at":"2019-07-09T11:51:32.000Z","size":163,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-01-22T17:11:33.204Z","etag":null,"topics":["index","indexing","knn","query-processing","spatial-index"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mberr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-07-09T11:42:25.000Z","updated_at":"2022-09-22T21:15:11.000Z","dependencies_parsed_at":"2023-03-24T08:46:47.224Z","dependency_job_id":null,"html_url":"https://github.com/mberr/k-distance-prediction","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mberr%2Fk-distance-prediction","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mberr%2Fk-distance-prediction/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mberr%2Fk-distance-prediction/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mberr%2Fk-distance-prediction/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mberr","download_url":"https://codeload.github.com/mberr/k-distance-prediction/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243826111,"owners_count":20354217,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["index","indexing","knn","query-processing","spatial-index"],"created_at":"2024-10-13T02:28:15.015Z","updated_at":"2025-03-16T04:26:26.680Z","avatar_url":"https://github.com/mberr.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# k-Distance Approximation for Memory-Efficient RkNN Retrieval\nRepository containing the code for the paper\n\n__k-Distance Approximation for Memory-Efficient RkNN Retrieval__  \n_Max Berrendorf, Felix Borutta, and Peer Kröger_  \nSISAP'19\n\nIf you find this code useful, please consider citing us.\n```\n@InProceedings{kDistanceApproximation,\n    author=\"Berrendorf, Max\n    and Borutta, Felix\n    and Kr{\\\"o}ger, Peer\",\n    title=\"k-Distance Approximation for Memory-Efficient RkNN Retrieval\",\n    booktitle=\"Similarity Search and Applications\",\n    year=\"2019\",\n    publisher=\"Springer International Publishing\",\n}\n\n```\n\n## Data Preprocessing\nDownload road networks `OL`, `TG`, and `SF` from [here](https://www.cs.utah.edu/~lifeifei/SpatialDataset.htm).\nThen, run preprocessing script to\n* generate synthetic datasets. \n* for each dataset, compute and store\n  * `BallTree` index for fast range queries in evaluation code, and \n  * `k-distances` for building the index.\n  * `MRkNNCoP` tree coefficients.\nYou can use the command line argument `--index_root` to specify a directory where the data is stored (consuming approx. `1.1 GiB`), and `--model_root` to specify a directory to store the MRkNNCoP tree coefficients (consuming approx. `5.0 MiB`).\n```bash\npython3 preprocess.py --index_root=./index --model_root=./models\n```\nThe indices are stored as `pickle` file named `\u003cdataset_name\u003e.pkl`, and the k-distances as sparse CSR matrices in a `HDF5` file in a custom format (cf. `persistence.py:save_csr_to_hdf`).\n\n\n## Model Selection\nAfter preprocessing you can perform model selection to analyse the trade-off between model size and candidate set size.\nTo this end, use the following\n```bash\npython3 model_selection.py --index_root=./index --model_root=./models --output_root=./results\n```\nwhich reads data and MRkNNCoP tree coefficients and trains numerous predefined models to predict the MRkNNCoP tree coefficients and thereby the k-distance.\nAs a result, for each of the datasets the following files are produced:\n* a CSV containing the error in coefficients prediction for all models\n* a CSV containing the mean candidate set size for every model in the `model_size`-`mae` skyline.\n* a pickle file containing the candidate set sizes for each model for each data point and each value of k between 1 and K_MAX\n\nThe output files consume approx. `4.0 GiB`.    \n\n## Evaluation\nAfter training all models, the `notebooks/Evaluation.ipynb` can be used to evaluate the results.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmberr%2Fk-distance-prediction","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmberr%2Fk-distance-prediction","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmberr%2Fk-distance-prediction/lists"}