{"id":17843795,"url":"https://github.com/mc-cat-tty/placerank","last_synced_at":"2025-08-14T02:32:52.359Z","repository":{"id":217898647,"uuid":"744687762","full_name":"mc-cat-tty/PlaceRank","owner":"mc-cat-tty","description":"Final assigment for \"Gestione dell'Informazione\" (\"Search Engines\") course @ UniMoRe","archived":false,"fork":false,"pushed_at":"2024-02-08T08:22:05.000Z","size":47454,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-10-28T00:21:28.889Z","etag":null,"topics":["airbnb","benchmarking","bert-embeddings","datasets","huggingface","huggingface-transformers","information-retrieval","insideairbnb-data","masked-language-models","ncurses","ranking-algorithm","search-engine","urwid","whoosh"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mc-cat-tty.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-17T20:03:57.000Z","updated_at":"2024-02-08T06:55:59.000Z","dependencies_parsed_at":"2024-01-21T22:56:24.847Z","dependency_job_id":"82d3fde7-b2fb-4c65-aafc-86e1e84c30ad","html_url":"https://github.com/mc-cat-tty/PlaceRank","commit_stats":null,"previous_names":["mc-cat-tty/placerank"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mc-cat-tty%2FPlaceRank","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mc-cat-tty%2FPlaceRank/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mc-cat-tty%2FPlaceRank/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mc-cat-tty%2FPlaceRank/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mc-cat-tty","download_url":"https://codeload.github.com/mc-cat-tty/PlaceRank/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":229795460,"owners_count":18125284,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airbnb","benchmarking","bert-embeddings","datasets","huggingface","huggingface-transformers","information-retrieval","insideairbnb-data","masked-language-models","ncurses","ranking-algorithm","search-engine","urwid","whoosh"],"created_at":"2024-10-27T21:26:50.089Z","updated_at":"2024-12-15T07:41:18.417Z","avatar_url":"https://github.com/mc-cat-tty.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PlaceRank\n\nSearch engine for AirBnB listings.\n\nFinal assignment of the \"Gestione dell'informazione\" course at University of Modena and Reggio Emilia. Academic year 2023-2024.\n\n## Bringup\nAt least version **3.11** of the **Python** interpreter is needed.\n\nIn order to enjoy our not-so-SOTA search engine, the average user needs to run the following commands in a shell where the Python interpreter is available:\n```bash\n# INSTALL DEPENDENCIES\npython3 -m pip install -r requirements.txt\n\n# DOWNLOAD DATASET, CREATE INDEX, DOWNLOAD WORDNET AND BERT MODEL\npython3 -m setup\n```\n\nPlease, be aware that `bert-large-uncased-whole-word-masking` can take up to 1.5 Gb of disk space and 30 min to download.\n\nThe model is by default stored in _hf\\_cache_ folder.\n\nFor experienced user, we suggest to firstly crate a virtual environment, where all packages will be installed; then follow the above procedure:\n```bash\npython3 -m venv venv\nsource venv/bin/activate\n```\n\n## Usage\nThe Placerank project embraces different modules, each of them with a specific purpose, usually self-explanatory. The most significant ones are:\n - `ir_model`, `models`, `sentiment` and `query_expansion` modules: contain some models and services that the user can experiment with through the following blocks\n - `tui` package: contains view, presenter, event dispatcher and all the logic that is under the ui's hood\n - `benchmark` module: contains the implementation of some popular benchmarking metrics\n - `preprocessing`, `dataset`, `views`, `config` modules: contain the building blocks and convenience functions/classes for the entire project\n\n### TUI\nThe TUI - Terminal User Interface - is the front-end for our project. Launch the following command with a terminal window big enough:\n```bash\npython3 -m placerank\n```\n\nIn case of any doubt about the interface visit [help page](HELP.txt).\n\nNote that the application can take up to some seconds to load, especially at the first run.\n\n\u003cimg src=\"assets/tui.png\" width=\"400px\"\u003e\n\u003cimg src=\"assets/tui2.png\" width=\"400px\"\u003e\n\n#### Common Exceptions\n`urwid.widget.widget.WidgetError: ... canvas when passed size ...`. This class of errors usually means that the terminal **window** is **too small** for the TUI to be rendered.\n\n### Benchmarks\nThe Benchmark module is designed to test the performance of an index against predefined queries. It includes functionality to load a benchmark dataset, test an index against the queries, and compute various evaluation metrics such as recall, precision, precision at ranking r, average precision, mean average precision, F1 score, and the E-measure.\n\nTo use the Benchmark module, follow these steps:\n\nSetup benchmarks:\n```python\npython3 -m setup_benchmarks\n```\n\nCreate a Benchmark object:\n\n```python\nbench = Benchmark()\n```\n\nOpen the index:\n\n```python\nix = open_dir(\"index/benchmark\")\n```\n\nTest the benchmark against the index. This is required to compute different metrics on the benchmark.\n\n```python\nbench.test_against(ix)\n```\n\nPrint or use the computed metrics by using the object methods:\n\n```python\nprint(bench.precision())\nprint(bench.recall())\nprint(bench.precision_at_r())\nprint(bench.precision_at_recall_levels())\nprint(bench.average_precision())\nprint(bench.mean_average_precision())\nprint(bench.f1())\nprint(bench.e())\n```\n\nCalling the module `placerank.benchmark` from the command line computes all of the metrics above for the \"index/benchmark\" index, which is an inverted index built on InsideAirbnb Cambridge listings.\n\n### Reviews\n\nThe reviews dataset is used to compute the sentiment metric for each listing. Recent reviews have a major weight on the score than older ones.\n\nTo compute sentiment for each review, use the function `build_reviews_index` of `placerank.dataset` to build the dataset of reviews.\nThe function initializes a defaultdict where keys are listing IDs, and values are lists of tuples containing review information.\n\nThe dataset will be saved in a `reviews.pickle` file, to load it call the function `load_reviews_index`.\n\n## Contributors\n - Corradini Giulio\n - Mecatti Francesco\n - Stano Antonio\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmc-cat-tty%2Fplacerank","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmc-cat-tty%2Fplacerank","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmc-cat-tty%2Fplacerank/lists"}