{"id":27127664,"url":"https://github.com/dbarbosadev/qrlit","last_synced_at":"2026-05-15T22:05:25.550Z","repository":{"id":248581066,"uuid":"829091743","full_name":"DBarbosaDev/QRLIT","owner":"DBarbosaDev","description":"QRLIT: Quantum Reinforcement Learning for Database Index Tuning","archived":false,"fork":false,"pushed_at":"2024-11-23T14:03:19.000Z","size":26008,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-27T12:41:20.646Z","etag":null,"topics":["database-indexes","database-indexing","grover-algorithm","grovers-algorithm","qiskit","quantum-ai","quantum-algorithms","quantum-computing","quantum-reinforcement-learning","reinforcement-learning"],"latest_commit_sha":null,"homepage":"https://doi.org/10.3390/fi16120439","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DBarbosaDev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-07-15T18:32:07.000Z","updated_at":"2024-11-23T16:52:28.000Z","dependencies_parsed_at":"2024-07-15T23:05:42.712Z","dependency_job_id":"b5434284-7690-4aae-b02f-0e1170c9895f","html_url":"https://github.com/DBarbosaDev/QRLIT","commit_stats":null,"previous_names":["dbarbosadev/quantum-rl-db-indexing","dbarbosadev/qrlit"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/DBarbosaDev/QRLIT","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DBarbosaDev%2FQRLIT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DBarbosaDev%2FQRLIT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DBarbosaDev%2FQRLIT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DBarbosaDev%2FQRLIT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DBarbosaDev","download_url":"https://codeload.github.com/DBarbosaDev/QRLIT/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DBarbosaDev%2FQRLIT/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279008297,"owners_count":26084427,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-11T02:00:06.511Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database-indexes","database-indexing","grover-algorithm","grovers-algorithm","qiskit","quantum-ai","quantum-algorithms","quantum-computing","quantum-reinforcement-learning","reinforcement-learning"],"created_at":"2025-04-07T17:57:29.798Z","updated_at":"2025-10-11T18:06:33.497Z","avatar_url":"https://github.com/DBarbosaDev.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# [QRLIT: Quantum Reinforcement Learning for Database Index Tuning ](https://doi.org/10.3390/fi16120439)\n\nQRLIT is a Quantum-classical algorithm version of SMARTIX. This version is based on the available source code in repository **rl-db-indexing** (https://github.com/Chotom/rl-db-indexing)\n\n### Cite this work with:\n\n```bibtex\n@Article{fi16120439,\n\tAUTHOR \t\t= {Barbosa, Diogo and Gruenwald, Le and D’Orazio, Laurent and Bernardino, Jorge},\n\tTITLE \t\t= {QRLIT: Quantum Reinforcement Learning for Database Index Tuning},\n\tJOURNAL \t= {Future Internet},\n\tVOLUME \t\t= {16},\n\tYEAR \t\t= {2024},\n\tNUMBER \t\t= {12},\n\tARTICLE-NUMBER \t= {439},\n\tURL \t\t= {https://www.mdpi.com/1999-5903/16/12/439},\n\tISSN \t\t= {1999-5903},\n\tABSTRACT \t= {Selecting indexes capable of reducing the cost of query processing in database systems is a challenging task, especially in large-scale applications. Quantum computing has been investigated with promising results in areas related to database management, such as query optimization, transaction scheduling, and index tuning. Promising results have also been seen when reinforcement learning is applied for database tuning in classical computing. However, there is no existing research with implementation details and experiment results for index tuning that takes advantage of both quantum computing and reinforcement learning. This paper proposes a new algorithm called QRLIT that uses the power of quantum computing and reinforcement learning for database index tuning. Experiments using the database TPC-H benchmark show that QRLIT exhibits superior performance and a faster convergence compared to its classical counterpart.},\n\tDOI \t\t= {10.3390/fi16120439}\n}\n```\n---\n\n#### Setup and training with Docker (Based on https://github.com/Chotom/rl-db-indexing)\n\nThis should only be used for testing or if you are certain, that you can provide stable server performance.\n\n1. Start mysql server and client.\n    ```shell\n    docker compose up -d\n    docker compose up -d --build # if docker images require to be rebuilt or created\n    ```\n\n2. Generate data and load database to mysql_server from client container.\n    ```shell\n    docker compose exec client python3 /project/cli/initiate_environment.py\n    ```\n\n3. You can start training by running script in client container\n    ```shell\n    docker compose exec client python3 /project/cli/run_quantum_train.py\n    ```\n\n---\n\n\n## rl-db-indexing (From https://github.com/Chotom/rl-db-indexing)\n\nDatabase indexes tuning with agent using reinforcement learning techniques.\\\nTPC-H benchmark MySQL environment for reinforcement learning.\n\nSource code for paper: [https://doi.org/10.1007/978-3-031-42941-5_45](https://doi.org/10.1007/978-3-031-42941-5_45)\n\n[![License](https://img.shields.io/github/license/Chotom/rl-db-indexing)](https://github.com/Chotom/rl-db-indexing/blob/main/LICENSE)\n\n---\n\n### Setup and training\n\nHow to train an agent and set up an environment for TPC-H database.\n\n1. In [db_env/tpch/tpch_tools](db_env/tpch/tpch_tools) directory paste tpc-h-tool.zip downloaded from TPC website (tested with version\n   3.0.0): [download link](http://tpc.org/tpc_documents_current_versions/download_programs/tools-download-request5.asp?bm_type=TPC-H\u0026bm_vers=3.0.0\u0026mode=CURRENT-ONLY).\n\n2. Define database size (in GB) and number of parallel streams in [db_env/tpch/config.py](db_env/tpch/config.py).\n\n   You can check recommended number of streams in TPC-H documentation [here (page 96)](https://www.tpc.org/TPC_Documents_Current_Versions/pdf/TPC-H_v3.0.0.pdf#page=96). \n    ```python\n    SCALE_FACTOR = 0.1   # Should be \u003e=1, however we used 0.1 because of limited resources\n    STREAM_COUNT = 2     # 2 for 1 GB, we also used 2\n    ```\n\n3. Follow further instruction:\n   - Docker:\n      - [Setup and training with Docker](#Setup-and-training-with-Docker)\n   - without Docker:\n     - [Server setup without Docker](#Server-setup-without-Docker)\n     - [Client setup without Docker](#Client-setup-without-Docker)\n\n---\n\n#### Setup and training with Docker\n\nThis should only be used for testing or if you are certain, that you can provide stable server performance.\n\n1. Start mysql server and client.\n    ```shell\n    docker compose up -d\n    docker compose up -d --build # if docker images require to be rebuilt or created\n    ```\n\n2. Generate data and load database to mysql_server from client container.\n    ```shell\n    docker compose exec client python3 /project/cli/initiate_environment.py\n    ```\n\n3. You can start training by running script in client container\n    ```shell\n    docker compose exec client python3 /project/cli/run_train.py\n    ```\n\n---\n\n#### Server setup without Docker\n\n1. Install and configure mysql server (version 8).\n\n2. Allow remote root connection for database operations during benchmarking (or create different user):\n    ```MySQL\n    CREATE USER 'root'@'CLIENT_IP'  /* change CLIENT_IP to client IP address, or use 'root'@'%' for all addresses */\n    IDENTIFIED WITH caching_sha2_password BY '1234';\n    GRANT ALL PRIVILEGES ON *.* TO 'root'@'CLIENT_IP';  /* change CLIENT_IP */\n    ```\n---\n\n#### Client setup without Docker\n\nProject currently only supports Linux client, because of DBGen. However, it should be possible to compile\nDBGen on Windows as well.\n\n1. Update file [.env](.env) with MySQL server data (user, password, server IP, port and name of the database, which will be\nused for benchmarking).\n\n2. Install gcc, make, python (minimal version is 3.8), pip and virtualenv.\n\n3. Move project to desired folder (for example /path/to/project).\n\n4. Create virtual environment (for example inside /path/to/project directory):\n    ```shell\n    virtualenv venv\n    ```\n\n5. Save project path to $PYTHONPATH:\n    ```shell\n    echo 'export PYTHONPATH=\"${PYTHONPATH}:/path/to/project/\"' \u003e\u003e venv/bin/activate\n    ```\n\n6. Activate virtual environment:\n    ```shell\n    source venv/bin/activate\n    ```\n\n7. Install requirements:\n    ```shell\n    pip install -r requirements.txt\n    ```\n\n8. Patch DBGen:\n    ```shell\n    python cli/patch_dbgen.py\n    ```\n\n9. Compile DBGen:\n    ```shell\n    make -C dbgen\n    ```\n\n10. Initiate environment - populate database with generated data:\n    ```shell\n    python cli/initiate_environment.py\n    ```\n\nThe environment is now ready and you can:\n\n- train agent, for example:\n    ```shell\n    nohup python cli/train_agent.py \u0026\u003e data/train.log \u0026\n    ```\n\n    It is possible to stop training at any time (without need of resetting environment) by sending the script SINGINT\n    (CTRL+C). In case of example way of running training shown above, it can be done with:\n    ```shell\n    kill -2 PID   # change PID to process id (can be established using 'ps' command)\n    ```\n\n- run single benchmark on current database configuration:\n    ```shell\n    python cli/run_benchmark.py\n    ```\n\n- reset database (without generating data for benchmark again):\n    ```shell\n    python cli/reset_environment.py\n    ```\n\n- reset index configuration to default state:\n    ```shell\n    python cli/reset_indexes.py\n    ```\n\n- set specific index configuration (run `python cli/set_index.py --help` to see column order):\n    ```shell\n    python cli/set_index.py 100000000000000000000000000000000000000000000\n    ```\n    \n---\n\n### Important notes\n\n#### DBGen bug\n\nIn file BUGS in dbgen directory in official TPC-H Tools, a bug named \"Problem #00062\" is mentioned, which states the\nfollowing:\n\n```\nbad update rollover after 1000 refreshes\nThis test uses tpcH scale 0.01. We've encountered\nan situation in which dbgen doesn't generate\nthe correct data for delete files delete.1000 and\nabove. In particular, file delete.1000 contains\nkeys to be deleted that have never been loaded.\nBecause of this problem, keys that should have been\ndeleted never are causing duplicate unique values\nto appear in the incremental loads after we cycle\nfrom the 4000th incremental update back around starting\nagain with the 1st one.\n```\n\nThe bug was closed, supposedly due to an unsupported scale factor. However, according to our observations, the bug still\nexists and affects every SCALE_FACTOR.\n\nDue to this bug database is constantly growing in size after 999 refresh functions. In order to continue with\nexperiments we had to implement a \"fixed\" way of generating data.\n\nOur way of \"fixing\" data uses only 1998 refreshes, after which the database is in its initial state (as opposed to 4000, which\nwould be official number if the bug hadn't existed).\n\nHad there been any solution for that bug, ensure to update MAX_REFRESH_FILE_INDEX variable in the file\n[db_env/tpch/config.py](db_env/tpch/config.py).\n\n```python\nMAX_REFRESH_FILE_INDEX = 4000 # 1998 changed to 4000\n```\n\nand file [cli/initiate_environment.py](cli/initiate_environment.py) like so:\n```python\nfrom db_env.tpch.TpchGenerator import TpchGenerator\n\nif __name__ == '__main__':\n    generator = TpchGenerator()\n    generator.reset_db()\n    generator.generate_data()\n    generator.load_db()\n    generator.generate_refresh_data()           # uncomment this line\n    #generator.generate_fixed_refresh_data()    # comment or remove this line\n```\n\n##### How to confirm the bug\n\n1. Generate data with DBGen (note officially supported 1 GB SCALE_FACTOR):\n    ```shell\n    ./dbgen -vf -s 1\n    ```\n\n2. Generate refresh data for 4000 refresh function pairs:\n    ```shell\n    ./dbgen -vf -U 4000 -s 1\n    ```\n\n3. Delete files from 1000 onwards do not delete anything, which can be checked by:\n    ```shell\n    grep -F -f delete.1000 ./orders.tbl* \n    ```\n\nThis demonstrates that, after refreshes 1-999 (which are assumed to be correct), delete files don't actually remove\nanything. At the same time update files do add records, which causes database to grow in size, which messes up the\nbenchmark. \n\n#### Database state file\n\nYou should never interfere with file path/to/project/data/rf_db_index.txt.\n\nEvery executed refresh pair should increment number stored in that file.\n\nAfter refresh pair number 4000 (or currently after refresh pair 1998) database should be in its initial\nstate (note this has not been fully tested) - rf_db_index.txt should contain 1.\n\nOtherwise, you will lose information about current database state, which may (and sooner or later will) lead to errors\nand force you to stop training and reset database.\n\n---\n\n### Project structure\n\n#### agent\n\nPackage with agent to train.\n\n#### data\n\nDirectory to store project datafiles and agent data for analysis.\n\n#### data_analysis\n\nDirectory to store scripts used for analysis of agent training data.\n\n#### db_env\n\nPackage with database environment for agent.\n\n#### shared_utils\n\nPackage with utilities functions and variables to use in project.\n\n#### test\n\nRun tests with code coverage (run `pip install -r test/requirements.txt` to install test requirements):\n\n```shell\ncoverage run -m unittest\ncoverage report\n```\n\n---\n\n### Citing \"Intelligent Index Tuning Using Reinforcement Learning\"\n\nIf you use this repository in your research, please cite:\n```bibtex\n@inproceedings{10.1007/978-3-031-42941-5_45,\n\ttitle        = {Intelligent Index Tuning Using Reinforcement Learning},\n\tauthor       = {Matczak, Micha{\\l} and Czocha{\\'{n}}ski, Tomasz},\n\tyear         = 2023,\n\tbooktitle    = {New Trends in Database and Information Systems},\n\tpublisher    = {Springer Nature Switzerland},\n\taddress      = {Cham},\n\tpages        = {523--534},\n\tisbn         = {978-3-031-42941-5},\n\teditor       = {Abell{\\'o}, Alberto and Vassiliadis, Panos and Romero, Oscar and Wrembel, Robert and Bugiotti, Francesca and Gamper, Johann and Vargas Solar, Genoveva and Zumpano, Ester},\n\tabstract     = {Index tuning in databases is a critical task that can significantly impact database performance. However, the process of manually configuring indexes is often time-consuming and can be inefficient. In this study, we investigate the process of creating indexes in a database using reinforcement learning. Our research aims to develop an agent that can learn to make optimal decisions for configuring indexes in a chosen database. The paper also discusses an evaluation method to measure database performance. The adopted performance test provides necessary documentation, database schema (on which experiments will be performed) and auxiliary tools such as data generator. This benchmark evaluates a selected database management system in terms of loading, querying and processing power of multiple query streams at once. It is a comprehensive test which results, calculated on measured queries time, will be used in the reinforcement learning algorithm. Our results demonstrate that used index technique requires repeatable benchmark with stable environment and high compute power, which cause cost and time demand. The replication package for this paper is available at GitHub: https://github.com/Chotom/rl-db-indexing.}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdbarbosadev%2Fqrlit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdbarbosadev%2Fqrlit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdbarbosadev%2Fqrlit/lists"}