{"id":28455722,"url":"https://github.com/questdb/mindsdb-tutorial","last_synced_at":"2025-06-27T02:31:30.534Z","repository":{"id":42003022,"uuid":"470170441","full_name":"questdb/mindsdb-tutorial","owner":"questdb","description":"A tutorial on integrating QuestDB with MindsDB to achieve wondrous ML feats","archived":false,"fork":false,"pushed_at":"2023-08-18T11:21:10.000Z","size":1810,"stargazers_count":27,"open_issues_count":0,"forks_count":4,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-06T22:11:17.526Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Dockerfile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/questdb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-03-15T13:24:35.000Z","updated_at":"2024-12-17T01:33:05.000Z","dependencies_parsed_at":"2022-08-12T02:00:55.497Z","dependency_job_id":null,"html_url":"https://github.com/questdb/mindsdb-tutorial","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/questdb/mindsdb-tutorial","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/questdb%2Fmindsdb-tutorial","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/questdb%2Fmindsdb-tutorial/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/questdb%2Fmindsdb-tutorial/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/questdb%2Fmindsdb-tutorial/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/questdb","download_url":"https://codeload.github.com/questdb/mindsdb-tutorial/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/questdb%2Fmindsdb-tutorial/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262177625,"owners_count":23270902,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-06T22:10:28.233Z","updated_at":"2025-06-27T02:31:30.525Z","avatar_url":"https://github.com/questdb.png","language":"Dockerfile","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# Using QuestDB as a datasource for MindsDB\n\n## Introduction\n\n[MindsDB](https://mindsdb.com/) enables you to use Machine Learning to ask predictive questions about your data \nand receive accurate answers from it, all **in SQL**. With MindsDB: \n\n- *Developers* can quickly add AI capabilities to their applications.\n- *Data scientists* can streamline MLOps by deploying ML models as \n  [AI Tables](https://docs.mindsdb.com/sql/tutorials/ai-tables/?h=ai#deep-dive-into-the-ai-tables).\n- *Data analysts* can easily make forecasts on complex data, such as multivariate time-series with high \n  cardinality, and visualize these in BI tools like [Grafana](https://grafana.com/), and \n  [Tableau](https://www.tableau.com/).\n \n[QuestDB](https://questdb.io/) is **the fastest open-source**, column-oriented SQL database for time-series data. \nIt has been designed and built for massively-parallelized vectorized execution and SIMD, as the de-facto backend \nfor high-performance demanding applications in financial services, IoT, IIoT, ML, MLOps, DevOps and observability. \nQuestDB implements **ANSI SQL with additional extensions for time-specific queries**, which make it simple to correlate \ndata from multiple sources using relational and time series [joins](https://questdb.io/docs/reference/sql/join), and \nexecute [aggregation functions](https://questdb.io/docs/reference/function/aggregation) with simplicity and speed.\nIn addition, QuestDB is resource efficient (comparatively cheaper than other projects to run in cloud environments), \nsimple to install, manage, use, and stable in all [production environments](https://questdb.io/customers/).   \n\nCombining both MindsDB and QuestDB gives you unbound prediction ability with SQL. You can perform all the pre-processing \nof your data inside QuestDB using its powerful and [unique extended SQL](https://questdb.io/docs/concept/sql-extensions), \nand then you can access these data from MindsDB, in its own [also unique SQL](https://docs.mindsdb.com/sql/), to produce \npowerful ML models.\n\nThe main goal of this article is to gently introduce these two deep technologies and give you enough \nunderstanding to be able to undertake very ambitious ML projects. To that end we will:\n\n- Build a Docker image and spawn a container to run **MindsDB** and **QuestDB** together.\n- Add **QuestDB** as a datasource to **MindsDB** using a SQL Statement.\n- Create a table and add data for a simple ML use case using **QuestDB**'s web console.\n- Connect to **MindsDB** using its web console write some SQL.\n- Create a predictor for our ML use case.\n- Make some predictions about our data.\n\nHave fun!\n\n## Requirements\n\n[docker](https://docs.docker.com/) is required to create an image and run our container. \n\nSoftware repositories in case you are inclined to look under the hood (**Give us a star!**):\n- MindsDB: [https://github.com/mindsdb/mindsdb](https://github.com/mindsdb/mindsdb).\n- QuestDB: [https://github.com/questdb/questdb](https://github.com/questdb/questdb).\n\n## Running our Docker application\n\nBuild the image first, with command:\n\n```shell\ndocker build -t questdb/mindsdb:latest .\n```\n\nwhich allows us to start our service container with command:\n\n```shell\ndocker run --rm \\\n    -p 8812:8812 \\\n    -p 9009:9009 \\\n    -p 9000:9000 \\\n    -p 8888:8888 \\\n    -p 47334:47334 \\\n    -p 47335:47335 \\\n    -d \\\n    --name qmdb \\\n    questdb/mindsdb:latest\n```\n\nThe container is run as user `quest`. It takes about 10 seconds to become responsive, logs can be followed in the terminal:\n\n```shell\ndocker logs -f qmdb\n...\nhttp API: starting...\nmysql API: starting...\nmongodb API: starting...\n...\nmongodb API: started on 47336\nmysql API: started on 47335\nhttp API: started on 47334\n```\n\nThe container has these mount points:\n\n- **/home/quest**: User home dir.\n- **~/questdb/**:  QuestDB's root directory.\n- **~/questdb/db/**:  QuestDB's data root directory.\n- **~/backups/**: Directory for backups.\n- **~/csv/**: Directory for COPY operation.\n- **~/mindsdb/storage/**: MindsDB's data root directory.\n\nTo manage it as root:\n\n```shell\ndocker run --rm -it --name qmdb-cli -u 0 questdb/mindsdb:latest bash\n```\n\n## Adding data to QuestDB\n\nWe can access QuestDB's web console at [localhost:9000](http://localhost:9000):\n\n![QuestDB_web_console](images/questdb_web_console.png)\n\nand execute this DDL query to create a simple table (copy this query to the web console, select it and click `Run`):\n\n```sql\nCREATE TABLE IF NOT EXISTS house_rentals_data (\n    number_of_rooms INT,\n    number_of_bathrooms INT,\n    sqft INT,\n    location SYMBOL,\n    days_on_market INT,\n    initial_price FLOAT,\n    neighborhood SYMBOL,\n    rental_price FLOAT,\n    ts TIMESTAMP\n) TIMESTAMP(ts) PARTITION BY YEAR;\n```\n\nWe can upload data from a [local CSV file](./sample_house_rentals_data.csv) to QuestDB:\n\n```shell\ncurl -F data=@sample_house_rentals_data.csv \"http://localhost:9000/imp?forceHeader=true\u0026name=house_rentals_data\"\n```\n\nMore information available [here!](https://questdb.io/docs/develop/insert-data#rest-api).\n\nWe could equally populate table `house_rentals_data` with random data ([excellent tutorial on this](https://questdb.io/tutorial/2022/03/14/mock-sql-timeseries-data-questdb/)):\n\n```sql\nINSERT INTO house_rentals_data SELECT * FROM (\n    SELECT \n        rnd_int(1,6,0),\n        rnd_int(1,3,0),\n        rnd_int(180,2000,0),\n        rnd_symbol('great', 'good', 'poor'),\n        rnd_int(1,20,0),\n        rnd_float(0) * 1000,\n        rnd_symbol('alcatraz_ave', 'berkeley_hills', 'downtown', 'south_side', 'thowsand_oaks', 'westbrae'),\n        rnd_float(0) * 1000 + 500,\n        timestamp_sequence(\n            to_timestamp('2021-01-01', 'yyyy-MM-dd'),\n            14400000000L\n        )\n    FROM long_sequence(100)\n);\n```\n\nEither way, this gives us 100 data points, one every 4 hours, from 2021-01-16T12:00:00.000000Z (QuestDB's timestamps \nare UTC with microsecond precision), conveniently downloaded to file [sample_house_rentals_data.csv](sample_house_rentals_data.csv).\n \nNOTE: If you tried the last query, you will have 200 rows, you can `truncate table house_rentals_data` and run the curl \ncommand again, in QuestDB data are immutable.\n\n## Connecting to MindsDB\n\nWe can access MindsDB's web console at [localhost:47334](http://localhost:47334): \n\n![MindsDB_web_console](images/mindsdb_web_console.png)\n\nOnly two databases are relevant to us, **questdb** and **mindsdb**\n\n```sql\nSHOW DATABASES;\n+--------------------+\n| Database           |\n+--------------------+\n| mindsdb            |\n| files              |\n| questdb            |\n+--------------------+\n5 rows in set (0.34 sec) \n```\n\nTo see `questdb` as a database we need to add it by executing:\n  \n```sql\nCREATE DATABASE questdb\n    WITH ENGINE = \"questdb\",\n    PARAMETERS = {\n        \"user\": \"admin\",\n        \"password\": \"quest\",\n        \"host\": \"0.0.0.0\",\n        \"port\": \"8812\",\n        \"database\": \"questdb\"\n    };\n```\n\n### questdb \n\nThis is a read-only view on our QuestDB instance. We can query it leveraging the full power of \nQuestDB's unique SQL syntax because statements are sent from MindsDB to QuestDB without interpreting \nthem. It only works for *SELECT* statements: \n\n```sql\nSELECT * FROM questdb (\n    SELECT\n        ts,\n        neighborhood, \n        sum(days_on_market) DaysLive,\n        min(rental_price) MinRent,\n        max(rental_price) MaxRent,\n        avg(rental_price) AvgRent\n    FROM house_rentals_data\n    WHERE ts BETWEEN '2021-01-08' AND '2021-01-10'\n    SAMPLE BY 1d FILL (0, 0, 0, 0)\n);\n+--------------+----------------+----------+----------+----------+--------------------+\n| ts           | neighborhood   | DaysLive | MinRent  | MaxRent  | AvgRent            |\n+--------------+----------------+----------+----------+----------+--------------------+\n| 1610064000.0 | south_side     | 19       | 1285.338 | 1285.338 | 1285.338134765625  |\n| 1610064000.0 | downtown       | 7        | 1047.14  | 1047.14  | 1047.1396484375    |\n| 1610064000.0 | berkeley_hills | 17       | 727.52   | 727.52   | 727.5198974609375  |\n| 1610064000.0 | westbrae       | 36       | 1038.358 | 1047.342 | 1042.85009765625   |\n| 1610064000.0 | thowsand_oaks  | 5        | 1067.319 | 1067.319 | 1067.318603515625  |\n| 1610064000.0 | alcatraz_ave   | 0        | 0.0      | 0.0      | 0.0                |\n| 1610150400.0 | south_side     | 10       | 694.403  | 694.403  | 694.4031982421875  |\n| 1610150400.0 | downtown       | 16       | 546.798  | 643.204  | 595.0011291503906  |\n| 1610150400.0 | berkeley_hills | 4        | 1256.49  | 1256.49  | 1256.4903564453125 |\n| 1610150400.0 | westbrae       | 0        | 0.0      | 0.0      | 0.0                |\n| 1610150400.0 | thowsand_oaks  | 0        | 0.0      | 0.0      | 0.0                |\n| 1610150400.0 | alcatraz_ave   | 14       | 653.924  | 1250.477 | 952.2005004882812  |\n| 1610236800.0 | south_side     | 0        | 0.0      | 0.0      | 0.0                |\n| 1610236800.0 | downtown       | 9        | 1357.916 | 1357.916 | 1357.9158935546875 |\n| 1610236800.0 | berkeley_hills | 0        | 0.0      | 0.0      | 0.0                |\n| 1610236800.0 | westbrae       | 0        | 0.0      | 0.0      | 0.0                |\n| 1610236800.0 | thowsand_oaks  | 0        | 0.0      | 0.0      | 0.0                |\n| 1610236800.0 | alcatraz_ave   | 0        | 0.0      | 0.0      | 0.0                |\n+--------------+----------------+----------+----------+----------+--------------------+\n```\n  \nBeyond SELECT statements, for instance when we need to save the results of a query into a new table,\nwe need to use QuestDB's web console available at [localhost:9000](http://localhost:9000):\n\n```sql\nCREATE TABLE sample_query_results AS (\n    SELECT\n        ts,\n        neighborhood, \n        sum(days_on_market) DaysLive,\n        min(rental_price) MinRent,\n        max(rental_price) MaxRent,\n        avg(rental_price) AvgRent\n    FROM house_rentals_data\n    WHERE ts BETWEEN '2021-01-08' AND '2021-01-10'\n    SAMPLE BY 1d FILL (0, 0, 0, 0)\n) TIMESTAMP(ts) PARTITION BY MONTH;\n```\n\n### mindsdb\n\nContains the metadata tables necessary to create ML models:\n\n```sql\nUSE mindsdb;\nSHOW TABLES;\n+-------------------+\n| Tables_in_mindsdb |\n+-------------------+\n| models            |  \n| models_versions   |\n+-------------------+\n```\n\n## Creating a predictor model\n\nWe can create a predictor model `mindsdb.home_rentals_model_ts` to predict the `rental_price` \nfor a `neighborhood` considering the past 20 days, and no additional features:\n\n```sql\nCREATE MODEL mindsdb.home_rentals_model_ts FROM questdb (\n    SELECT\n        neighborhood,\n        rental_price,\n        ts\n    FROM house_rentals_data\n) \nPREDICT rental_price ORDER BY ts GROUP BY neighborhood\nWINDOW 20 HORIZON 1;\n```\n\nThis triggers MindsDB to create/train the model based on the full data available from QuestDB's table \n`house_rentals_data` (100 rows) as a timeseries on column `ts`.\n\nYou can see the progress by monitoring the log output of the `mindsdb` Docker container. Creating/training a \nmodel will take time proportional to the number of features, i.e.cardinality of the source table as defined \nin the inner SELECT of the CREATE PREDICTOR statement, and the size of the corpus, i.e. number of rows. The \nmodel is a table in MindsDB:\n\n```sql\nSHOW TABLES;\n+-----------------------+\n| Tables_in_mindsdb     |\n+-----------------------+\n| information_schema    |    \n| models                |\n| model_versions        |\n| home_rentals_model_ts |\n+-----------------------+\n```\n\n## Describe the predictor model\n\nWe can get more information about the trained model, how was the accuracy calculated or which columns are important for the model by executing the DESCRIBE statement.\n\n```sql\nDESCRIBE MODEL mindsdb.home_rentals_model_ts;\n*************************** 1. row ***************************\n        accuracies: {'complementary_smape_array_accuracy':0.859}\n           outputs: ['rental_price']\n            inputs: ['neighborhood', 'ts', '__mdb_ts_previous_rental_price']\n        datasource: home_rentals_model_ts\n             model: encoders --\u003e dtype_dict --\u003e dependency_dict --\u003e model --\u003e problem_definition --\u003e identifiers --\u003e imputers --\u003e accuracy_functions\n```\n\nOr, to see how the model encoded the data prior to training we can execute:\n\n```sql\nDESCRIBE MODEL mindsdb.home_rentals_model_ts.features;\n+--------------+-------------+------------------+---------+\n| column       | type        | encoder          | role    |\n+--------------+-------------+------------------+---------+\n| neighborhood | categorical | OneHotEncoder    | feature |\n| rental_price | float       | TsNumericEncoder | target  |\n| ts           | datetime    | ArrayEncoder     | feature |\n+--------------+-------------+------------------+---------+\n```\n\nAdditional information about the models and how they can be customized can be found on the [Lightwood docs](https://lightwood.io/).\n\n## Querying MindsDB for predictions\n\nThe latest `rental_price` value per `neighborhood` in table `questdb.house_rentals_data` \n(as per the [uploaded data](sample_house_rentals_data.csv)) can be obtained directly from QuestDB\nexecuting query:\n\n\n```sql\nSELECT * FROM questdb (\n    SELECT \n        neighborhood, \n        rental_price, \n        ts \n    FROM house_rentals_data \n    LATEST BY neighborhood\n);\n+----------------+--------------+--------------+\n| neighborhood   | rental_price | ts           |\n+----------------+--------------+--------------+\n| thowsand_oaks  | 1150.427     | 1610712000.0 |   (2021-01-15 12:00:00.0)\n| south_side     | 726.953      | 1610784000.0 |   (2021-01-16 08:00:00.0)\n| downtown       | 568.73       | 1610798400.0 |   (2021-01-16 12:00:00.0)\n| westbrae       | 543.83       | 1610841600.0 |   (2021-01-17 00:00:00.0)\n| berkeley_hills | 559.928      | 1610870400.0 |   (2021-01-17 08:00:00.0)\n| alcatraz_ave   | 1268.529     | 1610884800.0 |   (2021-01-17 12:00:00.0)\n+----------------+--------------+--------------+\n```\n\nTo predict the next value:\n\n```sql\nSELECT \n    tb.ts,\n    tb.neighborhood,\n    tb.rental_price as predicted_rental_price,\n    tb.rental_price_explain as explanation\nFROM questdb.house_rentals_data AS ta\nJOIN mindsdb.home_rentals_model_ts AS tb\nWHERE ta.ts \u003e LATEST;\n+---------------------+----------------+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n| ts                  | neighborhood   | predicted_rental_price | explanation                                                                                                                                                                              |\n+---------------------+----------------+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n| 2021-01-17 00:00:00 | downtown       |      877.3007391233444 | {\"predicted_value\": 877.3007391233444, \"confidence\": 0.9991, \"anomaly\": null, \"truth\": null, \"confidence_lower_bound\": 379.43294697022424, \"confidence_upper_bound\": 1375.1685312764646} |\n| 2021-01-19 08:00:00 | westbrae       |      923.1387395936794 | {\"predicted_value\": 923.1387395936794, \"confidence\": 0.9991, \"anomaly\": null, \"truth\": null, \"confidence_lower_bound\": 385.8327438509463, \"confidence_upper_bound\": 1460.4447353364124}  |\n| 2021-01-15 16:00:00 | thowsand_oaks  |      1418.678199780345 | {\"predicted_value\": 1418.678199780345, \"confidence\": 0.9991, \"anomaly\": null, \"truth\": null, \"confidence_lower_bound\": 1335.4600013965369, \"confidence_upper_bound\": 1501.8963981641532} |\n| 2021-01-17 12:00:00 | berkeley_hills |      646.5979284300436 | {\"predicted_value\": 646.5979284300436, \"confidence\": 0.9991, \"anomaly\": null, \"truth\": null, \"confidence_lower_bound\": 303.253838410034, \"confidence_upper_bound\": 989.9420184500532}    |\n| 2021-01-18 12:00:00 | south_side     |       1422.69481363723 | {\"predicted_value\": 1422.69481363723, \"confidence\": 0.9991, \"anomaly\": null, \"truth\": null, \"confidence_lower_bound\": 129.97617491441304, \"confidence_upper_bound\": 2715.413452360047}   |\n| 2021-01-18 04:00:00 | alcatraz_ave   |      1305.009073065412 | {\"predicted_value\": 1305.009073065412, \"confidence\": 0.9991, \"anomaly\": null, \"truth\": null, \"confidence_lower_bound\": 879.0232742685288, \"confidence_upper_bound\": 1730.994871862295}   |\n+---------------------+----------------+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n```\n\n\n# Summary\n\nIn this article, we have introduced **QuestDB** and **MindsDB** in a hands-on approach. QuestDB can help you store, \nanalyse, and transform timeseries data, while MindsDB can help you make predictions about it. Albeit simple, our use case \nshould have lowered the entry barrier to these two deep technologies, and now you can deepen your knowledge further by\nundertaking more ambitious ML projects. \n\n**Thank you for getting this far!!!**, if you liked this content we'd love to know your thoughts, please come and say \nhello in our welcoming communities: \n\n- [QuestDB Community Slack](https://slack.questdb.io/).\n- [MindsDB Community Slack](https://mindsdbcommunity.slack.com/join/shared_invite/zt-o8mrmx3l-5ai~5H66s6wlxFfBMVI6wQ#/shared-invite/email).\n\nFurther reading:\n\n- [QuestDB documentation](https://questdb.io/docs/introduction/).\n- [MindsDB documentation](https://docs.mindsdb.com/).\n\nSee you soon!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquestdb%2Fmindsdb-tutorial","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fquestdb%2Fmindsdb-tutorial","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquestdb%2Fmindsdb-tutorial/lists"}