{"id":13453864,"url":"https://github.com/nshiab/simple-data-analysis","last_synced_at":"2026-02-23T17:30:25.657Z","repository":{"id":37239547,"uuid":"485817069","full_name":"nshiab/simple-data-analysis","owner":"nshiab","description":"Easy-to-use and high-performance TypeScript library for data analysis. Works with tabular, geospatial and vector data.","archived":false,"fork":false,"pushed_at":"2026-02-17T21:34:12.000Z","size":16948,"stargazers_count":329,"open_issues_count":4,"forks_count":21,"subscribers_count":8,"default_branch":"main","last_synced_at":"2026-02-17T22:40:39.848Z","etag":null,"topics":["ai","analysis","bun","data","data-analysis","data-science","deno","duckdb","geospatial","javascript","llm","machine-learning","node","node-js","nodejs","spatial","spatial-analysis","sql","typescript"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nshiab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2022-04-26T14:16:54.000Z","updated_at":"2026-02-17T21:34:16.000Z","dependencies_parsed_at":"2024-01-18T16:46:24.510Z","dependency_job_id":"2f1dd3d9-5966-4c70-bd17-9f656cab46db","html_url":"https://github.com/nshiab/simple-data-analysis","commit_stats":{"total_commits":501,"total_committers":4,"mean_commits":125.25,"dds":0.04990019960079839,"last_synced_commit":"cd20d7ed4bae655e9114544d05d6a289a8d72b0a"},"previous_names":["nshiab/simple-data-analysis","nshiab/simple-data-analysis.js"],"tags_count":314,"template":false,"template_full_name":null,"purl":"pkg:github/nshiab/simple-data-analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nshiab%2Fsimple-data-analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nshiab%2Fsimple-data-analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nshiab%2Fsimple-data-analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nshiab%2Fsimple-data-analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nshiab","download_url":"https://codeload.github.com/nshiab/simple-data-analysis/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nshiab%2Fsimple-data-analysis/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29749054,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-23T07:44:07.782Z","status":"ssl_error","status_checked_at":"2026-02-23T07:44:07.432Z","response_time":90,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","analysis","bun","data","data-analysis","data-science","deno","duckdb","geospatial","javascript","llm","machine-learning","node","node-js","nodejs","spatial","spatial-analysis","sql","typescript"],"created_at":"2024-07-31T08:00:48.651Z","updated_at":"2026-02-23T17:30:25.629Z","avatar_url":"https://github.com/nshiab.png","language":"TypeScript","readme":"# Simple data analysis (SDA)\n\nSDA is an easy-to-use and high-performance TypeScript library for data analysis.\nYou can use it with tabular and geospatial data.\n\nThe library is available on [JSR](https://jsr.io/@nshiab/simple-data-analysis)\nwith its [documentation](https://jsr.io/@nshiab/simple-data-analysis/doc).\n\nThe documentation is also available as the markdown file\n[llm.md](https://github.com/nshiab/simple-data-analysis/blob/main/llm.md), which\ncan be passed as context to improve the use of the library by AI coding\nassistants or agents.\n\nThe library is maintained by [Nael Shiab](http://naelshiab.com/), computational\njournalist and senior data producer for [CBC News](https://www.cbc.ca/news).\n\n\u003e [!TIP]\n\u003e To learn how to use SDA, check out\n\u003e [Code Like a Journalist](https://www.code-like-a-journalist.com/), a free and\n\u003e open-source data analysis and data visualization course in TypeScript.\n\nYou might also find the\n[journalism library](https://github.com/nshiab/journalism) interesting.\n\nIf you wish to contribute, please check the\n[guidelines](https://github.com/nshiab/simple-data-analysis/blob/main/CONTRIBUTING.md).\n\n## Quick setup\n\nCreate a folder and run [setup-sda](https://github.com/nshiab/setup-sda) in it\nwith:\n\n```bash\n# Deno \u003e= 2.2.x\ndeno -A jsr:@nshiab/setup-sda\n\n# Node.js \u003e= 22.6.x\nnpx setup-sda\n\n# Bun\nbunx --bun setup-sda\n```\n\nHere are available options:\n\n- `--claude` or `--gemini` or `--copilot`: Adds a `CLAUDE.md` or `GEMINI.md` or\n  `.github/copilot-instructions.md` file and extra documentation in `./docs` to\n  work efficiently with AI agents.\n- `--example`: adds example files\n- `--scrape`: adds web scraping dependencies\n- `--svelte`: adds a Svelte project\n- `--pages`: adds a GitHub Pages Actions workflow (works just with `--svelte`)\n- `--git`: initializes a git repository and commits the initial files\n- `--env`: adds a `.env` file for environment variables and loads them when\n  running\n\nYou can combine options, for example, this will install web scraping\ndependencies, set up a Svelte project with example files, initialize a git\nrepository, make a first commit, and add a GitHub Pages Actions workflow:\n\n```bash\ndeno -A jsr:@nshiab/setup-sda --scrape --svelte --example --pages --git\n```\n\n## Manual installation\n\nIf you want to add the library to an existing project, run this:\n\n```bash\n# Deno \u003e= 2.2.x\ndeno install --node-modules-dir=auto --allow-scripts=npm:playwright-chromium jsr:@nshiab/simple-data-analysis\n# To run with Deno\ndeno run -A main.ts\n\n# Node.js\nnpx jsr add @nshiab/simple-data-analysis\n\n# Bun\nbunx jsr add @nshiab/simple-data-analysis\n```\n\n## Core principles\n\nSDA is born out of the frustration of switching between Python, R, and\nJavaScript to produce data journalism projects. Usually, data crunching and\nanalysis are done with Python or R, and interactive data visualizations are\ncoded in JavaScript. However, being proficient in multiple programming languages\nis hard. Why can't we do everything in JS?\n\nThe missing piece in the JavaScript/TypeScript ecosystem was an easy-to-use and\nperformant library for data analysis. This is why SDA was created.\n\nThe library is based on [DuckDB](https://duckdb.org/), a fast in-process\nanalytical database. Under the hood, SDA sends SQL queries to be executed by\nDuckDB. We use [duckdb-node-neo](https://github.com/duckdb/duckdb-node-neo). For\ngeospatial computations, we rely on the\n[duckdb_spatial](https://github.com/duckdb/duckdb_spatial) extension.\n\nThe syntax and the available methods were inspired by\n[Pandas](https://github.com/pandas-dev/pandas) (Python) and the\n[Tidyverse](https://www.tidyverse.org/) (R).\n\nYou can also write your own SQL queries if you want to (check the\n[customQuery method](https://jsr.io/@nshiab/simple-data-analysis/doc/~/SimpleDB.prototype.customQuery))\nor use JavaScript to process your data (check the\n[updateWithJS method](https://jsr.io/@nshiab/simple-data-analysis/doc/~/SimpleTable.prototype.updateWithJS)).\n\nSeveral methods can also leverage LLMs (large language models). See\n[aiRowByRow](https://jsr.io/@nshiab/simple-data-analysis/doc/~/SimpleTable.prototype.aiRowByRow)\nfor cleaning, extracting, or categorizing data, and\n[aiQuery](https://jsr.io/@nshiab/simple-data-analysis/doc/~/SimpleTable.prototype.aiQuery)\nfor interacting with your data using natural language. For embeddings and\nsemantic search, have a look at\n[aiEmbeddings](https://jsr.io/@nshiab/simple-data-analysis/doc/~/SimpleTable.prototype.aiEmbeddings)\nand\n[aiVectorSimilarity](https://jsr.io/@nshiab/simple-data-analysis/doc/~/SimpleTable.prototype.aiVectorSimilarity).\n\nFeel free to start a conversation or open an issue. Check how you can\n[contribute](https://github.com/nshiab/simple-data-analysis/blob/main/CONTRIBUTING.md).\n\n## Performance\n\n### Tabular data\n\nTo test and compare the library's performance, we calculated the average\ntemperature per decade and city with the daily temperatures from the\n[Adjusted and Homogenized Canadian Climate Data](https://api.weather.gc.ca/collections/ahccd-annual).\nSee [this repository](https://github.com/nshiab/simple-data-analysis-benchmarks)\nfor the code.\n\nWe ran the same calculations with **simple-data-analysis** (Node.js, Bun, and\nDeno), **Pandas (Python)**, and the **tidyverse (R)**.\n\nIn each script, we:\n\n1. Loaded a CSV file (_Importing_)\n2. Selected four columns, removed rows with missing temperature, converted date\n   strings to date and temperature strings to float (_Cleaning_)\n3. Added a new column _decade_ and calculated the decade (_Modifying_)\n4. Calculated the average temperature per decade and city (_Summarizing_)\n5. Wrote the cleaned-up data that we computed the averages from in a new CSV\n   file (_Writing_)\n\nEach script has been run ten times on a MacBook Pro (Apple M4 Max / 64 GB).\n\nWith _ahccd.csv_:\n\n- 1.7 GB\n- 773 cities\n- 20 columns\n- 22,051,025 rows\n\nThanks to DuckDB, **simple-data-analysis** is the fastest option.\n\n![A chart showing the processing duration of multiple scripts in various languages](./assets/big-file.png)\n\n### Geospatial data\n\nTo test the geospatial computation speed, we performed a spatial join to match\neach public tree in Montreal to its neighbourhood. We then counted the number of\ntrees in each neighbourhood. For more information, check this\n[repository](https://github.com/nshiab/simple-data-analysis-spatial-benchmarks).\n\nWith _trees.csv_:\n\n- 128 MB\n- 316,321 trees\n- 33 columns\n\nAnd _neighbourhoods.geojson_:\n\n- 991 KB\n- 91 neighbourhoods\n- 6 columns\n\nEach script has been run ten times on a MacBook Pro (Apple M4 Max / 64 GB).\n\nAs we can see, **simple-data-analysis** is also the fastest option here.\n\n![A chart showing the processing duration of multiple scripts in various languages, for geospatial computations](./assets/spatial.png)\n\nDuckDB, which powers SDA, can also be used with\n[Python](https://duckdb.org/docs/api/python/overview.html) and\n[R](https://duckdb.org/docs/api/r).\n\n## Examples\n\nIn this example, we load a CSV file with the latitude and longitude of 2023\nwildfires in Canada, create point geometries from it, do a spatial join with\nprovinces' boundaries, and then compute the number of fires and the total area\nburnt per province. We create charts and write the results to a file.\n\nIf you are using Deno, make sure to install and enable the\n[Deno extension](https://docs.deno.com/runtime/getting_started/setup_your_environment/).\n\n```ts\nimport { SimpleDB } from \"@nshiab/simple-data-analysis\";\nimport { barX, plot } from \"@observablehq/plot\";\n\n// We start a SimpleDB instance.\nconst sdb = new SimpleDB();\n\n// We create a new table\nconst fires = sdb.newTable(\"fires\");\n// We fetch the wildfires data. It's a csv.\nawait fires.loadData(\n  \"https://raw.githubusercontent.com/nshiab/simple-data-analysis/main/test/geodata/files/firesCanada2023.csv\",\n);\n// We create point geometries from the lat and lon columns\n// and we store the points in the new column geom\nawait fires.points(\"lat\", \"lon\", \"geom\");\n// We log the fires\nawait fires.logTable();\n\n// We create a new table\nconst provinces = sdb.newTable(\"provinces\");\n// We fetch the provinces' boundaries. It's a geojson.\nawait provinces.loadGeoData(\n  \"https://raw.githubusercontent.com/nshiab/simple-data-analysis/main/test/geodata/files/CanadianProvincesAndTerritories.json\",\n);\n// We log the provinces\nawait provinces.logTable();\n\n// We match fires with provinces\n// and we output the results into a new table.\n// By default, joinGeo will automatically look\n// for columns storing geometries in the tables,\n// do a left join, and put the results\n// in the left table. For non-spatial data,\n// you can use the method join.\nconst firesInsideProvinces = await fires.joinGeo(provinces, \"inside\", {\n  outputTable: \"firesInsideProvinces\",\n});\n\n// We summarize to count the number of fires\n// and sum up the area burnt in each province.\nawait firesInsideProvinces.summarize({\n  values: \"hectares\",\n  categories: \"nameEnglish\",\n  summaries: [\"count\", \"sum\"],\n  decimals: 0,\n});\n// We rename columns.\nawait firesInsideProvinces.renameColumns({\n  count: \"nbFires\",\n  sum: \"burntArea\",\n});\n// We want the province with\n// the greatest burnt area first.\nawait firesInsideProvinces.sort({ burntArea: \"desc\" });\n\n// We log the results. By default, the method\n// logs the first 10 rows, but there is 13\n// rows in our data. We also log the data types.\nawait firesInsideProvinces.logTable({ nbRowsToLog: 13, types: true });\n\n// We can also log a bar chart directly in the terminal...\nawait firesInsideProvinces.logBarChart(\"nameEnglish\", \"burntArea\");\n\n// ... or make a fancier chart or map\n// with Observable Plot (don't forget to install it)\n// and save it to a file.\nconst chart = (data: unknown[]) =\u003e\n  plot({\n    marginLeft: 170,\n    grid: true,\n    x: { tickFormat: (d) =\u003e `${d / 1_000_000}M`, label: \"Burnt area (ha)\" },\n    y: { label: null },\n    color: { scheme: \"Reds\" },\n    marks: [\n      barX(data, {\n        x: \"burntArea\",\n        y: \"nameEnglish\",\n        fill: \"burntArea\",\n        sort: { y: \"-x\" },\n      }),\n    ],\n  });\nawait firesInsideProvinces.writeChart(chart, \"./chart.png\");\n\n// And we can write the data to a parquet, json or csv file.\n// For geospatial data, you can use writeGeoData to\n// write geojson or geoparquet files.\nawait firesInsideProvinces.writeData(\"./firesInsideProvinces.parquet\");\n\n// We close everything.\nawait sdb.done();\n```\n\nHere's what you should see in your console if your run this script.\n\n![The console tab in VS Code showing the result of simple-data-analysis computations.](./assets/nodejs-console-with-chart.png)\n\nYou'll also find a `chart.png` file and a `firesInsideProvinces.parquet` file in\nyour folder.\n\n![A chart showing the burnt area of wildfires in Canadian provinces.](./assets/chart.png)\n\n## More on charts and maps\n\nYou can easily display charts and maps directly in the terminal with the\n[`logBarChart`](https://jsr.io/@nshiab/simple-data-analysis/doc/~/SimpleTable.prototype.logBarChart),\n[`logDotChart`](https://jsr.io/@nshiab/simple-data-analysis/doc/~/SimpleTable.prototype.logDotChart),\n[`logLineChart`](https://jsr.io/@nshiab/simple-data-analysis/doc/~/SimpleTable.prototype.logLineChart)\nand\n[`logHistogram`](https://jsr.io/@nshiab/simple-data-analysis/doc/~/SimpleTable.prototype.logHistogram)\nmethods.\n\nBut you can also create [Observable Plot](https://github.com/observablehq/plot)\ncharts as an image file (`.png`, `.jpeg` or `.svg`) with\n[`writeChart`](https://jsr.io/@nshiab/simple-data-analysis/doc/~/SimpleTable.prototype.writeChart).\n\nHere's an example.\n\n```ts\nimport { SimpleDB } from \"@nshiab/simple-data-analysis\";\nimport { dodgeX, dot, plot } from \"@observablehq/plot\";\n\nconst sdb = new SimpleDB();\nconst table = sdb.newTable();\n\nawait table.loadData(\n  \"https://raw.githubusercontent.com/nshiab/simple-data-analysis/main/test/geodata/files/firesCanada2023.csv\",\n);\n// We keep only the fires that are larger than 1 hectare.\nawait table.filter(`hectares \u003e 1`);\n// We rename the causes.\nawait table.replace(\"cause\", { \"H\": \"Human\", \"N\": \"Natural\", \"U\": \"Unknown\" });\nawait table.logTable();\n\n// Let's create a beeswarm chart with a log scale.\n// We facet over the causes.\nconst chart = (data: unknown[]) =\u003e\n  plot({\n    height: 600,\n    width: 800,\n    color: { legend: true },\n    y: { type: \"log\", label: \"Hectares\" },\n    r: { range: [1, 20] },\n    marks: [\n      dot(\n        data,\n        dodgeX(\"middle\", {\n          fx: \"cause\",\n          y: \"hectares\",\n          fill: \"cause\",\n          r: \"hectares\",\n        }),\n      ),\n    ],\n  });\n\nconst path = \"./chart.png\";\n\nawait table.writeChart(chart, path);\n\nawait sdb.done();\n```\n\n![Beeswarm chart showing the size of wildfires in Canada in 2023.](./assets/beeswarm.png)\n\nIf you want to create [Observable Plot](https://github.com/observablehq/plot)\nmaps, you can use\n[`writeMap`](https://jsr.io/@nshiab/simple-data-analysis/doc/~/SimpleTable.prototype.writeMap).\n\nHere's an example.\n\n```ts\nimport { SimpleDB } from \"@nshiab/simple-data-analysis\";\nimport { geo, plot } from \"@observablehq/plot\";\n\nconst sdb = new SimpleDB();\nconst provinces = sdb.newTable(\"provinces\");\n\n// We fetch the Canadian provinces boundaries.\nawait provinces.loadGeoData(\n  \"https://raw.githubusercontent.com/nshiab/simple-data-analysis/main/test/geodata/files/CanadianProvincesAndTerritories.json\",\n);\nawait provinces.logTable();\n\n// We fetch the fires.\nconst fires = sdb.newTable(\"fires\");\nawait fires.loadData(\n  \"https://raw.githubusercontent.com/nshiab/simple-data-analysis/main/test/geodata/files/firesCanada2023.csv\",\n);\n// We create a new column to store the points as geometries.\nawait fires.points(\"lat\", \"lon\", \"geom\");\n// We select the columns of interest and filter out\n// fires less than 1 hectare.\nawait fires.replace(\"cause\", { \"H\": \"Human\", \"N\": \"Natural\", \"U\": \"Unknown\" });\nawait fires.selectColumns([\"geom\", \"hectares\", \"cause\"]);\nawait fires.filter(`hectares \u003e 0`);\nawait fires.logTable();\n\n// Now, we want the provinces and the fires in the same table\n// to draw our map with the writeMap method.\n// First, we clone the provinces table.\nconst provincesAndFires = await provinces.cloneTable({\n  outputTable: \"provincesAndFires\",\n});\n// Now we can insert the fires into the provincesAndFires table.\n// By default, SDA will throw an error if the tables don't have the\n// same columns. So we set the unifyColumns option to true.\nawait provincesAndFires.insertTables(fires, { unifyColumns: true });\n// To make our lives easier, we add a column to\n// distinguish between provinces and fires.\nawait provincesAndFires.addColumn(\"isFire\", \"boolean\", `hectares \u003e 0`);\nawait provincesAndFires.logTable();\n\n// This is our function to draw the map, using the Plot library.\n// The geoData will come from the our provincesAndFires table\n// as GeoJSON data. Each row of the table is a feature, and each\n// feature has properties matching the columns of the table.\nconst map = (geoData: {\n  features: {\n    properties: { [key: string]: unknown };\n  }[];\n}) =\u003e {\n  const fires = geoData.features.filter((d) =\u003e d.properties.isFire);\n  const provinces = geoData.features.filter((d) =\u003e !d.properties.isFire);\n\n  return plot({\n    projection: {\n      type: \"conic-conformal\",\n      rotate: [100, -60],\n      domain: geoData,\n    },\n    color: {\n      legend: true,\n    },\n    r: { range: [0.5, 25] },\n    marks: [\n      geo(provinces, {\n        stroke: \"lightgray\",\n        fill: \"whitesmoke\",\n      }),\n      geo(fires, {\n        r: \"hectares\",\n        fill: \"cause\",\n        fillOpacity: 0.25,\n        stroke: \"cause\",\n        strokeOpacity: 0.5,\n      }),\n    ],\n  });\n};\n\n// This is the path where the map will be saved.\nconst path = \"./map.png\";\n\n// Now we can call writeMap.\nawait provincesAndFires.writeMap(map, path);\n\nawait sdb.done();\n```\n\n![Map showing the wildfires in Canada in 2023.](./assets/map.png)\n\n## Caching fetched and computed data\n\nInstead of running the same code over and over again, you can cache the results.\nThis can speed up your workflow, especially when fetching data or performing\ncomputationally expensive operations.\n\nHere's the previous example adapted to cache data. For more information, check\nthe\n[cache method documentation](https://nshiab.github.io/simple-data-analysis/classes/SimpleTable.html#cache).\n\nThe data is cached in the hidden folder `.sda-cache` at the root of your code\nrepository. Make sure to add it to your `.gitignore`. If you want to clean your\ncache, just delete the folder.\n\nIf you set up with `setup-sda` (see _Quick setup_ at the top), `.sda-cache` is\nautomatically added to your `.gitignore` and you can use `npm run clean` or\n`bun run clean` or `deno task clean` to clear the cache.\n\n```ts\nimport { SimpleDB } from \"@nshiab/simple-data-analysis\";\n\n// We enable two options to make our lives easier.\n// cacheVerbose will log information about the cached\n// data, and logDuration will log the total duration between\n// the creation of this SimpleDB instance and its last operation.\nconst sdb = new SimpleDB({ cacheVerbose: true, logDuration: true });\n\nconst fires = sdb.newTable(\"fires\");\n\n// We cache these steps with a ttl of 60 seconds.\n// On the first run, the data will be fetched\n// and stored in the hidden folder .sda-cache.\n// If you rerun the script less than 60 seconds\n// later, the data won't be fetched but loaded\n// from the local cache. However, if you run the\n// code after 60 seconds, the data will be\n// considered outdated and fetched again.\n// After another 60 seconds, the new data in the cache will\n// expire again. This is useful when working with scraped data.\n// If you update the code passed to the cache method,\n// everything starts over.\nawait fires.cache(\n  async () =\u003e {\n    await fires.loadData(\n      \"https://raw.githubusercontent.com/nshiab/simple-data-analysis/main/test/geodata/files/firesCanada2023.csv\",\n    );\n    await fires.points(\"lat\", \"lon\", \"geom\");\n  },\n  { ttl: 60 },\n);\n\nconst provinces = sdb.newTable(\"provinces\");\n\n// Same thing here, except there is no ttl option,\n// so the cached data will never expire unless you delete\n// the hidden folder .sda-cache. Again, if you update\n// the code passed to the cache method, everything\n// starts over.\nawait provinces.cache(async () =\u003e {\n  await provinces.loadGeoData(\n    \"https://raw.githubusercontent.com/nshiab/simple-data-analysis/main/test/geodata/files/CanadianProvincesAndTerritories.json\",\n  );\n});\n\nconst firesInsideProvinces = sdb.newTable(\"firesInsideProvinces\");\n\n// While caching is quite useful when fetching data,\n// it's also handy for computationally expensive\n// operations like joins and summaries.\n// Since the fires table has a ttl of 60 seconds\n// and we depend on it here, we need a ttl equal\n// or lower. Otherwise, we won't work with\n// up-to-date data.\nawait firesInsideProvinces.cache(\n  async () =\u003e {\n    await fires.joinGeo(provinces, \"inside\", {\n      outputTable: \"firesInsideProvinces\",\n    });\n    await firesInsideProvinces.removeMissing();\n    await firesInsideProvinces.summarize({\n      values: \"hectares\",\n      categories: \"nameEnglish\",\n      summaries: [\"count\", \"sum\"],\n      decimals: 0,\n    });\n    await firesInsideProvinces.renameColumns({\n      count: \"nbFires\",\n      sum: \"burntArea\",\n    });\n    await firesInsideProvinces.sort({ burntArea: \"desc\" });\n  },\n  { ttl: 60 },\n);\n\nawait firesInsideProvinces.logTable({ nbRowsToLog: 13, types: true });\nawait firesInsideProvinces.logBarChart(\"nameEnglish\", \"burntArea\");\n\n// It's important to call done() at the end.\n// This method will remove the unused files\n// in the cache. It will also log the total duration\n// if the logDuration option was set to true.\nawait sdb.done();\n```\n\nAfter the first run, here's what you'll see in your terminal. For each\n`cache()`, a file storing the results has been written in `.sda-cache`.\n\nThe whole script took around a second to complete.\n\n```\nNothing in cache. Running and storing in cache.\nDuration: 311 ms. Wrote ./.sda-cache/fires.ff...68f.geojson.\n\nNothing in cache. Running and storing in cache.\nDuration: 397 ms. Wrote ./.sda-cache/provinces.42...55.geojson.\n\nNothing in cache. Running and storing in cache.\nDuration: 49 ms. Wrote ./.sda-cache/firesInsideProvinces.71...a8.parquet.\n\ntable firesInsideProvinces:\n┌─────────┬────────────┬─────────────────────────────┬─────────┬───────────┐\n│ (index) │ value      │ nameEnglish                 │ nbFires │ burntArea │\n├─────────┼────────────┼─────────────────────────────┼─────────┼───────────┤\n│ 0       │ 'hectares' │ 'Quebec'                    │ 706     │ 5024737   │\n│ 1       │ 'hectares' │ 'Northwest Territories'     │ 314     │ 4253907   │\n│ 2       │ 'hectares' │ 'Alberta'                   │ 1208    │ 3214444   │\n│ 3       │ 'hectares' │ 'British Columbia'          │ 2496    │ 2856625   │\n│ 4       │ 'hectares' │ 'Saskatchewan'              │ 560     │ 1801903   │\n│ 5       │ 'hectares' │ 'Ontario'                   │ 741     │ 441581    │\n│ 6       │ 'hectares' │ 'Yukon'                     │ 227     │ 395461    │\n│ 7       │ 'hectares' │ 'Manitoba'                  │ 301     │ 199200    │\n│ 8       │ 'hectares' │ 'Nova Scotia'               │ 208     │ 25017     │\n│ 9       │ 'hectares' │ 'Newfoundland and Labrador' │ 85      │ 21833     │\n│ 10      │ 'hectares' │ 'Nunavut'                   │ 1       │ 2700      │\n│ 11      │ 'hectares' │ 'New Brunswick'             │ 202     │ 854       │\n│ 12      │ 'hectares' │ null                        │ 124     │ 258       │\n└─────────┴────────────┴─────────────────────────────┴─────────┴───────────┘\n13 rows in total (nbRowsToLog: 13)\n\nSimpleDB - Done in 891 ms\n```\n\nIf you run the script less than 60 seconds after the first run, here's what\nyou'll see.\n\nThanks to caching, the script ran five times faster!\n\n```\nFound ./.sda-cache/fires.ff...8f.geojson in cache.\nttl of 60 sec has not expired. The creation date is July 5, 2024, at 4:25 p.m.. There are 11 sec, 491 ms left.\nData loaded in 151 ms. Running the computations took 311 ms last time. You saved 160 ms.\n\nFound ./.sda-cache/provinces.42...55.geojson in cache.\nData loaded in 8 ms. Running the computations took 397 ms last time. You saved 389 ms.\n\nFound ./.sda-cache/firesInsideProvinces.71...a8.parquet in cache.\nttl of 60 sec has not expired. The creation date is July 5, 2024, at 4:25 p.m.. There are 11 sec, 792 ms left.\nData loaded in 1 ms. Running the computations took 49 ms last time. You saved 48 ms.\n\ntable firesInsideProvinces:\n┌─────────┬────────────┬─────────────────────────────┬─────────┬───────────┐\n│ (index) │ value      │ nameEnglish                 │ nbFires │ burntArea │\n├─────────┼────────────┼─────────────────────────────┼─────────┼───────────┤\n│ 0       │ 'hectares' │ 'Quebec'                    │ 706     │ 5024737   │\n│ 1       │ 'hectares' │ 'Northwest Territories'     │ 314     │ 4253907   │\n│ 2       │ 'hectares' │ 'Alberta'                   │ 1208    │ 3214444   │\n│ 3       │ 'hectares' │ 'British Columbia'          │ 2496    │ 2856625   │\n│ 4       │ 'hectares' │ 'Saskatchewan'              │ 560     │ 1801903   │\n│ 5       │ 'hectares' │ 'Ontario'                   │ 741     │ 441581    │\n│ 6       │ 'hectares' │ 'Yukon'                     │ 227     │ 395461    │\n│ 7       │ 'hectares' │ 'Manitoba'                  │ 301     │ 199200    │\n│ 8       │ 'hectares' │ 'Nova Scotia'               │ 208     │ 25017     │\n│ 9       │ 'hectares' │ 'Newfoundland and Labrador' │ 85      │ 21833     │\n│ 10      │ 'hectares' │ 'Nunavut'                   │ 1       │ 2700      │\n│ 11      │ 'hectares' │ 'New Brunswick'             │ 202     │ 854       │\n│ 12      │ 'hectares' │ null                        │ 124     │ 258       │\n└─────────┴────────────┴─────────────────────────────┴─────────┴───────────┘\n13 rows in total (nbRowsToLog: 13)\n\nSimpleDB - Done in 184 ms / You saved 707 ms by using the cache\n```\n\nAnd if you run the script 60 seconds later, the fires and join/summary caches\nwill have expired, but not the provinces one. Some of the code will have run,\nbut not everything. The script still ran 1.5 times faster. This is quite handy\nin complex analysis with big datasets. The less you wait, the more fun you have!\n\n```\nFound ./.sda-cache/fires.ff...8f.geojson in cache\nttl of 60 sec has expired. The creation date is July 5, 2024, at 4:25 p.m.. It's is 4 min, 1 sec, 172 ms ago.\nRunning and storing in cache.\nDuration: 424 ms. Wrote ./.sda-cache/fires.ff...8f.geojson.\n\nFound ./.sda-cache/provinces.42...55.geojson in cache.\nData loaded in 10 ms. Running the computations took 397 ms last time. You saved 387 ms.\n\nFond ./.sda-cache/firesInsideProvinces.71...a8.parquet in cache\nttl of 60 sec has expired. The creation date is July 5, 2024, at 4:25 p.m.. It's is 4 min, 1 sec, 239 ms ago.\nRunning and storing in cache.\nDuration: 42 ms. Wrote ./.sda-cache/firesInsideProvinces.71...a8.parquet.\n\ntable firesInsideProvinces:\n┌─────────┬────────────┬─────────────────────────────┬─────────┬───────────┐\n│ (index) │ value      │ nameEnglish                 │ nbFires │ burntArea │\n├─────────┼────────────┼─────────────────────────────┼─────────┼───────────┤\n│ 0       │ 'hectares' │ 'Quebec'                    │ 706     │ 5024737   │\n│ 1       │ 'hectares' │ 'Northwest Territories'     │ 314     │ 4253907   │\n│ 2       │ 'hectares' │ 'Alberta'                   │ 1208    │ 3214444   │\n│ 3       │ 'hectares' │ 'British Columbia'          │ 2496    │ 2856625   │\n│ 4       │ 'hectares' │ 'Saskatchewan'              │ 560     │ 1801903   │\n│ 5       │ 'hectares' │ 'Ontario'                   │ 741     │ 441581    │\n│ 6       │ 'hectares' │ 'Yukon'                     │ 227     │ 395461    │\n│ 7       │ 'hectares' │ 'Manitoba'                  │ 301     │ 199200    │\n│ 8       │ 'hectares' │ 'Nova Scotia'               │ 208     │ 25017     │\n│ 9       │ 'hectares' │ 'Newfoundland and Labrador' │ 85      │ 21833     │\n│ 10      │ 'hectares' │ 'Nunavut'                   │ 1       │ 2700      │\n│ 11      │ 'hectares' │ 'New Brunswick'             │ 202     │ 854       │\n│ 12      │ 'hectares' │ null                        │ 124     │ 258       │\n└─────────┴────────────┴─────────────────────────────┴─────────┴───────────┘\n13 rows in total (nbRowsToLog: 13)\n\nSimpleDB - Done in 594 ms / You saved 297 ms by using the cache\n```\n","funding_links":[],"categories":["Libraries Powered by DuckDB","analysis"],"sub_categories":["Web Clients"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnshiab%2Fsimple-data-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnshiab%2Fsimple-data-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnshiab%2Fsimple-data-analysis/lists"}