{"id":13671569,"url":"https://github.com/alttch/myval","last_synced_at":"2025-03-29T22:30:39.037Z","repository":{"id":153033227,"uuid":"627931720","full_name":"alttch/myval","owner":"alttch","description":"Lightweight Apache Arrow data frame for Rust","archived":false,"fork":false,"pushed_at":"2023-05-06T08:49:25.000Z","size":224,"stargazers_count":61,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-04-26T02:04:06.276Z","etag":null,"topics":["arrow","dataframe","ipc","rust"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alttch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-04-14T14:14:25.000Z","updated_at":"2024-04-24T06:15:02.000Z","dependencies_parsed_at":"2023-05-27T12:00:24.991Z","dependency_job_id":null,"html_url":"https://github.com/alttch/myval","commit_stats":null,"previous_names":["alttch/arrow-util"],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alttch%2Fmyval","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alttch%2Fmyval/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alttch%2Fmyval/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alttch%2Fmyval/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alttch","download_url":"https://codeload.github.com/alttch/myval/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246254077,"owners_count":20747946,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arrow","dataframe","ipc","rust"],"created_at":"2024-08-02T09:01:13.498Z","updated_at":"2025-03-29T22:30:38.482Z","avatar_url":"https://github.com/alttch.png","language":"Rust","funding_links":[],"categories":["Rust"],"sub_categories":[],"readme":"\u003ch2\u003e\n  Myval - a lightweight Apache Arrow data frame for Rust\n  \u003ca href=\"https://crates.io/crates/myval\"\u003e\u003cimg alt=\"crates.io page\" src=\"https://img.shields.io/crates/v/myval.svg\"\u003e\u003c/img\u003e\u003c/a\u003e\n  \u003ca href=\"https://docs.rs/myval\"\u003e\u003cimg alt=\"docs.rs page\" src=\"https://docs.rs/myval/badge.svg\"\u003e\u003c/img\u003e\u003c/a\u003e\n\u003c/h2\u003e\n\n\u003cimg src=\"https://raw.githubusercontent.com/alttch/myval/main/media/myval_w.png\"\nwidth=\"200\" /\u003e\n\n## What is Myval?\n\nMýval (pronounced as [m'ival]) is translated from Czech as raccoon.\n\n## Why not a bear-name?\n\nThe common name for raccoon in Czech is \"medvídek mýval\" which can be\ntranslated as \"little bear\".\n\n## But there is Polars?\n\nMyval is not a competitor of Polars. Myval is a lightweight Arrow data frame\nwhich is focused on in-place data transformation and IPC.\n\nBecause Arrow has got the standardized data layout, data frames can be\nconverted to Polars and vice-versa with zero-copy:\n\n```rust,ignore\nlet polars_df = polars::frame::DataFrame::from(myval_df);\nlet myval_df = myval::DataFrame::from(polars_df);\n```\n\nAs well as Polars, Myval is based on [arrow2](https://crates.io/crates/arrow2).\n\n## Some tricks\n\n### IPC\n\nConsider there is an Arrow stream block (Schema+Chunk) received from e.g. RPC\nor Pub/Sub. Convert the block into a Myval data frame:\n\n```rust,ignore\nlet df = myval::DataFrame::from_ipc_block(\u0026buf).unwrap();\n```\n\nNeed to send a data frame back? Convert it to Arrow stream block with a single\nline of code:\n\n```rust,ignore\nlet buf = df.into_ipc_block().unwrap();\n```\n\nNeed to send sliced? No problem, there are methods which can easily return\nsliced series, sliced data frames or IPC chunks.\n\n### Overriding data types\n\nConsider there is an i64-column \"time\" which contains nanosecond timestamps.\nLet us override its data type:\n\n```rust,ignore\nuse myval::{DataType, TimeUnit};\n\ndf.set_data_type(\"time\",\n    DataType::Timestamp(TimeUnit::Nanosecond, None)).unwrap();\n```\n\n### Parsing numbers from strings\n\nConsider there is a utf8-column \"value\" which should be parsed to floats:\n\n```rust,ignore\ndf.parse::\u003cf64\u003e(\"value\").unwrap();\n```\n\n### Basic in-place math\n\n```rust,ignore\ndf.add(\"col\", 1_000i64).unwrap();\ndf.sub(\"col\", 1_000i64).unwrap();\ndf.mul(\"col\", 1_000i64).unwrap();\ndf.div(\"col\", 1_000i64).unwrap();\n```\n\n### Custom in-place transformations\n\n```rust,ignore\ndf.apply(\"time\", |time| time.map(|t: i64| t / 1_000)).unwrap();\n```\n\n### Horizontal join\n\n```rust,ignore\ndf.join(df2).unwrap();\n```\n\n### Concatenation\n\n```rust,ignore\nlet merged = myval::concat(\u0026[\u0026df1, \u0026df2, \u0026df3]).unwrap();\n```\n\n### Set column ordering\n\nConsider there is a Myval data frame with columns \"voltage\", \"temp1\", \"temp2\",\n\"temp3\" which has received data from a server column-by-column in random\nordering. Let us correct the ordering back to normal:\n\n```rust,ignore\ndf.set_ordering(\u0026[\"voltage\", \"temp1\", \"temp2\", \"temp3\"]);\n```\n\n### From/to JSON\n\nMyval data frames can be parsed from\n[serde_json](https://crates.io/crates/serde_json) Value (map only) or converted\nto Value (map/array). This requires \"json\" crate feature:\n\n```rust,ignore\n// create Object value from a data frame, converted to serde_json::Map\nlet val = serde_json::Value::Object(df.to_json_map().unwrap());\n// define JSON parser\nlet mut parser = myval::convert::json::Parser::new()\n    .with_type_mapping(\"name\", DataType::LargeUtf8);\n// add more columns if required\nparser = parser.with_type_mapping(\"time\", DataType::Int64);\nparser = parser.with_type_mapping(\"status\", DataType::Int32);\nlet parsed_df = parser.parse_value(val).unwrap();\n```\n\n* Some data types can not be correctly parsed from Value objects (e.g.\nTimestamp), use DataFrame methods to correct them to the required ones.\n\n* If a column is defined in a json::Parser object but missing in Value, it is\ncreated as null-filled.\n\n### Others\n\nCheck the documentation: \u003chttps://docs.rs/myval\u003e\n\n## Working with databases\n\nArrow provides several ways to work with databases. Myval additionally provides\ntools to work with PostgreSQL databases in the easy way via the popular\n[sqlx](https://crates.io/crates/sqlx) crate (\"postgres\" feature must be\nenabled):\n\n### Fetching data from a database\n\n```rust,ignore\nuse futures::stream::TryStreamExt;\n\nlet pool = PgPoolOptions::new()\n    .connect(\"postgres://postgres:welcome@localhost/postgres\")\n    .await.unwrap();\nlet max_size = 100_000;\nlet mut stream = myval::db::postgres::fetch(\n    \"select * from test\".to_owned(), Some(max_size), pool.clone());\n// the stream returns data frames one by one with max data frame size (in\n// bytes) = max_size\nwhile let Some(df) = stream.try_next().await.unwrap() {\n    // do some stuff\n}\n```\n\nWhy does the stream object require PgPool? There is one important reason: such\nstream objects are static and can be stored anywhere, e.g. used as cursors in a\nclient-server architecture.\n\n### Pushing data into a database\n\n#### Server\n\n```rust,ignore\nlet df = DataFrame::from_ipc_block(payload).unwrap();\n// The first received data frame must have \"database\" field in its schema\n// metadata. Next data frames can go without it.\nif let Some(dbparams) = df.metadata().get(\"database\") {\n    let params: myval::db::postgres::Params = serde_json::from_str(dbparams)\n        .unwrap();\n    let processed_rows: usize = myval::db::postgres::push(\u0026df, \u0026params,\n        \u0026pool).await.unwrap();\n}\n```\n\n#### Client\n\nLet us push Polars data frame into a PostgreSQL database:\n\n```rust,ignore\nuse serde_json::json;\n\nlet mut df = myval::DataFrame::from(polars_df);\ndf.metadata_mut().insert(\n    // set \"database\" metadata field\n    \"database\".to_owned(),\n    serde_json::to_string(\u0026json!({\n        // table, required\n        \"table\": \"test\",\n        // PostgreSQL schema, optional\n        \"postgres\": { \"schema\": \"public\" },\n        // keys, required if the table has got keys/unique indexes\n        \"keys\": [\"id\"],\n        // some field parameters\n        \"fields\": {\n            // another way to declare a key field\n            //\"id\": { \"key\": true },\n            // the following data frame columns contain strings which must be\n            // sent to the database as JSON (for json/jsonb PostgreSQL types)\n            \"data1\": { \"json\": true },\n            \"data2\": { \"json\": true }\n        }\n    }))?,\n);\n// send the data frame to the server in a single or multiple chunks/blocks\n```\n\n#### PostgreSQL types supported\n\n* BOOL, INT2 (16-bit int), INT4 (32-bit int), INT8 (64-bit int), FLOAT4 (32-bit\nfloat), FLOAT8 (64-bit float)\n\n* TIMESTAMP, TIMESTAMPTZ (time zone information is discarded as Arrow arrays\ncan not have different time zones for individual records)\n\n* CHAR, VARCHAR\n\n* JSON/JSONB (encoded to strings as LargeUtf8 when fetched)\n\n## General limitations\n\n* Myval is not designed for data engineering. Use Polars.\n\n* Myval series can contain a single chunk only and there are no plans to extend\nthis. When a Polars data frame with multiple chunks is converted to Myval, the\nchunks are automatically aggregated.\n\n* Some features (conversion to Polars, PostgreSQL) are experimental, use at\nyour own risk.\n\n## About\n\nMyval is a part of [EVA ICS Machine Learning\nkit](https://info.bma.ai/en/actual/eva-mlkit/index.html) developed by [Bohemia\nAutomation](https://www.bohemia-automation.com).\n\n[Bohemia Automation](https://www.bohemia-automation.com) /\n[Altertech](https://www.altertech.com) is a group of companies with 15+ years\nof experience in the enterprise automation and industrial IoT. Our setups\ninclude power plants, factories and urban infrastructure. Largest of them have\n1M+ sensors and controlled devices and the bar raises higher and higher every\nday.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falttch%2Fmyval","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falttch%2Fmyval","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falttch%2Fmyval/lists"}