{"id":35252173,"url":"https://github.com/cool-japan/pandrs","last_synced_at":"2026-04-04T12:59:13.140Z","repository":{"id":285922568,"uuid":"959743699","full_name":"cool-japan/pandrs","owner":"cool-japan","description":"DataFrame library for data analysis implemented in Rust. It has features and design inspired by Python's pandas library, combining fast data processing with type safety.","archived":false,"fork":false,"pushed_at":"2026-03-27T14:24:52.000Z","size":2663,"stargazers_count":10,"open_issues_count":3,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2026-04-04T12:59:03.759Z","etag":null,"topics":["data-analysis","data-science","datafrane","pandas","rust","rust-lang"],"latest_commit_sha":null,"homepage":"https://crates.io/crates/pandrs","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cool-japan.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY_FIX_REPORT.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"cool-japan"}},"created_at":"2025-04-03T09:28:21.000Z","updated_at":"2026-03-27T14:24:52.000Z","dependencies_parsed_at":"2025-12-30T23:05:30.101Z","dependency_job_id":null,"html_url":"https://github.com/cool-japan/pandrs","commit_stats":null,"previous_names":["cool-japan/pandrs"],"tags_count":11,"template":false,"template_full_name":null,"purl":"pkg:github/cool-japan/pandrs","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cool-japan%2Fpandrs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cool-japan%2Fpandrs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cool-japan%2Fpandrs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cool-japan%2Fpandrs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cool-japan","download_url":"https://codeload.github.com/cool-japan/pandrs/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cool-japan%2Fpandrs/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31400460,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T10:20:44.708Z","status":"ssl_error","status_checked_at":"2026-04-04T10:20:06.846Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","data-science","datafrane","pandas","rust","rust-lang"],"created_at":"2025-12-30T06:59:20.212Z","updated_at":"2026-04-04T12:59:13.134Z","avatar_url":"https://github.com/cool-japan.png","language":"Rust","funding_links":["https://github.com/sponsors/cool-japan"],"categories":[],"sub_categories":[],"readme":"# PandRS\n\n[![Crate](https://img.shields.io/crates/v/pandrs.svg)](https://crates.io/crates/pandrs)\n[![License: Apache-2.0](https://img.shields.io/badge/License-Apache--2.0-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0)\n[![Documentation](https://docs.rs/pandrs/badge.svg)](https://docs.rs/pandrs)\n![Tests](https://img.shields.io/badge/tests-1794%20passing-brightgreen.svg)\n\nA high-performance DataFrame library for Rust, providing pandas-like API with advanced features including SIMD optimization, parallel processing, and distributed computing capabilities.\n\n\u003e **Version 0.3.0 - March 2026**: PandRS is under active development with ongoing quality improvements. With **1794 tests passing**, enhanced documentation, and optimized performance, PandRS delivers a robust pandas-like experience for Rust developers.\n\n## Code Quality Highlights\n\n**Comprehensive Testing**: 1794 tests passing (nextest) + 128 doc tests with extensive coverage\n**Active Development**: Ongoing improvements to error handling and code quality (622 Rust files, 198,745 lines of code)\n**Production-Ready Error Handling**: Established error handling patterns with descriptive messages\n\n## Overview\n\nPandRS is a comprehensive data manipulation library that brings the power and familiarity of pandas to the Rust ecosystem. Built with performance, safety, and ease of use in mind, it provides:\n\n- **Type-safe operations** leveraging Rust's ownership system\n- **High-performance computing** through SIMD vectorization and parallel processing\n- **Memory-efficient design** with columnar storage and string pooling\n- **Comprehensive functionality** matching pandas' core features\n- **Seamless interoperability** with Python, Arrow, and various data formats\n\n## Quick Start\n\n```rust\nuse pandrs::{DataFrame, Series};\nuse std::collections::HashMap;\n\n// Create a DataFrame\nlet mut df = DataFrame::new();\ndf.add_column(\"name\".to_string(), \n    Series::from_vec(vec![\"Alice\", \"Bob\", \"Carol\"], Some(\"name\")))?;\ndf.add_column(\"age\".to_string(),\n    Series::from_vec(vec![30, 25, 35], Some(\"age\")))?;\ndf.add_column(\"salary\".to_string(),\n    Series::from_vec(vec![75000.0, 65000.0, 85000.0], Some(\"salary\")))?;\n\n// Perform operations\nlet filtered = df.filter(\"age \u003e 25\")?;\nlet mean_salary = df.column(\"salary\")?.mean()?;\nlet grouped = df.groupby(vec![\"department\"])?.agg(HashMap::from([\n    (\"salary\".to_string(), vec![\"mean\", \"sum\"]),\n    (\"age\".to_string(), vec![\"max\"])\n]))?;\n```\n\n## Core Features\n\n### Data Structures\n\n- **Series**: One-dimensional labeled array capable of holding any data type\n- **DataFrame**: Two-dimensional, size-mutable, heterogeneous tabular data structure\n- **MultiIndex**: Hierarchical indexing for advanced data organization\n- **Categorical**: Memory-efficient representation for string data with limited cardinality\n\n### Data Types\n\n- Numeric: `i32`, `i64`, `f32`, `f64`, `u32`, `u64`\n- String: UTF-8 encoded with automatic string pooling\n- Boolean: Native boolean support\n- DateTime: Timezone-aware datetime with nanosecond precision\n- Categorical: Efficient storage for repeated string values\n- Missing Values: First-class `NA` support across all types\n\n### Operations\n\n#### Data Manipulation\n- Column addition, removal, and renaming\n- Row and column selection with boolean indexing\n- Sorting by single or multiple columns\n- Duplicate detection and removal\n- Data type conversion and casting\n\n#### Aggregation \u0026 Grouping\n- GroupBy operations with multiple aggregation functions\n- Window functions (rolling, expanding, exponentially weighted)\n- Pivot tables and cross-tabulation\n- Custom aggregation functions\n\n#### Joining \u0026 Merging\n- Inner, left, right, and outer joins\n- Merge on single or multiple keys\n- Concat operations with axis control\n- Append with automatic index alignment\n\n#### Time Series\n- DateTime indexing and slicing\n- Resampling and frequency conversion\n- Time zone handling and conversion\n- Date range generation\n- Business day calculations\n\n### Performance Optimizations\n\n#### SIMD Vectorization\n- Automatic SIMD optimization for numerical operations\n- Hand-tuned implementations for common operations\n- Support for AVX2 and AVX-512 instruction sets\n\n#### Parallel Processing\n- Multi-threaded execution for large datasets\n- Configurable thread pool sizing\n- Parallel aggregations and transformations\n- Load-balanced work distribution\n\n#### Memory Efficiency\n- Columnar storage format\n- String interning with global string pool\n- Copy-on-write semantics\n- Memory-mapped file support\n- Lazy evaluation for chain operations\n\n### I/O Capabilities\n\n#### File Formats\n- **CSV**: Fast parallel CSV reader/writer\n- **Parquet**: Apache Parquet with compression support\n- **JSON**: Both records and columnar JSON formats\n- **Excel**: XLSX/XLS read/write with multi-sheet support\n- **Arrow**: Zero-copy Arrow integration\n\n#### Cloud Storage\n- AWS S3\n- Google Cloud Storage\n- Azure Blob Storage\n- HTTP/HTTPS endpoints\n\n### Security Features\n\nEnterprise-grade security features for data protection and access control:\n\n#### Authentication \u0026 Authorization\n- **JWT (JSON Web Tokens)**: Stateless authentication with token validation\n- **OAuth 2.0**: Industry-standard authorization framework\n- **API Key Management**: Secure API key generation and validation\n- **Session Management**: User session tracking and lifecycle management\n\n#### Access Control\n- **Role-Based Access Control (RBAC)**: Fine-grained permission management\n- **Multi-tenancy Support**: Isolated data access per tenant\n- **Resource-level Permissions**: Control access to specific datasets and operations\n\n#### Security Monitoring\n- **Audit Logging**: Comprehensive tracking of data access and modifications\n- **Security Events**: Real-time monitoring of authentication and authorization events\n- **Compliance Support**: Features designed to meet security compliance requirements\n\nSee `examples/security_jwt_oauth_example.rs` and `examples/security_rbac_example.rs` for implementation details.\n\n### Real-Time Analytics\n\nBuilt-in analytics engine for monitoring and performance tracking:\n\n#### Metrics Collection\n- **Counters**: Track cumulative values and event counts\n- **Gauges**: Monitor current values and resource levels\n- **Histograms**: Measure distribution of values over time\n- **Timers**: Track operation durations and performance\n\n#### Operation Tracking\n- **DataFrame Operations**: Monitor query execution and data transformations\n- **Resource Monitoring**: Track memory usage, CPU utilization, and I/O operations\n- **Performance Profiling**: Identify bottlenecks and optimization opportunities\n\n#### Alert Management\n- **Threshold-based Alerts**: Trigger notifications when metrics exceed limits\n- **Custom Alert Rules**: Define complex alerting conditions\n- **Alert History**: Track and analyze past alerts\n\n#### Visualization\n- **Real-time Dashboards**: Monitor system health and performance metrics\n- **Metric Aggregation**: Combine and analyze metrics across dimensions\n- **Export Capabilities**: Export metrics to external monitoring systems\n\nSee `examples/analytics_dashboard_example.rs` for comprehensive usage examples.\n\n### Machine Learning\n\nAdvanced machine learning capabilities integrated with DataFrame operations:\n\n#### Supervised Learning\n- **Decision Trees**: Classification and regression with interpretable models\n- **Random Forests**: Ensemble methods for improved accuracy\n- **Gradient Boosting**: High-performance boosting algorithms\n- **Neural Networks**: Deep learning with configurable architectures\n\n#### Time Series Forecasting\n- **ARIMA Models**: AutoRegressive Integrated Moving Average\n- **Exponential Smoothing**: Trend and seasonality modeling\n- **Prophet Integration**: Facebook's forecasting library support\n- **Feature Engineering**: Automatic lag features and date components\n\n#### Model Pipeline\n- **Feature Preprocessing**: Scaling, normalization, and encoding\n- **Model Training**: Unified API for training various algorithms\n- **Cross-validation**: K-fold and time series cross-validation\n- **Hyperparameter Tuning**: Grid search and random search optimization\n\nSee `examples/ml_neural_network_example.rs`, `examples/ml_decision_tree_example.rs`,\n`examples/ml_random_forest_example.rs`, `examples/ml_gradient_boosting_example.rs`,\nand `examples/time_series_forecasting_example.rs` for detailed examples.\n\n## Installation\n\nAdd to your `Cargo.toml`:\n\n```toml\n[dependencies]\npandrs = \"0.3.0\"\n```\n\n### Feature Flags\n\nEnable additional functionality with feature flags:\n\n```toml\n[dependencies]\npandrs = { version = \"0.3.0\", features = [\"optimized\"] }\n```\n\nAvailable features:\n- **Core features:**\n  - `optimized`: Performance optimizations and SIMD\n  - `backward_compat`: Backward compatibility support\n- **Data formats:**\n  - `parquet`: Parquet file support\n  - `excel`: Excel file support\n- **Advanced features:**\n  - `distributed`: Distributed computing with DataFusion\n  - `visualization`: Plotting capabilities\n  - `streaming`: Real-time data processing\n  - `serving`: Model serving and deployment\n  - `scirs2`: SciRS2 scientific computing integration\n- **Experimental:**\n  - `cuda`: GPU acceleration (requires CUDA toolkit)\n  - `wasm`: WebAssembly compilation support\n  - `jit`: Just-in-time compilation\n\n## Performance Benchmarks\n\nPerformance comparison with pandas (Python) and Polars (Rust):\n\n| Operation | PandRS | Pandas | Polars | Speedup vs Pandas |\n|-----------|--------|--------|--------|-------------------|\n| CSV Read (1M rows) | 0.18s | 0.92s | 0.15s | 5.1x |\n| GroupBy Sum | 0.09s | 0.31s | 0.08s | 3.4x |\n| Join Operations | 0.21s | 0.87s | 0.19s | 4.1x |\n| String Operations | 0.14s | 1.23s | 0.16s | 8.8x |\n| Rolling Window | 0.11s | 0.43s | 0.12s | 3.9x |\n\n*Benchmarks performed on AMD Ryzen 9 5950X, 64GB RAM, NVMe SSD*\n\n## Documentation\n\n- [API Documentation](https://docs.rs/pandrs)\n- [User Guide](https://github.com/cool-japan/pandrs/wiki)\n- [Examples](https://github.com/cool-japan/pandrs/tree/main/examples)\n- [Migration from Pandas](https://github.com/cool-japan/pandrs/wiki/Migration-Guide)\n\n## Examples\n\nThe `examples/` directory contains comprehensive examples demonstrating all major features:\n\n### Data Manipulation \u0026 Analysis\n- **Basic Operations**: `groupby_example.rs`, `transform_example.rs`, `pivot_example.rs`\n- **Time Series**: `time_series_example.rs`, `time_series_forecasting_example.rs`, `datetime_accessor_example.rs`\n- **Window Operations**: `window_operations_example.rs`, `comprehensive_window_example.rs`, `dataframe_window_example.rs`\n- **Multi-Index**: `multi_index_example.rs`, `hierarchical_groupby_example.rs`, `nested_group_operations_example.rs`\n- **Categorical Data**: `categorical_example.rs`, `categorical_na_example.rs`\n\n### Machine Learning\n- **Neural Networks**: `ml_neural_network_example.rs`\n- **Decision Trees**: `ml_decision_tree_example.rs`\n- **Random Forests**: `ml_random_forest_example.rs`\n- **Gradient Boosting**: `ml_gradient_boosting_example.rs`\n- **ML Pipelines**: `optimized_ml_pipeline_example.rs`, `optimized_ml_feature_engineering_example.rs`\n- **Specialized ML**: `optimized_ml_clustering_example.rs`, `optimized_ml_anomaly_detection_example.rs`, `optimized_ml_dimension_reduction_example.rs`\n\n### Security \u0026 Authentication\n- **JWT \u0026 OAuth 2.0**: `security_jwt_oauth_example.rs`\n- **Role-Based Access Control**: `security_rbac_example.rs`\n\n### Real-Time Analytics\n- **Analytics Dashboard**: `analytics_dashboard_example.rs`\n\n### I/O \u0026 Data Formats\n- **CSV**: Examples integrated into basic operations\n- **Parquet**: `parquet_example.rs`, `parquet_advanced_example.rs`, `parquet_advanced_features_example.rs`\n- **Excel**: `excel_multisheet_example.rs`, `excel_advanced_features_example.rs`\n\n### Performance \u0026 Optimization\n- **SIMD \u0026 Parallel**: `parallel_example.rs`, `optimized_dataframe_example.rs`, `optimized_large_dataset_example.rs`\n- **GPU Acceleration**: `gpu_dataframe_example.rs`, `gpu_ml_example.rs`, `gpu_benchmark_example.rs`\n- **Distributed Computing**: `distributed_example.rs`, `distributed_window_example.rs`, `distributed_fault_tolerance_example.rs`\n- **JIT Compilation**: `jit_parallel_example.rs`, `jit_window_operations_example.rs`\n- **Streaming**: `streaming_example.rs`\n\n### Visualization\n- **Plotters Integration**: `visualization_plotters_example.rs`, `plotters_visualization_example.rs`, `enhanced_visualization_example.rs`\n\n### Basic Data Analysis\n\n```rust\nuse pandrs::prelude::*;\n\nlet df = DataFrame::read_csv(\"data.csv\", CsvReadOptions::default())?;\n\n// Basic statistics\nlet stats = df.describe()?;\nprintln!(\"Data statistics:\\n{}\", stats);\n\n// Filtering and aggregation\nlet result = df\n    .filter(\"age \u003e= 18 \u0026\u0026 income \u003e 50000\")?\n    .groupby(vec![\"city\", \"occupation\"])?\n    .agg(HashMap::from([\n        (\"income\".to_string(), vec![\"mean\", \"median\", \"std\"]),\n        (\"age\".to_string(), vec![\"mean\"])\n    ]))?\n    .sort_values(vec![\"income_mean\"], vec![false])?;\n```\n\n### Time Series Analysis\n\n```rust\nuse pandrs::prelude::*;\nuse chrono::{Duration, Utc};\n\nlet mut df = DataFrame::read_csv(\"timeseries.csv\", CsvReadOptions::default())?;\ndf.set_index(\"timestamp\")?;\n\n// Resample to daily frequency\nlet daily = df.resample(\"D\")?.mean()?;\n\n// Calculate rolling statistics\nlet rolling_stats = daily\n    .rolling(RollingOptions {\n        window: 7,\n        min_periods: Some(1),\n        center: false,\n    })?\n    .agg(HashMap::from([\n        (\"value\".to_string(), vec![\"mean\", \"std\"]),\n    ]))?;\n\n// Exponentially weighted moving average\nlet ewm = daily.ewm(EwmOptions {\n    span: Some(10.0),\n    ..Default::default()\n})?;\n```\n\n### Machine Learning Pipeline\n\n```rust\nuse pandrs::prelude::*;\n\n// Load and preprocess data\nlet df = DataFrame::read_parquet(\"features.parquet\")?;\n\n// Handle missing values\nlet df_filled = df.fillna(FillNaOptions::Forward)?;\n\n// Encode categorical variables\nlet df_encoded = df_filled.get_dummies(vec![\"category1\", \"category2\"], None)?;\n\n// Normalize numerical features\nlet features = vec![\"feature1\", \"feature2\", \"feature3\"];\nlet df_normalized = df_encoded.apply_columns(\u0026features, |series| {\n    let mean = series.mean()?;\n    let std = series.std(1)?;\n    series.sub_scalar(mean)?.div_scalar(std)\n})?;\n\n// Split features and target\nlet X = df_normalized.drop(vec![\"target\"])?;\nlet y = df_normalized.column(\"target\")?;\n```\n\n## Contributing\n\nWe welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.\n\n### Development Setup\n\n```bash\n# Clone the repository\ngit clone https://github.com/cool-japan/pandrs\ncd pandrs\n\n# Install development dependencies\ncargo install cargo-nextest cargo-criterion\n\n# Run tests\ncargo nextest run\n\n# Run benchmarks\ncargo criterion\n\n# Check code quality\ncargo clippy -- -D warnings\ncargo fmt -- --check\n```\n\n## Sponsorship\n\nPandRS is developed and maintained by **COOLJAPAN OU (Team Kitasan)**.\n\nIf you find PandRS useful, please consider sponsoring the project to support continued development of the Pure Rust ecosystem.\n\n[![Sponsor](https://img.shields.io/badge/Sponsor-%E2%9D%A4-red?logo=github)](https://github.com/sponsors/cool-japan)\n\n**[https://github.com/sponsors/cool-japan](https://github.com/sponsors/cool-japan)**\n\nYour sponsorship helps us:\n- Maintain and improve the COOLJAPAN ecosystem\n- Keep the entire ecosystem (OxiBLAS, OxiFFT, SciRS2, etc.) 100% Pure Rust\n- Provide long-term support and security updates\n\n## License\n\nLicensed under the Apache License, Version 2.0 ([LICENSE](LICENSE) or \u003chttp://www.apache.org/licenses/LICENSE-2.0\u003e).\n\n## Acknowledgments\n\nPandRS is inspired by the excellent pandas library and incorporates ideas from:\n- [Pandas](https://pandas.pydata.org/) - API design and functionality\n- [Polars](https://www.pola.rs/) - Performance optimizations\n- [Apache Arrow](https://arrow.apache.org/) - Columnar format\n- [DataFusion](https://arrow.apache.org/datafusion/) - Query engine\n\n## Support\n\n- [Issue Tracker](https://github.com/cool-japan/pandrs/issues)\n- [Discussions](https://github.com/cool-japan/pandrs/discussions)\n- [Stack Overflow](https://stackoverflow.com/questions/tagged/pandrs)\n\n---\n\nPandRS is a COOLJAPAN project, bringing high-performance data analysis to the Rust ecosystem.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcool-japan%2Fpandrs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcool-japan%2Fpandrs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcool-japan%2Fpandrs/lists"}