https://github.com/1biot/fiquela
https://github.com/1biot/fiquela
json json-parser neon parser php query sql xml xml-parser yaml yaml-parser
Last synced: 2 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/1biot/fiquela
- Owner: 1biot
- License: mit
- Created: 2024-12-19T09:03:30.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-28T11:07:29.000Z (about 1 year ago)
- Last Synced: 2025-01-28T12:22:14.380Z (about 1 year ago)
- Topics: json, json-parser, neon, parser, php, query, sql, xml, xml-parser, yaml, yaml-parser
- Language: PHP
- Homepage:
- Size: 5.26 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# FiQueLa: File Query Language
> _[fi-kju-ela]_

[](https://github.com/1biot/fiquela/actions/workflows/ci.yml)





**F**i**Q**ue**L**a lets you query files like a database, with SQL-like syntax or a fluent PHP API.
Filter, join, group, aggregate, and transform data from **XML**, **XLS**, **ODS**, **CSV**, **JSON**, **NDJSON**,
**YAML**, **NEON**, and **HTTP access logs** without setting up a database. It is built for real-world data processing
with streaming support, explain/debug tooling, and strongly typed operators.
**Features**:
- 📂 **Supports multiple formats**: Work seamlessly with XML, CSV, JSON, NDJSON, YAML, NEON, XLS, and HTTP access logs.
- 🛠️ **SQL-inspired syntax**: Perform `SELECT`, `JOIN`, `WHERE`, `GROUP BY`, `HAVING`, `ORDER BY` and more.
- ✍️ **Flexible querying**: Write SQL-like strings or use the fluent API.
- 📊 **Powerful expressions and functions**: Use `CASE WHEN`, `IF`, grouped conditions, `XOR`, `REGEXP`, aggregates, and utility functions.
- 🚀 **Stream-first processing**: Optimized for large JSON, XML, and CSV files with low memory pressure where possible.
- 🧑💻 **Developer-Friendly**: Map results to DTOs for easier data manipulation.
- ⭐ **Unified API across all supported formats**: Use a consistent API for all your data needs.
### Why FiQueLa
- Query files with familiar SQL concepts while keeping everything in PHP.
- Join data across sources and formats in one query.
- Handle advanced logic with nested condition groups and statement functions.
- Inspect execution using explain output and debugger tooling.
**Table of Contents**:
- [Overview](#1-overview)
- [Installation](#2-installation)
- [Supported Formats](#3-supported-formats)
- [Getting Started](#4-getting-started)
- [Documentation](#5-documentation)
- [Opening Files](docs/opening-files.md)
- [Fluent API](docs/fluent-api.md)
- [File Query Language](docs/file-query-language.md)
- [Fetching Data](docs/fetching-data.md)
- [Query Life Cycle](docs/query-life-cycle.md)
- [Query Inspection and Benchmarking](docs/query-inspection-and-benchmarking.md)
- [API Reference](docs/api-reference.md)
- [Examples](#6-examples)
- [Ecosystem](#7-ecosystem)
- [FiQueLa CLI](#fiquela-cli)
- [FiQueLa API](#fiquela-api)
- [FiQueLa Studio](#fiquela-studio)
- [Known issues](#8-known-issues)
- [Roadmap](#9-roadmap)
- [Contributions](#10-contributions)
## 1. Overview
Why limit SQL to databases when it can be just as effective for structured files?
**F**i**Q**ue**L**a brings SQL-like querying to file-based data and keeps your workflow fully in PHP.
Key highlights:
- **Universal querying**: Filter, sort, join, and aggregate data across multiple file formats.
- **Real SQL-like behavior**: Use `GROUP BY`, `HAVING`, nested conditions, `CASE WHEN`, `IF`, and many built-in functions.
- **Flexible integration**: Query through fluent API or SQL-like strings, whichever matches your use case.
- **Operational tooling**: Use debugger and explain plans to understand performance and execution.
Use **F**i**Q**ue**L**a to:
- Simplify data extraction and analysis from structured files.
- Combine data from multiple sources with ease.
- Create lightweight data processing pipelines without a full-fledged database.
**F**i**Q**ue**L**a empowers developers to unlock the potential of file-based data with the familiar and expressive language of SQL.
## 2. Installation
Install via [Composer](https://getcomposer.org/):
```bash
composer require 1biot/fiquela
```
Install packages for optional features:
```bash
composer require tracy/tracy
```
### Dependencies
- **`league/csv`**: Required for CSV file support.
- **`halaxa/json-machine`**: Required for JSON stream support.
- **`symfony/yaml`**: Required for YAML file support.
- **`nette/neon`**: Required for NEON file support.
- **`openspout/openspout`**: Required for XLSX and ODS file support.
- **`tracy/tracy`**: Optional for using Debugger
## 3. Supported Formats
| Format | Name | Class | File Support | String Support |
|-------------|-------------------------|-------------------------|--------------|----------------|
| `csv` | CSV | `FQL\Stream\Csv` | ✅ | ❌ |
| `xml` | XML | `FQL\Stream\Xml` | ✅ | ❌ |
| `xls` | XLS/XLSX | `FQL\Stream\Xls` | ✅ | ❌ |
| `ods` | ODS | `FQL\Stream\Ods` | ✅ | ❌ |
| `jsonFile` | JSON Stream | `FQL\Stream\JsonStream` | ✅ | ❌ |
| `ndJson` | Newline Delimited JSON | `FQL\Stream\NDJson` | ✅ | ❌ |
| `json` | JSON (json_decode) | `FQL\Stream\Json` | ✅ | ✅ |
| `yaml` | YAML | `FQL\Stream\Yaml` | ✅ | ✅ |
| `neon` | NEON | `FQL\Stream\Neon` | ✅ | ✅ |
| `log` | HTTP Access Log | `FQL\Stream\AccessLog` | ✅ | ❌ |
### Directory provider
Is special provider `FQL\Stream\Dir` class. It allows you to use directory as a source.
You can query all files recursively by queries.
## 4. Getting Started
Here’s a quick example of how **F**i**Q**ue**L**a can simplify your data queries:
```php
use FQL\Enum;
use FQL\Query;
$results = Query\Provider::fromFileQuery('(./path/to/file.xml).SHOP.SHOPITEM')
->selectAll()
->where('EAN', Enum\Operator::EQUAL, '1234567891011')
->or('PRICE', Enum\Operator::LESS_THAN_OR_EQUAL, 200)
->orderBy('PRICE')->desc()
->limit(10)
->execute()
->fetchAll();
print_r(iterator_to_array($results));
```
This query returns rows that match either a specific EAN or a price threshold, sorted by price and limited to 10 records.
Or using the FQL syntax:
```php
use FQL\Query;
$query = <<execute()
->fetchAll();
print_r(iterator_to_array($results));
````
Output:
```php
Array
(
[0] => Array
(
[NAME] => "Product 1"
[EAN] => "1234567891011"
[PRICE] => 300.00
)
[1] => Array
(
[NAME] => "Product 2"
[EAN] => "1234567891012"
[PRICE] => 150.00
)
[2] => Array
(
[NAME] => "Product 3"
[EAN] => "1234567891013"
[PRICE] => 150.00
)
...
)
```
## 5. Documentation
For more details about **F**i**Q**ue**L**a and her capabilities, explore the documentation sections.
- [Opening Files](docs/opening-files.md)
- [Fluent API](docs/fluent-api.md)
- [File Query Language](docs/file-query-language.md)
- [Fetching Data](docs/fetching-data.md)
- [Query Life Cycle](docs/query-life-cycle.md)
- [Query Inspection and Benchmarking](docs/query-inspection-and-benchmarking.md)
- [API Reference](docs/api-reference.md)
## 6. Examples
Check the examples and run them using Composer. All examples uses `\FQL\Query\Debugger` and methods
`inspectQuery`, `inspectSql`, `inspectStreamSql` or `benchmarkQuery` to show the results.
```bash
composer examples
# or
composer example:csv
composer example:join
composer example:json
composer example:neon
composer example:sql
composer example:xml
composer example:yaml
composer example:explain
```
Check step **Examples** at [actions](https://github.com/1biot/fiquela/actions/runs/12992585648/job/36232767074) or run
`composer example:csv` and output will look like this:
```
=========================
### Debugger started: ###
=========================
> Memory usage (MB): 1.3191 (emalloc)
> Memory peak usage (MB): 1.7326 (emalloc)
------------------------------
> Execution time (s): 8.5E-5
> Execution time (ms): 0.085
> Execution time (µs): 85
> Execution memory peak usage (MB): 0
=========================
### Inspecting query: ###
=========================
==================
### SQL query: ###
==================
> SELECT
> ean ,
> defaultCategory ,
> EXPLODE(defaultCategory, " > ") AS categoryArray ,
> price ,
> ROUND(price, 2) AS price_rounded ,
> MOD(price, 100) AS modulo_100 ,
> MOD(price, 54) AS modulo_54
> FROM csv(products-w-1250.csv, "windows-1250", ";").*
> GROUP BY defaultCategory
> ORDER BY defaultCategory DESC
================
### Results: ###
================
> Result class: FQL\Results\InMemory
> Results size memory (KB): 3.55
> Result exists: true
> Result count: 15
========================
### Fetch first row: ###
========================
array (7)
'ean' => 5010232964877
'defaultCategory' => 'Testování > Drogerie'
'categoryArray' => array (2)
| 0 => 'Testování'
| 1 => 'Drogerie'
'price' => 121.0
'price_rounded' => 121.0
'modulo_100' => 21.0
'modulo_54' => 13.0
>>> SPLIT TIME <<<
> Memory usage (MB): 3.1451 (emalloc)
> Memory peak usage (MB): 3.2262 (emalloc)
------------------------------
> Execution time (s): 0.040016
> Execution time (ms): 40.016
> Execution time (µs): 40016
> Execution memory peak usage (MB): 1.4936
========================
### Benchmark Query: ###
========================
> 2 500 iterations
==================
### SQL query: ###
==================
> SELECT
> ean ,
> defaultCategory ,
> EXPLODE(defaultCategory, " > ") AS categoryArray ,
> price ,
> ROUND(price, 2) AS price_rounded ,
> MOD(price, 100) AS modulo_100 ,
> MOD(price, 54) AS modulo_54
> FROM csv(products-w-1250.csv, "windows-1250", ";").*
> GROUP BY defaultCategory
> ORDER BY defaultCategory DESC
=========================
### STREAM BENCHMARK: ###
=========================
> Size (KB): 2.78
> Count: 15
> Iterated results: 37 500
>>> SPLIT TIME <<<
> Memory usage (MB): 3.1347 (emalloc)
> Memory peak usage (MB): 3.2262 (emalloc)
------------------------------
> Execution time (s): 36.402098
> Execution time (ms): 36402.098
> Execution time (µs): 36402098
> Execution memory peak usage (MB): 0
============================
### IN_MEMORY BENCHMARK: ###
============================
> Size (KB): 3.55
> Count: 15
> Iterated results: 37 500
>>> SPLIT TIME <<<
> Memory usage (MB): 3.1451 (emalloc)
> Memory peak usage (MB): 3.2262 (emalloc)
------------------------------
> Execution time (s): 0.01743
> Execution time (ms): 17.43
> Execution time (µs): 17430
> Execution memory peak usage (MB): 0
=======================
### Debugger ended: ###
=======================
> Memory usage (MB): 3.1343 (emalloc)
> Memory peak usage (MB): 3.2262 (emalloc)
------------------------------
> Final execution time (s): 36.459756
> Final execution time (ms): 36459.756
> Final execution time (µs): 36459756
```
## 7. Ecosystem
FiQueLa is more than just a PHP library. It comes with a CLI tool, a REST API server, and a web-based query explorer.
### FiQueLa CLI
[**fiquela-cli**](https://github.com/1biot/fiquela-cli) is a command-line tool for querying structured files directly from the terminal. It supports local file querying, remote API connections, and an interactive REPL mode with paginated table output.
```bash
# Install
curl -fsSL https://raw.githubusercontent.com/1biot/fiquela-cli/main/install.sh | bash
# Query a local file
fiquela-cli --file=data.csv "SELECT name, price FROM * WHERE price > 100;"
# Interactive mode
fiquela-cli --file=data.csv
```
Requires PHP 8.2+ with readline, curl, and zlib extensions.
### FiQueLa API
[**fiquela-api**](https://github.com/1biot/fiquela-api) is a RESTful server for querying structured files over HTTP. It provides file management, query execution, result export (CSV, TSV, JSON), and query history tracking with JWT authentication.
[](https://cloud.digitalocean.com/apps/new?repo=https://github.com/1biot/fiquela-api/tree/main?refcode=92025543cb9f)
Requires credentials configuration via environment variables. Optionally enable S3 backup (Cloudflare R2) for file storage. For more information visit the [repository](https://github.com/1biot/fiquela-api?tab=readme-ov-file#-credentials).
Key endpoints: `POST /api/auth/login` for JWT authentication, `POST /api/v1/query` for executing queries, `GET /api/v1/files` for file management, `GET /api/v1/export/{hash}` for downloading results. All endpoints except login require `Authorization: Bearer `.
### FiQueLa Studio
[**studio.fiquela.io**](https://studio.fiquela.io) is a web-based visual query explorer for building and running FQL queries interactively. Requires a running [FiQueLa API](#fiquela-api) instance to connect to.
## 8. Known issues
- ⚠️ Functions `JOIN`, and `ORDER BY` are not memory efficient, because joining data or sorting data requires
to load all data into memory. It may cause memory issues for large datasets. But everything else is like ⚡️.
## 9. Roadmap
- [x] ~~**Operator BETWEEN**: Add operator `BETWEEN` for filtering data and add support for dates and ranges.~~
- [x] ~~**XLS/XLSX**: Add Excel file support.~~
- [x] ~~**Custom cast type**: Add support for custom cast type for `SELECT` clause.~~
- [x] ~~**Add explain method**: Add method `explain()` for explaining query execution from actual query debugger and provide more complex information about query.~~
- [x] ~~**PHPStan 8**: Fix all PHPStan 8 errors.~~
- [x] ~~**Tests**: Increase test coverage (80%+).~~
- [x] ~~**Optimize GROUP BY**: Optimize `GROUP BY` for more memory efficient data processing.~~
- [x] ~~**DELETE, UPDATE, INSERT**: Support for manipulating data in files.~~ ~~- Instead of this, it will comes support
for exporting data to files (CSV, NDJson, MessagePack, and more...) by `INTO` clause.~~
- [ ] **Next file formats**: Add next file formats [MessagePack](https://msgpack.org/), [Parquet](https://parquet.apache.org/docs/file-format/), [INI](https://en.wikipedia.org/wiki/INI_file) and [TOML](https://toml.io/en/)
- [ ] **Documentation**: Create detailed guides and examples for advanced use cases.
- [ ] **Tests**: Increase test coverage (90%+).
- [ ] **Hashmap cache**: Add hashmap cache (Redis, Memcache) for more memory efficient data processing.
## 10. Contributions
If you have suggestions or would like to contribute to these features, feel free to open an issue or a pull request!
**How to contribute:**
- Fork the repository
- Create a new branch
- Make your changes
- Create a pull request
- All tests must pass
- Wait for approval
- 🚀