https://github.com/batchlabs/charlatan
SQL-like Query Language
https://github.com/batchlabs/charlatan
Last synced: 8 months ago
JSON representation
SQL-like Query Language
- Host: GitHub
- URL: https://github.com/batchlabs/charlatan
- Owner: BatchLabs
- License: mit
- Created: 2015-07-28T14:21:43.000Z (almost 11 years ago)
- Default Branch: master
- Last Pushed: 2016-12-28T17:54:14.000Z (over 9 years ago)
- Last Synced: 2024-06-20T13:41:07.620Z (almost 2 years ago)
- Language: Go
- Homepage: https://godoc.org/github.com/BatchLabs/charlatan
- Size: 152 KB
- Stars: 55
- Watchers: 17
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Charlatan
[](https://travis-ci.org/BatchLabs/charlatan)
**Charlatan** is a query engine for lists or streams of records in different
formats. It natively supports CSV and JSON formats but can easily be extended
to others.
It supports an SQL-like query language that is defined below. Queries are
applied to records to extract values depending on zero or more criteria.
## Query Syntax
```
SELECT FROM [ WHERE ] [ STARTING AT ] [ LIMIT [,] ]
```
- `` is a list of comma-separated field names. Each field name must
exist in the source. When reading CSV files, the field names are the column
names, while when reading JSON they represent keys.
- `` is the filename from which the data is read. The API is agnostique
on this and one can implement support for any source type.
- `` is a SQL-like value, which can be either a constant (e.g.
`WHERE 1`), a field (e.g. `WHERE archived`) or any operation using comparison
operators (`=`, `!=`, `<`, `<=`, `>`, `>=`, `AND`, `OR`) and optionally
parentheses (e.g. `WHERE (foo > 2) AND (bar = "yo")`). The parser allows to
use `&&` instead of `AND` and `||` instead of `OR`. It also support inclusive
range tests, like `WHERE age BETWEEN 20 AND 30`.
- `LIMIT N` can be used to keep only the first N matched records. It also
support the MySQL way to specify offsets: `LIMIT M, N` can be used to get the
first N matched records after the M-th.
- `STARTING AT ` can be used to skip the first N records. It’s
equivalent to the `` field of the `LIMIT` clause, and if both clauses
are used in a query, the last one will be used.
Constant values include strings, integers, floats, booleans and the `null`
value.
### Examples
```sql
SELECT CountryName FROM sample/csv/population.csv WHERE Year = 2010 AND Value > 50000000 AND Value < 70000000
SELECT name, age FROM sample/json/people.jsons WHERE stats.walking > 30 AND stats.biking < 300
SELECT name, age FROM sample/json/people.jsons WHERE stats.walking BETWEEN 20 AND 100 LIMIT 10, 5
```
### Type Coercion Rules
* int: same value if the constant is an integer. Truncated value if it’s a
float. `1` if it’s a `true` boolean. `0` for everything else.
* float: same value if the constant is an integer or a float. `1.0` if it’s a
`true` boolean. `0.0` for everything else.
* boolean: `true` if it’s a string (even if it’s empty), a `true` boolean, a
non-zero integer or float. `false` for everything else.
* string: the string representation of the constant. `null` becomes `"null"`
These rules mean that e.g. `WHERE 0` is equivalent to `WHERE false` and
`WHERE ""` is equivalent to `WHERE true`.
## API
The library is responsible for parsing the query and executing against records.
Everything else is up to you, including how fields are retrieved from records.
Note: code examples below don’t include error handling for clarity purposes.
```go
// parse the query
query, _ := charlatan.QueryFromString("SELECT foo FROM myfile.json WHERE foo > 2")
// open the source file
reader, _ := os.Open(query.From())
defer reader.Close()
// skip lines if the query contains "STARTING AT "
skip := query.StartingAt()
decoder := json.NewDecoder(reader)
for {
// here we use STARTING AT to skip all lines, not only the ones that match
// the query. This is not the usual behavior, but we can do whatever we
// want here.
skip--
if skip >= 0 {
continue
}
// get a new JSON record
r, err := record.NewJSONRecordFromDecoder(decoder)
if err == io.EOF {
break
}
// evaluate the query against the record to test if it matches
if match, _ := query.Evaluate(r); !match {
continue
}
// extract the values and print them
values, _ := query.FieldsValues(r)
fmt.Printf("%v\n", values)
}
```
Two record types are included: `JSONRecord` and `CSVRecord`. Implementing a
record only requires one method: `Find(*Field) (*Const, error)`, which takes a
field and return its value.
As an example, let’s implement a `LineRecord` that’ll be used to get specific
characters on each line of a file, `c0` being the first character:
```go
type LineRecord struct { Line string }
func (r *LineRecord) Find(f *charlatan.Field) (*charlatan.Const, error) {
// this is the field value we must return
name := f.Name()
// we reject fields that doesn't start with 'c'
if len(name) < 2 || name[0] != 'c' {
return nil, fmt.Errorf("Unknown field '%s'", name)
}
// we extract the character index from the field name.
index, err := strconv.ParseInt(name[1:], 10, 64)
if err != nil {
return nil, err
}
// let's not be too strict and accept out-of-range indexes
if index < 0 || index >= int64(len(r.Line)) {
return charlatan.StringConst(""), nil
}
return charlatan.StringConst(fmt.Sprintf("%c", r.Line[index])), nil
}
```
One can now loop over a file’s content, construct `LineRecord`s from its lines
and evaluate queries against them:
```go
query, _ := charlatan.QueryFromString("SELECT c1 FROM myfile WHERE c0 = 'a'")
f, _ := os.Open(query.From())
defer f.Close()
s := bufio.NewScanner(f)
for s.Scan() {
r := &LineRecord{Line: s.Text()}
if m, _ := query.Evaluate(r); !m {
continue
}
values, _ := query.FieldsValues(r)
fmt.Printf("%v\n", values)
}
```
### Examples
Two examples are included in the repository under `sample/csv/` and
`sample/json/`.
### Authors
- [Nicolas DOUILLET](https://github.com/minimarcel)
- [Vincent RISCHMANN](https://github.com/vrischmann)
- [Baptiste FONTAINE](https://github.com/bfontaine)