An open API service indexing awesome lists of open source software.

https://github.com/liorkogan/v1

V1: A Visual Query Language for Property Graphs
https://github.com/liorkogan/v1

pattern-language property-graph query-language

Last synced: 4 months ago
JSON representation

V1: A Visual Query Language for Property Graphs

Awesome Lists containing this project

README

          

## V1: A Visual Query Language for Property Graphs

Copyright Β© 2017-2025 [Lior Kogan](https://www.linkedin.com/in/liorkogan) (koganlior1 [at] gmail [dot] com)

The "V1" name is a trademark of Lior Kogan.

This work is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-nc-sa/4.0/). Commercial licenses and software tools are also available from Lior Kogan.

![CC](https://upload.wikimedia.org/wikipedia/commons/1/12/Cc-by-nc-sa_icon.svg)

---

This page contains the description and the specifications of the V1 language. This content is periodically released in [arXiv](https://arxiv.org/abs/1710.04470).

V1 is named after the primary visual cortex in our brain, also known as visual area one (V1).

V1 is dedicated to my ancestors all the way back and their descendants all the way forth.

Feedback, questions, corrections, and suggestions are welcome.

---

## Table of Contents

- [Introduction](#introduction)
- [The Property Graph Mathematical Structure](#the-property-graph-mathematical-structure)
- [The Property Graph Data Model](#the-property-graph-data-model)
- [The Property Graph Schema](#the-property-graph-schema)
- [Patterns and Pattern Languages](#patterns-and-pattern-languages)
- [A Song of Ice and Fire](#a-song-of-ice-and-fire)
- [V1 Basics](#v1-basics)
- [Expressions and Expression Constraints](#expressions-and-expression-constraints)
- [Data Types, Operators, and Functions](#data-types-operators-and-functions)
- [Quantifiers](#quantifiers)
- [Entity-Tags](#entity-tags)
- [Negator](#negator)
- [Relationship/Path-Negator](#relationshippath-negator)
- [Combiner](#combiner)
- [Chains, Horizontal Quantifiers, and Horizontal Combiner](#chains-horizontal-quantifiers-and-horizontal-combiner)
- [Latent Pattern-Entities](#latent-pattern-entities)
- [Optional Components](#optional-components)
- [Untyped Entities](#untyped-entities)
- [Entity Type-Tags](#entity-type-tags)
- [Untyped Relationships](#untyped-relationships)
- [Relationship Type-Tags](#relationship-type-tags)
- [Null Entities](#null-entities)
- [Paths](#paths)
- [Shortest Paths](#shortest-paths)
- [Path Patterns](#path-patterns)
- [Referencing Expression-Tags](#referencing-expression-tags)
- [Aggregators](#aggregators)
- [A1 Aggregator](#a1-aggregator)
- [A2 Aggregator](#a2-aggregator)
- [A3 Aggregator](#a3-aggregator)
- [Min/Max Aggregators](#minmax-aggregators)
- [M1 Aggregator](#m1-aggregator)
- [M2 Aggregator](#m2-aggregator)
- [M3 Aggregator](#m3-aggregator)
- [R1 Aggregator](#r1-aggregator)
- [Aggregator Chains](#aggregator-chains)
- [Aggregator Sequences](#aggregator-sequences)
- [Extended Aggregators](#extended-aggregators)
- [Extended A1 Aggregator](#extended-a1-aggregator)
- [Extended A2 Aggregator](#extended-a2-aggregator)
- [Extended A3 Aggregator](#extended-a3-aggregator)
- [Extended M1 Aggregator](#extended-m1-aggregator)
- [Extended M2 Aggregator](#extended-m2-aggregator)
- [Extended M3 Aggregator](#extended-m3-aggregator)
- [Extended R1 Aggregator](#extended-r1-aggregator)
- [Multivalued Functions and Expressions](#multivalued-functions-and-expressions)
- [Application: Spatiotemporality](#application-spatiotemporality)

## Introduction

The _property graph_ is an increasingly popular data model. Pattern construction and pattern matching are important tasks when dealing with property graphs. Given a property graph schema 𝑆, a property graph 𝐺 conforming to 𝑆, and a query pattern 𝑃 conforming to 𝑆, all expressed in language 𝐿 = (𝐿𝑆, 𝐿𝐺, 𝐿𝑃, 𝐿𝑅), _pattern matching_ is the process of finding, transforming, merging, and annotating subgraphs of 𝐺 that match 𝑃. The syntaxes of sublanguages 𝐿𝑆_, 𝐿𝐺_, 𝐿𝑃_, and 𝐿𝑅_ define what and how symbols can be combined to form well-formed schemas, graphs, patterns, and query results, respectively. A semantics of 𝐿𝑃 is a mapping (𝑆, 𝐺, 𝑃) β†’ 𝑅: which subgraphs of 𝐺 match 𝑃 and how to transform, merge, and annotate them. Expressive pattern languages support topological constraints, property value constraints, negations, quantifications, aggregations, and path semantics. _Calculated properties_ may be defined for vertices, edges, and subgraphs, and constraints may be imposed on their evaluation result.

Many query posers are professionals (e.g., researchers, analysts, or investigators) who construct patterns as part of their daily work (e.g., investigative analytics). Such domain experts would like to construct patterns with minimal effort, minimal trial and error, and in a manner that is coherent with the way they think. The ability to express patterns in a way that is aligned with their mental processes is crucial to the flow of their work and to the quality of the insights they can draw. Many domain experts will not use textual property graph query languages (e.g., [Gremlin](https://arxiv.org/abs/1508.03843), [GSQL](https://arxiv.org/abs/1901.08248), [Cypher](https://dl.acm.org/citation.cfm?id=3190657), [PGQL](https://dl.acm.org/citation.cfm?id=2960421), [G-CORE](https://arxiv.org/abs/1712.01550), and [GQL](https://en.wikipedia.org/wiki/Graph_Query_Language)) either because it can be too hard for someone with little or no programming or scripting skills, or because it requires them to spend too much time on the technicalities and distracts them from their line of inquiry. As a result, they are forced to use only a predefined set of query templates or work in concert with technical experts. Both solutions are far from satisfying.

Since the pattern perception capabilities of the human visual cortex are remarkable, it is a matter of course that query patterns were to be expressed visually. Indeed, five of the abovementioned languages use 'ASCII art syntax' for expressing topological constraints. Needless to say, this type of 'visualization' is quite limited. While the use of ASCII art declined during the 1990s in favor of graphical images, query languages began to adopt ASCII art only recently. Visual (graphical, diagrammatic) query languages have the potential to be much more 'user-friendly' than their textual counterparts in the sense that patterns may be constructed and understood much more quickly and with much less mental effort. Given a schema, interactive tools can allow query posers to construct valid patterns with minimal typing. A long-standing challenge is to design a visual query language that is generic, has rich expressive power, and is highly receptive and productive. V1 attempts to answer this challenge.

V1 is a declarative visual pattern query language for schema-based property graphs. V1 supports property graphs with mixed (both directed and undirected) edges, multivalued and composite properties, and _null_ property values. V1 supports temporal data types, operators, and functions and can be extended to support additional data types, operators, and functions (one spatiotemporal model is presented). V1 is generic, concise, has rich expressive power, and is highly receptive and productive.

---

The term _property graph_ refers to both a mathematical structure and a data model; both are described below.

## The Property Graph Mathematical Structure

An [_undirected graph_](https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)) is an ordered triple 𝐺ᡀ = (𝑉, 𝐸, _Οˆβ‚‘_), a [_directed graph_](https://en.wikipedia.org/wiki/Directed_graph) (_digraph_, _oriented graph_) is an ordered triple 𝐺d = (𝑉, 𝐴, _Οˆβ‚_), and a [_mixed graph_](https://en.wikipedia.org/wiki/Mixed_graph) is an ordered quintuple πΊβ‚˜ = (𝑉, 𝐸, 𝐴, _Οˆβ‚‘_, _Οˆβ‚_), where 𝑉 is a nonempty set whose elements are called [_vertices_](https://en.wikipedia.org/wiki/Vertex_(graph_theory)) (_nodes_, _dots_, _points_), 𝐸 is a set whose elements are called _undirected edges_ (_undirected links_, _undirected lines_), and 𝐴 is a set whose elements are called _directed edges_ (_directed links_, _directed lines_, _arcs_, _arrows_). The sets in each graph type are pairwise disjoint. _Οˆβ‚‘: E β†’ { {u,v}: u,v ∈ V }_ and _Οˆβ‚: A β†’ { (u,v): u,v ∈ V }_ are total functions that map each undirected and directed edge, respectively, to an unordered or ordered pair of vertices. These functions are called _incidence functions_ because they associate each edge with its incident vertices.

Given undirected edge 𝑒: _Οˆβ‚‘_(𝑒) = {𝑒,𝑣}, we say that 𝑒 _connects_ (_joins_, is _between_) 𝑒 and 𝑣, and is _incident to_ both 𝑒 and 𝑣. 𝑒 and 𝑣 are _adjacent_ (_neighbors_), and _connected by_ (_joined by_, _incident to_, _ends of_) 𝑒. Likewise, given directed edge π‘Ž: _Οˆβ‚_(π‘Ž) = (𝑒,𝑣), π‘Ž is a directed edge _from_ 𝑒 _to_ 𝑣, _connects_ (_joins_) 𝑒 to 𝑣, _leaves_ (_outgoing edge of_, _outgoing edge from_, _incident from_) 𝑒, and _enters_ (_incoming edge of_, _incoming edge to_, _incident to_) 𝑣. 𝑒 is _adjacent to_ (_in-adjacent to_, _direct predecessor of_) 𝑣, and is the _tail_ of (_source of_, _initial vertex of_, _incident to_) π‘Ž. 𝑣 is _adjacent from_ (_out-adjacent to_, _direct successor of_) 𝑒, and is _head of_ (_target of_, _terminal vertex of_, _incident from_) π‘Ž. We say that two distinct edges are _incident edges_ (_adjacent edges_) if they share a vertex 𝑣. Furthermore, they are _successive edges_ (_consecutive edges_) if either: at least one is undirected, or both are directed and 𝑣 is the head of one and the tail of the other.

A [_loop_](https://en.wikipedia.org/wiki/Loop_(graph_theory)) (_self-edge_, _self-loop_, _buckle_) is an undirected edge 𝑒: _Οˆβ‚‘_(𝑒) = {𝑒,𝑒} (as a multiset) or a directed edge π‘Ž: _Οˆβ‚_(π‘Ž) = (𝑒,𝑒) connecting a vertex with itself. [_Multiple edges_](https://en.wikipedia.org/wiki/Multiple_edges) (_parallel edges_) are two or more undirected edges connecting the same unordered pair of vertices or directed edges connecting the same ordered pair of vertices. A _simple graph_ disallows loops and multiple edges, while a [_pseudograph_](http://mathworld.wolfram.com/Pseudograph.html) allows both.

An _attributed graph_ is a generic term referring to graphs in which an attribute (_single-attributed graph_) or a collection (e.g., a set, a bag, or a list) of attributes (_multi-attributed graph_) may be associated with each vertex (_vertex-attributed graph_), edge (_edge-attributed graph_), or the graph itself. An _attribute_ may be a nominal value, an ordinal value, a key-value pair, or any other annotation.

A _property graph_ (_PG_, _labeled property graph_, _LPG_) is a vertex-multi-attributed and edge-multi-attributed extension of a mixed pseudograph. It supports the following features:

- Vertex and edge labels: Each vertex and each edge is associated with an attribute called _label_ (i.e., _vertex-labeled graph_ and _edge-labeled graph_). The sets of labels assigned to vertices, undirected edges, and directed edges are required to be pairwise disjoint.

- Vertex and edge properties: Each vertex and each edge is associated with a finite set of attributes called _properties_. Each property is an ordered pair (𝑛,𝑣), where 𝑛 is a _property name_ and 𝑣 is a _property value_. For each vertex or edge, property names are required to be pairwise distinct.

Formally, extending the definition of a mixed graph, a _property graph_ is an ordered septuple πΊβ‚š = (𝑉, 𝐸, 𝐴, _Οˆβ‚‘_, _Οˆβ‚_, _Ξ»_, _Οƒ_), where _Ξ»_: 𝑉 βˆͺ 𝐸 βˆͺ 𝐴 β†’ 𝐿 is a total function mapping each vertex and edge to a label, and _Οƒ_: 𝑉 βˆͺ 𝐸 βˆͺ 𝐴 β†’ 2^(𝑃ₙ Γ— 𝑃α΅₯) is a total function mapping each vertex and edge to a set of properties. _Ξ»_ and _Οƒ_ are called _labeling function_ and _property function_ respectively. Let 𝐿, 𝑃ₙ, and 𝑃α΅₯ be the global domains over which a property graph is defined: 𝐿 denotes the set of possible labels, 𝑃ₙ the set of possible property names, and 𝑃α΅₯ the set of possible property values. Typically, 𝐿 and 𝑃ₙ are subsets of Ξ£*, the set of finite strings over a given alphabet Ξ£, while 𝑃α΅₯ includes supported value types (e.g., strings, integers, dates). The sets ⋃(vβˆˆπ‘‰) Ξ»(v), ⋃(e∈𝐸) Ξ»(e), and ⋃(a∈𝐴) Ξ»(A) are pairwise disjoint. For each x ∈ 𝑉βˆͺ𝐸βˆͺ𝐴, let Οƒ(x) = {(𝑛₁,𝑣₁), (𝑛₂,𝑣₂), ..., (𝑛ₖ,𝑣ₖ)}, then the property names 𝑛₁, 𝑛₂, ... , 𝑛ₖ are pairwise distinct.

## The Property Graph Data Model

At its core, _data_ is a representation of _information_, where each _data element_ (_datum_) is an atomic unit of data, corresponding to a basic unit of information. A _[data model](https://en.wikipedia.org/wiki/Data_model)_ provides a comprehensive specification for data elements, defining their meaning (semantics), their organization (structure), and how they relate to one another. This specification typically encompasses the following components:

- An _[upper ontology](https://en.wikipedia.org/wiki/Upper_ontology)_ (_top-level ontology_, _foundation ontology_): this is an [ontology](https://en.wikipedia.org/wiki/Ontology_(information_science)) composed of general, foundational, domain-independent concepts that describe data elements (e.g., entity, relationship, type, feature) and their interconnections.

- A _structure_: this is a framework (e.g., [mathematical](https://en.wikipedia.org/wiki/Mathematical_structure), [lexical](https://en.wikipedia.org/wiki/Lexical_grammar), [diagrammatic](https://en.wikipedia.org/wiki/Data_structure_diagram)) used for organizing these data elements.

The _property graph data model_ comprises the following **concepts**:

- An _entity_ represents information about a physical, conceptual, virtual, or fictional _particular_ (e.g., a certain person, guild, or dragon).

- A _relationship_ (_binary relationship_) represents information about an _association_ or _interaction_ between a pair of entities, or between an entity and itself. Each relationship is either _directional_ (_unidirectional_, _asymmetric_) (e.g., an _owns_ relationship between a _Person_ entity and a _Horse_ entity, an _offspring_ relationship between two _Person_ entities) or bidirectional (_non-directional_, _symmetric_, _reciprocal_) (e.g., a _friend of_ relationship between two _Person_ entities).

- Each entity and relationship is associated with a set of _features_ (_characteristics_). Each feature is defined by an immutable name (e.g., _birthdate_ for a _Person_ entity, _timeframe_ for an _owns_ association) and a value (e.g., _weight_= 450). For any given entity or relationship, feature names are pairwise distinct.

- Each entity and relationship has a single, immutable _type_ (e.g., _Person_, _owns_, _erupts_).

Types may be assigned according to different universals (i.e., shared qualities), such as _person_ entities, _red_ entities, or _owner_ entities. In general, entities of the same type are assumed to be _semantically homogeneous; this assumption applies equally to entities and relationships.
In the context of property graphs, _semantic homogeneity_ entails the following regularities, which describe typical patterns rather than strict constraints:

- _Repetition of existence_: each entity type and relationship type typically classifies multiple instances. That is, a type represents a set of entities or relationships rather than a single instance.
- _Repetition of features_: entities of the same entity type are typically associated with features of the same names; the same holds for relationships.
- _Repetition of feature domains_: for a given feature name, the type of its value is typically consistent across instances of the same entity or relationship type (e.g., if _weight_ is an integer for one _Horse_, it is an integer for _Horse_ entities).
- _Repetition of relationships_:
- for directional relationships, instances of the same relationship type typically connect entities whose types form the same ordered pair of entity types.
- for bidirectional relationships, instances of the same relationship type typically connect entities whose types form the same unordered pair of entity types.

The property graph data model can thus represent _heterogeneous graphs_, that is, graphs that may contain multiple types of entities (_multi-modal graphs_) and multiple types of relationships (_multi-relational graphs_). In addition, entities and relationships may each be associated with multiple features (_multifeatured graphs_).

The property graph data model is a _metamodel_, as it does not specify types of entities or relationships, nor does it define particular sets of features. It is therefore _domain-agnostic_. Instead, domain-specific concepts may be specified and enforced by means of a _property graph schema_ (see next section), which in turn constrains the structure and interpretation of concrete property graph instances.

The _property graph data model_ comprises the following **structure**:

- All data elements are organized within a single property graph mathematical structure.
- A _null vertex_ is a vertex with no properties and a null label. Each null vertex is connected to exactly one edge. Edges connecting two null vertices is not allowed.
- Any vertex other than a null vertex represents an entity. The vertex label is an integer or a nonempty string identifying the _entity type_ (e.g., _Person_, _Guild_, _Dragon_).
- An undirected edge {𝑒, 𝑣}, where neither 𝑒 nor 𝑣 is a null vertex, represents a bidirectional relationship between the entities represented by 𝑒 and 𝑣.
- A directed edge (𝑒, 𝑣), where neither 𝑒 nor 𝑣 is a null vertex, represents a directional relationship from the entity represented by 𝑒 to the entity represented by 𝑣.
- An edge incident to exactly one null vertex represents a relationship between an entity and a nonspecific entity. This allows modeling cases where one participant is unknown or irrelevant, while the existence of the relationship and the values of its features are known. For example, a horse may be known to have been owned during certain timeframes, even if the owners are unknown or unimportant.
- Each edge has a label that is an integer or a nonempty string identifying its _relationship type_ (e.g., _owns_, _member of_).
- A directed edge represents a directional relationship, whereas an undirected edge represents a bidirectional relationship.
- Properties and subproperties represent features and subfeatures of entities and relationships. For any entity or relationship, property names are pairwise distinct strings or integers, identifying feature names. Property values represent the corresponding feature values. For example, a _Person_ entity may have a _name_ property and _firs_ and _last_ subproperties; an _owns_ relationship may have a _timeframe_ property.

- Each feature value is of a data type corresponding to a value type supported by the model. In this paper, we will use the following data types:

- _basic data types_: _int_, _float_, _date_, _datetime_, _duration_, and _string_.
- A _multivalue_ is a set, a bag, or a list of values. All values are of the same basic data type (e.g., each value is a _string_), the same multivalue type (e.g., each value is a set(_string_)), or the same composite type (e.g., each value is a {_first_: _string_, _last_: _string_} composite).
- A _multivalue type_ is defined by the collection type (set, bag, or list) and the data type of its elements. The definition must be nonrecursive.
- A _map_ is a set of (name, value) pairs in which the names are pairwise distinct strings or integers identifying the subfeatures names, and the values are the respective subfeature values.
- A _map type_ is defined by a set of (name, data type) pairs. The definition must be nonrecursive. A _composite_ is a map in which each value is of a basic data type, a multivalue type, or a composite type.

A _basic property_ is a property whose value is of a basic data type. A _multivalued property_ is a property whose value is a multivalue (e.g., set, bag, or list), e.g., _titles_: _set_(_string_) = {"Her Majesty", "Her Royal Highness"}. A _composite property_ is a property whose value is composite, e.g., _name_: (_first_: _string_, _last_: _string_) = ("Brandon", "Stark"). Each member of a composite property is called a _subproperty_.

- _null_ is a valid value for each _nullable_ property and subproperty, regardless of its data type. _Null-valued_ [sub]property indicates that a [sub]feature value is not specified.

Several different interpretations can be associated with a _null_ value. Following the terminology introduced by [Codd](https://dl.acm.org/doi/10.1145/16301.16303) and adopted by many authors, a _null_ value is either
- _Applicable missing_ – at present, a value is applicable (applies to the particular entity or relationship) but unknown (whatever the reason, the graph does not have the value). E.g., the temperature 1000 years ago today; the phone number of a person who owns a phone, but the number is unknown; an answer to a question, where the questionee refused to answer.
- _Inapplicable_ - at present, no value is applicable. E.g., the temperature tomorrow, previous citizenship when there is none, direct manager of the CEO, a new hire's not-yet-assigned employee ID, a phone number of a person who does not own a phone, an answer to a question, where the question was not posed to the questionee.

[Zaniolo](https://www.sciencedirect.com/science/article/pii/0022000084900801) proposed a third basic interpretation of _null_ values:

- _No information_ – at present, the applicability of the unspecified value is unknown. E.g., a person's phone number – when it is unknown whether the person owns a phone; an answer to a question – when it is unknown if the question was posed to the questionee.

Codd, Zaniolo, and many others proposed using two or more types of _null_ instead of a 'generic' _null_, but this approach remains mainly theoretical. In practice, _null_ values often have no consistent semantics. For a _birth date_ property, a _null_ value would likely represent an unknown birth date, but for a _death date_ property, a _null_ value may represent that the date on which the person died is unknown (_applicable missing_), that the person is still alive (_inapplicable_), or that it is unknown if the person is still alive (_no information_).

Though the semantic of _null_ values is not always defined as part of the data model, nor as part of the _data schema_, it still must be well-defined for query languages' operators and functions. E.g., what is the result of (yesterday's date < person's death date) when the _death date_ is _null_? Often, _null_ values represent _applicable missing_ and _no information_, while _magic values_ (e.g., "9999-12-31" for dates) represent _inapplicable values_. In addition, a _sorting comparison operator_ is usually well-defined for _null_ values and may differ from the standard comparison operator (e.g., should _The five persons with the earliest birth date_ return persons with a _null_ birth date?)

- Should new information prove that two or more vertices represent the same entity, these vertices should be merged. Similarly, should new information prove that two or more edges represent the same relationship, these edges should be merged.

- Should new information prove that a vertex represents two or more entities, this vertex should be split. Similarly, should new information prove that an edge represents two or more relationships, this edge should be split.

- Any pair of vertices, except null vertices, must be _distinguishable_, which means that vertices' _identifiers_ must be pairwise distinct, or there should be no pair of vertices with identical type, property values, and relationships. Similarly, any pair of edges must be _distinguishable_, which means that edges' _identifiers_ must be pairwise distinct, or there should be no pair of edges with identical type and property values that connect the same pair of vertices or the same vertex and a null vertex. An _identifier_ is a set of properties (often just an automatically generated index) that collectively uniquely identifies the element.

𝑛-ary relationships, where 𝑛 > 2, are not supported. However, this poses no expressivity limitation since any 𝑛-ary relationship, 𝑛 > 2, can be reframed as an entity and 𝑛 binary relationships. Consider, for example, a ternary relationship, where Person 𝐴 sells Horse 𝐻 to Person 𝐡. Instead, one can reframe this data as a _Sale_ entity 𝑆, a _seller_ relationship from 𝑆 to 𝐴, a _buyer_ relationship from 𝑆 to 𝐡, and an _asset_ relationship from 𝑆 to 𝐻.

The term _property graph_ was introduced by [Rodriguez](https://arxiv.org/abs/1006.2361) and [Neubauer](https://arxiv.org/abs/1004.1001), though other terms were used to describe similar data models. [Tsai and Fu's](https://ieeexplore.ieee.org/document/4310127) _attributed relational graph_ is a directed multigraph in which both nodes and edges have labels, and each label defines a set of numerical or logical attributes. [Shao et al.](https://ieeexplore.ieee.org/abstract/document/7953521) used the term _Heterogeneous graph_ for the same construct. [Gallagher](http://www.aaai.org/Papers/Symposia/Fall/2006/FS-06-02/FS06-02-007.pdf) used the term _data graph_ to refer to graphs in which vertices and/or edges may be typed and/or attributed. [Singh et al.](http://ieeexplore.ieee.org/abstract/document/4272051/) used the term _M*3_ (multi-modal, multi-relational, multifeatured) _network_ to refer to graphs with multiple entity-types, multiple relationship-types, and multiple descriptive features for nodes and edges. [Krause et al.](https://link.springer.com/chapter/10.1007/978-3-319-40530-8_10) used the term _typed graph_ to refer to graphs with typed nodes, typed edges, and typed node properties.

Various extensions were proposed, including:
- Instead of a single label, each vertex has a (possibly empty) set of labels (_vertex multi-labeled graph_); entities are _multi-typed_
- Instead of a single label, each edge has a (possibly empty) set of labels (_edge multi-labeled graph_); relationships are _multi-typed_
- Directional relationship types naming: instead of a name for only one direction (e.g., _owns_), a unique name is defined for each direction (e.g., _owns_, _owned by_; _parent of_, _offspring of_)
- [_Property hypergraphs_](https://link.springer.com/chapter/10.1007%2F978-3-319-26148-5_21) (_hyperedges_ represent 𝑛-ary relationships)
- Schema-level and data-level _metaproperties_ (properties of properties – e.g., units of measure, accuracy, reliability)
- [EPGM – Extended Property Graph Model](https://dbs.uni-leipzig.de/file/EPGM.pdf), in which _logical graphs_ consist of subsets of a shared set of vertices and a shared set of edges. In addition, logical graphs have types and properties.
- Support of _derivation_ (_specialization_) of entity-types, relationship-types, and property types

## The Property Graph Schema

A _schema_ is a model for describing the structure of information in a certain domain using a certain data model. A _property graph schema_ defines the entity-types, the relationship-types, and the properties thereof.

The property graph data model is _schema-optional_. Each property graph may be:
* _Schema-free_ (_schemaless_, _schema-independent_). A schema-free property graph neither defines nor enforces entity or relationship types. Each vertex and edge, regardless of its label, may have properties with any name and of any data type.
* _Schema-based_ (_schema-strict_, _schema-driven_, _schema-full_, _schema-dependent_). A schema-based property graph is a property graph conforming to a given schema.
* _Schema-mixed_ (_schema-hybrid_), where a schema is defined, but additional elements (e.g., additional properties) may be used.

It is much easier to define patterns when the information is presented consistently. For example, to match patterns such as _Any person who owns a white horse_, one would first:

* Define entity-types _Person_ and _Horse_
* Define a relationship-type _owns_ that holds from entities of type _Person_ to entities of type _Horse_
* For the _Horse_ entity-type, define a _color_ property with a nominal data type
* Ensure that all the information is structured accordingly

Though proposed property graph and property graph schema definitions have much in common (see [Angles](http://ceur-ws.org/Vol-2100/paper26.pdf), [Wu](https://arxiv.org/abs/1810.08755), [Hartig and Hidders](https://dl.acm.org/citation.cfm?id=3327964.3328495), and [Angles et al.](https://ieeexplore.ieee.org/document/9088985)), to date, there is neither a de jure nor a de facto standard definition (and hence, no standard property graph schema definition language).

The following _property graph schema model_ is assumed in this paper:

A _property graph schema_ is defined by:
* A finite set of user-defined data types (on top of the built-in data types)
* _Categorical_ (_nominal_ and _ordinal_) data types (e.g., _gender_: nominal _{male, female}_)
* _Multivalued property data types_ (e.g., _nicknames_: set(_string_))
* _Composite data types_ (e.g., _name_ {_first_: _string_, _last_: _string_}
* _Interval data types_ (e.g., _span_: interval(_date_))
* A finite set of entity-types. For each entity-type:
* A unique name
* A set of properties. For each property:
* A unique name
* A data type
* For properties and subproperties with numeric data types, intervals of numeric data types, or multivalued numeric data types: an optional schema-level _units_ metaproperty representing units of measure (e.g., Kg, cm, seconds)
* A finite set of relationship-types. For each relationship-type:
* The relationship-type's directionality: directional or bidirectional
* A unique name
* A set of pairs of entity-types for which the relationship-type is applicable (e.g., _owns_: {(_Person_, _Horse_), (_Person_, _Dragon_)}. When a pair is of the same type (e.g., (_Dragon_, _Dragon_)), loops can be allowed or disallowed
* A set of properties - similar to entity-types' properties

A predefined property-less entity-type _Null_ serves the purpose of realizing relationships to unknown or unimportant entities: sometimes, a real entity is unknown or unimportant, but the existence of a relationship and the values of the relationship's properties - are important. For example, we may know that a certain dragon was owned in a given timeframe, but we do not know or do not care who owned it. Still - we want to be able to store and query such information. _owns_: {(_Person_, _Dragon_), (_Guild_, _Dragon_), (_Null_, _Dragon_)} allows us to realize this.

Property graph schema definitions may vary in many aspects, including:

* Supported schema constraints (properties _uniqueness_ and _nullability_, _property value constraints_, _relationships cardinality constraints_, disallow loops for certain relationship-types, etc.)
* Supported ways to declare user-defined data types
* Properties may be either:
* Defined globally and assigned to one or more entity/relationship-types, or
* Defined per entity/relationship-type: different entity/relationship-types may have a property with the same name but with a different data type

V1 can be utilized with most definitions with minimal changes.

## Patterns and Pattern Languages

A _pattern_ defines a set of topological and property value constraints on property graphs. Each property (sub)graph either _matches_ the pattern or not. For some patterns, a given (sub)graph may match a pattern in more than one way.

When a pattern is described in a natural language, it may be ambiguous or inaccurate. Nevertheless, all the patterns below are described in English. After all, to allow a reader to gain an intuitive understanding of a formal language, one has to use a natural language.

Here are two examples:

* _P1: Any person who owns at least five white horses_ (see Q101)

_P1_ defines the set of (sub)graphs in which

- There is a vertex 𝑝 with a label _Person_
- There are 𝑛 β‰₯ 5 vertices β„Žβ‚..β„Žβ‚™, each with a label _Horse_
- Each of β„Žβ‚..β„Žβ‚™ has a _color_ property, and its value is _white_
- There are relationships from 𝑝 to β„Žβ‚..β„Žβ‚™, each with a label _owns_

Note that the pattern's description ignores temporal aspects. Maybe a person has owned a horse, owns it, or will own it. Assuming that the owns relationship has a timeframe property, a more accurate description would be _Any person who has 'owns' relationships with at least five white horses_. Maybe we are looking for _Any person who currently owns at least five white horses_ or for _Any person who at some timepoint owned at least five white horses_. If, for example, a horse's color may change over time, or if a horse may turn into a unicorn, we might want to rephrase the pattern.

* _P2: Any person whose date of birth is between January 1, 970 and January 1, 980, who owns a white Horse, who owns a dragon whose name starts with 'M', that over the last month froze at least three dragons belonging to members of the Masons Guild_

_P2_ defines the set of (sub)graphs in which

- There is a vertex 𝑝 with a label _Person_
- 𝑝 has a _birthDate_ property of type _date_, and its value is between January 1, 970 and January 1, 980
- There is at least one vertex β„Ž with a label _Horse_
- There is a relationship from 𝑝 to β„Ž with a label _owns_
- β„Ž has a _color_ property, and its value is _white_
- There is at least one vertex 𝑑 with a label _Dragon_
- There is a relationship from 𝑝 to 𝑑 with a label _owns_
- 𝑑 has a _name_ property with a value that starts with 'M'
- There are π‘š > 3 vertices, 𝑑₁..π‘‘β‚˜, each with a label _Dragon_
- There are relationships from 𝑑 to any of 𝑑₁..π‘‘β‚˜, each with a label _freezes_
- Each of these relationships has a _tf_ property (stands for "timeframe") with a _since_ subproperty whose value is in the range [_now_ - _months_(3) .. _now_]
- There is a vertex 𝑔 with a label _Guild_
- 𝑔 has a _name_ property, and its value is _Masons_
- There are 𝑛 β‰₯ 1 vertices π‘žβ‚..π‘žβ‚™, each with a label _Person_
- There are relationships from each of π‘žβ‚..π‘žβ‚™ to 𝑔, each with a label _member of_
- There are relationships from each of π‘žβ‚..π‘žβ‚™ to one or more of 𝑑₁..π‘‘β‚˜, each with a label _owns_. Each of 𝑑₁..π‘‘β‚˜ is connected by at least one of these relationships

The terms _entity_ and _relationship_ denote both pattern elements and graph elements. When the context may be ambiguous, we use the terms _pattern-entity_ and _pattern-relationship_ to refer to pattern elements and the terms _graph-entity_ and _graph-relationship_ to refer to graph elements.

Given a property graph schema 𝑆, a property graph 𝐺 conforming to 𝑆, and a query pattern 𝑃 conforming to 𝑆, all expressed in language 𝐿 = (𝐿𝑆, 𝐿𝐺, 𝐿𝑃, 𝐿𝑅), _pattern matching_ is the process of finding, transforming, merging, and annotating subgraphs of 𝐺 that match 𝑃. The syntaxes of sublanguages 𝐿𝑆, 𝐿𝐺, 𝐿𝑃, and 𝐿𝑅 define what and how symbols may be combined to form well-formed schemas, graphs, patterns, and query results, respectively. A semantics of 𝐿_P_ is a mapping (𝑆, 𝐺, 𝑃) β†’ 𝑅: which subgraphs of 𝐺 match 𝑃 and how to transform, merge, and annotate them.

Any valid subgraph that matches the pattern is called _an assignment_. We use _assignment to_ 𝑋 where 𝑋 is a pattern-entity, a pattern-relationship, or a set of thereof, to denote the graph-entity, the graph-relationship, or the set of thereof that matches 𝑋 as part of an assignment.

In the patterns given below, unless otherwise stated, each reported assignment should include the graph-entity assigned to each mentioned pattern-entity and the graph-relationship assigned to each mentioned pattern-relationship. Hence, any reported assignment to _P1_ should be composed of:

- A _Person_ graph-entity
- Five or more _Horse_ graph-entities, each of which has a _color_ property, and its value is _white_
- The _owns_ graph-relationships between the _Person_ graph-entity to those _Horse_ graph-entities

Consider the following alternative patterns:

* _P1': Any person who owns at least five white horses. Report only the person_
* _P1'': Any person who owns at least five white horses. Report only the horses_
* _P1''': Any person who owns at least five white horses. Report the person and five of his horses_

A query may be:

* A decision query: does at least one assignment exist?
* A counting query: how many assignments exist?
* A counting-decision query: are there at least π‘˜ assignments?
* A reporting query:
* Report [all / up to π‘˜] subgraphs of 𝐺, each is an assignment
* Report subgraphs of 𝐺, each is a union of assignments, e.g., the union of all assignments with identical assignments to all entities (and different assignments to relationships)
* Report a single subgraph of 𝐺, composed of the union of all assignments. This is sometimes preferred since it avoids a combinatorial explosion for many queries (e.g., if a person owns ten white horses, any subset of five of the person's horses compose an assignment to _P1'''_). However, for some patterns, individual assignments cannot be deduced from their union.

Implementations may support one or more of the above.

V1 introduces the concept of _calculated properties_ - non-inherent properties of graph-entities, graph-relationships, and subgraphs, defined as part of a pattern. Each calculated property's evaluation result can be part of the reported query results, extending V1 capabilities beyond 'simple' pattern matching. For example, _The average number of horse ownerships per person_ - a calculated property of the set of all graph-entities of type _Person_ can be defined as part of a pattern. See Q356).

Pattern languages differ in many aspects, including:

* _Genericity_ - [general-purpose](https://en.wikipedia.org/wiki/General-purpose_language) (e.g., schema-driven) vs. [domain-specific](https://en.wikipedia.org/wiki/Domain-specific_language)
* _Pattern representation_ - _textual_ vs. _[visual](https://en.wikipedia.org/wiki/Visual_programming_language)_ (_graphical_, _diagrammatic_)
* _Receptivity_ and _Productivity_ (i.e., _readability_ and _writability_) - how intuitive and straightforward is it to understand existing patterns and construct new ones
* _Conciseness_ - the fewness of symbols and symbol types required for expressing patterns
* _Aesthetics_ - the quality of patterns being visually appealing
* _Declarative / Imperative_ - _[Declarative](https://en.wikipedia.org/wiki/Declarative_programming)_ languages describe patterns but do not specify how to match them. _[Imperative](https://en.wikipedia.org/wiki/Imperative_programming)_ languages describe patterns in terms of the steps required to match them on a given computational machine model (e.g., the [Gremlin Traversal Machine](https://arxiv.org/abs/1508.03843)). Languages may provide both declarative and imperative constructs.
* _[Expressive power](https://en.wikipedia.org/wiki/Expressive_power_(computer_science))_ - the breadth of patterns that can be expressed.
Unless a pattern language (declarative or imperative) is Turing-complete, there will always be computable patterns that cannot be expressed.

There are always tradeoffs, especially between _receptivity and productivity_ and _expressive power_. Quoting Perlis' 54th and 55th [epigrams of programming](https://en.wikipedia.org/wiki/Epigrams_on_Programming): _"Beware of the Turing tar-pit in which everything is possible but nothing of interest is easy."_ and _"A LISP programmer knows the value of everything, but the cost of nothing."_ Perlis' 93rd and 26th epigrams are also worth quoting here: _"When someone says, 'I want a programming language in which I need only say what I wish done,' give him a lollipop."_ and _"There will always be things we wish to say in our programs that in all known languages can only be said poorly."_ Though these epigrams refer to programming languages, they are equally valid for property graph query languages.

## A Song of Ice and Fire

We will use the following scenario, loosely based on [George R. R. Martin's _A Song of Ice and Fire_](http://www.georgerrmartin.com/bibliography/), to demonstrate the language's expressive power:

The subjects of [Sarnor](http://awoiaf.westeros.org/index.php/Kingdom_of_Sarnor), [Omber](http://awoiaf.westeros.org/index.php/Kingdom_of_Omber), and the other kingdoms of the [known world](http://awoiaf.westeros.org/index.php/Known_world) love their [horses](http://awoiaf.westeros.org/index.php/Horse). There is one thing they adore even more - that is their [dragons](http://awoiaf.westeros.org/index.php/Dragon). They own dragons of ice and fire. Like all well-behaved dragons, their dragons love to play. Dragons always play in pairs. When playing, dragons often get furious, fire at each other (fire breath), and freeze one another (cold breath). Dragons usually freeze one another for several minutes. However, on occasion, when they are furious, they can freeze one another for several hours. The subjects enjoy watching their dragons play. Fascinated by these magnificent creatures, they have composed myriads of scrolls detailing each fire and cold breath over the last thousand years. The kings of Sarnor and Omber regularly pose queries about their history. Often, it takes the royal historians and analysts several days to come up with answers, during which the kings tend to get impatient. Lately, the [high king of Sarnor](http://awoiaf.westeros.org/index.php/High_King_of_Sarnor) posed a very complex query. After waiting for results for more than two moons, he ordered the chief analyst to be executed. He then summoned his chief mechanics and ordered them to develop an apparatus that he could use to pose queries and get results quickly.

The engineers started by collecting all queries posed by their master over the last few years. Then they constructed a property graph schema over which these queries can be expressed.

The schema was composed of the following entity-types (and their properties):

* ***Person***: _name_ {_first_: _string_, _last_: _string_}, _gender_: _nominal {male, female}_, _birthDate_: _date_, _deathDate_: _date_, _height_: _int_ [cm]
* ***Dragon***: _name_: _string_, _color_: _nominal {black, white, ...}_
* ***Horse***: _name_: _string_, _color_: _nominal {black, white, ...}_, _weight_: _int_ [Kg]
* ***Guild***: _name_: _string_
* ***Kingdom***: _name_: _string_

the following directional relationship-types (and their properties):

* ***owns***: {(_Person_, _Horse_), (_Person_, _Dragon_), (_Guild_, _Horse_), (_Guild_, _Dragon_)} - _df_: _dateframe_

When the person is still the owner and the ownership has no defined termination date (the value in _inapplicable_), df.till is 31/12/9999.

* ***fires at***: {(_Dragon_, _Dragon_)} - _time_: _datetime_; no loops allowed
* ***freezes***: {(_Dragon_, _Dragon_)} - _tf_: _datetimeframe_; no loops allowed

tf is set only after the freeze has ended. tf.since and tf.till are _non-nullable_.

* ***offspring of***: {(_Person_, _Person_)}; no loops allowed
* ***member of***: {(_Person_, _Guild_)} - _df_: _dateframe_

When the person is still a member and the membership has no defined termination date (the value in _inapplicable_), df.till is 31/12/9999.

* ***subject of***: {(_Person_, _Kingdom_)}

and of the following bidirectional relationship-type (and its properties):

* ***friend of***: {(_Person_, _Person_)} - _since_: _date_; no loops allowed

_Person_'s name is a composite property. The _date_, _datetime_, _dateframe_, and _datetimeframe_ data types are defined in [Data Types, Operators, and Functions](#data-types-operators-and-functions).

The engineers then represented the whole known history using a property graph conforming to this schema.

## V1 Basics

The following sections describe the syntax and the semantics of the V1 language. We start with the basics, adding more language elements as we go along.

**Note:** V1 has two equivalent syntaxes for expressing patterns: A _visual syntax_ - described below, and a _textual syntax_ (JSON-based) - summarized [here](https://github.com/LiorKogan/V1/blob/master/JSON%20pattern.md). There is a bijective mapping between patterns expressed in these two syntaxes. Sample textual patterns are available [here](https://github.com/LiorKogan/V1/tree/master/JSON%20Patterns). A V1 schema for _A Song of Ice and Fire_ is available [here](https://github.com/LiorKogan/V1/blob/master/Dragons%20Schema.json).

![V1](Pictures/BB01.png)

![V1](Pictures/BB04.png)

Patterns are generally read from left to right. Each pattern starts with **a small black diamond**, denoting the pattern start. The most straightforward patterns are structured as a sequence of rectangles, where consecutive rectangles are connected with an arrow or a line.

Yellow, blue, and red rectangles represent _concrete_, _typed_ and _untyped_ entities, respectively. The terms _concrete entity_, _typed entity_, and _untyped entity_ refer to pattern entities only (and not to graph entities).

**A yellow rectangle** represents a _concrete entity_: a specific person, a specific horse, etc. A concrete entity has a single assignment - a specific graph-entity. The text inside the rectangle denotes the entity-type and the value of a _visualization expression_ defined for this entity-type. For example, the visualization-expression for the _Person_ entity-type may be: _name.first_ βˆ₯ ' ' βˆ₯ _name.last_ and its value, for a specific graph-entity, would be 'Brandon Stark'.

**A blue rectangle** represents a _typed entity_. The text inside the rectangle denotes an entity-type. Only graph-entities of this type may be assigned to the pattern-entity.

**A red rectangle** represents an _untyped entity_. Graph-entities of different types may be assigned to an untyped entity. An optional text inside the rectangle denotes an entity-type constraint (See [Untyped Entities](#untyped-entities)).

Two consecutive rectangles can be connected with:

* A horizontal **black arrow**, representing a _directional typed relationship_,
* A horizontal **black line**, representing either a _bidirectional typed relationship_ or a directional typed relationship for which either direction is acceptable,
* A horizontal **red arrow**, representing an _untyped directional relationship_ (see [Untyped Relationships](#untyped-relationships)),
* A horizontal **red line**, representing either an _untyped bidirectional relationship_ or an _untyped directional relationship_ where either direction is acceptable, or
* A horizontal **blue line**, representing a _pattern-path_ (see [Paths](#paths))

The terms _typed relationship_ and _untyped relationship_ refer only to pattern relationships. The term _path_ may refer to both _graph-path_ and _pattern-path_.

Each black arrow/line has a label on top. The label denotes a relationship-type. For arrows - the label is aligned to the arrow's origin. For lines - the label is centered. Only graph-relationships of this type can match the pattern-relationship.

A pattern matching engine would look in the property graph for assignments for every blue rectangle, red rectangle, black arrow, and black line. Graph entities are assigned to pattern entities. Graph relationships are assigned to pattern relationships. An assignment to the pattern is a set of graph-entities and graph-relationships that matches the whole pattern.

The relationship-type between any two entities must be valid with respect to the schema.

_**Q1:** Any dragon owned by Brandon Stark_ (two versions)

![V1](Pictures/Q001-1.png)

![V1](Pictures/Q001-2.png)

_**Q2:** Any dragon C that at least once had been frozen by a dragon owned by Brandon Stark_

![V1](Pictures/Q002.png)

_**Q184:** Any dragon C that at least once froze a dragon owned by Brandon Stark or was frozen by a dragon owned by Brandon Stark_

![V1](Pictures/Q184.png)

Both directions of the _freezes_ relationship are acceptable. Therefore - a line (instead of an arrow) is used in the pattern.

## Expressions and Expression Constraints

![V1](Pictures/BB02.png)

A **green rectangle** represents an expression. The rectangle contains:

- An _expression-tag_ ('{xt}') (see [Expression-Tags](#referencing-expression-tags))
- An expression ('_expr_')
- An optional _constraint_ on the result of the evaluation of the expression, composed of:
- A constraint operator
- A constraint expression (except when the constraint operator is _is null_ or _not null_)
- When units of measure are defined for the expression (based on the units of measures of the properties and the operators that compose the expression) - they are depicted as well (see Q117, Q304, Q265, Q95).

_expr_ is an entity's expression, a relationship's expression, or a Cartesian product's expression, or a global expression.

- An _entity's expression_

The expression is composed of or depends on at least one property (inherent or calculated) of the connected entity and no properties of other entities/relationships.

The green rectangle is connected to a pattern-entity (concrete, typed, or untyped) on its left.

An expression-tag of an entity's expression is _a property of_ each unique assignment to the pattern-entity.

- A _relationship's expression_

The expression is composed of or depends on at least one property (inherent or calculated) of the connected relationship and no properties of other entities/relationships.

The green rectangle is connected to a pattern-relationship on its top.

An expression-tag of a relationship's expression is _a property of_ each unique assignment to the pattern-relationship (Note that it is not a property of an assignment to the Cartesian product of the two related entities).

- A _Cartesian product's expression_

The expression is composed of or depends on properties (inherent or calculated) of at least two entities (see {3} in Q207, {2}, {3} and {4} in Q340), at least two relationships (see {2} in Q267v2), or at least one entity and one relationship (see {1} in Q115v2, {2} and {4} in G3).

The green rectangle is connected to one of the entities/relationships or located at the same level as the leftmost entities (when there is no single leftmost entity) (see Q207).

An expression-tag of a Cartesian product's expression is _a property of_ each unique assignment to the Cartesian product.

See also _extended Cartesian product's expression_ in [Extended Aggregators](#extended-aggregators).

- A _global expression_

The expression is composed of and depends on no entity's expression, relationship's expression, nor Cartesian product's expression.

The green rectangle is located at the same level as the leftmost entities (see Q375).

A global expression is a _global property_.

Any expression-tag of an expression that is not a property name, a subproperty name, or a constant is called _a calculated property_.

An _expression_ is

- A literal (_string_, _integer_, or _float_),

Note: _date_, _datetime_, and _duration_ literals are represented using the functions _date_(_string_), _datetime_(_string_) and _duration_(_string_), respectively. In visual syntax, these function names are omitted, and expressions are formatted according to the regional settings (see Q8),

- <_inherent property name_> (of a connected entity/relationship)

(valid for a Cartesian product's expression only if it is connected to an entity/relationship),

- <_inherent property name_>.<_subproperty name_>[.<_subproperty name_> ...] (of a connected entity/relationship)

(valid for a Cartesian product's expression only if it is connected to an entity/relationship),

- An _expression-tag_ or an _aggregation-tag_ (e.g., '{1}'),
- _op expr_, where _op_ is a unary operator (e.g., '- {1}'),
- _expr op expr_, where _op_ is a binary operator (e.g., '3 + {1}'),
- (_expr_),
- '𝑓' where 𝑓 is a parameterless function (e.g., '_now_'. See G11),
- '𝑓(e1, e2, ...)' where 𝑓 is a function with at least one parameter and e1, e2, ... are expressions (see Q353),
- 'e1.𝑓' - equivalent to 𝑓(e1), where 𝑓 is a function with one parameter and e1 is an expression,
- 'e1.𝑓(e2, e3, ...)' - equivalent to 𝑓(e1, e2, e3, ...), where 𝑓 is a function with more than one parameter and e1, e2, e3, ... are expressions,
- An _interval expression_ (see Q327),
- A _set expression_ (see Q318),
- A _bag expression_ (see Q315), or
- A _list expression_

An [_interval_](https://en.wikipedia.org/wiki/Interval_(mathematics)) can be explicitly constructed using the following syntaxes:

* (_expr1_ .. _expr2_) - an open interval
* (_expr1_ .. _expr2_] - a half-open interval
* [_expr1_ .. _expr2_) - a half-open interval
* [_expr1_ .. _expr2_] - a closed interval

Both _expr1_ and _expr2_ are of the same ordinal data type.

if _expr1_ > _expr2_ - the interval is an _empty interval_.

if _expr1_ = _expr2_ - (_expr1_ .. _expr2_), (_expr1_ .. _expr2_], [_expr1_ .. _expr2_) are _empty intervals_.

If _expr1_, _expr2_, or both are evaluated to _null_ - the interval is evaluated to _null_.

A _set_ is an unordered collection of zero or more _non-null_ values (called _elements_) of the same data type in which each element may occur only once.

A set can be explicitly constructed using the following syntax:

* {_expr_, _expr_, ...} - zero or more expressions of the same data type. _null_ values are ignored. Duplicate values are merged.

A comma after the last element is optional, except for a single-element set where it is mandatory.

A _bag_ (_multiset_) is an unordered collection of zero or more _non-null_ values (called _elements_) of the same data type in which elements may occur more than once.

A bag can be explicitly constructed using the following syntax:

* [_expr_, _expr_, ...] - zero or more expressions of the same data type. _null_ values are ignored.

A comma after the last element is optional.

Bag elements are unordered, and duplicates are allowed. A bag may not contain _null_ values.

A _list_ is an ordered collection of values (called _elements_) of the same data type in which elements may occur more than once.

A list can be explicitly constructed using the following syntax:

* (_expr_, _expr_, ...) - zero or more expressions of the same data type.

A comma after the last element is optional, except for a single-element list where it is mandatory.

Expressions must match the data types defined for each operator and function.

A constraint filters assignments; an assignment is valid only if the result of the expression's evaluation _satisfies_ the constraint.

A constraint cannot be defined for a concrete entity's expression.

For untyped entities, expressions can be composed only of properties common to all valid entity-types. Valid entity-types for an untyped entity are defined implicitly (according to the types of the pattern-entities and pattern-relationships which are connected to the untyped entity) or explicitly (using entity-type constraints - see later) (see Q291).

A subproperty of a composite property is denoted as <_property name_>.<_subproperty name_> (e.g., _name.first_, _tf.since_).

_**Q3:** Any person whose first name is Brandon who owns a dragon_ (version 1)

![V1](Pictures/Q003-1.png)

{1} is a property of each unique assignment to B.

_**Q190:** Any person who became a dragon owner at 1011 or later_ (two versions)

![V1](Pictures/Q190-1.png)

{1} is a property of each unique assignment to the _owns_ relationship.

![V1](Pictures/Q190-2.png)

_year_ is a function (see next section).

All V1 constraint operators, except _is null_ and _not null_, are first evaluated using [Kleene's three-valued logic](https://www.jstor.org/stable/2267778) (3VL) to _true_, _false_, or _unknown_, and then mapped to a two-valued logic: the constraint is either _satisfied_ or _not satisfied_.

Each constraint operator, except _is null_ and _not null_, can be either blue or red.

![V1](Pictures/BB10-1.png)

- **A blue constraint operator**: the constraint is satisfied if and only if it is evaluated to _true_
- **A red constraint operator**: the constraint is satisfied if and only if it is evaluated to _true_ or to _unknown_

The following constraint operators can be only blue:

![V1](Pictures/BB10-2.png)

* A _is null_ constraint is satisfied if and only if the expression is evaluated to _null_
* A _not null_ constraint is satisfied if and only if the expression is not evaluated to _null_

## Data Types, Operators, and Functions

All V1's operators and functions must be well-defined when one or more of the operands or parameters are _null_ or evaluated to _null_. _Null-valued_ [sub]properties are interpreted as _applicable missing or no information_ (e.g., 1 + _null_ = _null_; max(5, _null_) = _null_).

Since V1 is schema-based, there is no need to define the behavior of each operator for any combination of operand types (and similarly, for each function for any combination of parameter types). When types do not match the definition (e.g., 5 > 'abc', _round_('abc')) – the query is invalid. Type mismatch should be detected during query analysis. In addition, interactive pattern-building tools should disallow the construction of such queries.

One design goal of V1 is to make it applicable to many property graph database management systems. Implementations may support different data types, operators, and functions than those presented here.

To present V1, we use the following data types, operators, and functions:

**Built-in basic data types:**

|Type | Notes
|-----------------------------|-----------------------------
| _int_ | Integer
| _float_ | Floating-point
| _date_ | Date. For simplicity, we will not consider time zones here.
| _datetime_ | Date and time. For simplicity, we will not consider time zones here.
| _duration_ | Can be negative
| _string_ | Unicode string

**Built-in composite data types:**

|Type | Notes
|-----------------------------|-----------------------------
| _dateframe_ | {_since_: _date_, _till_: _date_}
| _datetimeframe_ | {_since_: _datetime_, _till_: _datetime_}

**Literals:**

|Type | Examples
|-----------------------------|-----------------------------
| _integer_ | 12, -3
| _float_ | 3., 3.12, -1.78e-6, NaN, -INF, +INF
| _string_ | "", "abc", 'abc'

We will use _St_, _Bt_, and _Lt_, to denote a set, a bag, and a list of elements of type 𝑑, respectfully, and _It_ to denote an interval of ordinal type 𝑑.

**Operators:**

|Operator (_op_) | Operands and result type (result may be _null_ as well)
|----------------------------------|-----------------------------
| +, - (unary) | _op_ _int_ β†’ _int_
_op_ _float_ β†’ _float_
If the operand is NaN – the result is NaN. Otherwise, If it is _null_ – the result is _null_
| +, - (binary) | _int_ _op_ _int_ β†’ _int_
_float_ _op_ _float_ β†’ _float_
_duration_ _op_ _duration_ β†’ _duration_
_duration_ + _date_ β†’ _date_
_date_ _op_ _duration_ β†’ _date_
_duration_ + _datetime_ β†’ _datetime_
_datetime_ op _duration_ β†’ _datetime_
If one or both operands are NaN – the result is NaN. Otherwise, if one or both operands are _null_ – the result is _null_
| * | _int_ * _int_ β†’ _int_
_float_ * _float_ β†’ _float_
_float_ * _duration_ β†’ _duration_
_duration_ * _float_ β†’ _duration_
If one or both operands are _null_ - the result is _null_
| / | _int_ / _int_ β†’ _int_ (truncated towards zero)
_float_ / _float_ β†’ _float_
_duration_ / _float_ β†’ _duration_
If one or both operands are _null_ - the result is _null_
| % (modulo) | _int_ % _int_ β†’ _int_ (remainder has the same sign as the dividend)
If one or both operands are _null_ - the result is _null_
| βˆͺ, ∩, -, β–³
(union, intersection, difference, symmetric difference) | _St op St_ β†’ _St_ (𝑑 is any type)
_Bt op Bt_ β†’ _Bt_ (𝑑 is any type) (see Q377)
If one or both operands are _null_ - the result is _null_
| βˆ₯ (concatenation) | string βˆ₯ string β†’ string. 𝑠 βˆ₯ _null_ = _null_ βˆ₯ 𝑠 = _null_
_Lt_ βˆ₯ _Lt_ β†’ _Lt_ (𝑑 is any type). _null_ βˆ₯ 𝐿 = 𝐿 βˆ₯ _null_ = _null_
𝑑 βˆ₯ _Lt_ β†’ _Lt_ (𝑑 is any type). _null_ βˆ₯ 𝐿 = (_null_, ...). 𝐿. 𝑑 βˆ₯ _null_ = _null_
_Lt_ βˆ₯ 𝑑 β†’ _Lt_ (𝑑 is any type). 𝐿 βˆ₯ _null_ = (..., _null_). _null_ βˆ₯ 𝑑 = _null_

**Constraint Operators:**

|Operator (_op_) | Operands type (result is false / true / unknown)
|---------------------------------------|-----------------------------
| is null, not null (unary) | any_type _op_
An empty set / bag / list is not a _null_ value.
| =, β‰  | both operands of the same type (any type)
_unknown_ if at least one operand is _null_
Exceptions for _float_:
(NaN = _null_) = _false_; (NaN β‰  _null_) = _true_
| <, >, ≀, β‰₯ | both operands of the same ordinal type:
_int_ / _float_ / _date_ / _datetime_ / _duration_ / _string_ / other ordinal
_unknown_ if at least one operand is _null_.
Exceptions for _string_:
("" ≀ _null_) = (_null_ β‰₯ "") = _true_
("" > _null_) = (_null_ < "") = _false_
("" > _null_) = (_null_ < "") = _false_
Exceptions for _float_:
(_null_ ≀ NaN) = (_null_ β‰₯ NaN) = (_null_ < NaN) = (_null_ > NaN) = _false_
(NaN ≀ _null_) = (NaN β‰₯ _null_) = (NaN < _null_) = (NaN > _null_) = _false_
Exceptions for bounded types with no NaN value:
(_lb_ ≀ _null_) = (_null_ β‰₯ _lb_) = (_ub_ β‰₯ _null_) = (_null_ ≀ _ub_) = _true_
(_ub_ < _null_) = (_null_ > _ub_) = (_lb_ > _null_) = (_null_ < _lb_) = _false_
where _lb_ is the lower bound (e.g., INT_MIN for _integer_) and _hb_ is the upper bound (e.g., INT_MAX for _integer_)
| ∈, βˆ‰ ([not] in) | left operand: any type 𝑑. right operand: _St_ / _Bt_ / _Lt_
_unknown_ if at least one operand is _null_. Exceptions:
(_null_ ∈ {}/[]/()) = _false_; (_null_ βˆ‰ {}/[]/()) = _true_

left operand: any ordinal type 𝑑. right operand : _It_
_unknown_ if at least one operand is _null_. Exceptions:
(_null_ ∈ _empty interval_) = false; (_null_ βˆ‰ _empty interval_) = _true_
𝑑 is _int_: (_null_ ∈ [INT_MIN, INT_MAX]) = _true_; (_null_ βˆ‰ [INT_MIN, INT_MAX]) = _false_
| βˆ‹, ∌ ([not] contains) | right operand: any type 𝑑. left operand: _St_ / _Bt_ / _Lt_
_unknown_ if at least one operand is _null_. Exceptions:
({}/[]/() βˆ‹ _null_) = _false_; ({}/[]/() ∌ _null_) = _true_

right operand: any ordinal type 𝑑. left operand : _It_
_unknown_ if at least one operand is _null_. Exceptions:
(_empty interval_ βˆ‹ _null_) = _false_; (_empty interval_ ∌ _null_) = _true_
𝑑 is _int_: ([INT_MIN, INT_MAX] βˆ‹ _null_) = _true_; ([INT_MIN, INT_MAX] ∌ _null_) = _false_
| βŠ†, ⊈ ([not] sub of)
βŠ‚, βŠ„ ([not] proper sub of) | both operands: _string_
_unknown_ if at least one operand is _null_. Exceptions:
(_null_ βŠ‚ "") = _false_; (_null_ βŠ„ "") = _true_
("" βŠ† _null_) = _true_; ("" ⊈ _null_) = _false_

both operands of the same type: _St_ / _Bt_ / _Lt_ (t is any type)
_unknown_ if at least one operand is _null_. Exceptions:
(_null_ βŠ‚ {}/[]/()) = _false_; (_null_ βŠ„ {}/[]/()) = _true_
({}/[]/() βŠ† _null_) = _true_; ({}/[]/() ⊈ _null_) = _false_

𝑑 is ordinal, and both operands of the same type: _It_
_unknown_ if at least one operand is _null_. Exceptions:
(_null_ βŠ‚ _empty interval_) = _false_; (_null_ βŠ„ _empty interval_) = _true_
(_empty interval_ βŠ† _null_) = _true_; (_empty interval_ ⊈ _null_) = _false_
𝑑 is _int_: (_null_ βŠ‚ [INT_MIN, INT_MAX]) = _true_; (_null_ βŠ„ [INT_MIN, INT_MAX]) = _false_
| βŠ‡, βŠ‰ ([not] super of)
βŠƒ, βŠ… ([not] proper super of) | both operands: _string_
_unknown_ if at least one operand is _null_. Exceptions:
(_null_ βŠ‡ "") = _true_; (_null_ βŠ‰ "") = _false_
("" βŠƒ _null_) = _false_; ("" βŠ… _null_) = _true_

both operands of the same type: _St_ / _Bt_ / _Lt_ (t is any type)
_unknown_ if at least one operand is _null_. Exceptions:
(_null_ βŠ‡ {}/[]/()) = _true_; (_null_ βŠ‰ {}/[]/()) = _false_
({}/[]/() βŠƒ _null_) = _false_; ({}/[]/() βŠ… _null_) = _true_

𝑑 is ordinal, and both operands of the same type: _It_
_unknown_ if at least one operand is _null_. Exceptions:
(_null_ βŠ‡ _empty interval_) = _true_; (_null_ βŠ‰ _empty interval_) = _false_
(_empty interval_ βŠƒ _null_) = _false_; (_empty interval_ βŠ… _null_) = _true_
𝑑 is _int_: ([INT_MIN, INT_MAX] βŠƒ _null_) = _true_; ([INT_MIN, INT_MAX] βŠ… _null_) = _false_
| ⊳, β‹« ([not] starts with) | both operands: _string_
_unknown_ if at least one operand is _null_. Exceptions:
(_null_ ⊳ "") = _true_; (_null_ β‹« "") = _false_

left operand: _Lt_. right operand: 𝑑 (𝑑 is any type)
_unknown_ if at least one operand is _null_. Exceptions:
(() ⊳ _null_) = _false_; (() β‹« _null_) = _true_

left operand: _Lt_. right operand: _Lt_ (𝑑 is any type)
_unknown_ if at least one operand is _null_. Exceptions:
(_null_ ⊳ ()) = _true_; (_null_ β‹« ()) = _false_
| ⊲, β‹ͺ ([not] ends with) | both operands: _string_
_unknown_ if at least one operand is _null_. Exceptions:
(_null_ ⊲ "") = _true_; (_null_ β‹ͺ "") = _false_

left operand: _Lt_. right operand: 𝑑 (𝑑 is any type)
_unknown_ if at least one operand is _null_. Exceptions:
(() ⊲ _null_) = _false_; (() β‹ͺ _null_) = _true_

left operand: _Lt_. right operand: _Lt_ (𝑑 is any type)
_unknown_ if at least one operand is _null_. exceptions:
(_null_ ⊲ ()) = _true_; (_null_ β‹ͺ ()) = _false_
| ≍, β‰­ ([not] match) | both operands: _string_ (right operand is a regex string)
unknown if at least one operand is null. Exceptions:
(_null_ ≍ "") = _true_; (_null_ β‰­ "") = _false_

**Implicit Type Coercion**

|From type (_t1_) | To type (_t2_) | Examples
|---------------------|-----------------------| -----
|_int_ | _float_ | 5 + 3. β†’ 8.
|_date_ | _datetime_ | _date_("2018-04-05") = _datetime_("2018-04-05T00:00:00") β†’ true
|_dateframe_ | _datetimeframe_ | _dateframe_("2018-04-05", "2018-04-08") = _datetimeframe_("2018-04-05T00:00:00", "2018-04-08T23:59:59.999999999")
|_dateframe_ | interval(_date_) | _df_._duration_, where _df_ is a _dateframe_ property
|_datetimeframe_ | interval(_datetime_) | _tf_._duration_, where _tf_ is a _datetimeframe_ property
|interval(_date_) | _dateframe_ | (_date_("2018-04-05") .. _date_("2018-05-05")).since β†’ _date_("2018-04-05")
|interval(_datetime_) | _datetimefrmae_ | (_date_("2018-04-05") .. _datetime_("2018-05-05T00:00")).since β†’ _datetime_("2018-04-05T00:00")

Also, based on any of these coercion rules:

|From type | To type | Examples
|---------------------|-----------------------------| -----
|{} (empty set) | set(_t2_) of any type _t2_ | {} βˆͺ {5} β†’ {5}
|[] (empty bag) | bag(_t2_) of any type _t2_ | [] βˆͺ [5] β†’ [5] (see Q349)
|() (empty list) | list(_t2_) of any type _t2_ | () βˆ₯ (5,) β†’ (5,)
|set(_t1_) | set(_t2_) | {3, 5, 8} βˆͺ {3., 8.) β†’ {3., 5., 8.}
|bag(_t1_) | bag(_t2_) | [3, 5, 8] βˆͺ [3., 8.] β†’ [3., 3., 5., 8., 8.]
|list(_t1_) | list(_t2_) | (3, 5, 8) βˆ₯ (3., 8.) β†’ (3., 5., 8., 3., 8.)
|interval(_t1_) | interval(_t2_) | (3 .. 8) = (3. .. 8.) β†’ true (3 .. 8) βŠƒ (3. .. 5.) β†’ true
|{_t1_, _t2_, ...} | set(_t2_) * | {3, 5.) β†’ {3., 5.)
|[_t1_, _t2_, ...] | bag(_t2_) * | [3, 5.] β†’ [3., 5.]
|(_t1_, _t2_, ...) | list(_t2_) * | (3, 5.) β†’ (3., 5.)
|_t1_ .. _t2_ | interval(_t2_) * | [3 .. 5.) β†’ [3. .. 5.)

\* where there is implicit type coercion from t1 to t2

**Functions**

Functions over _float_ expressions:

|Function | Notes
|----------------------------------|-----------------------------
| _trunc_(_float_) β†’ _int_ | truncates toward zero
| _round_(_float_) β†’ _int_ | rounds to the nearest integer (see G13)
| _mRound_(_float_, _int_) β†’ _int_ | rounds to the nearest multiple of a given integer (see G9, G10)
| _seconds_(_float_) β†’ _duration_ | (e.g., _seconds_(6) is a duration of 6 seconds)
| _minutes_(_float_) β†’ _duration_ | See G10
| _hours_(_float_) β†’ _duration_ |
| _days_(_float_) β†’ _duration_ | See Q216, Q289
| _weeks_(_float_) β†’ _duration_ | One week = 7 days
| _months_(_float_) β†’ _duration_ | One month = 30.4367 days (see Q110)
| _years_(_float_) β†’ _duration_ | One year = 365.24 days (see Q317)

Functions over _string_ expressions:

|Function | Notes
|-----------------------------------|-----------------------------
| _length_(_string_) β†’ _int_ | See Q255
| _toLower_(_string_) β†’ _string_ | See Q308
| _date_(_string_) β†’ _date_ | String format: YYYY-MM-DD (e.g., "2018-04-23")
In visual syntax the, function name is omitted, and the string is formatted according to the regional settings.
| _datetime_(_string_) β†’ _datetime_ | String format: YYYY-MM-DDTHH:MM[:SS[.sss]]
(e.g., "2018-04-23T12:34:00")
In visual syntax, the function name is omitted, and the string is formatted according to the regional settings.
| _duration_(_string_) β†’ _duration_ | String format adapted from Cypher: P[nY][nM][nW][nD][T[nH][nM][nS]]
(e.g., "P1Y2M10DT12H45M30.25S")
Y: years, M: months, W: weeks, D: days, H: hours, M: minutes, S: seconds
In visual syntax, the function name is omitted, and the string is formatted.

Functions over _datetime_ expressions:

|Function | Notes
|----------------------------------------------------|------
| _date_(_datetime_) β†’ _date_ | See Q158
| _year_(_datetime_) β†’ _int_ | See Q185
| _month_(_datetime_) β†’ _int_ | The month of the year (1-12)
| _day_(_datetime_) β†’ _int_ | The day of the month (1-31)
| _hour_(_datetime_) β†’ _int_ | 0-23 (see G4)
| _minute_(_datetime_) β†’ _int_ | 0-59
| _sec_(_datetime_) β†’ _int_ | 0-59
| _yearsSinceEpoch_(_datetime_) β†’ _float_ |
| _monthsSinceEpoch_(_datetime_) β†’ _float_ |
| _weeksSinceEpoch_(_datetime_) β†’ _float_ |
| _daysSinceEpoch_(_datetime_) β†’ _float_ |
| _hoursSinceEpoch_(_datetime_) β†’ _float_ |
| _minsSinceEpoch_(_datetime_) β†’ _float_ |
| _span_(_datetime_, _datetime_) β†’ _duration_ | Positive difference (see Q374)

Functions over _date_ expressions:

|Function | Notes
|----------------------------------------------------|------
| _year_(_date_) β†’ _int_ |
| _month_(_date_) β†’ _int_ | The month of the year (1-12)
| _day_(_date_) β†’ _int_ | The day of the month (1-31)
| _yearsSinceEpoch_(_date_) β†’ _float_ |
| _monthsSinceEpoch_(_date_) β†’ _float_ |
| _weeksSinceEpoch_(_date_) β†’ _float_ |
| _daysSinceEpoch_(_date_) β†’ _float_ |

Functions over _dateframe_ expressions:

|Function | Notes
|----------------------------------------------------|------
| _duration_(_dateframe_) β†’ _duration_ |
| _overlap_(_dateframe_, _dateframe_) β†’ _duration_ | Always non-negative (see Q267v2)

Functions over _datetimeframe_ expressions:

|Function | Notes
|----------------------------------------------------------|------
| _duration_(_datetimeframe_) β†’ _duration_ | See Q110
| _overlap_(_datetimeframe_, _datetimeframe_) β†’ _duration_ | Always non-negative

Functions over _duration_ expressions:

|Function | Notes
|----------------------------------------------------|------
| _years_(_duration_) β†’ _float_ | One year = 365.24 days
| _months_(_duration_) β†’ _float_ | One month = 30.4367 days
| _weeks_(_duration_) β†’ _float_ | One week = 7 days
| _days_(_duration_) β†’ _float_ | See Q328
| _hours_(_duration_) β†’ _float_ |
| _minutes_(_duration_) β†’ _float_ |
| _seconds_(_duration_) β†’ _float_ |

Functions over _set_ expressions:

|Function | Notes
|------------------------------|-----------------------------
| _count_(_St_) β†’ _int_ | number of elements
| _bag_(_St_) β†’ _Bt_ | set to bag
| _list_(_St_) β†’ _Lt_ | set to list
| _el_(_St_) ⇉ 𝑑 | (see [Multivalued Functions and Expressions](#multivalued-functions-and-expressions))
| _subset_(_St_) ⇉ _St_ | (see [Multivalued Functions and Expressions](#multivalued-functions-and-expressions))
| _min_(_St_) β†’ 𝑑
_max_(_St_) β†’ 𝑑 | 𝑑 is an ordinal type
_null_ when _St_ is _null_ or when it is empty
| _avg_(_St_) β†’ 𝑑 or _float_ | 𝑑 is an ordinal type
if 𝑑 is _int_ - the result is _float_
_null_ when _St_ is _null_ or when it is empty
| _sum_(_St_) β†’ 𝑑 | 𝑑 is _int_ / _float_ / _duration_ (not _date_ / _time_ / _datetime_)
_null_ when _St_ is _null_; zero when it is empty
| _min_(_St, n: int_) β†’ _St_
_max_(_St, n: int_) β†’ _St_ | Set of (up to) _max_(0, 𝑛) smallest/largest values
𝑑 is an ordinal type
_null_ when _St_ is _null_, {} when it is empty
| _overlap_(_Sdateframe_) β†’ _duration_
_overlap_(_Sdatetimeframe_) β†’ _duration_ | The duration of the overlap between all members of 𝑆
Always non-negative (see Q371)
| _union_(_SSt_) β†’ _St_
_union_(_SBt_) β†’ _Bt_ | The union of all members of a set of sets/bags (𝑑 is any type)
| intersection(_SSt_) β†’ _St_
intersection(_SBt_) β†’ _Bt_| The intersection of all members of a set of sets/bags (𝑑 is any type)

Functions over _bag_ expressions:

|Function | Notes
|-----------------------------------|-----------------------------
| _count_(_Bt_) β†’ _int_ | number of elements
| _distinct_(_Bt_) β†’ _int_ | number of distinct elements
| _multiplicity_(_Bt_, 𝑑) β†’ _int_ | number of times 𝑑 occurs in _Bt_
| _set_(_Bt_) β†’ _St_ | bag to set
| _list_(_Bt_) β†’ _Lt_ | bag to list
| _el_(_Bt_) ⇉ 𝑑 | (see [Multivalued Functions and Expressions](#multivalued-functions-and-expressions))
| _subbag_(_Bt_) ⇉ _Bt_ | (see [Multivalued Functions and Expressions](#multivalued-functions-and-expressions))
| _min_(_Bt_) β†’ 𝑑
_max_(_Bt_) β†’ 𝑑 | 𝑑 is an ordinal type
_null_ when _Bt_ is _null_ or when it is empty
| _avg_(_Bt_) β†’ 𝑑 or _float_ | 𝑑 is an ordinal type
if 𝑑 is _int_ - the result is _float_
_null_ when _Bt_ is _null_ or when it is empty
| _sum_(_Bt_) β†’ 𝑑 | 𝑑 is _int_ / _float_ / _duration_ (not _date_ / _time_ / _datetime_)
_null_ when _Bt_ is _null_; zero when it is empty
| _min_(_Bt, n: int_) β†’ _Bt_
_max_(_Bt, n: int_) β†’ _Bt_ | Bag of (up to) _max_(0, 𝑛) smallest/largest values (see Q377)
𝑑 is an ordinal type
_null_ when _Bt_ is _null_, [] when it is empty
| _overlap_(_Bdateframe_) β†’ _duration_
_overlap_(_Bdatetimeframe_) β†’ _duration_ | The duration of the overlap between all members of 𝐡
Always non-negative
| _union_(_BSt_) β†’ _St_
_union_(_BBt_) β†’ _Bt_ | The union of all members of a bag of sets/bags (𝑑 is any type)
| intersection(_BSt_) β†’ _St_
intersection(_BBt_) β†’ _Bt_ | The intersection of all members of a bag of sets/bags (𝑑 is any type)

Functions over _list_ expressions (𝑑):

|Function | Notes
|-----------------------------------|-----------------------------
| _count_(_Lt_) β†’ _int_ | number of elements
| _distinct_(_Lt_) β†’ _int_ | number of distinct _non-null_ elements
| _multiplicity_(_Lt_, 𝑑) β†’ _int_ | number of times 𝑑 occurs in _Lt_
| _at_(_Lt, n: int_) β†’ t | 𝑛'th element (1-based)
_null_ if 𝑛 is out of range
| _set_(_Lt_) β†’ _St_ | list to set
| _bag_(_Lt_) β†’ _Bt_ | list to bag
| _min_(_Lt_) β†’ 𝑑
_max_(_Lt_) β†’ 𝑑 | 𝑑 is an ordinal type
_null_ values are ignored
_null_ when _Lt_ is _null_ or when it contains no _non-null_ elements
| _avg_(_Lt_) β†’ 𝑑 or _float_ | 𝑑 is an ordinal type
if 𝑑 is _int_ - the result is _float_
_null_ values are ignored
_null_ when _Lt_ is _null_ or when it contains no _non-null_ elements
| _sum_(_Lt_) β†’ 𝑑 | 𝑑 is _int_ / _float_ / _duration_ (not _date_ / _time_ / _datetime_)
_null_ values are ignored
zero when _Lt_ is _null_ or when it contains no _non-null_ elements
| _min_(_Lt, n: int_) β†’ _Lt_
_max_(_Lt, n: int_) β†’ _Lt_ | List of (up to) _max_(0, 𝑛) smallest/largest values
𝑑 is an ordinal type
_null_ values are ignored
_null_ when _Lt_ is _null_, () when it contains no _non-null_ elements
| _sort_(_Lt_) β†’ _Lt_ | Sorted list
𝑑 is an ordinal type
| _invsort_(_Lt_) β†’ _Lt_ | Inverse-sorted list
𝑑 is an ordinal type

Functions over _interval_ expressions:

|Function | Notes
|-----------------------------|-----------------------------
| _lb_(_It_) β†’ 𝑑 | Lower bound
_null_ when _It_ is _null_
| _up_(_It_) β†’ 𝑑 | Upper bound
_null_ when _It_ is _null_
| _set_(_It_) β†’ _St_ | Interval to set
𝑑 is discrete (_int_, _datetime_, or another ordinal type)
_null_ when _It_ is _null_
| _bag_(_It_) β†’ _Bt_ | Interval to bag
𝑑 is discrete (_int_, _datetime_, or another ordinal type)
_null_ when _It_ is _null_
| _list_(_It_) β†’ _Lt_ | Interval to list
𝑑 is discrete (_int_, _datetime_, or another ordinal type)
_null_ when _It_ is _null_

Other functions:

|Function | Notes
|----------------------------------------------------|------
| _now_ β†’ _datetime_ | See Q8v2, G11
| _today_ β†’ _date_ | See Q328
| _date_(year, month, day) β†’ _date_ | Construct date using three integers (see Q353)
_null_ when at least one value is _null_
| _min_(𝑑, 𝑑, ...) β†’ 𝑑
_max_(𝑑, 𝑑, ...) β†’ 𝑑 | One or more values of the same ordinal type
_null_ when at least one value is _null_
_min_({𝑑, 𝑑, …})and _max_({𝑑, 𝑑, …}) ignore _null_ values (see Q317)

Implementations may support _[opaque data types](https://en.wikipedia.org/wiki/Opaque_data_type)_ - data types for which the internal data representation is not exposed. For each opaque data type - a set of functions and operators may be defined (see _location_ data type in [Application: Spatiotemporality](#application-spatiotemporality)).

## Quantifiers

A **vertical purple rectangle** represents a _vertical quantifier_. The text inside the rectangle denotes the quantifier type.

Vertical quantifiers (or simply 'quantifiers') add much expressive power, including more complex topological constraints, more than one entity's expression constraint, and alternative subpatterns.

_**Q3:** Any person whose first name is Brandon who owns a dragon_ (version 2)

![V1](Pictures/Q003-2.png)

_**Q219:** Any person who owns a white horse weighing more than 200 Kg_

![V1](Pictures/Q219.png)

_**Q304:** Any person who owns a white horse and who owns a horse weighing more than 200 Kg_

![V1](Pictures/Q304.png)

The same graph-entity may match more than one pattern-entity. For example, Either the same horse or different horses may be assigned to B and C (this can be avoided: see *identicality, nonidenticality*, and *order constraints* later on). Similarly, the same graph-relationship may match more than one pattern-relationship.

A vertical quantifier has one connection on its left side and zero or more branches on its right side. On its left side is an entity, a quantifier, or the pattern's start. Except at the pattern's start, a quantifier may be wrapped with an 'O' (see Q147, Q360v2).

When a quantifier (or the rightmost quantifier in a sequence of quantifiers) is directly right of the pattern start, each branch may start with:

* An entity (see Q108),
* A Cartesian product's expression (see Q207)
* A global expression (see Q375), or
* A quantifier (see Q332v2)

When a quantifier (or the rightmost quantifier in a sequence of quantifiers) is directly right of an entity, each branch may start with:

* A relationship/path
* optionally, with a relationship/path-negator (see Q358, Q359)
* optionally, with a negator or with an 'O' (see Q358, Q359)
* optionally, with relationship's expressions (see Q339)
* optionally, with aggregators (see Q125),
* An entity's expression (see Q3v2),
* A Cartesian product's expression (see Q340), or
* A quantifier (see Q8)

The following branches do not affect the quantifier's evaluation:

* Any branch composed of an entity's expression with no constraint (see Q109)
* Any branch that starts with an 'O' (see Q148)

Each such branch is marked with a **white triangle**.

All other branches affect the quantifier's evaluation. Let 𝑏 denote the number of such branches.

We will name the left side of the quantifier _the left component_, and anything that follows a branch, up to the branch's end, _a right component_.

12 quantifier types are defined:

![V1](Pictures/BB03.png)

* **All** (denoted '&')

If 𝑏 is zero, an assignment matches the pattern if and only if it matches the left component.
Otherwise - An assignment matches the pattern if and only if it matches the whole pattern.

* **Some** (denoted '|')

If 𝑏 is zero, no assignment matches the pattern. Otherwise - An assignment matches pattern 𝑃 if and only if it matches pattern 𝑄 where

* 𝑄's left component is identical to 𝑃's
* 𝑄 has 𝑖 right components identical to 𝑃's, _1 ≀ i ≀ b_, and no other right components
* The quantifier is replaced with an _All_ quantifier

* **Not all (but more than 0)** (denoted by an '&' with stroke)

If 𝑏 is zero, no assignment matches the pattern. Otherwise - An assignment 𝐴 matches pattern 𝑃 if and only if it matches pattern 𝑄 where

* 𝑄's left component is identical to 𝑃's
* 𝑄 has 𝑖 right components identical to 𝑃's, _1 ≀ i < b_, and no other right components
* The quantifier is replaced with an _All_ quantifier

and there is no assignment 𝐡 with a similar left component as 𝐴's that matches pattern 𝑅 where

* 𝑅's left component and all its right components are identical to 𝑃's
* The quantifier is replaced with an _All_ quantifier

* **None** (denoted '0')

If 𝑏 is zero, an assignment matches the pattern if and only if it matches the left component. Otherwise - An assignment 𝐴 matches pattern 𝑃 if and only if it matches pattern 𝑄 where

* 𝑄's left component is identical to 𝑃's
* The quantifier and the right components are removed

and there is no assignment 𝐡 with a similar left component as 𝐴's that matches pattern 𝑅 where

* 𝑅's left component is identical to 𝑃's
* 𝑅 has 𝑖 right components identical to 𝑃's, _1 ≀ i ≀ b_, and no other right components
* The quantifier is replaced with an _All_ quantifier

The _None_ quantifier may not start a pattern.

* **= 𝑛**; 𝑏 β‰₯ 1, 𝑛 ∈ [1,𝑏]

An assignment 𝐴 matches pattern 𝑃 if and only if it matches pattern 𝑄 where

* 𝑄's left component is identical to 𝑃's
* 𝑄 has 𝑛 right components identical to 𝑃's, and no other right components
* The quantifier is replaced with an _All_ quantifier

and, if _n β‰  b_, there is no assignment 𝐡 with a similar left component as 𝐴's that matches pattern 𝑅 where

* 𝑅's left component is identical to 𝑃's
* 𝑅 has 𝑖 right components identical to 𝑃's, 𝑖 > 𝑛, and no other right components
* The quantifier is replaced with an _All_ quantifier

* **> 𝑛**; 𝑏 β‰₯ 2, 𝑛 ∈ [0, 𝑏-1]

An assignment 𝐴 matches pattern 𝑃 if and only if it matches pattern 𝑄 where

* 𝑄's left component is identical to 𝑃's
* 𝑄 has 𝑖 right components identical to 𝑃's, 𝑛 < 𝑖 ≀ 𝑏, and no other right components
* The quantifier is replaced with an _All_ quantifier

* **β‰₯ 𝑛**; _b β‰₯ 2, n_ ∈ [1, 𝑏]

An assignment 𝐴 matches pattern 𝑃 if and only if it matches pattern 𝑄 where

* 𝑄's left component is identical to 𝑃's
* 𝑄 has 𝑖 right components identical to 𝑃's, 𝑛 ≀ 𝑖 ≀ 𝑏, and no other right components
* The quantifier is replaced with an _All_ quantifier

* **< 𝑛 (but more than 0)**; 𝑏 β‰₯ 2, 𝑛 ∈ [2, 𝑏]

An assignment 𝐴 matches pattern 𝑃 if and only if it matches pattern 𝑄 where

* 𝑄's left component is identical to 𝑃's
* 𝑄 has 𝑖 right components identical to 𝑃's, 1 ≀ 𝑖 < 𝑛, and no other right components
* The quantifier is replaced with an _All_ quantifier

and there is no assignment 𝐡 with a similar left component as 𝐴's that matches pattern 𝑅 where

* 𝑅's left component is identical to 𝑃's
* 𝑅 has 𝑖 right components identical to 𝑃's, 𝑖 β‰₯ 𝑛, and no other right components
* The quantifier is replaced with an _All_ quantifier

* **≀ 𝑛 (but more than 0)**; 𝑏 β‰₯ 2, 𝑛 ∈ [1, 𝑏]

An assignment 𝐴 matches pattern 𝑃 if and only if it matches pattern 𝑄 where

* 𝑄's left component is identical to 𝑃's
* 𝑄 has 𝑖 right components identical to 𝑃's, 1 ≀ 𝑖 ≀ 𝑛, and no other right components
* The quantifier is replaced with an _All_ quantifier

and there is no assignment 𝐡 with a similar left component as 𝐴's that matches pattern 𝑅 where

* 𝑅's left component is identical to 𝑃's
* 𝑅 has 𝑖 right components identical to 𝑃's, 𝑖 > 𝑛, and no other right components
* The quantifier is replaced with an _All_ quantifier

* **β‰  𝑛 (but more than 0)**; 𝑏 β‰₯ 2, 𝑛 ∈ [1, 𝑏]

≑ (_< n_) ∨ (_> n_)

* **𝑛1..𝑛2**; 𝑏 β‰₯ 2, 𝑛_1_ ∈ [1, 𝑏], 𝑛_2_ ∈ [2, 𝑏], 𝑛_1_ < 𝑛_2_

An assignment 𝐴 matches pattern 𝑃 if and only if it matches pattern 𝑄 where

* 𝑄's left component is identical to 𝑃's
* 𝑄 has 𝑖 right components identical to 𝑃's, 𝑛_1_ ≀ 𝑖 ≀ 𝑛_2_, and no other right components
* The quantifier is replaced with an _All_ quantifier

and there is no assignment 𝐡 with a similar left component as 𝐴's that matches pattern 𝑅 where

* 𝑅's left component is identical to 𝑃's
* 𝑅 has 𝑖 right components identical to 𝑃's, 𝑖 > 𝑛_2_, and no other right components
* The quantifier is replaced with an _All_ quantifier

* **βˆ‰ 𝑛1..𝑛2 (but more than 0)**; 𝑏 β‰₯ 4, _n1_ ∈ [2, _b-1_], _n2_ ∈ [3, 𝑏], 𝑛_1_ < 𝑛_2_

≑ (< 𝑛_1_) ∨ (> 𝑛_2_)

The order of the branches does not affect the evaluation result.

_**Q8:** Any person born before 970 and passed away or whose father was born not later than January 1, 950_ (two versions)

![V1](Pictures/Q008-1.png)

The person's death date is not _null_, nor is it _inapplicable_. When A's death date is _null_, 'deathDate β‰  31/12/9999' is evaluated to _unknown_, and the constraint is not satisfied.

The following pattern also requires that A's death date is not a future date:

![V1](Pictures/Q008-2.png)

_**Q11:** Any current member of the Masons Guild who, on or after January 1, 1011, befriended someone who had left the Saddlers guild or the Blacksmiths guild in June 1010 or later_

![V1](Pictures/Q011.png)

The constraint 'member of df.till = 31/12/9999' means an _inapplicable_ membership end date – the person is currently a member of the Masons guild.

## Entity-Tags

The letter in the top-left corner of each pattern-entity rectangle (concrete, typed, or untyped) is called an _entity-tag_. Entity-tags are also included in query results: any graph-entity in a query result is annotated with the same tag as the pattern-entity to which it was assigned so that the query poser can understand why any given entity is part of the result. As part of the result, a graph-entity may be annotated with more than one entity-tag, as it may be assigned to several pattern-entities (in the same assignment or in different assignments - when assignments are merged).

Entity-tags may be referenced:

* in *identicality, nonidenticality*, and *order constraints*
* in an aggregator _per_ clause
* in an A1/M1/M2/M3 "_et Γ— et Γ— ..._" clause
* in an M1 aggregator "with min/max ..." clause (see Q196)

***Identicality constraints*** can be used when the same graph-entity should be assigned to:
- Several typed entities of the same type
- Several untyped entities

_**Q4:** Any person A whose dragon was frozen by a dragon owned by (at least) one of A's parents_

![V1](Pictures/Q004.png)

Entity-tag 'B' is used to enforce identical assignment to two _Dragon_ pattern-entities.

Entity-tags with identicality constraints are depicted in green.

_**Q9:** Any pair of dragons (A, B) where A froze B in both 980 and 984_

![V1](Pictures/Q009.png)

The same visual notation is also used when the same concrete entity appears more than once (see Q25v2, Q26v2)

***Nonidenticality constraint*** can be used when different graph-entities should be assigned to typed entities of the same type or to untyped entities.

_**Q5:** Any person A whose dragon was frozen by a dragon owned by two of A's parents_ (version 1)

![V1](Pictures/Q005-1.png)

Without the nonidenticality constraint, the same parent could be assigned to both D and E.

Nonidenticality constraints are depicted in red ('β‰ X'), where X is another entity-tag. Several nonidenticality constraints may be defined for the same pattern-entity, e.g., 'β‰ A,β‰ C' (see Q57).

_**Q6:** Any person A whose dragon was frozen by two dragons - one owned by one of A's parents, the other owned by another parent (none, one, or both dragons may be owned by both parents)_

![V1](Pictures/Q006.png)

_**Q7:** Any person A whose dragon was either (i) frozen by a dragon owned by two of A's parents or (ii) frozen by two dragons - one owned by one of A's parents and the other owned by A's other parent_

![V1](Pictures/Q007.png)

_**Q24:** Any person A having (at least) two parents and owns a dragon that was frozen by a dragon neither of A's parents owns_

![V1](Pictures/Q024.png)

Q24 demonstrates the usage of both identicality and nonidenticality constraints for the same pattern-entity.

Consider Q5v1, Q6, Q7, and Q24. For any given assignment, there is another assignment where the two parents are switched (for example, in Q5v1, the assignments to D and E are switched). Such redundant assignments are usually undesired. Using _order constraints_, we can avoid such redundancies (see Q5v2).

Also, consider the following pattern: _Any three persons A, B, and C, who are pairwise friends_. If persons (A1, B1, C1) compose an assignment, so do (A1, C1, B1), (B1, A1, C1), and all other permutations. Such a factorial increase in the number of assignments is usually undesired. Using _order constraints_, we can express patterns such as _Any three persons A