https://github.com/aclai-lab/soledata.jl
Manage logical datasets!
https://github.com/aclai-lab/soledata.jl
machine-learning multimodal-data unstructured-data
Last synced: about 7 hours ago
JSON representation
Manage logical datasets!
- Host: GitHub
- URL: https://github.com/aclai-lab/soledata.jl
- Owner: aclai-lab
- License: mit
- Created: 2021-10-19T09:40:34.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2025-03-26T09:17:38.000Z (10 months ago)
- Last Synced: 2025-03-26T10:25:28.342Z (10 months ago)
- Topics: machine-learning, multimodal-data, unstructured-data
- Language: Julia
- Homepage:
- Size: 1.83 MB
- Stars: 12
- Watchers: 4
- Forks: 2
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# SoleData.jl – Datasets for data-driven symbolic AI
[](https://aclai-lab.github.io/SoleData.jl)
[](https://aclai-lab.github.io/SoleData.jl/dev)
[](https://github.com/aclai-lab/SoleData.jl/actions/workflows/ci.yml)
[](https://codecov.io/gh/aclai-lab/SoleData.jl)
## In a nutshell
Learning logical models (that is, models with logical formulas as antecedents)
[often](https://scholar.google.com/scholar?q=Multi-Models+and+Multi-Formulas+Finite+Model+Checking+for+Modal+Logic+Formulas+Induction.)
requires performing [model checking](https://en.wikipedia.org/wiki/Model_checking) many times.
*SoleData.jl* provides *logiset* (that is, sets of logical interpretations) structures that are
optimized for for checking many formulas.
Logisets are the symbolic counterpart to Machine Learning datasets.
## Examples
## Propositional Logic
Symbolic AI treats tabular dataset (e.g., the [Iris flower dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set))
as a set of propositional interpretations (or propositional *logiset*), onto which formulas of propositional logic are interpreted.
```julia-repl
julia> using SoleData, MLJBase;
julia> X = PropositionalLogiset(MLJBase.load_iris())
PropositionalLogiset (6.16 KBs)
├ # instances: 150
├ # features: 5
└ Table: ...
julia> φ = parseformula(
"sepal_length > 5.8 ∧ sepal_width < 3.0 ∨ target == \"setosa\"";
atom_parser = a->Atom(parsecondition(SoleData.ScalarCondition, a; featuretype = SoleData.VariableValue))
)
SyntaxBranch: (sepal_length > 5.8 ∧ sepal_width < 3.0) ∨ target == setosa
julia> check(φ, X, 10) # Check the formula on a single instance
true
julia> satmask = check(φ, X); # Check the formula on the whole dataset
julia> slicedataset(X, satmask)
PropositionalLogiset (3.66 KBs)
├ # instances: 79
├ # features: 5
└ Table: ...
julia> slicedataset(X, (!).(satmask))
PropositionalLogiset (3.38 KBs)
├ # instances: 71
├ # features: 5
└ Table: ...
```
## Modal Logic
Symbolic AI treats non-tabular datasets (e.g., datasets of time-series or images) as sets of interpretations (*logisets*) of more-than-propositional logics,
that can express *relational patterns*.
In the following example, a time-series dataset such as [NATOPS](http://www.timeseriesclassification.com/description.php?Dataset=NATOPS) is interpreted via a modal logic formalism based on intervals and [Allen's (or Interval Algebra) relations](https://en.wikipedia.org/wiki/Allen%27s_interval_algebra).
On each time series in NATOPS, we hereby check the following temporal property, encoded via a modal logical formula:
*"there an interval where V1 is always higher than -0.54, and such that there exists a later interval where either V3 is lower than -0.78, or V5 is higher than -0.84."*
```julia-repl
julia> Xdf, y = SoleData.load_arff_dataset("NATOPS");
julia> X = scalarlogiset(Xdf)
SupportedLogiset with 1 support (343.08 MBs)
├ worldtype: SoleLogics.Interval{Int64}
├ featvaltype: Float64
├ featuretype: SoleData.AbstractUnivariateFeature
├ frametype: SoleLogics.FullDimensionalFrame{1, SoleLogics.Interval{Int64}}
├ # instances: 360
├ usesfullmemo: true
├[BASE] UniformFullDimensionalLogiset of channel size (51,) (342.91 MBs)
│ ├ size × eltype: (51, 51, 360, 48) × Float64
│ └ features: 48 -> SoleData.AbstractUnivariateFeature[max[V1], min[V1], max[V2], min[V2], ..., min[V22], max[V23], min[V23], max[V24], min[V24]]
└[SUPPORT 1] FullMemoset (0 memoized values, 174.42 KBs))
julia> φ = parseformula(
"⟨G⟩(min[V1] > -0.54 ∧ ⟨L⟩(max[V3] < -0.78 ∨ min[V5] > -0.84))",
SoleLogics.diamondsandboxes(SoleLogics.IARelations);
atom_parser = a->Atom(parsecondition(SoleData.ScalarCondition, a; featvaltype = Float64)),
);
SyntaxBranch: ⟨G⟩(min[V1] > -0.54 ∧ ⟨L⟩(max[V3] < -0.78 ∨ min[V5] > -0.84))
julia> syntaxstring(φ; variable_names_map = names(Xdf)) |> println
⟨G⟩(min[X[Hand tip l]] > -0.54 ∧ ⟨L⟩(max[Z[Hand tip l]] < -0.78 ∨ min[Y[Hand tip r]] > -0.84))
julia> check(φ, X) # Query each instance
360-element Vector{Bool}:
1
1
1
1
1
1
1
1
0
1
0
1
0
1
0
1
1
1
1
...
```
## About
The package is developed by the [ACLAI Lab](https://aclai.unife.it/en/) @ University of
Ferrara.
*SoleData.jl* provides the data layer for
[*Sole.jl*](https://github.com/aclai-lab/Sole.jl), an open-source framework for
*symbolic machine learning*.
