https://github.com/otrecoding/optimaltransportdataintegration.jl
Data integration using optimal transport theory
https://github.com/otrecoding/optimaltransportdataintegration.jl
data-integration julia-language optimal-transport
Last synced: 4 days ago
JSON representation
Data integration using optimal transport theory
- Host: GitHub
- URL: https://github.com/otrecoding/optimaltransportdataintegration.jl
- Owner: otrecoding
- License: mit
- Created: 2024-05-17T15:25:28.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-11T10:27:50.000Z (7 months ago)
- Last Synced: 2025-03-11T11:32:37.746Z (7 months ago)
- Topics: data-integration, julia-language, optimal-transport
- Language: Julia
- Homepage: https://otrecoding.github.io/OptimalTransportDataIntegration.jl/
- Size: 364 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# OptimalTransportDataIntegration.jl
[](https://github.com/otrecoding/OptimalTransportDataIntegration.jl/actions/workflows/ci.yml)
[](https://codecov.io/gh/otrecoding/OptimalTransportDataIntegration.jl)
[](https://otrecoding.github.io/OptimalTransportDataIntegration.jl/dev)This package implements a statistical matching strategy based on
optimal transport theory to integrate different data sources.
These data sources are related to the same target population, which
share a subset of covariates which each data source has its own
distinct subset of variables. After recoding you'll get a unique
data set in which all the variables, coming from the different
sources, are jointly available.This package is derived from
[OTRecod.jl](https://github.com/otrecoding/OTRecod.jl) where joint
distribution of shared and distinct variables is transported within
the data sources. Here the method also transports the distribution
of shared and distinct variables and estimates a function to predict
the missing variables.## Installation
The package runs on julia 1.1 and above.
In a Julia session switch to `pkg>` mode to add the package:```julia
julia>] # switch to pkg> mode
pkg> add https://github.com/otrecoding/OptimalTransportDataIntegration.jl
```To run an example
```julia
using OptimalTransportDataIntegration # import the package
params = DataParameters() # Create the parameters set
rng = DiscreteDataGenerator(params) # Create the random generator
data = generate( rng ) # Generate a dataset
result = otrecod( data, JointOTBetweenBases() ) # Perform the statistical matching
println(accuracy(result)) # Print accuracies on each distinct variables and the total accuracy.
```
It is possible tu use continuous explanatory variables by using
```
rng = ContinuousDataGenerator(params)
```outcomes are always categorical, Y outcome levels are 1:4 and Z outcome levels are 1:3.