https://github.com/bptlab/fiber2xes

This project contains a python utility intended to use EHR data coming from fiber to create .xes event logs.
https://github.com/bptlab/fiber2xes
Last synced: 18 days ago
JSON representation
This project contains a python utility intended to use EHR data coming from fiber to create .xes event logs.
Host: GitHub
URL: https://github.com/bptlab/fiber2xes
Owner: bptlab
Created: 2020-06-02T05:34:00.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2021-12-02T11:39:46.000Z (over 4 years ago)
Last Synced: 2025-01-07T19:51:51.409Z (over 1 year ago)
Language: Python
Homepage:
Size: 4.49 MB
Stars: 2
Watchers: 4
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # fiber2xes

This project contains a python utility intended to use data coming from `fiber` to create `.xes` event logs.

To use this tool you need access to the Mount Sinai Data Warehouse.

## Installation

Follow these steps to install `fiber2xes`:

1. Install fiber according to their [installation guide](https://gitlab.hpi.de/fiber/fiber).

2. Download and install Spark 3.1.2 according to their [installation guide](https://spark.apache.org/downloads.html). [This](https://www.tutorialspoint.com/pyspark/pyspark_environment_setup.htm) website provides a concise overview of how the Spark environment can be set up. Make sure that both the `SPARK_HOME` and `JAVA_HOME` environment variables are correctly set and exported. Should the Spark Version available change, the pyspark version of this package, as well as the one of the docker image, needs to be changed accordingly. 

3. Run the pip installation to install `fiber2xes`:

```bash

pip install git+https://gitlab.hpi.de/pm1920/fiber2xes.git

```

For development and testing, all dev dependencies can be installed using

```bash

pip install -e .[dev]

```

If you're using `zsh`, escape the square brackets: `pip install -e .\[dev\]`

In case you encounter version or dependency issues in relation to `fiber`, it is advisable to run

```bash

sed -i 's/==/>=/' requirements.txt

```

in the `fiber` directory in order to allow the installation of `fiber2xes` to override the right dependency versions.  

## Example

After following all installation steps, `example.py`, a demo file containing a short overview of how fiber2xes can be executed, can be run by calling

```bash

python3 ./example.py

```

This example creates a sample cohort for a MRN-based event log, which will be extracted and saved to the repository's root directory as a file called `./log__mrn_5.xes` This file can then be used for process mining.

## Interface

The package offers two methods for the event log creation and filter for trace and event filtering.

The following chapters contains more details about these methods.

### Log creation

To create a log from a fiber cohort, just call the `cohort_to_event_log`-method:

```python

from fiber2xes import cohort_to_event_log

cohort_to_event_log(

  cohort,

  trace_type,

  verbose=False,

  remove_unlisted=True,

  remove_duplicates=True,

  event_filter=None,

  trace_filter=None,

  cores=multiprocessing.cpu_count(),

  window_size=500,

  abstraction_path=None,

  abstraction_exact_match=False,

  abstraction_delimiter=";",

  include_anamnesis_events,

  duplicate_event_identifier,

  event_identifier_to_merge,

  perform_complex_duplicate_detection

)

```

Parameters:

- **cohort**: The fiber cohort with the patient

- **trace_type**: The type of a trace (`mrn` or `visit`)

- **verbose=False**: Flag if the events should contain original non abstracted values (default False)

- **remove_unlisted=True**: Flag if a trace should only contain listed events (default True)

- **remove_duplicates=True**: Flag if duplicate events should be removed (default True)

- **event_filter=None**: A custom filter to filter events (default None)

- **trace_filter=None**: A custom filter to filter traces (default None)

- **cores=multiprocessing.cpu_count()**: The number of cores which should be used to process the cohort (default amount of CPUs)

- **window_size=500**: The number of patients per window (default 500)

- **abstraction_path=None**: The path to the abstraction file (default None)

- **abstraction_exact_match=False**: Flag if the abstraction algorithm should only abstract exacted matches (default False)

- **abstraction_delimiter=";"**: The delimiter of the abstraction file (default ;)

- **include_anamnesis_events=True**: Should anamnesis events be included in the log (default True)

- **duplicate_event_identifier="BACK PAIN"**: Event identifier to be analysed separately for duplications (default "BACK PAIN")

- **event_identifier_to_merge="CHRONIC LOW BACK PAIN"**: Event identifier to be used for separately identified duplicates (default "CHRONIC LOW BACK PAIN")

- **perform_complex_duplicate_detection=False**: should complex time- and lifecycle-based duplicate detection be performed (default False)

### Log serialisation

The method `save_event_log_to_file` serialises a created log to a file.

```python

from fiber2xes import save_event_log_to_file

save_event_log_to_file(log, file_path)

```

Parameters:

- **log**: The log generated by the `cohort_to_event_log` method

- **file_path**: The file path / name

### Trace and event filtering

With the trace or event filter its possible to filter the traces or events during the creation process.

Therefore there are the following conditions:

- [Diagnosis](https://gitlab.hpi.de/pm1920/fiber2xes#diagnosis)

- [Material](https://gitlab.hpi.de/pm1920/fiber2xes#material)

- [Procedure](https://gitlab.hpi.de/pm1920/fiber2xes#procedure)

- [Time](https://gitlab.hpi.de/pm1920/fiber2xes#time)

- [Generic](https://gitlab.hpi.de/pm1920/fiber2xes#generic)

These can be combined by [And](https://gitlab.hpi.de/pm1920/fiber2xes#and), [Or](https://gitlab.hpi.de/pm1920/fiber2xes#or) and [Not](https://gitlab.hpi.de/pm1920/fiber2xes#not) operations.

#### Diagnosis

A filter for a specific diagnosis given by the code.

```python

from fiber2xes.filter.condition import Diagnosis

filter = Diagnosis(diagnosis_code)

```

Parameter:

- **diagnosis_code**: The diagnosis code

#### Material

A filter for a specific material given by the code.

```python

from fiber2xes.filter.condition import Material

filter = Material(material_code)

```

Parameter:

- **material_code**: The material code

#### Procedure

A filter for a specific procedure given by the code

```python

from fiber2xes.filter.condition import Procedure

filter = Procedure(procedure_code)

```

Parameter:

- **procedure_code**: The procedure code

#### Time

A filter the traces based on timing conditions (see parameter)

```python

from fiber2xes.filter.condition import Time

filter = Time(one_event_after=None, one_event_before=None, all_events_after=None, all_events_before=None)

```

Parameters:

- **one_event_after**: The trace is relevant if one event of the trace was after the given date

- **one_event_before**: The trace is relevant if one event of the trace was before the given date

- **all_events_after**: The trace is relevant if all events of the are were after the given date

- **all_events_before**: The trace is relevant if all events of the are were after the given date

#### Generic

A filter the traces or events with the given lambda expression. The lambda expression gets the trace or event as a parameter and it should return true or false. In case of true its a relevant trace or event, otherwise not.

```python

from fiber2xes.filter.condition import Generic

filter = Generic(lambda_expression)

```

Parameter:

- **lambda_expression**: The lambda expression which will be applied on all traces and events

#### And

An aggregation of two other filters with a logical _and_ as aggregation function.

```python

from fiber2xes.filter.operator import And

filter = And(filter1, filter2)

```

Parameter:

- **filter1** and **filter2**: Two other trace or event filters which will be aggregated by a logical *and*.

#### Or

An aggregation of two other filters with a logical _or_ as aggregation function.

```python

from fiber2xes.filter.operator import Or

filter = Or(filter1, filter2)

```

Parameter:

- **filter1** and **filter2**: Two other trace or event filters which will be aggregated by a logical *or*.

#### Not

An inverter of the result of another filter.

```python

from fiber2xes.filter.operator import Not

filter = Not(filter)

```

Parameter:

- **filter**: The result of the given filter will be negated.

## Spark Configuration

This pipeline tool utilises spark for transforming large event data sets. For local development, or for using the tool on differently equipped hardware,

it can be sensible to change memory requirements and other spark configuration options. For this, the `.env` file in the project's root directory can be used

in order to override the default options passed to the spark calls.

## Contribution

To contribute please fork this repository and create a merge request.

Assign one of the developer of this project for a review.

Please always add a short introduction of your submission containing a reason for your submission.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bptlab/fiber2xes

Awesome Lists containing this project

README