https://github.com/bptlab/fiber2xes
This project contains a python utility intended to use EHR data coming from fiber to create .xes event logs.
https://github.com/bptlab/fiber2xes
Last synced: 18 days ago
JSON representation
This project contains a python utility intended to use EHR data coming from fiber to create .xes event logs.
- Host: GitHub
- URL: https://github.com/bptlab/fiber2xes
- Owner: bptlab
- Created: 2020-06-02T05:34:00.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2021-12-02T11:39:46.000Z (over 4 years ago)
- Last Synced: 2025-01-07T19:51:51.409Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 4.49 MB
- Stars: 2
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# fiber2xes
This project contains a python utility intended to use data coming from `fiber` to create `.xes` event logs.
To use this tool you need access to the Mount Sinai Data Warehouse.
## Installation
Follow these steps to install `fiber2xes`:
1. Install fiber according to their [installation guide](https://gitlab.hpi.de/fiber/fiber).
2. Download and install Spark 3.1.2 according to their [installation guide](https://spark.apache.org/downloads.html). [This](https://www.tutorialspoint.com/pyspark/pyspark_environment_setup.htm) website provides a concise overview of how the Spark environment can be set up. Make sure that both the `SPARK_HOME` and `JAVA_HOME` environment variables are correctly set and exported. Should the Spark Version available change, the pyspark version of this package, as well as the one of the docker image, needs to be changed accordingly.
3. Run the pip installation to install `fiber2xes`:
```bash
pip install git+https://gitlab.hpi.de/pm1920/fiber2xes.git
```
For development and testing, all dev dependencies can be installed using
```bash
pip install -e .[dev]
```
If you're using `zsh`, escape the square brackets: `pip install -e .\[dev\]`
In case you encounter version or dependency issues in relation to `fiber`, it is advisable to run
```bash
sed -i 's/==/>=/' requirements.txt
```
in the `fiber` directory in order to allow the installation of `fiber2xes` to override the right dependency versions.
## Example
After following all installation steps, `example.py`, a demo file containing a short overview of how fiber2xes can be executed, can be run by calling
```bash
python3 ./example.py
```
This example creates a sample cohort for a MRN-based event log, which will be extracted and saved to the repository's root directory as a file called `./log__mrn_5.xes` This file can then be used for process mining.
## Interface
The package offers two methods for the event log creation and filter for trace and event filtering.
The following chapters contains more details about these methods.
### Log creation
To create a log from a fiber cohort, just call the `cohort_to_event_log`-method:
```python
from fiber2xes import cohort_to_event_log
cohort_to_event_log(
cohort,
trace_type,
verbose=False,
remove_unlisted=True,
remove_duplicates=True,
event_filter=None,
trace_filter=None,
cores=multiprocessing.cpu_count(),
window_size=500,
abstraction_path=None,
abstraction_exact_match=False,
abstraction_delimiter=";",
include_anamnesis_events,
duplicate_event_identifier,
event_identifier_to_merge,
perform_complex_duplicate_detection
)
```
Parameters:
- **cohort**: The fiber cohort with the patient
- **trace_type**: The type of a trace (`mrn` or `visit`)
- **verbose=False**: Flag if the events should contain original non abstracted values (default False)
- **remove_unlisted=True**: Flag if a trace should only contain listed events (default True)
- **remove_duplicates=True**: Flag if duplicate events should be removed (default True)
- **event_filter=None**: A custom filter to filter events (default None)
- **trace_filter=None**: A custom filter to filter traces (default None)
- **cores=multiprocessing.cpu_count()**: The number of cores which should be used to process the cohort (default amount of CPUs)
- **window_size=500**: The number of patients per window (default 500)
- **abstraction_path=None**: The path to the abstraction file (default None)
- **abstraction_exact_match=False**: Flag if the abstraction algorithm should only abstract exacted matches (default False)
- **abstraction_delimiter=";"**: The delimiter of the abstraction file (default ;)
- **include_anamnesis_events=True**: Should anamnesis events be included in the log (default True)
- **duplicate_event_identifier="BACK PAIN"**: Event identifier to be analysed separately for duplications (default "BACK PAIN")
- **event_identifier_to_merge="CHRONIC LOW BACK PAIN"**: Event identifier to be used for separately identified duplicates (default "CHRONIC LOW BACK PAIN")
- **perform_complex_duplicate_detection=False**: should complex time- and lifecycle-based duplicate detection be performed (default False)
### Log serialisation
The method `save_event_log_to_file` serialises a created log to a file.
```python
from fiber2xes import save_event_log_to_file
save_event_log_to_file(log, file_path)
```
Parameters:
- **log**: The log generated by the `cohort_to_event_log` method
- **file_path**: The file path / name
### Trace and event filtering
With the trace or event filter its possible to filter the traces or events during the creation process.
Therefore there are the following conditions:
- [Diagnosis](https://gitlab.hpi.de/pm1920/fiber2xes#diagnosis)
- [Material](https://gitlab.hpi.de/pm1920/fiber2xes#material)
- [Procedure](https://gitlab.hpi.de/pm1920/fiber2xes#procedure)
- [Time](https://gitlab.hpi.de/pm1920/fiber2xes#time)
- [Generic](https://gitlab.hpi.de/pm1920/fiber2xes#generic)
These can be combined by [And](https://gitlab.hpi.de/pm1920/fiber2xes#and), [Or](https://gitlab.hpi.de/pm1920/fiber2xes#or) and [Not](https://gitlab.hpi.de/pm1920/fiber2xes#not) operations.
#### Diagnosis
A filter for a specific diagnosis given by the code.
```python
from fiber2xes.filter.condition import Diagnosis
filter = Diagnosis(diagnosis_code)
```
Parameter:
- **diagnosis_code**: The diagnosis code
#### Material
A filter for a specific material given by the code.
```python
from fiber2xes.filter.condition import Material
filter = Material(material_code)
```
Parameter:
- **material_code**: The material code
#### Procedure
A filter for a specific procedure given by the code
```python
from fiber2xes.filter.condition import Procedure
filter = Procedure(procedure_code)
```
Parameter:
- **procedure_code**: The procedure code
#### Time
A filter the traces based on timing conditions (see parameter)
```python
from fiber2xes.filter.condition import Time
filter = Time(one_event_after=None, one_event_before=None, all_events_after=None, all_events_before=None)
```
Parameters:
- **one_event_after**: The trace is relevant if one event of the trace was after the given date
- **one_event_before**: The trace is relevant if one event of the trace was before the given date
- **all_events_after**: The trace is relevant if all events of the are were after the given date
- **all_events_before**: The trace is relevant if all events of the are were after the given date
#### Generic
A filter the traces or events with the given lambda expression. The lambda expression gets the trace or event as a parameter and it should return true or false. In case of true its a relevant trace or event, otherwise not.
```python
from fiber2xes.filter.condition import Generic
filter = Generic(lambda_expression)
```
Parameter:
- **lambda_expression**: The lambda expression which will be applied on all traces and events
#### And
An aggregation of two other filters with a logical _and_ as aggregation function.
```python
from fiber2xes.filter.operator import And
filter = And(filter1, filter2)
```
Parameter:
- **filter1** and **filter2**: Two other trace or event filters which will be aggregated by a logical *and*.
#### Or
An aggregation of two other filters with a logical _or_ as aggregation function.
```python
from fiber2xes.filter.operator import Or
filter = Or(filter1, filter2)
```
Parameter:
- **filter1** and **filter2**: Two other trace or event filters which will be aggregated by a logical *or*.
#### Not
An inverter of the result of another filter.
```python
from fiber2xes.filter.operator import Not
filter = Not(filter)
```
Parameter:
- **filter**: The result of the given filter will be negated.
## Spark Configuration
This pipeline tool utilises spark for transforming large event data sets. For local development, or for using the tool on differently equipped hardware,
it can be sensible to change memory requirements and other spark configuration options. For this, the `.env` file in the project's root directory can be used
in order to override the default options passed to the spark calls.
## Contribution
To contribute please fork this repository and create a merge request.
Assign one of the developer of this project for a review.
Please always add a short introduction of your submission containing a reason for your submission.