An open API service indexing awesome lists of open source software.

https://github.com/saswatpadhi/excelsynth

An enumerative synthesizer for recovering Excel formulas from CSVs.
https://github.com/saswatpadhi/excelsynth

csv excel excel-formulas program-synthesis synthesizer

Last synced: 3 months ago
JSON representation

An enumerative synthesizer for recovering Excel formulas from CSVs.

Awesome Lists containing this project

README

        

ExcelSynth

==========

[![](https://img.shields.io/travis/SaswatPadhi/ExcelSynth/master.svg?logo=travis&style=popout&label=Travis+Build)][travis]
 
[![](https://img.shields.io/docker/cloud/build/padhi/excelsynth.svg?logo=docker&style=popout&label=Docker+Image)][docker-hub]

An enumerative synthesizer for recovering Excel formulas from CSVs.




Input: CSV File
(samples/example.csv)


Output: Formula Mask








Col 1 , Col 2 , Col 3 , Col 4 , Col 5
Row 2 , 1. , 10. , 9.5 , 24.
Row 3 , 23. , 12. , 0.5 , 35.
Row 4 , 22. , 2. , -9. , 24.
Row 5 , -1. , 6. , 6.5 , 5.
Row 6 , 59. , 0. , -29.5 , 41.
Row 7 , 11. , -2. , -7.5 , 9.
Row 8 , 115. , 14. , -43.5 , 23.






, , , ,
, , , =(C2-(B2/(1.+1.)) ,
, , , =(C3-(B3/(1.+1.)) ,
, , , =(C4-(B4/(1.+1.)) ,
, , , =(C5-(B5/(1.+1.)) ,
, , , =(C6-(B6/(1.+1.)) ,
, , , =(C7-(B7/(1.+1.)) ,
, =SUM(B2:B7) , =(SUM(C2:C7)/(1.+1.)) , =(C8-(B8/(1.+1.)) , =AVERAGE(E2:E7)




----

## Installation

0. [Get `docker` for your OS](https://docs.docker.com/install).
1. Pull the docker image[#](#note_1): `docker pull padhi/excelsynth`.
2. Run a container over the image: `docker run -it padhi/excelsynth`.

This would give you a `bash` shell within ExcelSynth directory.
3. To run ExcelSynth on `samples/unit_test.csv`, execute: `dune exec bin/App.exe -- samples/unit_test.csv`
4. To run the unit tests, execute: `dune runtest`

# Alternatively, you could also build the Docker image locally:

```bash
docker build -t padhi/excelsynth github.com/SaswatPadhi/ExcelSynth
```

## Usage

### Formula Synthesis from CSV

```text
$ dune exec bin/App.exe -- -h
Synthesize Excel formulas for a CSV file.

App.exe [flag] ... FILENAME

=== flags ===

[-check-last-col-aggregations BOOLEAN] synthesize aggregation formulas for
cells in the last column
[-check-last-row-aggregations BOOLEAN] synthesize aggregation formulas for
cells in the last row
[-check-pointwise-col-operations BOOLEAN] synthesize pointwise
transformations for columns
[-check-pointwise-row-operations BOOLEAN] synthesize pointwise
transformations for rows
[-constant FLOAT] ... additional Boolean/numeric/string
constants
[-disable-constant-solutions BOOLEAN] disable constant formulas (e.g.
=0.0) for cells
[-enable-2d-aggregation BOOLEAN] use 2D ranges in aggregation
operations
[-enable-booleans BOOLEAN] enable Boolean and conditional
expressions
[-log-path FILENAME] enable logging and output to the
specified path
[-mask-path FILENAME] a known formula mask for the CSV
file
[-max-expr-size INTEGER] maximum cost (AST size) of
expressions to explore
[-max-threads INTEGER] maximum number of threads to create
[-range STRING] a range (in RC:R'C' format) that
bounds the synthesis space
[-relative-error FLOAT] the fractional relative error
allowed in float comparisons
[-restrict-to-top-left-data BOOLEAN] only use data to the top left of a
cell in formulas
[-type-error-threshold FLOAT] maximum fraction of cells that may
be ignored due to type errors
[-value-error-threshold FLOAT] maximum fraction of cells that may
be ignored due to value errors
```

### Bulk Processing ([`scripts/evaluate.sh`](scripts/evaluate.sh))

The following **input** directory structure is required:

```text

|
+-- table_ranges.csv <--- Contains table ranges for CSV files
|
+-- evaluated_csvs <--- Contains fully evaluated CSV files
| |
| +-- .csv
| |
| `-- .csv
|
+-- formula_csvs <--- Contains CSV files with formulas
|
+--
.csv
|
`-- .csv
```

The following **output** data is generated within this directory:

```text

|
: · · ·
|
: · · ·
|
+-- extracted_masks <--- Contents generated by scripts/extract_mask.py
| |
| +--
.csv <--- Ground truth mask from `../formula_csvs/.csv`
| |
| `-- .csv
|
+-- recovered_masks <--- Contents generated by scripts/recover_mask.py
| |
| +-- Baseline <--- Unrestricted synthesis (over whole sheet)
| | |
| | +--
.csv <--- Synthesized mask from `../evaluated_csvs/.csv`
| | |
| | `-- .csv
| |
: : · · ·
| |
| `-- <--- Synthesis restricted to tables from
| |
| +--
.csv <--- Synthesized mask from `../evaluated_csvs/.csv`
| |
| `-- .csv
|
`-- comparison_masks <--- Contents generated by scripts/compare_masks.py
|
+-- full <--- All cells within a recovered mask are checked
| |
| +-- Baseline <--- Evaluation of unrestricted-synthesis masks
| | |
| | +--
.csv <--- Evaluation of `../recovered_masks/Baseline/.csv`
| | |
| | `-- .csv
| |
: : · · ·
| |
| `-- <--- Evaluation of restricted-synthesis masks
| |
| +--
.csv
| |
| `-- .csv
|
`-- in-table <--- Only in-table cells are checked
|
+-- Baseline
| |
| +--
.csv
| |
| `-- .csv
|
: · · ·
|
`--
|
+--
.csv
|
`-- .csv
```

#### Extract A Formula Masks from CSVs

```text
$ python3 scripts/extract_mask.py -h
usage: extract_mask.py [-h] -i INPUT_DIR -o OUTPUT_DIR

optional arguments:
-h, --help show this help message and exit

-i INPUT_DIR, --input-dir INPUT_DIR
-o OUTPUT_DIR, --output-dir OUTPUT_DIR

$ python3 scripts/extract_mask.py -i data/formula_csvs -o data/extracted_masks
```

#### Recover Formula Masks from CSVs

```text
$ python3 scripts/recover_mask.py -h
usage: recover_mask.py [-h] -e EVAL_CSV_DIR -o OUTPUT_DIR -c TABLE_RANGE_COLUMN
tables_data_csv

positional arguments:
tables_data_csv

optional arguments:
-h, --help show this help message and exit

-e EVAL_CSV_DIR, --eval-csv-dir EVAL_CSV_DIR
-o OUTPUT_DIR, --output-dir OUTPUT_DIR
-c TABLE_RANGE_COLUMN, --table-range-column TABLE_RANGE_COLUMN

$ python3 scripts/recover_mask.py -e data/evaluated_csvs -o data/recovered_masks \
-c 1 data/table_ranges.csv
```

`tables_data_csv` has extracted table ranges in `TABLE_RANGE_COLUMN` (0-indexed).

#### Compare Formula Masks

```text
$ python3 scripts/compare_masks.py -h
usage: compare_masks.py [-h] -g GROUND_TRUTH_DIR -p PREDICTION_DIR -o OUTPUT_DIR

optional arguments:
-h, --help show this help message and exit

-g GROUND_TRUTH_DIR, --ground-truth-dir GROUND_TRUTH_DIR
-p PREDICTION_DIR, --prediction-dir PREDICTION_DIR
-o OUTPUT_DIR, --output-dir OUTPUT_DIR

$ python3 scripts/compare_mask.py -g data/extracted_masks \
-p data/recovered_masks/table_detector_1 \
-o data/comparison_masks/table_detector_1
```

[docker-hub]: https://hub.docker.com/r/padhi/excelsynth
[travis]: https://travis-ci.org/SaswatPadhi/ExcelSynth