https://github.com/saswatpadhi/excelsynth
An enumerative synthesizer for recovering Excel formulas from CSVs.
https://github.com/saswatpadhi/excelsynth
csv excel excel-formulas program-synthesis synthesizer
Last synced: 3 months ago
JSON representation
An enumerative synthesizer for recovering Excel formulas from CSVs.
- Host: GitHub
- URL: https://github.com/saswatpadhi/excelsynth
- Owner: SaswatPadhi
- License: mit
- Created: 2020-02-19T23:26:24.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-06-19T01:14:00.000Z (about 5 years ago)
- Last Synced: 2025-02-10T23:38:23.701Z (5 months ago)
- Topics: csv, excel, excel-formulas, program-synthesis, synthesizer
- Language: OCaml
- Homepage:
- Size: 189 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[][travis]
[][docker-hub]An enumerative synthesizer for recovering Excel formulas from CSVs.
Input: CSV File
(samples/example.csv)
Output: Formula Mask
Col 1 , Col 2 , Col 3 , Col 4 , Col 5
Row 2 , 1. , 10. , 9.5 , 24.
Row 3 , 23. , 12. , 0.5 , 35.
Row 4 , 22. , 2. , -9. , 24.
Row 5 , -1. , 6. , 6.5 , 5.
Row 6 , 59. , 0. , -29.5 , 41.
Row 7 , 11. , -2. , -7.5 , 9.
Row 8 , 115. , 14. , -43.5 , 23.
, , , ,
, , , =(C2-(B2/(1.+1.)) ,
, , , =(C3-(B3/(1.+1.)) ,
, , , =(C4-(B4/(1.+1.)) ,
, , , =(C5-(B5/(1.+1.)) ,
, , , =(C6-(B6/(1.+1.)) ,
, , , =(C7-(B7/(1.+1.)) ,
, =SUM(B2:B7) , =(SUM(C2:C7)/(1.+1.)) , =(C8-(B8/(1.+1.)) , =AVERAGE(E2:E7)
----
## Installation
0. [Get `docker` for your OS](https://docs.docker.com/install).
1. Pull the docker image[#](#note_1): `docker pull padhi/excelsynth`.
2. Run a container over the image: `docker run -it padhi/excelsynth`.
This would give you a `bash` shell within ExcelSynth directory.
3. To run ExcelSynth on `samples/unit_test.csv`, execute: `dune exec bin/App.exe -- samples/unit_test.csv`
4. To run the unit tests, execute: `dune runtest`# Alternatively, you could also build the Docker image locally:
```bash
docker build -t padhi/excelsynth github.com/SaswatPadhi/ExcelSynth
```## Usage
### Formula Synthesis from CSV
```text
$ dune exec bin/App.exe -- -h
Synthesize Excel formulas for a CSV file.App.exe [flag] ... FILENAME
=== flags ===
[-check-last-col-aggregations BOOLEAN] synthesize aggregation formulas for
cells in the last column
[-check-last-row-aggregations BOOLEAN] synthesize aggregation formulas for
cells in the last row
[-check-pointwise-col-operations BOOLEAN] synthesize pointwise
transformations for columns
[-check-pointwise-row-operations BOOLEAN] synthesize pointwise
transformations for rows
[-constant FLOAT] ... additional Boolean/numeric/string
constants
[-disable-constant-solutions BOOLEAN] disable constant formulas (e.g.
=0.0) for cells
[-enable-2d-aggregation BOOLEAN] use 2D ranges in aggregation
operations
[-enable-booleans BOOLEAN] enable Boolean and conditional
expressions
[-log-path FILENAME] enable logging and output to the
specified path
[-mask-path FILENAME] a known formula mask for the CSV
file
[-max-expr-size INTEGER] maximum cost (AST size) of
expressions to explore
[-max-threads INTEGER] maximum number of threads to create
[-range STRING] a range (in RC:R'C' format) that
bounds the synthesis space
[-relative-error FLOAT] the fractional relative error
allowed in float comparisons
[-restrict-to-top-left-data BOOLEAN] only use data to the top left of a
cell in formulas
[-type-error-threshold FLOAT] maximum fraction of cells that may
be ignored due to type errors
[-value-error-threshold FLOAT] maximum fraction of cells that may
be ignored due to value errors
```### Bulk Processing ([`scripts/evaluate.sh`](scripts/evaluate.sh))
The following **input** directory structure is required:
```text
|
+-- table_ranges.csv <--- Contains table ranges for CSV files
|
+-- evaluated_csvs <--- Contains fully evaluated CSV files
| |
| +-- .csv
| |
| `-- .csv
|
+-- formula_csvs <--- Contains CSV files with formulas
|
+-- .csv
|
`-- .csv
```The following **output** data is generated within this directory:
```text
|
: · · ·
|
: · · ·
|
+-- extracted_masks <--- Contents generated by scripts/extract_mask.py
| |
| +-- .csv <--- Ground truth mask from `../formula_csvs/.csv`
| |
| `-- .csv
|
+-- recovered_masks <--- Contents generated by scripts/recover_mask.py
| |
| +-- Baseline <--- Unrestricted synthesis (over whole sheet)
| | |
| | +-- .csv <--- Synthesized mask from `../evaluated_csvs/.csv`
| | |
| | `-- .csv
| |
: : · · ·
| |
| `-- <--- Synthesis restricted to tables from
| |
| +-- .csv <--- Synthesized mask from `../evaluated_csvs/.csv`
| |
| `-- .csv
|
`-- comparison_masks <--- Contents generated by scripts/compare_masks.py
|
+-- full <--- All cells within a recovered mask are checked
| |
| +-- Baseline <--- Evaluation of unrestricted-synthesis masks
| | |
| | +-- .csv <--- Evaluation of `../recovered_masks/Baseline/.csv`
| | |
| | `-- .csv
| |
: : · · ·
| |
| `-- <--- Evaluation of restricted-synthesis masks
| |
| +-- .csv
| |
| `-- .csv
|
`-- in-table <--- Only in-table cells are checked
|
+-- Baseline
| |
| +-- .csv
| |
| `-- .csv
|
: · · ·
|
`--
|
+-- .csv
|
`-- .csv
```#### Extract A Formula Masks from CSVs
```text
$ python3 scripts/extract_mask.py -h
usage: extract_mask.py [-h] -i INPUT_DIR -o OUTPUT_DIRoptional arguments:
-h, --help show this help message and exit-i INPUT_DIR, --input-dir INPUT_DIR
-o OUTPUT_DIR, --output-dir OUTPUT_DIR$ python3 scripts/extract_mask.py -i data/formula_csvs -o data/extracted_masks
```#### Recover Formula Masks from CSVs
```text
$ python3 scripts/recover_mask.py -h
usage: recover_mask.py [-h] -e EVAL_CSV_DIR -o OUTPUT_DIR -c TABLE_RANGE_COLUMN
tables_data_csvpositional arguments:
tables_data_csvoptional arguments:
-h, --help show this help message and exit-e EVAL_CSV_DIR, --eval-csv-dir EVAL_CSV_DIR
-o OUTPUT_DIR, --output-dir OUTPUT_DIR
-c TABLE_RANGE_COLUMN, --table-range-column TABLE_RANGE_COLUMN$ python3 scripts/recover_mask.py -e data/evaluated_csvs -o data/recovered_masks \
-c 1 data/table_ranges.csv
````tables_data_csv` has extracted table ranges in `TABLE_RANGE_COLUMN` (0-indexed).
#### Compare Formula Masks
```text
$ python3 scripts/compare_masks.py -h
usage: compare_masks.py [-h] -g GROUND_TRUTH_DIR -p PREDICTION_DIR -o OUTPUT_DIRoptional arguments:
-h, --help show this help message and exit-g GROUND_TRUTH_DIR, --ground-truth-dir GROUND_TRUTH_DIR
-p PREDICTION_DIR, --prediction-dir PREDICTION_DIR
-o OUTPUT_DIR, --output-dir OUTPUT_DIR$ python3 scripts/compare_mask.py -g data/extracted_masks \
-p data/recovered_masks/table_detector_1 \
-o data/comparison_masks/table_detector_1
```[docker-hub]: https://hub.docker.com/r/padhi/excelsynth
[travis]: https://travis-ci.org/SaswatPadhi/ExcelSynth