{"id":23209869,"url":"https://github.com/saswatpadhi/excelsynth","last_synced_at":"2025-04-05T12:23:14.149Z","repository":{"id":145751131,"uuid":"241749022","full_name":"SaswatPadhi/ExcelSynth","owner":"SaswatPadhi","description":"An enumerative synthesizer for recovering Excel formulas from CSVs.","archived":false,"fork":false,"pushed_at":"2020-06-19T01:14:00.000Z","size":194,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-10T23:38:23.701Z","etag":null,"topics":["csv","excel","excel-formulas","program-synthesis","synthesizer"],"latest_commit_sha":null,"homepage":"","language":"OCaml","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SaswatPadhi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-02-19T23:26:24.000Z","updated_at":"2022-07-03T02:13:23.000Z","dependencies_parsed_at":"2023-04-06T09:53:39.357Z","dependency_job_id":null,"html_url":"https://github.com/SaswatPadhi/ExcelSynth","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SaswatPadhi%2FExcelSynth","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SaswatPadhi%2FExcelSynth/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SaswatPadhi%2FExcelSynth/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SaswatPadhi%2FExcelSynth/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SaswatPadhi","download_url":"https://codeload.github.com/SaswatPadhi/ExcelSynth/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247334091,"owners_count":20922139,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","excel","excel-formulas","program-synthesis","synthesizer"],"created_at":"2024-12-18T18:29:58.381Z","updated_at":"2025-04-05T12:23:14.118Z","avatar_url":"https://github.com/SaswatPadhi.png","language":"OCaml","funding_links":[],"categories":[],"sub_categories":[],"readme":"ExcelSynth\n\u003ca href=\"https://microbadger.com/images/padhi/excelsynth\"\u003e\u003cimg align=\"right\" src=\"https://img.shields.io/microbadger/image-size/padhi/excelsynth.svg?style=flat\u0026label=docker\"\u003e\u003c/img\u003e\u003c/a\u003e\n==========\n\n[![](https://img.shields.io/travis/SaswatPadhi/ExcelSynth/master.svg?logo=travis\u0026style=popout\u0026label=Travis+Build)][travis]\n\u0026nbsp;\n[![](https://img.shields.io/docker/cloud/build/padhi/excelsynth.svg?logo=docker\u0026style=popout\u0026label=Docker+Image)][docker-hub]\n\nAn enumerative synthesizer for recovering Excel formulas from CSVs.\n\n\u003ctable\u003e\n   \u003cthead\u003e\n   \u003ctr\u003e\n      \u003cth align='center'\u003e\n        Input: CSV File\n        (\u003ca href='https://github.com/SaswatPadhi/ExcelSynth/blob/master/samples/example.csv'\u003esamples/example.csv\u003c/a\u003e)\n      \u003c/th\u003e\n      \u003cth align='center'\u003e\n        Output: Formula Mask\n      \u003c/th\u003e\n   \u003c/tr\u003e\n   \u003c/thead\u003e\n   \u003ctbody\u003e\n      \u003ctr\u003e\n         \u003ctd\u003e\n            \u003csub\u003e\u003cpre lang='text'\u003e\n                \u003ccode\u003e\nCol 1  ,  Col 2  ,  Col 3  ,  Col 4  ,  Col 5\nRow 2  ,  1.     ,  10.    ,  9.5    ,  24.\nRow 3  ,  23.    ,  12.    ,  0.5    ,  35.\nRow 4  ,  22.    ,  2.     ,  -9.    ,  24.\nRow 5  ,  -1.    ,  6.     ,  6.5    ,  5.\nRow 6  ,  59.    ,  0.     ,  -29.5  ,  41.\nRow 7  ,  11.    ,  -2.    ,  -7.5   ,  9.\nRow 8  ,  115.   ,  14.    ,  -43.5  ,  23.\n                \u003c/code\u003e\n            \u003c/pre\u003e\u003c/sub\u003e\n         \u003c/td\u003e\n         \u003ctd\u003e\n            \u003csub\u003e\u003cpre lang='text'\u003e\n                \u003ccode\u003e\n ,             ,                       ,                   ,\n ,             ,                       , =(C2-(B2/(1.+1.)) ,\n ,             ,                       , =(C3-(B3/(1.+1.)) ,\n ,             ,                       , =(C4-(B4/(1.+1.)) ,\n ,             ,                       , =(C5-(B5/(1.+1.)) ,\n ,             ,                       , =(C6-(B6/(1.+1.)) ,\n ,             ,                       , =(C7-(B7/(1.+1.)) ,\n , =SUM(B2:B7) , =(SUM(C2:C7)/(1.+1.)) , =(C8-(B8/(1.+1.)) , =AVERAGE(E2:E7)\n                \u003c/code\u003e\n            \u003c/pre\u003e\u003c/sub\u003e\n         \u003c/td\u003e\n      \u003c/tr\u003e\n   \u003c/tbody\u003e\n\u003c/table\u003e\n\n----\n\n## Installation\n\n0. [Get `docker` for your OS](https://docs.docker.com/install).\n1. Pull the docker image\u003csup\u003e[#](#note_1)\u003c/sup\u003e: `docker pull padhi/excelsynth`.\n2. Run a container over the image: `docker run -it padhi/excelsynth`.\u003cbr\u003e\n   This would give you a `bash` shell within ExcelSynth directory.\n3. To run ExcelSynth on `samples/unit_test.csv`, execute: `dune exec bin/App.exe -- samples/unit_test.csv`\n4. To run the unit tests, execute: `dune runtest`\n\n\u003ca name=\"note_1\"\u003e\u003csup\u003e#\u003c/sup\u003e\u003c/a\u003e Alternatively, you could also build the Docker image locally:\n\n```bash\ndocker build -t padhi/excelsynth github.com/SaswatPadhi/ExcelSynth\n```\n\n## Usage\n\n### Formula Synthesis from CSV\n\n```text\n$ dune exec bin/App.exe -- -h\nSynthesize Excel formulas for a CSV file.\n\n  App.exe [flag] ... FILENAME\n\n=== flags ===\n\n  [-check-last-col-aggregations BOOLEAN]     synthesize aggregation formulas for\n                                             cells in the last column\n  [-check-last-row-aggregations BOOLEAN]     synthesize aggregation formulas for\n                                             cells in the last row\n  [-check-pointwise-col-operations BOOLEAN]  synthesize pointwise\n                                             transformations for columns\n  [-check-pointwise-row-operations BOOLEAN]  synthesize pointwise\n                                             transformations for rows\n  [-constant FLOAT] ...                      additional Boolean/numeric/string\n                                             constants\n  [-disable-constant-solutions BOOLEAN]      disable constant formulas (e.g.\n                                             =0.0) for cells\n  [-enable-2d-aggregation BOOLEAN]           use 2D ranges in aggregation\n                                             operations\n  [-enable-booleans BOOLEAN]                 enable Boolean and conditional\n                                             expressions\n  [-log-path FILENAME]                       enable logging and output to the\n                                             specified path\n  [-mask-path FILENAME]                      a known formula mask for the CSV\n                                             file\n  [-max-expr-size INTEGER]                   maximum cost (AST size) of\n                                             expressions to explore\n  [-max-threads INTEGER]                     maximum number of threads to create\n  [-range STRING]                            a range (in RC:R'C' format) that\n                                             bounds the synthesis space\n  [-relative-error FLOAT]                    the fractional relative error\n                                             allowed in float comparisons\n  [-restrict-to-top-left-data BOOLEAN]       only use data to the top left of a\n                                             cell in formulas\n  [-type-error-threshold FLOAT]              maximum fraction of cells that may\n                                             be ignored due to type errors\n  [-value-error-threshold FLOAT]             maximum fraction of cells that may\n                                             be ignored due to value errors\n```\n\n### Bulk Processing ([`scripts/evaluate.sh`](scripts/evaluate.sh))\n\nThe following **input** directory structure is required:\n\n```text\n\u003cdata\u003e\n |\n +-- table_ranges.csv              \u003c--- Contains table ranges for CSV files\n |\n +-- evaluated_csvs                \u003c--- Contains fully evaluated CSV files\n |    |\n |    +-- \u003ca\u003e.csv\n |    |\n |    `-- \u003cb\u003e.csv\n |\n +-- formula_csvs                  \u003c--- Contains CSV files with formulas\n      |\n      +-- \u003ca\u003e.csv\n      |\n      `-- \u003cb\u003e.csv\n```\n\nThe following **output** data is generated within this directory:\n\n```text\n\u003cdata\u003e\n |\n : · · ·\n |\n : · · ·\n |\n +-- extracted_masks               \u003c--- Contents generated by scripts/extract_mask.py\n |    |\n |    +-- \u003ca\u003e.csv                  \u003c--- Ground truth mask from `../formula_csvs/\u003ca\u003e.csv`\n |    |\n |    `-- \u003cb\u003e.csv\n |\n +-- recovered_masks               \u003c--- Contents generated by scripts/recover_mask.py\n |    |\n |    +-- Baseline                 \u003c--- Unrestricted synthesis (over whole sheet)\n |    |    |\n |    |    +-- \u003ca\u003e.csv             \u003c--- Synthesized mask from `../evaluated_csvs/\u003ca\u003e.csv`\n |    |    |\n |    |    `-- \u003cb\u003e.csv\n |    |\n :    : · · ·\n |    |\n |    `-- \u003ctable_detector_n\u003e       \u003c--- Synthesis restricted to tables from \u003ctable_detector_n\u003e\n |         |\n |         +-- \u003ca\u003e.csv             \u003c--- Synthesized mask from `../evaluated_csvs/\u003ca\u003e.csv`\n |         |\n |         `-- \u003cb\u003e.csv\n |\n `-- comparison_masks              \u003c--- Contents generated by scripts/compare_masks.py\n      |\n      +-- full                     \u003c--- All cells within a recovered mask are checked\n      |    |\n      |    +-- Baseline            \u003c--- Evaluation of unrestricted-synthesis masks\n      |    |    |\n      |    |    +-- \u003ca\u003e.csv        \u003c--- Evaluation of `../recovered_masks/Baseline/\u003ca\u003e.csv`\n      |    |    |\n      |    |    `-- \u003cb\u003e.csv\n      |    |\n      :    : · · ·\n      |    |\n      |    `-- \u003ctable_detector_n\u003e  \u003c--- Evaluation of restricted-synthesis masks\n      |         |\n      |         +-- \u003ca\u003e.csv\n      |         |\n      |         `-- \u003cb\u003e.csv\n      |\n      `-- in-table                 \u003c--- Only in-table cells are checked\n           |\n           +-- Baseline\n           |    |\n           |    +-- \u003ca\u003e.csv\n           |    |\n           |    `-- \u003cb\u003e.csv\n           |\n           : · · ·\n           |\n           `-- \u003ctable_detector_n\u003e\n                |\n                +-- \u003ca\u003e.csv\n                |\n                `-- \u003cb\u003e.csv\n```\n\n#### Extract A Formula Masks from CSVs\n\n```text\n$ python3 scripts/extract_mask.py -h\nusage: extract_mask.py [-h] -i INPUT_DIR -o OUTPUT_DIR\n\noptional arguments:\n  -h, --help            show this help message and exit\n\n  -i INPUT_DIR, --input-dir INPUT_DIR\n  -o OUTPUT_DIR, --output-dir OUTPUT_DIR\n\n$ python3 scripts/extract_mask.py -i data/formula_csvs -o data/extracted_masks\n```\n\n#### Recover Formula Masks from CSVs\n\n```text\n$ python3 scripts/recover_mask.py -h\nusage: recover_mask.py [-h] -e EVAL_CSV_DIR -o OUTPUT_DIR -c TABLE_RANGE_COLUMN\n                       tables_data_csv\n\npositional arguments:\n  tables_data_csv\n\noptional arguments:\n  -h, --help            show this help message and exit\n\n  -e EVAL_CSV_DIR, --eval-csv-dir EVAL_CSV_DIR\n  -o OUTPUT_DIR, --output-dir OUTPUT_DIR\n  -c TABLE_RANGE_COLUMN, --table-range-column TABLE_RANGE_COLUMN\n\n$ python3 scripts/recover_mask.py -e data/evaluated_csvs -o data/recovered_masks \\\n                                  -c 1 data/table_ranges.csv\n```\n\n`tables_data_csv` has extracted table ranges in `TABLE_RANGE_COLUMN` (0-indexed).\n\n#### Compare Formula Masks\n\n```text\n$ python3 scripts/compare_masks.py -h\nusage: compare_masks.py [-h] -g GROUND_TRUTH_DIR -p PREDICTION_DIR -o OUTPUT_DIR\n\noptional arguments:\n  -h, --help            show this help message and exit\n\n  -g GROUND_TRUTH_DIR, --ground-truth-dir GROUND_TRUTH_DIR\n  -p PREDICTION_DIR, --prediction-dir PREDICTION_DIR\n  -o OUTPUT_DIR, --output-dir OUTPUT_DIR\n\n$ python3 scripts/compare_mask.py -g data/extracted_masks \\\n                                  -p data/recovered_masks/table_detector_1 \\\n                                  -o data/comparison_masks/table_detector_1\n```\n\n[docker-hub]:         https://hub.docker.com/r/padhi/excelsynth\n[travis]:             https://travis-ci.org/SaswatPadhi/ExcelSynth\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaswatpadhi%2Fexcelsynth","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsaswatpadhi%2Fexcelsynth","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaswatpadhi%2Fexcelsynth/lists"}