{"id":35280718,"url":"https://github.com/hawkfish/textform","last_synced_at":"2026-03-17T16:32:20.817Z","repository":{"id":47201031,"uuid":"355590081","full_name":"hawkfish/textform","owner":"hawkfish","description":"A data transformation pipeline library based on Potter's Wheel.","archived":false,"fork":false,"pushed_at":"2021-09-08T18:35:52.000Z","size":476,"stargazers_count":7,"open_issues_count":5,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-19T09:32:41.863Z","etag":null,"topics":["csv-converter","data-transformation-pipeline","json-converter","markdown-converter","pipelines","potter-wheel","record-stream","text-processing","textform","wrangling","wrangling-data"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hawkfish.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-04-07T15:12:57.000Z","updated_at":"2024-04-12T01:19:15.000Z","dependencies_parsed_at":"2022-09-10T02:22:00.985Z","dependency_job_id":null,"html_url":"https://github.com/hawkfish/textform","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/hawkfish/textform","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hawkfish%2Ftextform","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hawkfish%2Ftextform/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hawkfish%2Ftextform/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hawkfish%2Ftextform/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hawkfish","download_url":"https://codeload.github.com/hawkfish/textform/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hawkfish%2Ftextform/sbom","scorecard":{"id":457784,"data":{"date":"2025-08-11","repo":{"name":"github.com/hawkfish/textform","commit":"d2859f799291739d3289d323102a0eb6a70fd8fd"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.4,"checks":[{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/publish-to-test-pypi.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Code-Review","score":0,"reason":"Found 0/19 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/publish-to-test-pypi.yml:17: update your workflow using https://app.stepsecurity.io/secureworkflow/hawkfish/textform/publish-to-test-pypi.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/publish-to-test-pypi.yml:19: update your workflow using https://app.stepsecurity.io/secureworkflow/hawkfish/textform/publish-to-test-pypi.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/publish-to-test-pypi.yml:38: update your workflow using https://app.stepsecurity.io/secureworkflow/hawkfish/textform/publish-to-test-pypi.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/publish-to-test-pypi.yml:44: update your workflow using https://app.stepsecurity.io/secureworkflow/hawkfish/textform/publish-to-test-pypi.yml/main?enable=pin","Warn: pipCommand not pinned by hash: .github/workflows/publish-to-test-pypi.yml:24","Info:   0 out of   2 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   2 third-party GitHubAction dependencies pinned","Info:   0 out of   1 pipCommand dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'main'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 15 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-19T10:12:37.184Z","repository_id":47201031,"created_at":"2025-08-19T10:12:37.184Z","updated_at":"2025-08-19T10:12:37.184Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30627165,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-17T14:16:03.965Z","status":"ssl_error","status_checked_at":"2026-03-17T14:16:03.380Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv-converter","data-transformation-pipeline","json-converter","markdown-converter","pipelines","potter-wheel","record-stream","text-processing","textform","wrangling","wrangling-data"],"created_at":"2025-12-30T14:37:17.074Z","updated_at":"2026-03-17T16:32:20.813Z","avatar_url":"https://github.com/hawkfish.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# textform\n\nA data transformation pipeline module based on the seminal [Potter's Wheel](http://control.cs.berkeley.edu/pwheel-vldb.pdf) data wrangling formalism. The name is a portmanteau of \"text\" and \"transform\".\n\n## Overview\n\n`textform` (abbreviated `txf`) is a text-oriented data transformation module. With it, you can create sequential record processing _pipelines_ that convert data from (say) lines of text into records and then route the final record stream for another use (e.g, write the records to a `csv` file.)\n\nPipelines are cosntructed from a sequence of _transforms_ that take in a record and modify it in some way. For example, the `Split` transform will replace an input field with several new fields that are derived from the input by splitting on a pattern.\n\nWhile inspired by the Potter's Wheel transform list, `textform` is designed for practical everyday use. This means it includes transforms for limiting the number of rows, writing intermediate results to files and capturing via regular expressions.\n\n## Audience\n\nHow do I know if `textform` is right for me? The simplest use case is where you want to use Python's `DictReader` but the file isn't a `csv`. With `textform` you can write a pipeline that will end up producing the records you would get from `DictReader`.\n\nMore complex use cases can be built on top of this kind of record stream. Reshaping, computing values, splitting, dividing, merging, filling in blanks and other kinds of data cleaning and preparation tasks can all be implemented in a reusable fashion with `textform`. A pipeline effectively describes the format of a text file in an executable fashion that can be reused.\n\n## Example\n\nI created `textform` because I had worked on [a similar research system](https://tc19.tableau.com/learn/sessions/lets-get-physical-preparing-data-text-files) in the past and had two text files produced by the [DuckDB](https://github.com/duckdb/duckdb) performance test suite that I needed to convert into `csv`s:\n\n```\n------------------\n|| Q01_PARALLEL ||\n------------------\nCold Run...Done!\nRun 1/5...0.12345\nRun 1/5...0.12345\nRun 1/5...0.12345\nRun 1/5...0.12345\nRun 1/5...0.12345\n------------------\n|| Q02_PARALLEL ||\n------------------\n...\n```\n\nThis file is esssentially a sequence of records grouped by higher attributes. Instead of writing a one-off Python script, I decided to write some simple transforms and build a pipeline, which looked like this:\n\n```py\np = Text(sys.stdin, 'Line')                         # Read a line\np = Add(p, 'Branch', sys.argv[1])                   # Tag the file with the branch name\np = Match(p, 'Line', r'------', invert=True).       # Remove horizontal lines\np = Divide(p, 'Line', 'Query', 'Run', r'Q')         # Separate the query names from the run data\np = Fill(p, 'Query', '00')                          # Fill down the blank query names\np = Capture(p, 'Query', ('Query',), r'\\|\\|\\s+Q(\\w+)\\s+\\|\\|')  # Capture the query number\n# Split the execution mode from the query name\np = Split(p, 'Query', ('Query', 'Mode',), r'_', ('00', 'SERIAL',))\np = Cast(p, 'Query', int)                           # Cast the query number to an integer\np = Match(p, 'Run', r'\\d')                          # Filter to the runs with data\n# Capture the run components\np = Capture(p, 'Run', ('Run #', 'Run Count', 'Time',), r'(\\d+)/(\\d+)...(\\d+\\.\\d+)')\np = Cast(p, 'Run #', int)                           # Cast the run components\np = Cast(p, 'Run Count', int)\np = Cast(p, 'Time', float)\np = Write(p, sys.stdout)                            # Write the records to stdout as a csv\np.pump()\n```\n\nWe can now invoke the pipeline script as:\n\n```shell\n$ python3 pipeline.py master \u003c performance.txt \u003e performance.csv\n```\n\n## Contributing\n\nYou know the drill: Fork, branch, test submit a PR.  This is a completely open source, free as in beer project.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhawkfish%2Ftextform","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhawkfish%2Ftextform","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhawkfish%2Ftextform/lists"}