{"id":16958460,"url":"https://github.com/dohliam/html-table2text","last_synced_at":"2025-04-14T09:20:29.318Z","repository":{"id":88992046,"uuid":"203299441","full_name":"dohliam/html-table2text","owner":"dohliam","description":"HTML Table to Text - Extract and convert HTML tables to plain text formats","archived":false,"fork":false,"pushed_at":"2023-11-12T14:32:20.000Z","size":10,"stargazers_count":3,"open_issues_count":1,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-27T22:42:39.820Z","etag":null,"topics":["asciidoctor-converter","csv","csv-converter","csv-format","html-conversion","html-converter","html-tables","html-to-markdown","markdown-converter","tsv-converter"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dohliam.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-08-20T04:26:59.000Z","updated_at":"2022-10-07T09:42:35.000Z","dependencies_parsed_at":"2024-11-27T20:40:16.001Z","dependency_job_id":null,"html_url":"https://github.com/dohliam/html-table2text","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dohliam%2Fhtml-table2text","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dohliam%2Fhtml-table2text/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dohliam%2Fhtml-table2text/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dohliam%2Fhtml-table2text/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dohliam","download_url":"https://codeload.github.com/dohliam/html-table2text/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248852186,"owners_count":21171843,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asciidoctor-converter","csv","csv-converter","csv-format","html-conversion","html-converter","html-tables","html-to-markdown","markdown-converter","tsv-converter"],"created_at":"2024-10-13T22:42:39.622Z","updated_at":"2025-04-14T09:20:29.306Z","avatar_url":"https://github.com/dohliam.png","language":"Ruby","readme":"# HTML Table to Text - Extract and convert HTML tables to plain text formats\n\nThis is a simple (optionally interactive) script that can extract any or all tables from a given HTML file or URL. The data can be output to CSV (comma-separated values), TSV (tab-separated values), Markdown, Asciidoc, or raw HTML.\n\n## Requirements\n\nThis script relies on [Nokogiri](https://nokogiri.org/) to parse HTML. You can install it with:\n\n    gem install nokogiri\n\nMarkdown conversion uses [reverse_markdown](https://github.com/xijo/reverse_markdown) which can be installed the same way:\n\n    gem install reverse_markdown\n\nAsciidoc conversion uses the [reverse_adoc](https://github.com/metanorma/reverse_adoc) gem:\n\n    gem install reverse_adoc\n\n## Usage\n\nTo extract tables from an arbitrary URL, just run the `webtable_to_text` script with the `-u` option followed by the URL:\n\n    ./webtable_to_text.rb -u [URL]\n\nFor example:\n\n    ./webtable_to_text.rb -u \"https://en.wikipedia.org/wiki/Gabon\"\n\nThis will print out all the tables found on the specified page.\n\nTo output a specific table only, use the `n` option, followed by the number of the table:\n\n    ./webtable_to_text.rb -u \"https://en.wikipedia.org/wiki/Gabon\" -n 3\n\nThe script also works with local files, using the `f` option, e.g.:\n\n    ./webtable_to_text.rb -f some_file.html\n\n### Interactive mode\n\nTo use interactive mode, add the `-i` option to the command and specify a URL or file as normal. For example:\n\n    ./webtable_to_text.rb -u \"https://en.wikipedia.org/wiki/Gabon\" -i\n\nThis will print a message with the total number of tables found in the document. If you enter a number at the prompt, it will print the corresponding table. Otherwise, pressing ENTER or RETURN will print all tables found.\n\nFor example, pressing `3` will print something like the following:\n\n    Population in Gabon\n    Year\tMillion \n    1950\t0.5 \n    2000\t1.2 \n    2016\t2\n\nTo run tests, just enter the following command:\n\n    ruby tests.rb\n\n### Options\n\nThe following options are available:\n\n* `-A`, `--all`: _Print all tables found on the specified page_\n* `-a`, `--asciidoc`: _Output in asciidoc/asciidoctor format_\n* `-c`, `--csv`: _Output in CSV / comma separated values format_\n* `-f`, `--file FILE`: _Specify HTML input file as source for extracting tables_\n* `-h`, `--help`: _Print help text_\n* `-i`, `--interactive`: _Interactive mode_\n* `-m`, `--markdown`: _Output in markdown format_\n* `-n`, `--number NUM`: _Print specific table number only; separate multiple numbers with commas_\n* `-o`, `--output FILE`: _Specify output file (default: output to STDOUT)_\n* `-r`, `--raw`: _Output raw table HTML_\n* `-t`, `--tsv`: _Output in TSV / tab separated values format (default)_\n* `-u`, `--url URL`: _Specify URL as source for extracting tables_\n\n## To do\n\n* ~~add options and non-interactive mode~~\n* ~~output to raw HTML~~\n* ~~output to Markdown~~\n* ~~output to AsciiDoc~~\n\n## Credits\n\n* Table extraction to CSV based on [this Stack Overflow answer](https://stackoverflow.com/a/1403325) by user audiodude.\n\n## License\n\nMIT.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdohliam%2Fhtml-table2text","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdohliam%2Fhtml-table2text","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdohliam%2Fhtml-table2text/lists"}