{"id":35175629,"url":"https://github.com/lignum-vitae/pyrolysate","last_synced_at":"2026-04-29T03:04:10.548Z","repository":{"id":274661526,"uuid":"893958109","full_name":"lignum-vitae/pyrolysate","owner":"lignum-vitae","description":"API and CLI that convert URLs and emails to CSV, JSON, or text with outputs to console or to a file","archived":false,"fork":false,"pushed_at":"2026-04-02T03:27:01.000Z","size":249,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-04-02T16:46:10.896Z","etag":null,"topics":["csv","email","json","python","python3","url","url-parser"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/pyrolysate/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc0-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lignum-vitae.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-11-25T14:01:22.000Z","updated_at":"2026-04-02T03:27:04.000Z","dependencies_parsed_at":"2025-10-07T23:27:51.395Z","dependency_job_id":"68ba258a-865d-48bc-88c4-ec6454cc22af","html_url":"https://github.com/lignum-vitae/pyrolysate","commit_stats":null,"previous_names":["dawnandrew100/pyrolysate","lignum-vitae/pyrolysate"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/lignum-vitae/pyrolysate","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lignum-vitae%2Fpyrolysate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lignum-vitae%2Fpyrolysate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lignum-vitae%2Fpyrolysate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lignum-vitae%2Fpyrolysate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lignum-vitae","download_url":"https://codeload.github.com/lignum-vitae/pyrolysate/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lignum-vitae%2Fpyrolysate/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32408447,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-29T02:37:21.628Z","status":"ssl_error","status_checked_at":"2026-04-29T02:36:50.947Z","response_time":110,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","email","json","python","python3","url","url-parser"],"created_at":"2025-12-28T22:15:02.256Z","updated_at":"2026-04-29T03:04:10.540Z","avatar_url":"https://github.com/lignum-vitae.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Static Badge](https://img.shields.io/badge/Project_Name-Pyrolysate-blue)](https://github.com/lignum-vitae/pyrolysate)\n[![Python Version from PEP 621 TOML](https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2Flignum-vitae%2Fpyrolysate%2Fmain%2Fpyproject.toml)](https://github.com/lignum-vitae/pyrolysate)\n[![PyPI version](https://img.shields.io/pypi/v/pyrolysate.svg)](https://pypi.org/project/pyrolysate/)\n[![GitHub License](https://img.shields.io/github/license/lignum-vitae/pyrolysate)](https://github.com/lignum-vitae/pyrolysate/blob/main/LICENSE)\n[![GitHub branch check runs](https://img.shields.io/github/check-runs/lignum-vitae/pyrolysate/main)](https://github.com/lignum-vitae/pyrolysate)\n\n# Pyrolysate\n\nPyrolysate is a Python library and CLI tool for parsing and validating URLs and\nemail addresses.\nIt breaks down URLs and emails into their component parts, validates against\nIANA's official TLD list,\nand outputs structured data in JSON, CSV, or text format.\n\nThe library offers both a programmer-friendly API and a command-line interface,\nmaking it suitable for both development integration and quick data processing tasks.\nIt handles single entries or large datasets efficiently using Python's\ngenerator functionality,\nand provides flexible input/output options including file processing with\ncustom delimiters.\n\n## Features\n\n### URL Parsing\n\n- Extract scheme, subdomain, domain, TLD, port, path, query, and fragment components\n- Support for complex URL patterns including ports, queries, and fragments\n- Support for IP addresses in URLs\n- Support for both direct input and file processing via CLI or API\n- Output as JSON, CSV, or text format through CLI or API\n\n### Email Parsing\n\n- Extract local, mail server, and domain components\n- Support for plus addressing (e.g., user+tag@domain.com)\n- Support for both direct input and file processing via CLI or API\n- Output as JSON, CSV, or text format through CLI or API\n\n### Top Level Domain Validation\n\n- Automatic updates from IANA's official TLD list\n- Local TLD file caching for offline use\n- Fallback to common TLDs if both online and local sources fail\n\n### Flexible Input/Output\n\n- Process single or multiple entries\n- Support for government domain emails (.gov.tld)\n- Custom delimiters for file input\n- Multiple output formats with .txt format as default (JSON, CSV, text)\n- Pretty-printed or minified JSON output\n- Console output or file saving options\n- Memory-efficient processing of large datasets using Python generators\n- Support for compressed input files:\n  - ZIP archives (processes all text files within .zip)\n  - GZIP (.gz)\n  - BZIP2 (.bz2)\n  - LZMA (.xz, .lzma)\n\n### Developer Friendly\n\n- Type hints for better IDE support\n- Comprehensive docstrings\n- Modular design for easy integration\n- Command-line interface for quick testing\n\n## API Reference\n\n### Email Class\n\n| Method                                           | Parameters                                              | Description                    |\n|---------------------                             |---------------------                                    |-----------------               |\n| `parse_email(email_str)`                         | `email_str: str`                                        | Parses single email address    |\n| `parse_email_array(emails)`                      | `emails: list[str]`                                     | Parses list of email addresses |\n| `to_json(emails, prettify=True)`                 | `emails: str\\|list[str]`, `prettify: bool`              | Converts to JSON format        |\n| `to_json_file(file_name, emails, prettify=True)` | `file_name: str`, `emails: list[str]`, `prettify: bool` | Converts and saves JSON to file|\n| `to_csv(emails)`                                 | `emails: str\\|list[str]`                                | Converts to CSV format         |\n| `to_csv_file(file_name, emails)`                 | `file_name: str`, `emails: list[str]`                   | Converts and saves CSV to file |\n\n### URL Class\n\n| Method                                         | Parameters                                            | Description                                               |\n|------------------                              |----------------------                                 |-------------------                                        |\n| `parse_url(url_str, tlds=[])`                  | `url_str: str`, `tlds: list[str]`                     | Parses single URL                                         |\n| `parse_url_array(urls, tlds=[])`               | `urls: list[str]`, `tlds: list[str]`                  | Parses list of URLs                                       |\n| `to_json(urls, prettify=True)`                 | `urls: str\\|list[str]`, `prettify: bool`              | Converts to JSON format                                   |\n| `to_json_file(file_name, urls, prettify=True)` | `file_name: str`, `urls: list[str]`, `prettify: bool` | Converts and saves JSON to file                           |\n| `to_csv(urls)`                                 | `urls: str\\|list[str]`                                | Converts to CSV format                                    |\n| `to_csv_file(file_name, urls)`                 | `file_name: str`, `urls: list[str]`                   | Converts and saves CSV to file                            |\n\n### Miscellaneous\n\n| Method                                          | Parameters                               | Description                                                                          |\n|------------------                               |----------------------                    |-------------------------                                                             |\n| `file_to_list(input_file_name, delimiter='\\n')` | `input_file_name: str`, `delimiter: str` | Parses input file into python list by delimiter                                      |\n| `get_tlds_from_iana`                            |                                          | Fetches latest top level domains from IANA                                           |\n| `get_tlds_from_local`                           | `path_to_tlds_file: str`                 | Fetches tlds from local file. Defaults to project's local file if path not specified |\n\n## CLI Reference\n\n| Argument               | Type   | Value when argument is omitted| Description                        |\n|------------------------|--------|--------------------           |------------------------------------|\n| `target`               | `str`  | `None`                        | Email or URL string(s) to process  |\n| `-u`, `--url`          | `flag` | `False`                       | Specify URL input                  |\n| `-e`, `--email`        | `flag` | `False`                       | Specify Email input                |\n| `-i`, `--input_file`   | `str`  | `None`                        | Input file name with extension     |\n| `-o`, `--output_file`  | `str`  | `None`                        | Output file name without extension |\n| `-c`, `--csv`          | `flag` | `False`                       | Save output as CSV format          |\n| `-j`, `--json`         | `flag` | `False`                       | Save output as JSON format         |\n| `-np`, `--no_prettify` | `flag` | `False`                       | Turn off prettified JSON output    |\n| `-d`, `--delimiter`    | `str`  | `'\\n'`                        | Delimiter for input file parsing   |\n\n### Input File Support\n\n| Format | Extension  | Description                    |\n|--------|------------|--------------------------------|\n| Text   | .txt       | Plain text files               |\n| Log    | .log       | Plain text log files           |\n| CSV    | .csv       | Comma-separated values         |\n| ZIP    | .zip       | Archives containing text files |\n| GZIP   | .gz        | GZIP compressed files          |\n| BZIP2  | .bz2       | BZIP2 compressed files         |\n| LZMA   | .xz, .lzma | LZMA compressed files          |\n\n## Output Types\n\n### Email Parse Output\n\n| Field        | Description                   | Example            |\n|--------------|-------------------------------|--------------------|\n| input        | Full email                    | user+tag@gmail.com |\n| local        | Part before + or @ symbol     | user               |\n| plus_address | Optional part between + and @ | tag                |\n| mail_server  | Domain before TLD             | gmail              |\n| domain       | Top-level domain              | com                |\n\nExample output:\n\n```json\n{\"user+tag@gmail.com\":\n    {\n    \"local\": \"user\",\n    \"plus_address\": \"tag\",\n    \"mail_server\": \"gmail\",\n    \"domain\": \"com\"\n    }\n}\n```\n\n```csv\nemail,local,plus_address,mail_server,domain\nuser+tag@gmail.com,user,tag,gmail,com\n```\n\n### URL Parse Output\n\n| Field               | Description      | Example   |\n|--------------       |---------------   |---------  |\n| scheme              | Protocol         | https     |\n| subdomain           | Domain prefix    | www       |\n| second_level_domain | Main domain      | example   |\n| top_level_domain    | Domain suffix    | com       |\n| port                | Port number      | 443       |\n| path                | URL path         | blog/post |\n| query               | Query parameters | q=test    |\n| fragment            | URL fragment     | section1  |\n\nExample output:\n\n```json\n{\"https://www.example.com:443/blog/post?q=test#section1\":\n    {\n    \"scheme\": \"https\",\n    \"subdomain\": \"www\",\n    \"second_level_domain\": \"example\",\n    \"top_level_domain\": \"com\",\n    \"port\": \"443\",\n    \"path\": \"blog/post\",\n    \"query\": \"q=test\",\n    \"fragment\": \"section1\"\n    }\n}\n```\n\n```csv\nurl,scheme,subdomain,second_level_domain,top_level_domain,port,path,query,fragment\nhttps://www.example.com:443/blog/post?q=test#section1,https,www,example,com,443,blog/post,q=test,section1\n```\n\n## 🚀 Installation\n\n### From PyPI\n\n```bash\npip install pyrolysate\n```\n\n### For Development\n\n1. **Clone the repository**\n\n```bash\ngit clone https://github.com/dawnandrew100/pyrolysate.git\ncd pyrolysate\n```\n\n2. **Create and activate a virtual environment**\n\n```bash\n# Using hatch (recommended)\nhatch env create\n\n# Or using venv\npython -m venv .venv\n# Windows\n.venv\\Scripts\\activate\n# Unix/MacOS\nsource .venv/bin/activate\n```\n\n3. **Install in development mode**\n\n```bash\n# Using hatch\nhatch run dev\n\n# Or using pip\npip install -e .\n```\n\n### Verify Installation\n\n```bash\n# Using hatch (recommended)\nhatch run pyro -u example.com\n\n# Or using the CLI directly\npyro -u example.com\n```\n\nThe CLI command `pyro` will be available after installation. If the command isn't found, ensure Python's Scripts directory is in your PATH.\n\n## Usage\n\n### Input File Parsing\n\n```python\nfrom pyrolysate import file_to_list\n```\n\n#### Parse file with default newline delimiter\n\n```python\nurls = file_to_list(\"urls.txt\")\n```\n\n#### Parse file with custom delimiter\n\n```python\nemails = file_to_list(\"emails.csv\", delimiter=\",\")\n```\n\n### Supported Outputs\n\n- JSON (prettified or minified)\n- CSV\n- Text (default)\n- File output with custom naming\n- Console output\n\n### Email Parsing\n\n```python\nfrom pyrolysate import email\n```\n\n#### Parse single email\n\n```python\nresult = email.parse_email(\"user@example.com\")\n```\n\n#### Parse plus addressed email\n\n```python\nresult = email.parse_email(\"user+tag@example.com\")\n```\n\n#### Parse multiple emails\n\n```python\nemails = [\"user1@example.com\", \"user2@agency.gov.uk\"]\nresult = email.parse_email_array(emails)\n```\n\n#### Convert to JSON\n\n```python\njson_output = email.to_json(\"user@example.com\")\njson_output = email.to_json([\"user1@example.com\", \"user2@example.com\"])\n```\n\n#### Save to JSON file\n\n```python\nemail.to_json_file(\"output\", \"user@example.com\")\nemail.to_json_file(\"output\", [\"user1@example.com\", \"user2@test.org\"])\n```\n\n#### Convert to CSV\n\n```python\ncsv_output = email.to_csv(\"user@example.com\")\ncsv_output = email.to_csv([\"user1@example.com\", \"user2@test.org\"])\n\n```\n\n#### Save to CSV file\n\n```python\nemail.to_csv_file(\"output\", \"user@example.com\")\nemail.to_csv_file(\"output\", [\"user1@example.com\", \"user2@test.org\"])\n```\n\n### URL Parsing\n\n```python\nfrom pyrolysate import url\n```\n\n#### Parse single URL\n\n```python\nresult = url.parse_url(\"https://www.example.com/path?q=test#fragment\")\n```\n\n#### Parse multiple URLs\n\n```python\nurls = [\"example.com\", \"https://www.test.org\"]\nresult = url.parse_url_array(urls)\n```\n\n#### Convert to JSON\n\n```python\njson_output = url.to_json(\"example.com\")\njson_output = url.to_json([\"example.com\", \"test.org\"])\n```\n\n#### Save to JSON file\n\n```python\nurl.to_json_file(\"output\", \"example.com\")\nurl.to_json_file(\"output\", [\"example.com\", \"test.org\"])\n```\n\n#### Convert to CSV\n\n```python\ncsv_output = url.to_csv(\"example.com\")\ncsv_output = url.to_csv([\"example.com\", \"test.org\"])\n\n```\n\n#### Save to CSV file\n\n```python\nurl.to_csv_file(\"output\", \"example.com\")\nurl.to_csv_file(\"output\", [\"example.com\", \"test.org\"])\n```\n\n### Command Line Interface\n\n#### CLI help\n\n```bash\npyro -h\n```\n\n#### Parse single URL\n\n```bash\npyro -u example.com\n```\n\n#### Parse multiple URLs\n\n```bash\npyro -u example1.com example2.com\n```\n\n#### Parse URLs from file (one per line by default)\n\n```bash\npyro -u -i urls.txt\n```\n\n#### Parse URLs from CSV file with comma delimiter\n\n```bash\npyro -u -i urls.csv -d \",\"\n```\n\n#### Parse email with plus addressing\n\n```bash\npyro -e user+newsletter@example.com\n```\n\n#### Parse multiple emails and save as JSON\n\n```bash\npyro -e user1@example.com user2@example.com -j -o output\n```\n\n#### Parse URLs from file and save as CSV\n\n```bash\npyro -u -i urls.txt -c -o parsed_urls\n```\n\n#### Parse emails from file with comma delimiter\n\n```bash\npyro -e -i emails.txt -d \",\" -o output\n```\n\n#### Parse emails with non-prettified JSON output\n\n```bash\npyro -e user@example.com -j -np\n```\n\n#### Parse different file types\n\n```bash\n# Parse log file\npyro -u -i server.log\n\n# Parse compressed log file\npyro -u -i server.log.gz\n\n# Parse BZIP2 compressed file\npyro -e -i emails.txt.bz2\n\n# Parse ZIP archive containing logs and text files\npyro -u -i archive.zip\n```\n\n## Supported Formats\n\n### Email Formats\n\n- Standard: `example@mail.com`\n- Plus Addresses: `example+tag@mail.com`\n- Government: `example@agency.gov.uk`\n\n### URL Formats\n\n- Basic: `example.com`\n- With subdomain: `www.example.com`\n- With scheme: `https://example.org`\n- With path: `example.com/path/to/file.txt`\n- With port: `example.com:8080`\n- With query: `example.com/search?q=test`\n- With fragment: `example.com#section1`\n- IP addresses: `192.168.1.1:8080`\n- Government domains: `agency.gov.uk`\n- Full complex URLs: `https://www.example.gov.uk:8080/path?q=test#section1`\n\n### Input File Support\n\n- Plain text files (.txt)\n- Plain text log files (.log)\n- Comma-separated values (.csv)\n- ZIP archives containing text files (.zip)\n- GZIP compressed files (.gz)\n- BZIP2 compressed files (.bz2)\n- LZMA compressed files (.xz, .lzma)\n\n#### ZIP Archive Support\n\n- Processes all text files within the archive (.txt, .csv, .log)\n- Handles nested directories\n- Continues processing if some files are corrupted\n- UTF-8 encoding expected for text files\n\n### Outputs\n\n- Text file (default)\n- JSON file (prettified or minified)\n- CSV file\n- Console output\n\n\u003e [!IMPORTANT]\n\u003e This library handles email address comments by removing them\n\u003e from the final output\n\n\u003e [!CAUTION]\n\u003e - This library does not specially handle emails containing double quotes.\n\u003e Double quotes are valid in the local part of an email, but many modern\n\u003e email systems either block or mark emails with quotes as spam.\n\u003e - Make sure that `requests` is installed before running `get_tlds_from_iana`.\n\n\u003e [!WARNING]\n\u003e This library is designed and tested to handle http and https urls. Other forms of url may return undefined results.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flignum-vitae%2Fpyrolysate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flignum-vitae%2Fpyrolysate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flignum-vitae%2Fpyrolysate/lists"}