{"id":30669035,"url":"https://github.com/anthonybench/datapeek","last_synced_at":"2026-03-02T22:46:04.633Z","repository":{"id":142304117,"uuid":"582159753","full_name":"anthonybench/datapeek","owner":"anthonybench","description":"Peek summary of datafile in a succinct, opinionated manner.","archived":false,"fork":false,"pushed_at":"2025-05-23T22:56:17.000Z","size":78,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-31T09:33:37.894Z","etag":null,"topics":["cli","data","data-analysis"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/sleepydatapeek/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/anthonybench.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-12-25T23:18:56.000Z","updated_at":"2025-06-25T18:52:37.000Z","dependencies_parsed_at":null,"dependency_job_id":"0a60a62d-7c5c-43fe-aeb5-3030a4ba970f","html_url":"https://github.com/anthonybench/datapeek","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/anthonybench/datapeek","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anthonybench%2Fdatapeek","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anthonybench%2Fdatapeek/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anthonybench%2Fdatapeek/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anthonybench%2Fdatapeek/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/anthonybench","download_url":"https://codeload.github.com/anthonybench/datapeek/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anthonybench%2Fdatapeek/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273060932,"owners_count":25038594,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-31T02:00:09.071Z","response_time":79,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","data","data-analysis"],"created_at":"2025-09-01T01:01:14.263Z","updated_at":"2026-03-02T22:46:04.565Z","avatar_url":"https://github.com/anthonybench.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# **sleepydatapeek**\n*A quick way to peek at local datafiles.*\n\n\u003cbr /\u003e\n\n## **Welcome to sleepydatapeek!**\nOne often needs to spit out a configurable preview of a data file. It would also be nice if said tool could detect and read several formats automatically.\\\n**`sleepydatapeek`** has entered the chat!\n\nQuickly summarize data files of type:\n- `csv`\n- `parquet`\n- `json`\n- `pkl`\n- `xlsx`\n\nAnd glance metadata for files:\n- `pdf`\n- `png`\n- `jpg`|`jpeg`\n\n\u003e ℹ️ Note that this tool presumes format by file extension. If you leave out extensions, or give csv data a `.json` extension for funsies, then you're being silly.\n\n\u003e ℹ️ Due to how metadata formats vary across file types, how metadata is presented varies.\n\n\u003e ℹ️ For further configuration options, see the [sleepyconfig](#sleepyconfig) section below.\n\n\u003cbr /\u003e\n\n## **Get Started 🚀**\n\n```sh\npip install sleepydatapeek\npip install --upgrade sleepydatapeek\n\npython -m sleepydatapeek --help\npython -m sleepydatapeek data.csv\npython -m sleepydatapeek doc.pdf\n```\n\n\u003cbr /\u003e\n\n## **Usage ⚙**\n\nSet a function in your shell environment to run a script like:\n```sh\nalias datapeek='python -m sleepydatapeek'\n```\n\nPresuming you've named said macro `datapeek`, print the help message:\n```sh\n$ datapeek data.xlsx\n\n════════════════════ data.xlsx ════════════════════\n      Unnamed: 0    CustomerID  ProductName      Quantity  OrderDate      Price\n--  ------------  ------------  -------------  ----------  -----------  -------\n 0             0           101  Laptop                  2  2023-10-26      1200\n 1             1           102  Mouse                   1  2023-10-26        25\n 2             2           103  Keyboard                1  2023-10-27        50\n 3             3           104  Monitor                 1  2023-10-27       300\n 4             4           105  Headphones              3  2023-10-28        80\n\n═══Summary Stats\n╭──────────────┬─────────────────╮\n│ Index Column │ (no_name):int64 │\n├──────────────┼─────────────────┤\n│ Row Count    │ 30              │\n├──────────────┼─────────────────┤\n│ Column Count │ 6               │\n├──────────────┼─────────────────┤\n│ Memory Usage │ \u003c 0.00 bytes    │\n╰──────────────┴─────────────────╯\n\n═══Schema\n╭─────────────┬────────╮\n│ Unnamed: 0  │ int64  │\n├─────────────┼────────┤\n│ CustomerID  │ int64  │\n├─────────────┼────────┤\n│ ProductName │ object │\n├─────────────┼────────┤\n│ Quantity    │ int64  │\n├─────────────┼────────┤\n│ OrderDate   │ object │\n├─────────────┼────────┤\n│ Price       │ int64  │\n╰─────────────┴────────╯\n═══════════════════════════════════════════════════\n\n```\n\nOptionally, you can also get group-by counts for distinct values of a given column:\n```sh\n$ datapeek test.xlsx --groupby-count-column=ProductName\n\n# typical output (elided)\n\n═══Groupby Counts\n  (row counts for distinct values of ProductName)\n╭──────────────┬───╮\n│ Laptop       │ 3 │\n├──────────────┼───┤\n│ Mouse        │ 3 │\n├──────────────┼───┤\n│ Keyboard     │ 3 │\n├──────────────┼───┤\n│ Monitor      │ 3 │\n├──────────────┼───┤\n│ Headphones   │ 3 │\n├──────────────┼───┤\n│ USB Drive    │ 3 │\n├──────────────┼───┤\n│ Printer      │ 3 │\n├──────────────┼───┤\n│ Webcam       │ 3 │\n├──────────────┼───┤\n│ Speakers     │ 3 │\n├──────────────┼───┤\n│ External HDD │ 3 │\n╰──────────────┴───╯\n═══════════════════════════════════════════════════\n\n```\n\nYou can check metadata for certain file types too:\n```txt\n$ datapeek resume.pdf\n\n📄test.pdf\n╭──────────────┬─────────────────────────────────╮\n│ CreationDate │ D:20250306111007-06'00'         │\n├──────────────┼─────────────────────────────────┤\n│ Creator      │ Adobe InDesign 20.1 (Macintosh) │\n├──────────────┼─────────────────────────────────┤\n│ ModDate      │ D:20250306111048-06'00'         │\n├──────────────┼─────────────────────────────────┤\n│ Producer     │ Adobe PDF Library 17.0          │\n├──────────────┼─────────────────────────────────┤\n│ Trapped      │ /False                          │\n├──────────────┼─────────────────────────────────┤\n│ Length       │ 48 pages                        │\n╰──────────────┴─────────────────────────────────╯\n\n```\n\n\u003cbr /\u003e\n\n## **SleepyConfig**\nYou can personalize a few aspects of datapeek's behavior via a file strictly named `~/.sleepyconfig/params.yml`. Paste the following into said file, and tinker to your liking:\n```yml\ndatapeek_sample_size: 5\ndatapeek_table_style: 'rounded_grid'\ndatapeek_max_terminal_width: 80\n```\n\nAll other *sleepytools* use this file as well. Browse [my PyPI](https://pypi.org/user/sleepyboy/) if you're interested!\n\n\u003cbr /\u003e\n\n## **Technologies 🧰**\n\n  - [Pandas](https://pandas.pydata.org/docs/)\n  - [Tabulate](https://pypi.org/project/tabulate/)\n  - [Typer](https://typer.tiangolo.com/)\n  - [PyArrow](https://arrow.apache.org/docs/python/index.html)\n  - [openpyxl](https://pypi.org/project/openpyxl/)\n  - [PyPDF2](https://pypdf2.readthedocs.io/en/stable/)\n  - [PIllow](https://pypi.org/project/pillow/)\n\n\u003cbr /\u003e\n\n## **Contribute 🤝**\n\nIf you have thoughts on how to make the tool more pragmatic, submit a PR 😊.\n\nTo add support for more data/file types:\n1. append extension name to `supported_formats` in `sleepydatapeek_toolchain.params.py`\n2. add detection logic branch to the `main` function in `sleepydatapeek_toolchain/command_logic.py`\n3. update this readme\n\n\u003cbr /\u003e\n\n## **License, Stats, Author 📜**\n\n\u003cimg align=\"right\" alt=\"example image tag\" src=\"https://i.imgur.com/ZHnNGeO.png\" width=\"200\" /\u003e\n\n\u003c!-- badge cluster --\u003e\n![PyPI - License](https://img.shields.io/pypi/l/sleepydatapeek?style=plastic)\n![PyPI - Version](https://img.shields.io/pypi/v/sleepydatapeek)\n![GitHub repo size](https://img.shields.io/github/repo-size/anthonybench/datapeek)\n\u003c!-- / --\u003e\n\nSee [License](LICENSE) for the full license text.\n\nThis package was authored by *Isaac Yep*. \\\n👉 [GitHub](https://github.com/anthonybench/datapeek) \\\n👉 [PyPI](https://pypi.org/project/sleepydatapeek/)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanthonybench%2Fdatapeek","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fanthonybench%2Fdatapeek","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanthonybench%2Fdatapeek/lists"}