{"id":50615298,"url":"https://github.com/howwohmm/fetchgram","last_synced_at":"2026-06-06T07:30:59.062Z","repository":{"id":361836744,"uuid":"1256055072","full_name":"howwohmm/fetchgram","owner":"howwohmm","description":"era-adjusted Instagram content intelligence — scrape any public profile, OCR every image, measure what actually works. free, local, no API keys.","archived":false,"fork":false,"pushed_at":"2026-06-01T12:29:11.000Z","size":33,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-01T14:15:50.772Z","etag":null,"topics":["analytics","cli","content-strategy","data","instagram","ocr","python","scraper"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/howwohmm.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-01T12:22:08.000Z","updated_at":"2026-06-01T12:29:50.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/howwohmm/fetchgram","commit_stats":null,"previous_names":["howwohmm/fetchgram"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/howwohmm/fetchgram","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/howwohmm%2Ffetchgram","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/howwohmm%2Ffetchgram/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/howwohmm%2Ffetchgram/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/howwohmm%2Ffetchgram/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/howwohmm","download_url":"https://codeload.github.com/howwohmm/fetchgram/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/howwohmm%2Ffetchgram/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33973868,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-06T02:00:07.033Z","response_time":107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analytics","cli","content-strategy","data","instagram","ocr","python","scraper"],"created_at":"2026-06-06T07:30:58.435Z","updated_at":"2026-06-06T07:30:59.051Z","avatar_url":"https://github.com/howwohmm.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# fetchgram\n\n**see what _actually_ works on any instagram account.**\n\n`fetchgram` pulls any public profile, reads the text off every image, and scores each post against its own era — so a growing account never gets mistaken for good content. free, local, no api keys.\n\n`free` · `runs locally` · `no api keys` · `mac · linux · windows` · `MIT`\n\n```\n$ pipx install git+https://github.com/howwohmm/fetchgram\n$ fetchgram analyze nike\n```\n\n---\n\n## the 11.5× that wasn't\n\nyou've seen the thread: _\"carousels get 11× more likes — just post more carousels.\"_\n\ni ran that exact claim on a real wellness brand (904 posts). here's what the raw numbers said, vs what was actually true once you divide out the account's growth that year:\n\n```\ncontent type     raw lift          era-adjusted        the truth\n─────────────────────────────────────────────────────────────────\ncarousel         11.5×        →    1.01×               dead average\nsingle image      0.09×       →    1.00×               dead average\nreel              1.02×       →    0.99×               dead average\n\nconfound  (post age ↔ likes):   -0.864   →   -0.017     (removed)\n```\n\nthe famous \"carousel lift\" was **~90% just the account tripling its following** that year. measured against its own contemporaries, the format did *nothing*.\n\n**raw engagement is a clock, not a verdict.** almost every \"what works on instagram\" take is survivorship bias wearing a lab coat. fetchgram is the tool that controls for it — and that table above is real, unedited output.\n\n---\n\n## the fix (it's embarrassingly simple)\n\ncompare each post to what that same account posted **±45 days around it** — not its all-time average. growth and recency cancel out, and you start measuring the content instead of the calendar.\n\nthat's it. that's the whole trick. most tools skip it.\n\n## what it does\n\n- **reads the words inside the images.** most scrapers grab the caption and stop. fetchgram OCRs every frame — quote graphics, carousel slides, on-image text all become searchable data. Apple Vision on mac, tesseract everywhere else.\n- **scores each post against its own era.** the part most tools skip (see above). you see content effects, not calendar effects.\n- **free, local, yours.** no api keys, no cloud, no account. runs on your machine; the data never leaves it.\n- **clean data, ready for anything.** out comes a text corpus, a training-ready `jsonl`, a `metrics.json`, and a plain-english `SIGNAL.md`. drop any of it into an LLM and ask your own questions.\n\n## 60 seconds to your first teardown\n\n```\npipx install git+https://github.com/howwohmm/fetchgram\ninstaloader -l YOUR_IG_USERNAME          # log in once — Instagram blocks anonymous access\nfetchgram analyze nike --login YOUR_IG_USERNAME\n```\n\none command runs the whole pipeline: scrape → OCR → corpus → era-adjusted report. open `fetchgram-data/nike/signal/SIGNAL.md` and read.\n\n\u003e **now run it on the account whose advice you've been copying.** if their \"secret\" survives era-adjustment — great, copy away. if it evaporates like the 11.5× did, you just saved yourself a quarter of wasted posting. either way, tell me what you find.\n\n## use it as a Claude skill\n\nprefer talking to it? drop [`skills/fetchgram/SKILL.md`](skills/fetchgram/SKILL.md) into `~/.claude/skills/fetchgram/` and just say:\n\n\u003e **analyze @nike**\n\nClaude installs the CLI if needed, runs the pipeline, and hands you the era-adjusted teardown (and a clean report). the CLI is the engine; the skill is the conversational layer.\n\n## how it works\n\n1. **scrape** — `fetchgram analyze \u003chandle\u003e` pulls the profile, images only, rate-limit friendly.\n2. **read** — OCRs every image, groups carousel slides, builds the corpus.\n3. **signal** — era-adjusts the engagement and writes you the report.\n\n## install\n\n```\npipx install git+https://github.com/howwohmm/fetchgram\n```\n\nor with pip:\n\n```\npip install git+https://github.com/howwohmm/fetchgram\n```\n\n\u003e _PyPI release (`pipx install fetchgram`) coming soon._\n\n**OCR dependencies:**\n\n| platform | what to do |\n|---|---|\n| macOS | just works — fetchgram compiles the bundled Apple Vision binary on first run (needs Xcode CLI tools: `xcode-select --install`) |\n| Linux / Windows | `pip install pytesseract` + install `tesseract-ocr` from your package manager (`apt install tesseract-ocr` / `choco install tesseract`) |\n| anywhere | `--ocr none` to skip OCR and use captions only |\n\n## usage\n\n```\n# full pipeline (recommended)\nfetchgram analyze \u003chandle\u003e [--login U] [--count N] [--out DIR] [--ocr auto|vision|tesseract|none]\n\n# individual steps\nfetchgram scrape  \u003chandle\u003e [--login U] [--count N] [--out DIR] [--force]\nfetchgram ocr     \u003chandle\u003e [--out DIR] [--ocr ...]\nfetchgram metrics \u003chandle\u003e [--out DIR]\n```\n\n```\nfetchgram analyze nike --login you           # log in first — IG 403s anonymous\nfetchgram analyze patagonia --login myuser   # logged in = more posts, less throttling\nfetchgram analyze someaccount --count 200    # last 200 posts only\nfetchgram analyze brand --ocr none           # captions only (fast)\nfetchgram analyze brand --out ~/data/ig      # custom output dir\n```\n\n\u003e **login is required.** Instagram now 403s anonymous graphql requests. log in once with `instaloader -l \u003cyour_ig_username\u003e` (creates a reusable session), then pass `--login \u003cyour_ig_username\u003e`. use your own account, at a sane volume — heavy scraping gets the session throttled.\n\n## output layout\n\n```\nfetchgram-data/\u003chandle\u003e/\n├── raw/        the posts (instaloader format)\n├── text/\n│   ├── corpus.txt        full text corpus\n│   ├── training.jsonl    one record per post (training-ready)\n│   └── posts/            per-post readable .txt files\n└── signal/\n    ├── metrics.json      every number (structured)\n    └── SIGNAL.md         human-readable report\n```\n\n### `training.jsonl` record schema\n\n```json\n{\n  \"date\":       \"2024-03-15T12:30:00+00:00\",\n  \"shortcode\":  \"ABC123\",\n  \"url\":        \"https://www.instagram.com/p/ABC123/\",\n  \"likes\":      4821,\n  \"comments\":   63,\n  \"num_slides\": 5,\n  \"images\":     [\"2024-03-15_12-30-00_UTC_1.jpg\", \"...\"],\n  \"caption\":    \"the caption text\",\n  \"slides\":     [\"slide 1 OCR text\", \"slide 2 OCR text\", \"...\"],\n  \"text\":       \"merged slides + caption (clean, boilerplate stripped)\"\n}\n```\n\n### what `SIGNAL.md` tells you\n\n- **confound correction** — `spearman(age, likes)` before and after era-normalisation. close to 0 after = the growth confound is gone.\n- **content type** — reel vs single vs carousel: raw lift vs era-adjusted lift vs p90 ceiling.\n- **vocabulary** — Monroe z log-odds, top-tercile vs bottom-tercile era posts. the words that travel with hits (and flops), cleaned of stopwords / calendar / brand noise.\n- **CTA, caption length, weekday, opening word** — all era-adjusted, most groups gated at n≥20 (hook-words at n≥12).\n- **exemplars** — current-scale hits (era ≥ 1.5× *and* likes ≥ median) and the top era-outliers.\n- **raw engagement stats** — gini, p90/p99, virality multiple, hit rate.\n\n## fair use\n\npublic profiles only. rate-limited. for research and personal use. respect Instagram's terms of service — don't be weird with it. fetchgram is a measurement tool, not a growth-hack farm.\n\n## contributing\n\nPRs welcome. keep deps minimal (`instaloader` + `pillow` + `pytesseract` + stdlib only).\n\n## license\n\nMIT — see [LICENSE](LICENSE).\n\n---\n\n_if fetchgram changed how you read a feed, a ⭐ helps the next person find it._\n\n**the obvious numbers lie.**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhowwohmm%2Ffetchgram","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhowwohmm%2Ffetchgram","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhowwohmm%2Ffetchgram/lists"}