{"id":22619584,"url":"https://github.com/davemolk/gogetjs","last_synced_at":"2025-04-11T15:21:03.923Z","repository":{"id":41195689,"uuid":"494649757","full_name":"davemolk/goGetJS","owner":"davemolk","description":"a tool for extracting, searching, and saving JavaScript files (with optional headless browser)","archived":false,"fork":false,"pushed_at":"2022-09-15T11:44:11.000Z","size":990,"stargazers_count":42,"open_issues_count":0,"forks_count":8,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-25T11:22:07.546Z","etag":null,"topics":["extract","go","golang","goquery","hacking","javascript","osint","parser","pentesters","playwright","recon","scraping"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/davemolk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-05-21T01:27:00.000Z","updated_at":"2025-02-15T14:49:43.000Z","dependencies_parsed_at":"2022-07-14T10:21:49.598Z","dependency_job_id":null,"html_url":"https://github.com/davemolk/goGetJS","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davemolk%2FgoGetJS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davemolk%2FgoGetJS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davemolk%2FgoGetJS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davemolk%2FgoGetJS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/davemolk","download_url":"https://codeload.github.com/davemolk/goGetJS/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248429129,"owners_count":21101786,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["extract","go","golang","goquery","hacking","javascript","osint","parser","pentesters","playwright","recon","scraping"],"created_at":"2024-12-08T22:06:30.711Z","updated_at":"2025-04-11T15:21:03.899Z","avatar_url":"https://github.com/davemolk.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# goGetJS\n[![License](https://img.shields.io/badge/License-MIT-blue.svg)](http://opensource.org/licenses/MIT)\n[![Go Report Card](https://goreportcard.com/badge/github.com/davemolk/goGetJS)](https://goreportcard.com/report/github.com/davemolk/goGetJS)\n[![contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/davemolk/goGetJS/issues)\n\ngoGetJS extracts, searches, and saves JavaScript files. Includes an optional chromium headless browser (playwright) for dealing with JavaScript-heavy sites.\n\n![demo](demo.gif)\n\n## Overview\n* goGetJS scrapes a given page for script tags, visits each src, and writes the contents to an individual file.\n* If a script tag doesn't include an src attribute, goGetJS scrapes everything between the script tags and writes the contents to an individual file.\n* All src are also saved to a text file.\n* goGetJS (optionally) uses playwright to handle JavaScript-heavy sites and retrieve async scripts. Use -b.\n* Add some extra waiting time with -ew to allow the network to settle and grab those longer loading async scripts.\n* Use -term, -regex, and -terms, respectively, to scan each script for a specific word, with a regular expression, or with a list of words (input as a file).\n* goGetJS does not follow redirects by default, but this can be toggled with -redirect=true.\n\n## Example Usages (use browser and search each script for a list of terms in search.txt)\n```\ngo run ./cmd/goGetJS -u https://go.dev -b -terms search.txt\n```\n```\necho https://go.dev | goGetJS -b -terms search.txt\n```\n\n## Command-line Options\n```\nUsage of goGetJS:\n  -b bool\n    \tUse chromium headless browser (powered by playwright). Default is false.\n  -bt int\n    \tTimeout for headless browser. Default is 10000 ms. Must also activate browser via -b.\n  -ew int\n    \tPlaywright considers a page loaded after the network has been idle for at least 500ms. Use this flag (in ms) to add time. \n  -proxy string\n    \tProxy to use on requests.\n  -redirect bool\n    \tAllow redirects. Default is false.\n  -regex string\n    \tParse each script for the supplied regular expression. Any matches will be saved and exported as a json file.\n  -rt int\n    \tTimeout for retries. Default is 1000ms.\n  -t int\n    \tRequest timeout (in milliseconds). Default is 5000.\n  -term\tstring\n        Parse each script for the supplied word. Any matches will be saved and exported as a json file.\n  -terms string\n    \tName of .txt file containing a list of search terms (one per line). Any matches will be saved and exported as a json file. \n  -u string\n    \tURL to extract JS files from.\n```\n\n## Installation\nFirst, you'll need to [install go](https://golang.org/doc/install).\n\nThen run this command to download + compile goGetJS:\n```\ngo install github.com/davemolk/goGetJS/cmd/goGetJS@latest\n```\n\n## Additional Notes\n* goGetJS names JavaScript files with ```fName := regexp.MustCompile(`[\\w-\u0026]+(\\.js)?$`)```. Most scripts play nice, but those that don't are still saved. Each saved script has the full URL prepended to the file.\n* Occasionally, an src will link to an empty page. These are automatically retried (set a timeout for these retries with -rt). Typically, these pages are legitimately blank, causing the number of saved files printed to the terminal to be fewer than the number of processed files. Sometimes we're lucky though, and the successful retry will be searched and saved.\n\n## Changelog\n*    **2022-09-15** : Release 1.0. \n*    **2022-08-26** : Add proxy, redirect, and rt flags. Refactor client creation. Improve error handling throughout. \n*    **2022-08-20** : Move from %v to %w for handling errors with fmt.Errorf. Move everything to milliseconds.\n\n## Support\n* Like goGetJS? Use it, star it, and share with your friends!\n    - Let me know what you're up to so I can feature your work here.\n* Want to see a particular feature? Found a bug? Question about usage or documentation?\n    - Please raise an issue.\n* Pull request?\n    - Please discuss in an issue first. \n\n## Built With\n* https://github.com/PuerkitoBio/goquery\n* https://github.com/playwright-community/playwright-go\n\n## License\n* goGetJS is released under the MIT license. See [LICENSE](LICENSE) for details.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavemolk%2Fgogetjs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdavemolk%2Fgogetjs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavemolk%2Fgogetjs/lists"}