{"id":22719072,"url":"https://github.com/lemariva/squirelcrawl","last_synced_at":"2026-03-08T15:35:39.672Z","repository":{"id":50582891,"uuid":"105665143","full_name":"lemariva/SquirelCrawl","owner":"lemariva","description":"This code compress a webpage into an html file. Images are converted to base64 and integrated together with CSS files in the html. Useful for webpages on microcontrollers (or low memory devices), a complete offline copy of a webpage etc.","archived":false,"fork":false,"pushed_at":"2017-10-03T21:01:46.000Z","size":22,"stargazers_count":19,"open_issues_count":0,"forks_count":4,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-13T17:58:16.842Z","etag":null,"topics":["compression-algorithm","esp32","micropython","website","wifi-hacking"],"latest_commit_sha":null,"homepage":"https://lemariva.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lemariva.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-10-03T14:59:23.000Z","updated_at":"2024-12-21T03:15:45.000Z","dependencies_parsed_at":"2022-09-11T07:01:40.375Z","dependency_job_id":null,"html_url":"https://github.com/lemariva/SquirelCrawl","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/lemariva/SquirelCrawl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lemariva%2FSquirelCrawl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lemariva%2FSquirelCrawl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lemariva%2FSquirelCrawl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lemariva%2FSquirelCrawl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lemariva","download_url":"https://codeload.github.com/lemariva/SquirelCrawl/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lemariva%2FSquirelCrawl/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260029128,"owners_count":22948103,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compression-algorithm","esp32","micropython","website","wifi-hacking"],"created_at":"2024-12-10T14:11:27.098Z","updated_at":"2025-10-19T20:48:50.475Z","avatar_url":"https://github.com/lemariva.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SquirelCrawl\n\nThis code compress a webpage into an html file. Images are converted to base64 and integrated together with CSS files in the html. Useful for webpages on microcontrollers (or low memory devices), a complete offline copy of a webpage etc.\n\nRequirements\n-------------------\nThe application was tested using [Python 2.7](https://www.python.org/download/releases/2.7/). The following libraries are required:\n\n* [beautifulsoup4](https://pypi.python.org/pypi/beautifulsoup4)\n* [requests](http://docs.python-requests.org/en/master/)\n* [cssutils](https://pypi.python.org/pypi/cssutils/)\n* [tinycss](https://pypi.python.org/pypi/tinycss)\n* [htmlmin](https://pypi.python.org/pypi/htmlmin/)\n* [jsmin](https://pypi.python.org/pypi/jsmin)\n* [mincss2](https://pypi.python.org/pypi/mincss) (modified mincss library included)\n* [pillow](https://pypi.python.org/pypi/Pillow)\n\nThese can be installed using [pip](https://packaging.python.org/tutorials/installing-packages/) as:\n```\npip install \u003clibrary\u003e\n```\nA tutorial for installing `pip` on Windows can be found [here](https://github.com/BurntSushi/nfldb/wiki/Python-\u0026-pip-Windows-installation). `pip` can be downloaded from [get-pip](https://pip.pypa.io/en/stable/installing/).\n\nUse\n--------------------\n```\npython squirelcrawl --url \u003chttp(s)://...\u003e --path \u003cfolder\u003e\n```\nOptional arguments are the following:\n* `-iq (def.: 5)`: the images are compress before converted to base64, the option defines the image quality for the compression. Pillow library is used for the compression;\n* `-csd (def.: 0)`: the css are crawled to search for image links (basically `background(-image): url(...)`). Only the used classes (in the html file) are searched. If this option is set, then all classes are crawled. This may need a lot of time;\n* `-ie (def.: 1)`: the converted images are saved as `txt` in the `base64/` folder;\n* `--mcss (def.: 0)`: unused css classes are removed using the `mincss` library. This option reduces substantially the size of the resulting html. But it does not always work great;\n* `--cjs (def.: 0)`: if set, remove all `\u003cscript\u003e...\u003c/script\u003e` sections. Be careful, bootstrap css may need some JavaScript to look great;\n* `--cmeta (def.: 0)`: if set, remove all `\u003cmeta .../\u003e` sections; \n* `--clink (def.: 0)`: if set, remove all `\u003clink .../\u003e` sections;\n* `--clink (def.: 0)`: if set, remove all `\u003ca ...\u003e...\u003c/a\u003e` sections (including the texts)\n* `--calinkref (def.: 1)`: if set, replace all `href` of the `\u003ca\u003e` sections with `javascript:a_links()` allowing to use a javascript to actuate while clicking;\n* `--d (def.: 1`: if set, debugging info is displayed;\n* `--overlay (def.: 0)`: combines the overlay(-body).[css, js, html] files in the html file (**);\n\n(**) `squirelcrawl.py` requires the following files, if the `--overlay` option is set to `1`:\n\n* `overlay.html` - two overlay divs are included, for submit/link purposes respectively (included at the end of the body section)\n* `overlay.css` - includes the style for overlay.html (included in header section)\n* `overlay.js` - includes the necessarily JavaScript for an overlay (included in header section)\n* `overlay-body.js` - add the form actions to display the overlay (included at the end of the body section)\n\nThe files `overlay.html` and `overlay.css` can be modified by the user to include contain to the webpage.\n\nThe `path` folder is created and all files related are saved under this folder. Images and css files are saved in almost the same file structure of the website. A folder `base64` is created if the option `-ie` is set, and the converted to base64 images are saved as `txt` files. Two files are generated (if `--mcss`) is set:\n* `index.html`: compressed webpage including css and images as based64 strings.\n* `index-compressed.html`: same as `index.html` but the `\u003cstyle\u003e` sections (which include the css files) are compressed using `mincss` library. The file results very small, but it does not always work (style problems). \n\nDisclaimers\n------------\nThe author of the code assumes no responsibility for users' decision-making and their code usage. \n\nLicense\n--------------\nApache 2.0\n\nChangelog\n-----------\nRevision 0.1: Initial submission.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flemariva%2Fsquirelcrawl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flemariva%2Fsquirelcrawl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flemariva%2Fsquirelcrawl/lists"}