{"id":17609892,"url":"https://github.com/emmiegit/wikidot-css-extractor","last_synced_at":"2026-03-04T05:31:18.721Z","repository":{"id":65712543,"uuid":"403172415","full_name":"emmiegit/wikidot-css-extractor","owner":"emmiegit","description":"Extracts styling and includes from SCP Wiki pages for Technical Team use.","archived":false,"fork":false,"pushed_at":"2025-12-31T23:34:10.000Z","size":911180,"stargazers_count":6,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-05T14:17:41.540Z","etag":null,"topics":["crom","scp-wiki","wikidot"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/emmiegit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2021-09-04T23:07:55.000Z","updated_at":"2025-12-31T22:44:25.000Z","dependencies_parsed_at":"2023-02-19T04:31:07.997Z","dependency_job_id":"84ae67c3-7976-47be-87f9-8be6547fdbee","html_url":"https://github.com/emmiegit/wikidot-css-extractor","commit_stats":null,"previous_names":[],"tags_count":22,"template":false,"template_full_name":null,"purl":"pkg:github/emmiegit/wikidot-css-extractor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emmiegit%2Fwikidot-css-extractor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emmiegit%2Fwikidot-css-extractor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emmiegit%2Fwikidot-css-extractor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emmiegit%2Fwikidot-css-extractor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/emmiegit","download_url":"https://codeload.github.com/emmiegit/wikidot-css-extractor/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emmiegit%2Fwikidot-css-extractor/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30072491,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-04T05:13:31.218Z","status":"ssl_error","status_checked_at":"2026-03-04T05:10:24.293Z","response_time":59,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crom","scp-wiki","wikidot"],"created_at":"2024-10-22T17:10:33.256Z","updated_at":"2026-03-04T05:31:18.688Z","avatar_url":"https://github.com/emmiegit.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# wikidot-css-extractor\n\nAn ad hoc system to pull style and component info from pages on the SCP Wiki. It uses the [Crom API](https://api.crom.avn.sh/) as its data source, causing this to be easily adapted by any wiki which is backed up by Crom.\n\nIt looks for inline styling, CSS modules, included pages, and CSS classes. In the absence of proper Wikidot tools to understand what styling is used on a site, this can help fill that gap.\n\nRequires Python 3.8+.\n\n**You can see the collected data here: https://emmiegit.github.io/wikidot-css-extractor/**\n\n### Execution\n\n#### Setup\n\nFirst, you need to install all the Python dependencies:\n\n```\n$ pip install -r requirements.txt\n```\n\nThen you need to edit `config.toml` to have the settings appropriate for your site.\nYou can see `config-en.toml` for an example with EN, or `config-all.toml` that pulls all sites.\nThe file must be copied or symlinked to `config.toml` to work.\nUsually this is just editing `sites` to have the Wikidot names for your site. (e.g. `fondationscp` for FR)\n\n#### Fetch\n\nFor any of the other tools to work, you will want a downloaded local copy of all the page sources.\nYou pull this using `fetch.py`. This can take several minutes, depending on the size of your site.\n\n```\n$ ./fetch.py\n```\n\nThere will now be a JSON file in `output/` with the filename specified in `config.toml` (default `output/results.json`).\n\n#### Search\n\nIf you are interested in searching through the gathered JSON data, you can use `grep.py`. (See also: [grep](https://en.wikipedia.org/wiki/Grep))  \nHere is its usage information:\n\n```\nusage: grep.py [-h] [-i] [-v] [--compact] [--color {always,never,auto}] pattern [path]\n\ngrep for wikidot sites\n\npositional arguments:\n  pattern               The regular expression to search for\n  path                  The file containing page sources to look through\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -i, --ignore-case     Whether to ignore case when searching\n  -v, --invert-match    Invert the sense of matching, selecting all lines which don't match\n  --compact             Whether to display the results in compact / line mode\n  --color {always,never,auto}, --colour {always,never,auto}\n                        Whether to use colors to highlight results\n```\n\nAn example would be:\n\n```\n$ ./grep.py -i 'module redirect'\n```\n\nWhich would find all instances of \"module redirect\" across all pages, case-insensitively.\n\n#### HTML Report\n\nTo generate the HTML report visible, run the builder:\n\n```\n$ ./build.py\n```\n\nThe generated HTML files are in `output/`.\n\n#### Publishing to GitHub Pages\n\nIf this repository is a fork, and you can push to it, you can publish a [GitHub Pages](https://pages.github.com/) site using:\n\n```\n$ ./publish.sh\n```\n\nThis may take some time due to the size of the files. The large JSON blob (`output/results.json`) is _not_ uploaded.\n\n### Composition\n\nThis repository has a few scripts:\n\n* `fetch.py` retrieves all page sources via the Crom API, extracting styles and other information.\n* `build.py` builds a static HTML page which contains the scraped information in a readable way. Presently this information is hosted on this repository's GitHub pages site.\n* `publish.sh` takes the data created by `fetch.js` and `build.py` and pushes them to the `gh-pages` branch. You can do this manually, if you prefer.\n* `grep.py` permits searching over all pages, as if using `grep` over a Wikidot site.\n\nPreviously it made use of these scripts:\n\n* `scraper.js` runs through pages and looks for any CSS. Any styles, as well as the entire page sources are written to `extracted-styles.json`.\n* `merge.js` is able to merge different JSON files into one. Because the scraper can continue off from incomplete jobs (anything remaining in `extracted-styles.json`), this can be used to take incomplete results and combine them.\n\nThis was prior to the switch of using the Crom API to retrieve Wikidot page sources instead of relying on scraping.\n\n### Licensing\n\nThis code is available under the terms of the MIT License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Femmiegit%2Fwikidot-css-extractor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Femmiegit%2Fwikidot-css-extractor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Femmiegit%2Fwikidot-css-extractor/lists"}