{"id":23752460,"url":"https://github.com/livelace/girie","last_synced_at":"2026-03-07T12:03:21.991Z","repository":{"id":64302499,"uuid":"330120275","full_name":"livelace/girie","owner":"livelace","description":"girie (\"go\" + \"kirie\") is a tool for data/metadata extraction from web pages.","archived":false,"fork":false,"pushed_at":"2024-07-13T06:07:47.000Z","size":58,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-04T18:25:54.641Z","etag":null,"topics":["api","dataset","etl","graphql","jsonld","microdata","microservice","nlp","opengraph","rdfa","scrapping"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/livelace.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-01-16T08:43:25.000Z","updated_at":"2024-07-13T06:07:50.000Z","dependencies_parsed_at":"2023-01-15T09:45:23.913Z","dependency_job_id":null,"html_url":"https://github.com/livelace/girie","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/livelace/girie","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livelace%2Fgirie","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livelace%2Fgirie/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livelace%2Fgirie/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livelace%2Fgirie/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/livelace","download_url":"https://codeload.github.com/livelace/girie/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livelace%2Fgirie/sbom","scorecard":{"id":594254,"data":{"date":"2025-08-11","repo":{"name":"github.com/livelace/girie","commit":"fe34b84fffa6ac6eb9ffbc833f5e988cb813b1e0"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":2.5,"checks":[{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":0,"reason":"Project has not signed or included provenance with any releases.","details":["Warn: release artifact v1.5.1 not signed: https://api.github.com/repos/livelace/girie/releases/65063896","Warn: release artifact v1.5.1 does not have provenance: https://api.github.com/repos/livelace/girie/releases/65063896"],"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Vulnerabilities","score":8,"reason":"2 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: GO-2023-1737 / GHSA-2c4m-59x9-fr2g","Warn: Project is vulnerable to: GO-2022-0588 / GHSA-x95h-979x-cf3j"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-20T22:45:23.975Z","repository_id":64302499,"created_at":"2025-08-20T22:45:23.975Z","updated_at":"2025-08-20T22:45:23.975Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30212491,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-07T09:02:10.694Z","status":"ssl_error","status_checked_at":"2026-03-07T09:02:08.429Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api","dataset","etl","graphql","jsonld","microdata","microservice","nlp","opengraph","rdfa","scrapping"],"created_at":"2024-12-31T17:52:09.471Z","updated_at":"2026-03-07T12:03:21.959Z","avatar_url":"https://github.com/livelace.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# girie\n\n***girie*** (\"go\" + \"kirie\") is a tool for data/metadata extraction from web pages.\n\n### Main goal:\n\n* To have a microservice with API ([GraphQL](https://en.wikipedia.org/wiki/GraphQL)) for ETL pipelines.  \n* Provide a plugin endpoint to other tool - [gosquito](https://github.com/livelace/gosquito).\n\n### Features:\n\n* Extract the primary article ([boilerpipe](https://github.com/kohlschutter/boilerpipe), [go-domdistiller](https://github.com/markusmobius/go-domdistiller)) from a web page (HTML and text).\n* Extract [JSON-LD](https://en.wikipedia.org/wiki/JSON-LD).\n* Extract [Microdata](https://en.wikipedia.org/wiki/Microdata_(HTML)).\n* Extract [Opengraph](https://en.wikipedia.org/wiki/Facebook_Platform#Open_Graph_protocol).\n* Extract [RDFa](https://en.wikipedia.org/wiki/RDFa).\n* Extract images from an entire page or from a page's article.\n\n### Quick start:\n\n```shell script\n# Start daemon:\nuser@localhost ~ $ docker run --name girie -ti --rm ghcr.io/livelace/girie:v1.5.0\nINFO[16.01.2021 11:38:59.101] girie v1.5.0      \nWARN[16.01.2021 11:38:59.102] config error       error=\"Config File \\\"config.toml\\\" Not Found in \\\"[/etc/girie]\\\"\"\nINFO[16.01.2021 11:38:59.102] listen :8080 \n\n# Get API IP:\nSERVER=`docker inspect -f \"{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}\" girie`\n\n\n# GET + URL:\nuser@localhost ~ $ docker exec girie curl -s -L -g --request GET \\\n'http://127.0.0.1:8080/api/?query={data(url:\"https://iz.ru/1091344/2020-11-24/effektivnost-vaktciny-sputnik-v-prevysila-95\"){article{text_spans{lang,text,tokens_amount}}}}' | jq  \n\n\n# POST + URL:\nQUERY=`cat \u003c\u003c EOF\n{\n    \"query\": \"{\n        data(url: \\\"https://iz.ru/1091344/2020-11-24/effektivnost-vaktciny-sputnik-v-prevysila-95\\\") {\n            page{\n                images{alt,height,src,width}\n            }\n        }\n    }\"\n}\nEOF\n`\n\nQUERY=`echo $QUERY | tr -d \" \\n\"`\n\ncurl -s -L -X POST \"http://${SERVER}:8080/api/?retry=3\u0026timeout=3\" \\\n--header \"Content-Type: application/json\" \\\n--data-raw \"${QUERY}\" | jq  \n\n\n# POST + HTML:\nBASE64=`curl -s \"https://iz.ru/1091344/2020-11-24/effektivnost-vaktciny-sputnik-v-prevysila-95\" | base64 -w0`\n\nQUERY=`cat \u003c\u003c EOF\n{\n    \"query\": \"{\n        data(html: \\\"${BASE64}\\\") {\n            article{\n                html,\n                images{alt,height,src,width},\n                text,\n                text_spans{lang,text,tokens_amount},\n                text_spans_append{lang,text,tokens_amount},\n                text_spans_block{lang,text,tokens_amount},\n            },\n            html,\n            url,\n            page{\n                html,\n                jsonld,\n                images{alt,height,src,width},\n                lang,\n                microdata,\n                opengraph,\n                rdfa,\n                text,\n                title\n            }\n        }\n    }\"\n}\nEOF\n`\n\necho $QUERY | tr -d \" \\n\" \u003e \"/tmp/query.json\"\n\ncurl -s -L -X POST \"http://${SERVER}:8080/api/?retry=3\u0026timeout=3\" \\\n--header \"Content-Type: application/json\" \\\n--data \"@/tmp/query.json\" | jq  \n```\n\n\n### Config example:\n\n```toml\n[default]\n\n# Options priority order (top -\u003e down):\n# 1. Configuration file.\n# 2. Environment variables.\n# 3. Query options.\n\n# env GIRIE_LISTEN=\":8080\"\n# listen = \":8080\"\n\n# env: GIRIE_PROXY=\"http://127.0.0.1:3128\"\n# url: http://127.0.0.1:8080/api/?proxy=\"http://127.0.0.1:3128\"\n# proxy = \"http://127.0.0.1:3128\"\n\n# env: GIRIE_RETRY=2\n# url: http://127.0.0.1:8080/api/?retry=2\n# retry = 2\n\n# env: GIRIE_TIMEOUT=2\n# url: http://127.0.0.1:8080/api/?timeout=2\n# timeout = 10\n\n# env: GIRIE_USER_AGENT=\"girie v1.5.0\"\n# url: http://127.0.0.1:8080/api/?user_agent=\"curl 3000\"\n# user_agent = \"girie v1.5.0\"\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flivelace%2Fgirie","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flivelace%2Fgirie","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flivelace%2Fgirie/lists"}