{"id":37065494,"url":"https://github.com/tac0x2a/lake_weed","last_synced_at":"2026-01-14T07:40:05.040Z","repository":{"id":53781198,"uuid":"215789043","full_name":"tac0x2a/lake_weed","owner":"tac0x2a","description":"Lake Weed is elastic converter for JSON, JSON Lines, and CSV string to use for constructin RDB query. ","archived":false,"fork":false,"pushed_at":"2021-03-15T14:33:51.000Z","size":119,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-30T07:55:59.238Z","etag":null,"topics":["clickhouse","csv","json","json-lines","pypi"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tac0x2a.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-10-17T12:38:57.000Z","updated_at":"2025-01-22T19:05:50.000Z","dependencies_parsed_at":"2022-08-24T01:10:34.668Z","dependency_job_id":null,"html_url":"https://github.com/tac0x2a/lake_weed","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/tac0x2a/lake_weed","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tac0x2a%2Flake_weed","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tac0x2a%2Flake_weed/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tac0x2a%2Flake_weed/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tac0x2a%2Flake_weed/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tac0x2a","download_url":"https://codeload.github.com/tac0x2a/lake_weed/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tac0x2a%2Flake_weed/sbom","scorecard":{"id":865195,"data":{"date":"2025-08-18","repo":{"name":"github.com/tac0x2a/lake_weed","commit":"2247849c0adfe944d475bddacbe7dfbbd137a4aa"},"scorecard":{"version":"v5.2.1-41-g40576783","commit":"40576783fda6698350fcbbeaea760ff827433034"},"score":3.4,"checks":[{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#dangerous-workflow"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#maintained"}},{"name":"Code-Review","score":0,"reason":"Found 0/12 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#code-review"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#packaging"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/python-app.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#token-permissions"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#binary-artifacts"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/python-app.yml:18: update your workflow using https://app.stepsecurity.io/secureworkflow/tac0x2a/lake_weed/python-app.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/python-app.yml:20: update your workflow using https://app.stepsecurity.io/secureworkflow/tac0x2a/lake_weed/python-app.yml/master?enable=pin","Warn: pipCommand not pinned by hash: .github/workflows/python-app.yml:25","Warn: pipCommand not pinned by hash: .github/workflows/python-app.yml:26","Warn: pipCommand not pinned by hash: .github/workflows/python-app.yml:27","Info:   0 out of   2 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   3 pipCommand dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#license"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#vulnerabilities"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 24 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/40576783fda6698350fcbbeaea760ff827433034/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-24T02:30:36.918Z","repository_id":53781198,"created_at":"2025-08-24T02:30:36.918Z","updated_at":"2025-08-24T02:30:36.918Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28413470,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T05:26:33.345Z","status":"ssl_error","status_checked_at":"2026-01-14T05:21:57.251Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clickhouse","csv","json","json-lines","pypi"],"created_at":"2026-01-14T07:40:04.490Z","updated_at":"2026-01-14T07:40:05.007Z","avatar_url":"https://github.com/tac0x2a.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Lake Weed\n\n![Python application](https://github.com/tac0x2a/lake_weed/workflows/Python%20application/badge.svg)\n\n![Lake Weed](./doc/img/lakeweed_s.png)\n\nLake Weed is elastic converter for JSON, JSON Lines, and CSV string to use for constructin RDB query.\nYou can get schema and convertion values just input src string.\n\n# Usage\n\n## Install package\n\n```\npip install lakeweed\n```\n\nPyPI: https://pypi.org/project/lakeweed/\n\n## Example(Json test to ClickHouse)\n\n```py\nfrom lakeweed import clickhouse\n\nsrc_json = \"\"\"\n{\n  \"array\" : [1,2,3],\n  \"array_in_array\" : [[1.1, 2.2], [3.3, 4.4]],\n  \"nested_map\" : {\"value\" : [[1,2], [3,4]]},\n  \"map_in_array\"  : [{\"v\":1}, {\"v\":2}],\n  \"dates\" : [\"2019/09/15 14:50:03.101 +0900\", \"2019/09/15 14:50:03.202 +0900\"],\n  \"date\"  : {\n    \"as_datetime\": \"2019/09/15 14:50:03.042042043 +0900\",\n    \"as_string\"  : \"2019/09/15 14:50:03.042042043 +0900\"\n  },\n  \"str\"   : \"Hello, LakeWeed\"\n}\n\"\"\"\n\n# Value types are guessed by lakeweed automatically.\n# You can use specified type if you want.\nmy_types = {\n    \"date__as_string\": \"str\"\n}\n\n(columns, types, values) = clickhouse.data_string2type_value(src_json, specified_types=my_types)\n\nprint(columns)\n# (\n#   'array',\n#   'array_in_array',\n#   'nested_map__value',\n#   'map_in_array',\n#   'dates',\n#   'date__as_datetime',\n#   'date__as_string',\n#   'str'\n# )\n\nprint(types)\n# (\n#   'Array(Float64)',\n#   'Array(String)',\n#   'Array(String)',\n#   'Array(String)',\n#   'Array(DateTime64(6))',\n#   'DateTime64(6)',\n#   'String',\n#   'String'\n# )\n\nprint(values)\n# [(\n#   [1.0, 2.0, 3.0],\n#   ['[1.1, 2.2]', '[3.3, 4.4]'],\n#   ['[1, 2]', '[3, 4]'],\n#   ['{\"v\": 1}', '{\"v\": 2}'],\n#   [\n#     datetime.datetime(2019, 9, 15, 14, 50, 3, 101000, tzinfo=tzoffset(None, 32400)),\n#     datetime.datetime(2019, 9, 15, 14, 50, 3, 202000, tzinfo=tzoffset(None, 32400))\n#   ],\n#   datetime.datetime(2019, 9, 15, 14, 50, 3, 42042, tzinfo=tzoffset(None, 32400)),\n#   '2019/09/15 14:50:03.042042043 +0900',\n#   'Hello, LakeWeed'\n# )]\n```\n\n## Example(CSV test to ClickHouse)\n\n```py\n\nsrc_csv = \"\"\"\nf,b,d\n42,true,2019/09/15 14:50:03.101 +0900\n\"42\",\"true\",2019/12/15 14:50:03.101 +0900\n\"\"\"\n\n(columns, types, values) = clickhouse.data_string2type_value(src_csv)\n\nprint(columns)\n# ('f', 'b', 'd', 'd_ns')\n\nprint(types)\n# ('Float64', 'UInt8', 'DateTime64(6)')\n\nprint(values)\n# [\n#   (42.0, 1, datetime.datetime(2019, 9, 15, 14, 50, 3, 101000, tzinfo=tzoffset(None, 32400))),\n#   (42.0, 1, datetime.datetime(2019, 12, 15, 14, 50, 3, 101000, tzinfo=tzoffset(None, 32400)))\n# ]\n```\n\n## Example(Json lines test to ClickHouse)\n\nLake Weed converts each row of JSON in the same way as a single line of json.\nAutomatically selects the type so that all data can be stored. For example, if you have a mix of Numbers and Strings, select a String type that can store both.\n\n```py\n\nsrc_json_lines = \"\"\"\n{\"f\": 42,   \"b\": true,   \"d\": \"2019/09/15 14:50:03.101 +0900\"}\n{\"f\": \"42\", \"b\": \"true\", \"d\": \"2019/12/15 14:50:03.101 +0900\"}\n\"\"\"\n\n(columns, types, values) = clickhouse.data_string2type_value(src_json_lines)\n\n\nprint(columns)\n# ('f', 'b', 'd', 'd_ns')\n\nprint(types)\n# ('String', 'String', 'DateTime64(6)')\n\n# ('String', 'String', 'DateTime', 'UInt32')\n\nprint(values)\n# [\n#   ('42', 'true', datetime.datetime(2019, 9, 15, 14, 50, 3, 101000, tzinfo=tzoffset(None, 32400))),\n#   ('42', 'true', datetime.datetime(2019, 12, 15, 14, 50, 3, 101000, tzinfo=tzoffset(None, 32400)))\n# ]\n```\n\n# Type\n\n## Lake Weed types\n\n- `Int`\n- `Float`\n- `Bool`\n- `String`\n- `DateTime` (nano seconds order)\n- `Array[Int]`\n- `Array[Float]`\n- `Array[Bool]`\n- `Array[String]`\n- `Array[DateTime]`\n\nPython default data types are used for Int, Float, Bool and String types. By default, numeric values(Int or Float) are always treated as Float.\nDateTime is expand based on `datetime.datetime` and it contains nano seconds. Please see `DateTimeWithNS` type.\n`Array[]` support above primitive types.\n\n## Specified Types\n\nIn default, Value types will be guessed by lakeweed automatically.\nIf you want enforce to use type by specified it as `specified_types` argument.\n\n```python\nmy_types = {\n    \"date__as_string\": \"str\" # field name : specified type name\n}\n(columns, types, values) = clickhouse.data_string2type_value(src_json, specified_types=my_types)\n```\n\nThese types you can use.\n\n| Specified Type String (ignore case) | Lake Weed Type |\n| :---------------------------------: | :------------: |\n|                `INT`                |     `Int`      |\n|              `INTEGER`              |     `Int`      |\n|               `FLOAT`               |    `Float`     |\n|              `DOUBLE`               |    `Float`     |\n|               `BOOL`                |     `Bool`     |\n|              `BOOLEAN`              |     `Bool`     |\n|             `DATETIME`              |   `DateTime`   |\n|                `STR`                |    `String`    |\n|              `STRING`               |    `String`    |\n\nIf it faileds to cast, the value will be NULL.\n\n## Output Data Type\n\n### Clickhouse\n\n|    Source Type    | [Clickhouse Data Types](https://clickhouse.tech/docs/en/sql-reference/data-types/) |\n| ---------------: | :--------------------------------------------------------------------------------- |\n|       `Int`       | `Int64`                                                                            |\n|      `Float`      | `Float64`                                                                          |\n|      `Bool`       | `UInt8` (True: 1, False: 0)                                                        |\n|     `String`      | `String`                                                                           |\n|    `DateTime`     | `DateTime64(6)` (Nano seconds order is ignored.)                                   |\n|   `Array(Int)`    | `Array(Int64)`                                                                     |\n|  `Array(Float)`   | `Array(Float64)`                                                                   |\n|   `Array(Bool)`   | `Array(UInt8)`                                                                     |\n|  `Array(String)`  | `Array(String)`                                                                    |\n| `Array(DateTime)` | `Array(DateTime64(6))`                                                             |\n\n# Release PyPI\n\n## Setup\n\n### Create `~/.pypirc`\n\n```ini\n[distutils]\nindex-servers =\n  pypi\n  testpypi\n\n[pypi]\nrepository: https://upload.pypi.org/legacy/\nusername: \u003cProduction Acciont Name\u003e\npassword: \u003cPassword\u003e\n\n[testpypi]\nrepository: https://test.pypi.org/legacy/\nusername: \u003cTesting Account Name\u003e\npassword: \u003cPassword\u003e\n```\n\n### Install packages for build and deploy\n\n```sh\npip install wheel twine\n```\n\n## Build and Deploy\n\n### Make Package\n\n```sh\nrm -f -r lakeweed.egg-info/* dist/*\npython setup.py sdist bdist_wheel\n```\n\n### Local testing\n\n```sh\npython setup.py develop\n```\n\n### Deploy to PyPI\n\n```sh\n# for testing\ntwine upload --repository testpypi dist/*\n# open https://test.pypi.org/project/lakeweed/\n\n# for production\ntwine upload --repository pypi dist/*\n# open https://pypi.org/project/lakeweed/\n```\n\n# Contributing\n\n1. Fork it ( https://github.com/tac0x2a/lake_weed/fork )\n2. Create your feature branch (`git checkout -b my-new-feature`)\n3. Commit your changes (`git commit -am 'Add some feature'`)\n4. Push to the branch (`git push origin my-new-feature`)\n5. Create a new Pull Request\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftac0x2a%2Flake_weed","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftac0x2a%2Flake_weed","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftac0x2a%2Flake_weed/lists"}