{"id":31805175,"url":"https://github.com/klima7/pol-spider","last_synced_at":"2025-10-30T08:41:20.744Z","repository":{"id":232364689,"uuid":"698295913","full_name":"klima7/Pol-Spider","owner":"klima7","description":"Polish translation of spider dataset.","archived":false,"fork":false,"pushed_at":"2024-05-12T17:36:14.000Z","size":13491,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-05-12T18:32:46.571Z","etag":null,"topics":["machine-learning","polish","spider","text-to-sql","text2sql","translation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/klima7.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-29T15:37:46.000Z","updated_at":"2024-05-12T18:32:49.016Z","dependencies_parsed_at":"2024-05-12T18:43:02.029Z","dependency_job_id":null,"html_url":"https://github.com/klima7/Pol-Spider","commit_stats":null,"previous_names":["klima7/polish-spider","klima7/pol-spider"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/klima7/Pol-Spider","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/klima7%2FPol-Spider","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/klima7%2FPol-Spider/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/klima7%2FPol-Spider/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/klima7%2FPol-Spider/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/klima7","download_url":"https://codeload.github.com/klima7/Pol-Spider/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/klima7%2FPol-Spider/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279005957,"owners_count":26084004,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-11T02:00:06.511Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","polish","spider","text-to-sql","text2sql","translation"],"created_at":"2025-10-11T02:47:00.905Z","updated_at":"2025-10-11T02:47:06.559Z","avatar_url":"https://github.com/klima7.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Pol-Spider 🕷️\n\nThis repository provides translation of [Spider](https://yale-lily.github.io/spider), [CoSQL](https://yale-lily.github.io/cosql), [SParC](https://yale-lily.github.io/sparc), [Spider-DK](https://github.com/ygan/Spider-DK), [Spider-Syn](https://github.com/ygan/Spider-Syn) datasets into Polish and code for some experiments.\n\n📄 Associated master thesis: [download link](https://github.com/klima7/Master-Thesis/releases/download/submit/master-thesis.pdf).\n\n## Ready datasets\nPolish translations are ready to download from [Hugging Face Datasets](https://huggingface.co/datasets/klima7/Pol-Spider/tree/main) 🤗\n\n## Datasets synthesis\n`datasets` directory contains scripts for dataset synthesis\n\n### Setup environment\n```bash\n# clone repository\nhttps://github.com/klima7/Polish-Spider\n\n# create environment\nconda create -n pol-spider python=3.19\nconda activate pol-spider\npip install -r requirements.txt\n\n# download spacy model\npython -m spacy download xx_sent_ud_sm\n```\n\nThen download oryginal english databases from [here](https://huggingface.co/datasets/klima7/Pol-Spider/blob/main/_database.zip) and place inside `datasets/components/database`\n\n### Example dataset synthesis\nSynthesize dataset named `pol-spider-en`, which is based on samples from `spider`. Translate questions to polish. Apply `context-curated` translation to schema names. Translate strings in SQL queries to polish:\n```bash\npython datasets/scripts/synthesize.py spider pol-spider-en \\\n  --question-lang pl \\\n  --schema-translation context-curated \\\n  --query-lang pl \\\n  --with-db\n```\n\n### Joining datasets\nCreate `pol-spider` dataset by joining `pol-spider-en` and `pol-spider-pl`:\n```bash\npython datasets/scripts/join.py pol-spider pol-spider-en pol-spider-pl\n```\n\n## App\n`app` directory contains streamlit app, which allows to use `C3SQL` and `RESDSQL` models easily.\n\n![app_image](app/image.png)\n\n### Starting app\nTo use `RESDSQL` model downloading weights from [Hugging Face](https://huggingface.co/klima7/Pol-Spider-App) 🤗 and placing inside `app/models` is required.\n```bash\ncd app\ndocker compose up --build\n```\n\n## Experiments\n`experiments` directory contains dockerized code for experiments with `RAT-SQL`, `BRIDGE`, `RESDSQL`, `C3`.\n\n## Evaluation\n`evaluation` directory contains code for calculating metrics.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fklima7%2Fpol-spider","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fklima7%2Fpol-spider","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fklima7%2Fpol-spider/lists"}