{"id":50643388,"url":"https://github.com/impresso/impresso-mulitlingual-dictionary-annotations","last_synced_at":"2026-06-07T10:31:08.907Z","repository":{"id":362290935,"uuid":"1252315596","full_name":"impresso/impresso-mulitlingual-dictionary-annotations","owner":"impresso","description":null,"archived":false,"fork":false,"pushed_at":"2026-06-03T14:12:47.000Z","size":113,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-03T14:15:31.369Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/impresso.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-28T11:55:53.000Z","updated_at":"2026-06-03T14:05:48.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/impresso/impresso-mulitlingual-dictionary-annotations","commit_stats":null,"previous_names":["impresso/impresso-mulitlingual-dictionary-annotations"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/impresso/impresso-mulitlingual-dictionary-annotations","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/impresso%2Fimpresso-mulitlingual-dictionary-annotations","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/impresso%2Fimpresso-mulitlingual-dictionary-annotations/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/impresso%2Fimpresso-mulitlingual-dictionary-annotations/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/impresso%2Fimpresso-mulitlingual-dictionary-annotations/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/impresso","download_url":"https://codeload.github.com/impresso/impresso-mulitlingual-dictionary-annotations/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/impresso%2Fimpresso-mulitlingual-dictionary-annotations/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34018404,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-07T02:00:07.652Z","response_time":124,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-07T10:31:08.245Z","updated_at":"2026-06-07T10:31:08.901Z","avatar_url":"https://github.com/impresso.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Multilingual Dictionary Seed Annotation\n\nSmall terminal tool for validating German-to-target word pairs.\n\nPurpose: create better seed word pairs for aligning monolingual static word embeddings, improving the final multilingual embedding space and dictionary.\n\n## Files\n\n- `pivot_seed_candidates_1to1_clustered_500x4.jsonl`: candidate word pairs\n- `annotate_seed_candidates.py`: terminal annotation script\n- `annotations/seed_annotations.json`: shared output file created while annotating\n\n## Run\n\nFirst time only, clone the repository:\n\n```bash\ngit clone git@github.com:impresso/impresso-mulitlingual-dictionary-annotations.git\ncd impresso-mulitlingual-dictionary-annotations\n```\n\nBefore starting, always pull the latest annotations:\n\n```bash\ngit pull\n```\n\n```bash\npython annotate_seed_candidates.py\n```\n\nPlease read the instructions printed by the script before starting.\n\nThe displayed words are normalized forms, not necessarily original surface forms.\n\nFor each language pair, enter how many new examples to annotate. Enter `0` to skip a pair.\n\nDuring annotation:\n\n- `t` = correct translation\n- `f` = wrong translation\n- `s` = skip if you do not know the word or are very unsure; it does not count, and another random pair is shown\n- `b` = go back\n- `q` = quit and save\n\nSkipped pairs are not saved as annotations, so the number you enter means the number of `t`/`f` decisions you will contribute.\n\nAnnotation rules:\n\n- Focus on the semantics of the two words. If the target word is overall a correct semantic translation of the source word, mark it as true.\n- Ignore capitalization and OCR/spelling errors if the intended word is clear.\n- Ignore inflectional differences if the meaning is otherwise correct: tense, singular/plural, gender, and grammatical case such as nominative, accusative, dative, or genitive.\n- If either word is in the wrong language for its column, mark it as false.\n- If the two words are identical, mark it as false.\n- For words with multiple meanings, judge the most common meaning of each word. Mark true if the common meanings match.\n- Mark false if the match only works through a rare or unusual meaning of one word.\n\n## After Annotating\n\nPush your changes so the next annotator starts from the latest file:\n\n```bash\ngit add annotations/seed_annotations.json\ngit commit -m \"Added X new checked pairs - NAME\"\ngit push\n```\n\nReplace `X` with the number of new `t`/`f` decisions you added, and replace `NAME` with your name.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimpresso%2Fimpresso-mulitlingual-dictionary-annotations","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fimpresso%2Fimpresso-mulitlingual-dictionary-annotations","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimpresso%2Fimpresso-mulitlingual-dictionary-annotations/lists"}