{"id":18507900,"url":"https://github.com/educationaltestingservice/toefl-spell","last_synced_at":"2026-01-23T07:06:24.319Z","repository":{"id":48978081,"uuid":"190021949","full_name":"EducationalTestingService/TOEFL-Spell","owner":"EducationalTestingService","description":"Corpus of Annotations for Misspelings ","archived":false,"fork":false,"pushed_at":"2023-07-31T15:34:36.000Z","size":180,"stargazers_count":27,"open_issues_count":1,"forks_count":3,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-10-11T13:55:28.947Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EducationalTestingService.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-06-03T14:29:22.000Z","updated_at":"2025-10-03T08:24:58.000Z","dependencies_parsed_at":"2022-08-30T05:52:01.394Z","dependency_job_id":null,"html_url":"https://github.com/EducationalTestingService/TOEFL-Spell","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/EducationalTestingService/TOEFL-Spell","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EducationalTestingService%2FTOEFL-Spell","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EducationalTestingService%2FTOEFL-Spell/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EducationalTestingService%2FTOEFL-Spell/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EducationalTestingService%2FTOEFL-Spell/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EducationalTestingService","download_url":"https://codeload.github.com/EducationalTestingService/TOEFL-Spell/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EducationalTestingService%2FTOEFL-Spell/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28682293,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-23T05:48:07.525Z","status":"ssl_error","status_checked_at":"2026-01-23T05:48:07.129Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T15:12:47.441Z","updated_at":"2026-01-23T07:06:24.283Z","avatar_url":"https://github.com/EducationalTestingService.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# TOEFL-Spell\nA dataset of Spelling Annotations for English language learner essays written for TOEFL exams.\n\nThis repository contains the TOEFL-Spell annotated data set. The data is described in the paper\n*A Benchmark Corpus of English Misspellings and a Minimally-supervised Model for Spelling Correction*\n [(Flor, Fried \u0026 Rozovskaya, 2019)](https://www.aclweb.org/anthology).\n \n\nThe TOEFL-Spell data set contains annotations of 6000+ spelling errors from\nessays written by non-native speakers of English taking the TOEFL iBT test.\n\nWe based our data set on the publicly available ETS\nCorpus of Non-Native Written English, a.k.a. TOEFL11,\nwhich contains 12,100 essays from 11 first language backgrounds.\nWe sampled 883 essays from that corpus and manually annotated them for spelling errors.\n\nWe provide two files: FilesCounts.tsv and Annotations.tsv  (both have tab-separated values, first line is header). \n(There are now also FilesCounts.csv and Annotations.csv, that have the same data in csv format).\n\n*FilesCounts.tsv* contains the names of annotated files (essays) and the count of spelling errros for each. Note that 35 essays had no spelling errors.\n\nThe file *Annotations.tsv* contains tab-separated annotations for all the data.\nEach annotation appears on a separate line, like this:\n\nFilename | OffsetSpan | Misspelling | Type | Correction\n-------- | ---------- | ----------- | ---- | ----------\n1004135 |\t1186-1193\t| beacuse |\tM\t| because\n\nThe value of the *Filename* field matches the corresponding text file in the full TOEFL11 corpus.\nThe value in the *span* field gives the offset of the misspelling in the original text file.\n\nIn order to appreciate the annotations in full context of the original essay (or to run your own experiments),\nyou will need to obtain the essays from the Linguistic Data Consortium (LDC Catalog Number: [LDC2014T06](https://catalog.ldc.upenn.edu/LDC2014T06)) and link them to the annotation via filenames and offset values.\n\n### A note on 'types' of misspellings ###\n\nThe article mentions 6121 misspelings, where each is a single token nonword.\nThose are marked as type M in the annotation.\nThe annotation file has 112 additional misspellings, with other type names (M2,MWM,MWM2), which were marked for addiitoanl research.\nOnly type M misspellings were used in system evaluations reported in the paper.\n\nQuestions? Send email to mflor@ets.org\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feducationaltestingservice%2Ftoefl-spell","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feducationaltestingservice%2Ftoefl-spell","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feducationaltestingservice%2Ftoefl-spell/lists"}