{"id":16662216,"url":"https://github.com/arademaker/sick","last_synced_at":"2026-04-27T10:31:12.917Z","repository":{"id":66932315,"uuid":"279628843","full_name":"arademaker/sick","owner":"arademaker","description":null,"archived":false,"fork":false,"pushed_at":"2022-06-29T18:39:11.000Z","size":17891,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-12-27T20:58:26.650Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/arademaker.png","metadata":{"files":{"readme":"README.org","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-07-14T15:50:49.000Z","updated_at":"2022-06-29T18:47:52.000Z","dependencies_parsed_at":"2023-05-14T16:00:14.760Z","dependency_job_id":null,"html_url":"https://github.com/arademaker/sick","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/arademaker/sick","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arademaker%2Fsick","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arademaker%2Fsick/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arademaker%2Fsick/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arademaker%2Fsick/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/arademaker","download_url":"https://codeload.github.com/arademaker/sick/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arademaker%2Fsick/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32333193,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-26T23:26:28.701Z","status":"online","status_checked_at":"2026-04-27T02:00:06.769Z","response_time":128,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-12T10:37:18.293Z","updated_at":"2026-04-27T10:31:12.891Z","avatar_url":"https://github.com/arademaker.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"#+title: treebanking SICK dataset with ERG\n\n** data\n\n- https://www.aclweb.org/anthology/L14-1314/\n- http://marcobaroni.org/composes/sick.html\n- https://zenodo.org/record/2787612#.X0E1SS2z31A\n\nalso mentioned at http://nlpprogress.com/english/semantic_textual_similarity.html\n\nSICK contains 9,841 pairs of sentences, the text file contains one line for each pair:\n\n#+BEGIN_EXAMPLE\n            pair_ID: 1\n         sentence_A: A group of kids is playing in a yard and an old man is standing in the background\n         sentence_B: A group of boys in a yard is playing and a man is standing in the background\n   entailment_label: NEUTRAL\n  relatedness_score: 4.5\n      entailment_AB: A_neutral_B\n      entailment_BA: B_neutral_A\nsentence_A_original: A group of children playing in a yard, a man in the background.\nsentence_B_original: A group of children playing in a yard, a man in the background.\n sentence_A_dataset: FLICKR\n sentence_B_dataset: FLICKR\n        SemEval_set: TRAIN\n#+END_EXAMPLE\n\nConsidering the sentences only for treebanking, we have many\nrepetitions. For instance, the sentence 'A man is playing a guitar'\noccurs in 63 pairs.\n\n#+BEGIN_EXAMPLE\n% awk -F \"\\t\" -v OFS=\"\\n\" 'NR \u003e 1 {print $2,$3}' SICK.txt | wc -l\n   19680\n% awk -F \"\\t\" -v OFS=\"\\n\" 'NR \u003e 1 {print $2,$3}' SICK.txt | sort | uniq | wc -l\n    6076\n% awk -F \"\\t\" -v OFS=\"\\n\" 'NR \u003e 1 {print $2,$3,$8,$9}' SICK.txt | sort | uniq | wc -l\n    7985\n#+END_EXAMPLE\n\n** data preparation\n\n   1. obtain the SICK.txt file (note that I made few manual editions\n      to FIX errors in the original SICK.txt)\n      \n   2. create the profiles running data/compact.sh\n      \n   3. process the profiles with ACE/ERG (see data/proc-profile.sh)\n\n   4. create the data/sample.txt from the sentences.txt\n      \n** grammar compilation\n\ngrammar compilation (trunk version):\n\n#+BEGIN_SRC bash\nace -g ~/hpsg/terg/ace/config.tdl -G erg.dat\n#+END_SRC\n\n** profile and fftb treebanking\n\nprofile construction:\n\n#+BEGIN_SRC bash\nmkprof -r ~/logon/lingo/erg/tsdb/gold/mrs/relations -i data/sample.txt data/golden\nart -a \"ace -g erg.dat -O --disable-generalization\" -f data/golden\n#+END_SRC\n\nwith ACE/PyDelphin:\n\n#+BEGIN_SRC bash\ndelphin mkprof --input sample.txt --relations ~/hpsg/logon/lingo/lkb/src/tsdb/skeletons/english/Relations --skeleton data/golden\ndelphin process data/golden -g erg.dat --full-forest --options='--disable-generalization'\n#+END_SRC\n\ntreebanking:\n\n#+BEGIN_SRC bash\nfftb -g erg.dat --webdir /usr/local/fftb/assets/ data/sample\n#+END_SRC\n\nThe annotation was done in aprox. 6 hours.\n\n** profile processing with ACE\n\n#+BEGIN_SRC bash\ndelphin process -g erg.dat -o \"-n 1\" -s data/golden data/parsed\n#+END_SRC\n\n** comparing the profiles\n\n#+BEGIN_SRC bash\n% delphin edm golden parsed\nPrecision:\t0.9637710992177851\n   Recall:\t0.9683557394002068\n  F-score:\t0.9660579799855565\n#+END_SRC\n\n\n** solver for underspecified scopes\n\n   We used https://github.com/coli-saar/utool to solve the\n   underspecified scopes of quantifiers. This process actually test\n   the consistency of the MRS structures.\n\n   Download utool and start the server with:\n\n   : java -Xmx8g -server -jar utool/Utool-3.4.jar server --logging --warmup\n\n   then execute:\n\n   : python solver.py \u003e solver.txt\n\n   \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farademaker%2Fsick","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Farademaker%2Fsick","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farademaker%2Fsick/lists"}