{"id":23467391,"url":"https://github.com/ucsd-progsys/yunounderstand-data","last_synced_at":"2025-04-12T19:00:13.305Z","repository":{"id":148842948,"uuid":"91727005","full_name":"ucsd-progsys/yunounderstand-data","owner":"ucsd-progsys","description":"A collection of novice interactions with the OCaml top-level.","archived":false,"fork":false,"pushed_at":"2017-06-12T20:53:47.000Z","size":17327,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-12T18:59:56.556Z","etag":null,"topics":["functional-programming","homework","homework-problem","ocaml","user-study"],"latest_commit_sha":null,"homepage":null,"language":"OCaml","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ucsd-progsys.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-05-18T18:46:35.000Z","updated_at":"2021-03-29T14:44:51.000Z","dependencies_parsed_at":null,"dependency_job_id":"c2418bf4-f74c-4dab-a080-d6f7ccf07944","html_url":"https://github.com/ucsd-progsys/yunounderstand-data","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ucsd-progsys%2Fyunounderstand-data","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ucsd-progsys%2Fyunounderstand-data/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ucsd-progsys%2Fyunounderstand-data/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ucsd-progsys%2Fyunounderstand-data/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ucsd-progsys","download_url":"https://codeload.github.com/ucsd-progsys/yunounderstand-data/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248618280,"owners_count":21134200,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["functional-programming","homework","homework-problem","ocaml","user-study"],"created_at":"2024-12-24T12:31:34.011Z","updated_at":"2025-04-12T19:00:08.273Z","avatar_url":"https://github.com/ucsd-progsys.png","language":"OCaml","funding_links":[],"categories":[],"sub_categories":[],"readme":"A Collection of Novice Interactions with the OCaml Top-Level System [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.806813.svg)](https://doi.org/10.5281/zenodo.806813)\n==============\n\nWe recruited around 50 students each across two instances of the CSE 130\n(Undergraduate Programming Languages) course at UC San Diego (IRB #140608) \nto use an instrumented version of the [ocaml-top] editor, which logged\neach of their interactions with the top-level system while they worked\non the first three homework assignments.\n\n[ocaml-top]: https://www.typerex.org/ocaml-top.html\n\nWe have released this data as a [CC0] work, you are free to use it\nfor your own research or however else you please. We just ask that if\nyour use leads to a publication, please cite the dataset as follows.\n\n[CC0]: https://creativecommons.org/publicdomain/zero/1.0/\n\n``` bibtex\n@misc{yunounderstand,\n  author       = {Eric L Seidel and Ranjit Jhala},\n  title        = {A Collection of Novice Interactions with the {OCaml} {Top-Level} System},\n  month        = jun,\n  year         = 2017,\n  doi          = {10.5281/zenodo.806814},\n  url          = {https://doi.org/10.5281/zenodo.806814}\n}\n```\n\n\nRaw Data\n--------\n\nEach JSON file in `data/raw/{sp14,fa15}/` contains an object of the\nfollowing form on each line. Note that this means the file as a whole is\nNOT valid JSON, each line must be parsed individually as a JSON object.\n\n```\n{\n    \"file\": \"hw1.ml\" | \"hw2.ml\" | \"hw3.ml\",\n    \"time\": number,\n    \"body\": string,\n    \"cursor\": number,\n    \"event\": {\n        \"type\": \"abort\" | \"eval\" | \"stop\",\n        \"region\": {\n            \"start\": number,\n            \"stop\": number\n        }\n    },\n    \"ocaml\": [{\n        \"in\": string,\n        \"out\": string,\n        \"type\": \"scope\" | \"syntax\" | \"type\" | \"\",\n        \"min\": string\n    }]\n}\n```\n\nThe `\"file\"` field is the name of the file and should be one of\n`\"hw1.ml\"`, `\"hw2.ml\"`, or `\"hw3.ml\"`. The `\"time\"` field is the UNIX\ntimestamp of the event. The `\"body\"` field is the contents of the file\nat that point. The `\"cursor\"` field tracks the location of the cursor as\nan offset into the body.\n\nThe `\"event\"` field describes what type of event occurred. The\npossibilities for the `\"type\"` field are:\n\n- `\"abort\"`: the student aborted the current computation, i.e. sent a\n  SIGINT to the top-level.\n- `\"eval\"`: the student sent some definitions to be evaluated. In this\n  case the `\"event\"` field will also contain a `\"region\"` object with\n  `\"start\"` and `\"stop\"` offsets into the body, indicating what program\n  text was evaluated. \n- `\"stop\"`: the student restarted the top-level.\n\nIn the case of an `\"eval\"` event, the object will also contain an\n`\"ocaml\"` array, which contains the list of definitions that were sent\nto the OCaml top-level. Each item in the list is an object containing:\n\n- `\"in\"`: a single definition sent to OCaml.\n- `\"out\"`: OCaml's response. We only captured the error responses, so\n  this field will often be empty.\n- `\"type\"`: a classification of the response. `\"scope\"` indicates an\n  unbound variable error, `\"syntax\"` a syntax error, and `\"type\"` a type\n  error. The empty string implies there was no error.\n- `\"min\"`: a self-contained program with the **minimal** set of definitions. \n  This is extremely useful because the students were interacting with the OCaml\n  **interpreter** rather than the compiler, and would send individual (groups of)\n  definitions to the interpreter rather than compiling the entire file. Thus,\n  we cannot expect the contents of the `\"in\"` field to constitute a complete,\n  **closed** program.\n\n  We were not always able to produce such a minimal program, so this field \n  will sometimes be empty.\n  \n**NOTE:** the offsets into the body (`\"cursor\"`, `\"start\"`, and `\"stop\"`)\nare not always reliable, we have observed cases where they do not match up \nwith the actual text that was sent to OCaml.\n  \n\nDerived Data\n------------\n\nWe include two derived datasets.\n\n1. A collection of distinct, minimal (i.e. derived from the `\"min\"`\n   field above), ill-typed programs, located in\n   `data/derived/{sp14,fa15,comb}/prog`, extracted from the SP14\n   (resp. FA15 and combined) dataset. The programs are further grouped\n   into `cnstr` and `unify` folders. The `cnstr` folder contains\n   programs with errors that are explained pretty well by OCaml\n   (e.g. \"This constructor takes 3 arguments but only 2 were supplied\").\n   The `unify` folder contains more general unification errors, which is\n   the vast majority.\n   \n   For convenience, each `.ml` file is paired with a `.ml.out` file\n   containing the OCaml compiler's error message, and a `.orig.ml` file\n   containing the unminimized program (i.e. the `\"body\"` field above).\n   \n   These programs are produced by the python3 script `scripts/extract_programs.py`.\n   We provide Makefile targets `progs{-fa15,-sp14,-comb}` for convenience.\n\n2. A collection of ill-typed programs paired with their subsequent\n   **fixes**, located in `data/derived{sp14,fa15}/pairs.json`. These\n   files are again a sequence of JSON objects on each line, with the\n   following structure:\n   \n   ```\n   {\n       \"index\": number,\n       \"hw\": \"hw1\" | \"hw2\" | \"hw3\",\n       \"problem\": string,\n       \"bad\": string,\n       \"fix\": string\n   }\n   ```\n   \n   The `\"hw\"` and `\"problem\"` fields specify which homework and problem\n   the student was working on. The `\"bad\"` field contains the ill-typed\n   program, and the `\"fix\"` field contains the student's fix. We define\n   a \"fix\" to an ill-typed program as the first subsequent program the \n   student submitted to the interpreter that \n   (1) we can determine to be solving the same homework problem, and \n   (2) has the correct type. \n   Both `\"bad\"` and `\"fix\"` contain the minimal programs from the `\"min\"` field above.\n   \n   We determine which homework problem a program is solving by looking\n   at the names of the defined functions (this works because the\n   programs are already minimized). We determine whether a fix is valid\n   by checking that it has the expected type (we, of course, know the\n   expected types of all homework programs).\n\n   These programs are produced by the python3 script `scripts/extract_pairs.py`.\n   We provide Makefile targets `pairs{-fa15,-sp14}` for convenience.\n   **NOTE:** `extract_pairs.py` expects `ocaml` to be on your `PATH` as it will\n   check that the \"fixes\" typecheck against the expected type for each homework\n   problem.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fucsd-progsys%2Fyunounderstand-data","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fucsd-progsys%2Fyunounderstand-data","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fucsd-progsys%2Fyunounderstand-data/lists"}