{"id":49231359,"url":"https://github.com/acqdiv/acqdiv","last_synced_at":"2026-04-24T12:07:18.000Z","repository":{"id":57408004,"uuid":"221018208","full_name":"acqdiv/acqdiv","owner":"acqdiv","description":"Pipeline for the ACQDIV Corpus Database","archived":false,"fork":false,"pushed_at":"2021-01-26T13:39:32.000Z","size":2720,"stargazers_count":1,"open_issues_count":7,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-12-21T23:06:52.705Z","etag":null,"topics":["child-language","corpora","corpus-linguistics","cross-linguistic-data","databases","language-acquisition","linguistics","linguistics-databases","typology"],"latest_commit_sha":null,"homepage":"https://www.acqdiv.uzh.ch","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/acqdiv.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-11-11T16:05:22.000Z","updated_at":"2025-08-18T13:57:56.000Z","dependencies_parsed_at":"2022-09-26T17:10:54.301Z","dependency_job_id":null,"html_url":"https://github.com/acqdiv/acqdiv","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/acqdiv/acqdiv","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/acqdiv%2Facqdiv","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/acqdiv%2Facqdiv/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/acqdiv%2Facqdiv/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/acqdiv%2Facqdiv/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/acqdiv","download_url":"https://codeload.github.com/acqdiv/acqdiv/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/acqdiv%2Facqdiv/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32222535,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-24T10:26:35.452Z","status":"ssl_error","status_checked_at":"2026-04-24T10:25:27.643Z","response_time":64,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["child-language","corpora","corpus-linguistics","cross-linguistic-data","databases","language-acquisition","linguistics","linguistics-databases","typology"],"created_at":"2026-04-24T12:07:17.204Z","updated_at":"2026-04-24T12:07:17.989Z","avatar_url":"https://github.com/acqdiv.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ACQDIV\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3558643.svg)](https://doi.org/10.5281/zenodo.3558643)\n[![PyPI version](https://badge.fury.io/py/acqdiv.svg)](https://badge.fury.io/py/acqdiv)\n\n[![CircleCI](https://circleci.com/gh/acqdiv/acqdiv.svg?style=svg)](https://circleci.com/gh/acqdiv/acqdiv)\n\nThis repository contains the code and configuration files for transforming \nthe child language acquisition corpora into the ACQDIV database.\n\n## Publication\nIf you use the database in your reasearch, please cite as follows:  \n```\nJancso, Anna, Steven Moran, and Sabine Stoll.\n\"The ACQDIV Corpus Database and Aggregation Pipeline.\"\nProceedings of The 12th Language Resources and Evaluation Conference. 2020.\n```\n[Link to Paper](http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.20.pdf)\n\n\n## Resources\n\nDownload the ACQDIV database (only public corpora):\n\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3558641.svg)](https://doi.org/10.5281/zenodo.3558641)\n\nTo request access to the full database including the private corpora (for\nresearch purposes only!), \nplease refer to \n[Sabine Stoll](https://www.psycholinguistics.uzh.ch/en/stoll.html).\nIn case of technical questions, please open an issue on this repository.\n\n--------------\n\n## Corpora\n\nOur full database consists of the following corpora:\n\n| Corpus                                                                                                                    | ISO | Public | # Words   | \n|---------------------------------------------------------------------------------------------------------------------------|:---:|:------:|---------:| \n| Chintang Language Corpus                                                                                                  | ctn | no     | 987'673   | \n| [Cree Child Language Acquisition Study (CCLAS) Corpus](https://phonbank.talkbank.org/access/Other/Cree/CCLAS.html)        | cre | yes    | 44'751    | \n| [English Manchester Corpus](https://childes.talkbank.org/access/Eng-UK/Manchester.html)                                   | eng | yes    | 2'016'043  | \n| [MPI-EVA Jakarta Child Language Database](https://archive.mpi.nl/islandora/object/lat%253A1839_00_0000_0000_0022_6164_B)  | ind | yes    | 2'489'329  | \n| Allen Inuktitut Child Language Corpus                                                                                     | ike | no     | 71'191    | \n| [MiiPro Japanese Corpus](https://childes.talkbank.org/access/Japanese/MiiPro.html)                                        | jpn | yes    | 1'011'670  | \n| [Miyata Japanese Corpus](https://childes.talkbank.org/access/Japanese/Miyata.html)                                        | jpn | yes    | 373'021   | \n| Ku Waru Child Language Socialization Study                                                                                | mux | yes    | 65'723    | \n| [Sarvasy Nungon Corpus](https://childes.talkbank.org/access/Other/Nungon/Sarvasy.html)                                    | yuw | yes    | 19'659    | \n| Qaqet Child Language Documentation                                                                                        | byx | no     | 56'239    | \n| Stoll Russian Corpus                                                                                                      | rus | no     | 2'029'704  | \n| [Demuth Sesotho Corpus](https://childes.talkbank.org/access/Other/Sesotho/Demuth.html)                                    | sot | yes    | 177'963   | \n| Tuatschin Corpus                                                                                                          | roh | no     | 118'310   | \n| Koç University Longitudinal Language Development Database                                                                 | tur | no     | 1'120'077  | \n| Pfeiler Yucatec Child Language Corpus                                                                                     | yua | no     | 262'382   | \n| **Total**                                                                                                                 |     |        | **10'843'735** |\n\n--------------\n\n## Running the pipeline\n\nFor Windows users, follow the installation/run instructions here: [https://github.com/acqdiv/acqdiv/wiki/Installation-Run-instructions-for-Windows](https://github.com/acqdiv/acqdiv/wiki/Installation-Run-instructions-for-Windows)\n\nFor Mac and Linux user, continue here to run the pipeline yourself:\n\n### Install the package\n\nCreate a virtual environment [optional]:\n\n```shell script\npython3 -m venv venv\nsource venv/bin/activate\n```\n\nYou can install the package from PyPI or directly from source:\n\n**PyPI**\n\n`pip install acqdiv`\n\n**From source**\n\n```shell script\n# Clone Repository\ngit clone git@github.com:acqdiv/acqdiv.git\ncd acqdiv\n\n# Install package (for users!)\npip install .\n\n# Developer mode (for developers!)\npip install -r requirements.txt\n```\n\n### Get the corpora\n\nRun the following script to download the public corpora:\n\n`python util/download_public_corpora.py`\n\nThe corpora are in the folder `corpora`. \n\nFor the private corpora, either place the session files  in `corpora/\u003ccorpus_name\u003e/{cha|toolbox}/` \nand the metadata files (only Toolbox corpora) in `corpora/\u003ccorpus_name\u003e/imdi/` or \nedit the paths to those files in the `config.ini` (also see below).\n\n### Generate the database\n\nGet the configuration file `src/acqdiv/config.ini` and specify the absolute\npaths (without trailing slashes) for the corpora directory (`corpora_dir`) and \nthe directory where the database should be written to (`db_dir`):\n```ini\n[.global]\n# directory containing corpora\ncorpora_dir = /absolute/path/to/corpora/dir\n# directory where the database is written to\ndb_dir = /absolute/path/to/database/dir\n...\n```\n\nOptionally adapt the paths for the individual corpora (`sessions` and `metadata_dir`).\n\nRun the pipeline specifying the absolute path to the configuration file:  \n`acqdiv load -c /absolute/path/to/config.ini`\n\n### Generate the R object\n\nInstall dependencies\n```\n$ R\n\u003e install.packages(\"RSQLite\")\n\u003e install.packages(\"rlang\")\n```\n\nNavigate to `src/acqdiv/database` and run:\n```\nRscript sqlite_to_r.R /absolute/path/to/sqlite-DB\n```\n\n### Run tests\n\nRun the unittests:  \n`pytest tests/unittests`  \n\nRun the integrity tests on the database:  \n`pytest tests/systemtests`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Facqdiv%2Facqdiv","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Facqdiv%2Facqdiv","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Facqdiv%2Facqdiv/lists"}