{"id":16894760,"url":"https://github.com/ddbourgin/bookworm_db","last_synced_at":"2026-05-05T20:34:07.001Z","repository":{"id":134549107,"uuid":"50977445","full_name":"ddbourgin/bookworm_db","owner":"ddbourgin","description":"Modifications to database construction scripts. Forked from https://github.com/Bookworm-project/BookwormDB","archived":false,"fork":false,"pushed_at":"2016-02-05T23:38:21.000Z","size":59,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-25T10:42:26.160Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ddbourgin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-02-03T05:51:50.000Z","updated_at":"2020-09-23T13:14:43.000Z","dependencies_parsed_at":"2023-05-10T22:45:50.593Z","dependency_job_id":null,"html_url":"https://github.com/ddbourgin/bookworm_db","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ddbourgin%2Fbookworm_db","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ddbourgin%2Fbookworm_db/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ddbourgin%2Fbookworm_db/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ddbourgin%2Fbookworm_db/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ddbourgin","download_url":"https://codeload.github.com/ddbourgin/bookworm_db/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244591485,"owners_count":20477710,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-13T17:19:51.061Z","updated_at":"2026-05-05T20:34:06.961Z","avatar_url":"https://github.com/ddbourgin.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# bookworm_db\nModifications to database construction scripts. Forked from https://github.com/Bookworm-project/BookwormDB\n\nThere are 4 major \"moving parts\" for the bookworm:\n\n1. *The bookworm data*\n  - This is handled by the code in the [`force_align`](https://github.com/ddbourgin/force_align) repo. \n  - Briefly, this entails downloading the audio, matching it to its transcript, transcribing it phonetically, and then organizing the two transcriptions into a bookworm-readable format\n\n2. *The bookworm database*\n  - This is handled by the code in the [`bookworm_db`](https://github.com/ddbourgin/bookworm_db) repo\n  - This code takes the bookworm data produced via the force_align code and organizes it into a SQL database for use with the bookworm API\n  * Important pieces of code here are\n    - `Makefile`                      -\u003e High level overview of database construction\n    - `bookworm/tokenizer.py`         -\u003e Contains the regexes used for tokenizing the bookworm data\n    - `bookworm/CreateDatabase.py`    -\u003e Includes rules and SQL calls for constructing the tables in the database\n    - `OneClick.py`                   -\u003e Calls the functions in CreateDatabase during db construction\n\n3. *The bookworm API*\n  - This is handled by the code in the [`bookworm_api`](https://github.com/ddbourgin/bookworm_api) repo\n  - This code is the interface between the bookworm browser/gui and the bookworm database as constructed using the code in bookworm_db\n  * Important pieces of code here\n    - `dbbindings.py`                 -\u003e This is the script that receives queries from the front-end, sends them along to the API, and returns the results\n    - `bookworm/general_API.py`       -\u003e The general API for organizing and parsing database queries. Makes use of the `userquery` class in `SQLAPI.py` to actually query the database.\n    - `bookworm/SQLAPI.py`            -\u003e Defines the `userqueries` class for querying the bookworm database and parsing the response\n\n4. *The bookworm GUI*\n  - This is handled by the code in the [`bookworm_gui`](https://github.com/ddbourgin/bookworm_gui) repo\n  - This is the front-end for the bookworm browser. The majority of the processing is handled in `bookworm_gui/static/js/a.js`\n  * Important pieces of code here are\n    - `index.html`\n    - `static/js/a.js`        -\u003e This is where the calls to the API are constructed; handles look + feel of the interface, as well as query highlighting and phoneme vs. word database selection (for now - this should be moved to server-side eventually).\n    - `static/options.json`   -\u003e The config file containing the default values for the front-end, as well as lookup tables for translating database ids into display names.\n\n\n## Workflow:\n1. Construct a bookworm data zip\n  - For the formatting requirements, refer to: https://bookworm-project.github.io/Docs/Requirements.html\n\n2. Initialize a server (I usually use the AWS EC2 Ubuntu free-tier). Ensure that permissions are set to allow unrestricted access to http and https ports. If the bookworm is large, make sure to allocate an appropriate swapfile to avoid segfaults during database construction.\n\n3. SSH in to the server and clone the [`bookworm_db`](https://github.com/ddbourgin/bookworm_db) repo into `/var/www/`:\n  ```shell\n  sudo apt-get install git #if you're using ubuntu\n  cd /var/www/\n  sudo git clone https://github.com/ddbourgin/bookworm_db.git\n  ```\n4. Make a directory `files` in `bookworm_db` and rename the `bookworm_db` directory to your bookworm database name. For example, if your bookworm DB is named `My_BW_DB_Name`, you would run \n  ```shell\n  sudo mkdir /var/www/bookworm_db/files\n  sudo mv /var/www/bookworm_db /var/www/My_BW_DB_Name\n  ```\n\n5. Run the script `deploy_bw.sh` in the renamed database directory. This will install the necessary bookworm dependencies and set up the MySQL server/config files for bookworm access.\n  ```shell\n  sudo sh My_BW_DB_Name/deploy_bw.sh\n  ```\n\n6. From the `/var/www/` directory, download the zip file containing the bookworm data you created in step 1. I typically upload the file to dropbox and use `wget` to download:\n  ```shell\n  cd /var/www/\n  sudo wget Link_to_Bookworm_Data_Zip\n  sudo unzip *.zip\n  sudo rm *.zip\n  ```\n7. Copy the `texts` and `metadata` folders in your unzipped `Bookworm_Data_Folder` to the files directory. \n  - We assume here that your data folder is organized as\n  ```\n  Bookworm_Data_Folder/\n    | -- texts/\n    |  | input.txt\n    | -- metadata/\n    |  | jsoncatalog.txt\n    |  | field_descriptions.json\n  ```\n  - If this is so, then you can simply run the following from the `/var/www/` directory\n  ```shell\n  sudo mv Bookworm_Data_Folder/files My_BW_DB_Name/tests/\n  sudo mv Bookworm_Data_Folder/metadata My_BW_DB_Name/metadata/\n  sudo rm -rf Bookworm_Data_Folder\n  ```\n8. To actually construct the database\n```shell\ncd /var/www/My_BW_DB_Name/\nsudo make all\n```\n9. Follow the on-screen instructions. If all has gone well, this will result in a completed Bookworm database\n\n##TODO:\n1. Add code for creating pause and word:pronunciation tables to `CreateDatabase.py`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fddbourgin%2Fbookworm_db","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fddbourgin%2Fbookworm_db","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fddbourgin%2Fbookworm_db/lists"}