{"id":13671154,"url":"https://github.com/rcdilorenzo/ecce","last_synced_at":"2026-01-18T06:01:20.693Z","repository":{"id":37595922,"uuid":"176339855","full_name":"rcdilorenzo/ecce","owner":"rcdilorenzo","description":"ML Prediction of Bible Topics and Passages (Python / React)","archived":false,"fork":false,"pushed_at":"2022-12-09T16:32:18.000Z","size":7583,"stargazers_count":49,"open_issues_count":34,"forks_count":13,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-04-27T14:38:40.787Z","etag":null,"topics":["data-science","fastapi","fully-connected-network","interactive-visualizations","keras-tensorflow","reactjs"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rcdilorenzo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-03-18T17:48:09.000Z","updated_at":"2025-02-05T04:35:22.000Z","dependencies_parsed_at":"2023-01-25T22:32:00.665Z","dependency_job_id":null,"html_url":"https://github.com/rcdilorenzo/ecce","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/rcdilorenzo/ecce","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rcdilorenzo%2Fecce","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rcdilorenzo%2Fecce/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rcdilorenzo%2Fecce/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rcdilorenzo%2Fecce/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rcdilorenzo","download_url":"https://codeload.github.com/rcdilorenzo/ecce/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rcdilorenzo%2Fecce/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28531991,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-18T00:39:45.795Z","status":"online","status_checked_at":"2026-01-18T02:00:07.578Z","response_time":98,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","fastapi","fully-connected-network","interactive-visualizations","keras-tensorflow","reactjs"],"created_at":"2024-08-02T09:01:00.979Z","updated_at":"2026-01-18T06:01:20.677Z","avatar_url":"https://github.com/rcdilorenzo.png","language":"JavaScript","funding_links":[],"categories":["JavaScript","data-science"],"sub_categories":[],"readme":"# Exploratory Core Concept Extraction (Ecce)\n\n![GPLv3](https://img.shields.io/badge/license-GPLv3-blue.svg)\n![last commit](https://img.shields.io/github/last-commit/rcdilorenzo/ecce.svg)\n\n[![Screenshot](https://user-images.githubusercontent.com/634167/56903950-46009900-6a6b-11e9-8d8e-51b6fdf21a4c.png)](https://ecce.rcd.ai)\n\n## Introduction\n\n_ecce_ = \"behold\" (Latin)\n\n\u003e Deuteronomy 5:24 (ESV)\n\u003e\n\u003e And you said, ‘Behold, the Lord our God has shown us his glory and greatness, and owe have heard\nhis voice out of the midst of the fire. This day we have seen God speak with man, and man still\nlive.\n\nFor thousands of years, people have studied the Bible from countless perspectives with diverse\napproaches towards various goals. As a Christian myself, I have read, discussed, and learned\nfrom it both in personal study and through others. With the plethora of related documents in\nthe form of commentaries, topical indexes, dictionaries, and cross-references, the Bible has\nbeen scoured from cover to cover throughout the ages.\n\nThe application of this project is two-fold. The first objective is to create a visual exploration of\nthe topics from the Bible. If time permits, this would be accomplished using an interactive\nwebsite that gives users a way to see related passages that were only previously linked in a\nmanual fashion. Second, the trained network will be used to predicting both related topics and\nScripture references from arbitrary text (similar in form to Bible verses).\n\n## Overview\n\nThis project is the intersection and analysis of three data sources: English\nStandard Version (ESV Bible translation), Nave's Topical Index, and Treasury of\nScripture Knowledge (TSK, cross-references). The actual data processing and\nentire flow of the project can be found in the [rendered notebook](ecce.ipynb).\nAdditional interactive exploratory data analysis can be found in [several React\ncomponents from the web app](https://ecce.rcd.ai/eda). The primary interaction\nin the web app flows through two models. The topic model combines ESV verse text\nwith a filtered list of Nave's topics (at least 30 verses per topic). The\ncluster model combines ESV verse text with the cross-references from TSK such\nthat groups of passages can be predicted.\n\n[![Image](https://user-images.githubusercontent.com/634167/57022702-e7b7ef80-6bfd-11e9-97c0-6f3705767adc.png)](https://ecce.rcd.ai/eda)\n\nIn addition, I presented this project in my final semester for an M.S. in Data\nScience. The slides I used to present can be found in the repository:\n\n[\u003cimg src=\"https://user-images.githubusercontent.com/634167/57033003-b2b99600-6c19-11e9-9857-21636c775c5f.png\" alt=\"Slides\" width=\"60%\" /\u003e](./ecce-slides.pdf)\n\n## Data Sources\n\n**English Standard Version.** Text from English Standard Version (2001) is\nemployed using JSON from [honza/bibles](https://github.com/honza/bibles). [All\ncopyrights remain with\nCrossway](https://www.esv.org/resources/esv-global-study-bible/copyright-page/).\u003csup\u003e1\u003c/sup\u003e\nPassages longer than three verses are truncated in the interface and link\ndirectly to BibleGateway.\n\n**Nave's Topical Index.** Topics were extracted from text files assembled by the\nfolks behind JustVerses.com from the original, public domain PDF. Although three\nlevels of data are available (topics, categories, and sub-topics), the primary\nfocus was the top-level topics with a total of ~4,200 topics that intersected\nwith verses available from the ESV.\n\n**Treasury of Scripture Knowledge.** Cross-references were also extracted from\ntext files downloaded from JustVerses.com from the original, public domain data.\nThese verses were associated with the ESV text by validating the references from\njust over 63,500 cross-reference clusters.\n\n## Topic Model\n\n![nave-diagram](https://user-images.githubusercontent.com/634167/56922117-72c9a600-6a95-11e9-96ba-a18e63bb0b9c.png)\n\n## Cluster (Passage) Model\n\n![tsk-diagram](https://user-images.githubusercontent.com/634167/56922159-8c6aed80-6a95-11e9-8cc5-9de40903d173.png)\n\n## Results\n\nBoth of the highest performing models ended up being extremely large\nfully-connected neural networks although multiple types of recurrent\narchitectures were explored (LSTMs and GRUs) with word embeddings from\n[GloVe](https://nlp.stanford.edu/projects/glove/). The topic model came in at\n435MB with 36,315,622 parameters with an input size of 13,337 and an output of\n853 topics. The cluster model was 2.3GB with 191,259,581 parameters with an\ninput size of 150 (truncated SVD of encoded word vocabulary) and an output of\n63,581 clusters of cross-references.\n\n### Topic Model\n\nUsing data from Nave's Topical Index (about ~4,200 without filtering), all of\nthe following model revisions were trained on 21,106 verses, validated on 3,725\nverses, and evaluated on 6,208 verses.\n\nName            |  Categorical Accuracy  |  Notes\n----------------|------------------------|-----------------------------------------------------------------\nlstm-base       |  2.95%                 |  sequence of words, no word embeddings, ~4200 possible topics\nlstm-b4cab4     |  5.72%                 |  tuned and tweaked, reduce to ~850 topics, word embeddings from glove.42B.300d (includes 92.55% of ESV words)\nsvd-bow-cb8915  |  6.91%                 |  switch to truncated SVD with bag-of-words\nsvd-bow-52a075  |  6.62%                 |  additional experiments, exclude top two topics\nsvd-bow-88bf90  |  8.21%                 |  make SVD 200 components (102% of last model size)\nsvd-bow-ced288  |  7.06%                 |  make SVD 150 components (200 was too big for initial production machine)\nnave-4576e8     |  13.61%                |  properly filter topics and remove SVD due to smaller model size, use vocabulary count vectorizer as direct input\n\n\n\n### Cluster Model\n\nThe cluster model was trained on cross-references from the Treasury of Scripture\nKnowledge . All of the following model revisions were trained on 20,837 verses,\nvalidated on 2,678 verses, and evaluated on 6,129 verses (70%-10%-20% split).\n\nName                |  Categorical Accuracy  |  Notes\n--------------------|------------------------|-----------------------------------------------------------\ntsk-cluster-87b509  |  0.25%                 |  initial fully-connected model\ntsk-cluster-f13345  |  0.33%                 |  add dropout layers and tweak architecture\ntsk-cluster-1d7203  |  1.05%                 |  fix verses to have multiple uuids\ntsk-cluster-26869f  |  1.14%                 |  add hidden layer and overfit with 10 patience epochs\ntsk-cluster-4e1698  |  1.16%                 |  make SVD 200 components (doubled model size)\ntsk-cluster-47f717  |  1.24%                 |  make SVD 150 components (200 was too big for production)\ntsk-cluster-8a1db9  |  1.32%                 |  change epoch patience to 2 instead of 3\n\n## Usage\n\nIf you're interested in running the project or extending the existing work, you'll need to do the following setup the first time.\n\n```bash\n# Download sources\n./download.sh\n\n# Install Python version and setup dependencies\npyenv install 3.6.8\npyenv virtualenv 3.6.8 $(cat .python-version)\npip install -r requirements.txt\n\n# Download spaCy model\npython -m spacy download en\n```\n\nWith this setup complete, some of the primary ways you'd want to\ninteract the code are provided by the command line utility that\nincludes documentation for each command.\n\n```\n❯ python -m ecce -h\nusage: __main__.py [-h]\n                   {nave-export,topic-export,train-nave,train-tsk,predict-nave,predict-tsk}\n                   ...\n\npositional arguments:\n  {nave-export,topic-export,train-nave,train-tsk,predict-nave,predict-tsk}\n    nave-export         Export processed data from Nave's Topical Index\n    topic-export        Preprocess topics and export with ESV text\n    train-nave          Train an neural network model on Nave data\n    train-tsk           Train cluster model on TSK data\n    predict-nave        (REPL) Predict topics based on text\n    predict-tsk         (REPL) Predict TSK clusters based on text\n\noptional arguments:\n  -h, --help            show this help message and exit\n```\n\n## Additional Information\n\nEcce: ML Prediction of Bible Topics and Passages\n\nCopyright (C) 2019 Christian Di Lorenzo\n\nThis program is free software: you can redistribute it and/or modify\nit under the terms of the GNU General Public License as published by\nthe Free Software Foundation, either version 3 of the License, or\n(at your option) any later version.\n\nThis program is distributed in the hope that it will be useful,\nbut WITHOUT ANY WARRANTY; without even the implied warranty of\nMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\nGNU General Public License for more details.\n\nYou should have received a copy of the GNU General Public License\nalong with this program.  If not, see \u003chttps://www.gnu.org/licenses/\u003e.\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n\u003csup\u003e1\u003c/sup\u003e *If you believe that the use of ESV text is in violation of\ncopyrights, please send me a direct message with your reasoning so that I can\nremain above board. My current understanding is that using the 2001 version is\nnot prohibitive in the manner I am using it assuming the entire application is\nopen, noncommercial, and not exposing entire books of the Bible.*\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frcdilorenzo%2Fecce","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frcdilorenzo%2Fecce","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frcdilorenzo%2Fecce/lists"}