{"id":14155080,"url":"https://github.com/nexus-stc/stc","last_synced_at":"2025-04-12T21:28:49.046Z","repository":{"id":156060941,"uuid":"626313678","full_name":"nexus-stc/stc","owner":"nexus-stc","description":"Distributed free search engine and AI tools that grant access to knowledge","archived":false,"fork":false,"pushed_at":"2023-11-24T07:18:14.000Z","size":2473,"stargazers_count":428,"open_issues_count":17,"forks_count":36,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-04-04T01:14:09.295Z","etag":null,"topics":["books","database","ipfs","knowledge","scholarly-articles","summa"],"latest_commit_sha":null,"homepage":"http://standard-template-construct.org","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nexus-stc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-11T08:09:43.000Z","updated_at":"2025-04-02T04:57:36.000Z","dependencies_parsed_at":"2024-08-17T08:04:40.364Z","dependency_job_id":"d31c6dd8-af3d-450b-a2dc-31fb35b4056d","html_url":"https://github.com/nexus-stc/stc","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nexus-stc%2Fstc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nexus-stc%2Fstc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nexus-stc%2Fstc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nexus-stc%2Fstc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nexus-stc","download_url":"https://codeload.github.com/nexus-stc/stc/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248634038,"owners_count":21136966,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["books","database","ipfs","knowledge","scholarly-articles","summa"],"created_at":"2024-08-17T08:01:48.312Z","updated_at":"2025-04-12T21:28:49.012Z","avatar_url":"https://github.com/nexus-stc.png","language":"TypeScript","funding_links":[],"categories":["TypeScript","OpenSource","books"],"sub_categories":["OpenSource Bots"],"readme":"# Standard Template Construct\n\nWelcome, developer!\nYou've arrived at the repository for [STC](https://libstc.cc), the library, search engine and AI tooling offering free access to academic knowledge and works of fictional literature.\n\n![](/web/public/favicon.svg)\n\n[STC](https://libstc.cc) | [Help Center](https://libstc.cc/#/help)\n\n## Getting Started\n\n- Explore our search features at [Web STC](https://libstc.cc), or through one of the Telegram bots listed in the bio of our [channel](https://t.me/nexus_search) (not an ad, just a safety)\n- [Discover](https://libstc.cc/#/help/replicate) how to set up your own STC instance, enabling you to enjoy the same search capabilities in your local environment\n- Learn about [how to access large corpus](/geck) of high-quality scholarly texts using Python and [use them in AI apps](/cybrex)\n\n## Details\n\nIn essence, STC is a search engine [Summa](https://github.com/izihawa/summa) coupled with databanks. \nThese databanks reside on [IPFS](https://ipfs.tech/) in a format that allows for searching without necessitating the download of the entire dataset. \nThe search engine library can function as a standalone server, an embeddable Python library (requiring no additional software!), and a WASM-compiled module that can be used in a browser.\nLast way allows to embed search engine in a static site that further can be deployed over IPFS too. This is how [Web STC](https://libstc.cc) is live.\n\nPutting everything to IPFS allows you to open STC in your browser or on your server and avoid the use of centralized servers that may lose or censor data.\n\n## Components\n\n- [Web STC](/web) is a browser-based interface with embedded search engine that can be entirely deployed on IPFS and used in browsers\n- [GECK](/geck) is a Python library and Bash tool for setting up and interacting with STC programmatically\n- [Cybrex AI](/cybrex) library pairs STC with AI tools such as OpenAI or free LLM for processing stored data\n- [STC Hub API](https://libstc.cc/#/help/stc-hub-api) is plain API for accessing scholarly publications by their DOIs through `kubo` command line tools or even through HTTP.\n- [Telegram Nexus Bot](/tgbot) allows users to access STC via Telegram, one of the most popular messaging platforms.\n\n## Roadmap\n\n| Part                | Task                                                                                                                                         | Description                                                                                                                                                                                                                           |\n|---------------------|----------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| Library Stewardship |                                                                                                                                              |                                                                                                                                                                                                                                       | \n|                     | ✅ Assimilation of LibGen corpus                                                                                                              | Transition of all items to `nexus_science`                                                                                                                                                                                            |\n|                     | 🚧 Assimilation of SciMag corpus                                                                                                             | Significant task of transferring scimag corpus to IPFS                                                                                                                                                                                |\n|                     | ✅ Structured content                                                                                                                         | Enhance GROBID extraction (headers + content) and store content in structured_content JSON column. Extract entities for cross-linking in Web STC                                                                                      |\n|                     | 🚧 Implementing classification ([articles](https://github.com/nexus-stc/stc/issues/12), [books](https://github.com/nexus-stc/stc/issues/13)) |                                                                                                                                                                                                                                       |\n| Web STC             |                                                                                                                                              |                                                                                                                                                                                                                                       |\n|                     | UX improvement                                                                                                                               | STC often requires loading of large data chunks, currently reflected only by a spinner. The UX needs improvement. Following structured content implementation, we can highlight headers and generate cross-links in abstracts/content |\n|                     | Enhancing availability                                                                                                                       | Further testing needed on diverse devices and networks                                                                                                                                                                                |\n|                     | Bookshelf                                                                                                                                    | STC has all tools for generating bookshelves that may offer users high-quality suggestions on read.                                                                                                                                   |\n| Cybrex AI           |                                                                                                                                              |                                                                                                                                                                                                                                       |\n|                     | First-class support of local LLM                                                                                                             | Extensive testing of prompts with documents is required to identify the smallest model capable of efficiently executing QA and summarization tasks. Most 13-15B models are currently failing (quantized, on CPU)                      |\n|                     | Building an embeddings dataset                                                                                                               | The goal is to build a comprehensive dataset with DOIs and document embeddings. Currently, the Instructor XL model appears most promising, but further testing is necessary                                                           |\n|                     | Refining and fixing metadata ([cleaning `content`](https://github.com/nexus-stc/stc/issues/14))                                              | Areas for improvement include: detected language, tags, keywords, automated abstracts, Dewey classification                                                                                                                           |\n|                     | Build QA on local LLM                                                                                                                        | Such a system should be independently operable and also accessible via Telegram.                                                                                                                                                      |\n|                     | Fine-tuning LLMs on STC                                                                                                                      |                                                                                                                                                                                                                                       |\n| Distribution        |                                                                                                                                              |                                                                                                                                                                                                                                       |\n|                     | Building STC Box                                                                                                                             | Develop and maintain a definitive guide and scripts for replicating and launching STC on compact devices like PI computers or TV Boxes                                                                                                |\n|                     | Global replication                                                                                                                           | The goal is to replicate STC (including the search database and papers) a minimum of 100 times across at least 30 countries                                                                                                           |\n|                     | Establishing Frontier Outposts                                                                                                               | Investigate strategies to replicate STC on an orbiting satellite or another planet in the solar system (Mars or Europa preferred)                                                                                                     |\n| Communities         |                                                                                                                                              |                                                                                                                                                                                                                                       |\n|                     | ✅ [Forming Science Communities on Telegram](https://t.me/+CVQ4OIRoU85hZDc8)                                                                  | Initiate the first version of Telegram-based forums focusing on specific scientific topics                                                                                                                                            |\n|                     | Addressing Copyright Issues                                                                                                                  | Organize more activities aimed at challenging the copyright laws for scholarly and educational writings                                                                                                                               |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnexus-stc%2Fstc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnexus-stc%2Fstc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnexus-stc%2Fstc/lists"}