{"id":13645117,"url":"https://github.com/benthecoder/classgpt","last_synced_at":"2025-04-21T13:31:49.041Z","repository":{"id":124124400,"uuid":"609343897","full_name":"benthecoder/ClassGPT","owner":"benthecoder","description":"ChatGPT for lecture slides","archived":false,"fork":false,"pushed_at":"2023-07-30T00:15:32.000Z","size":1206,"stargazers_count":209,"open_issues_count":0,"forks_count":29,"subscribers_count":9,"default_branch":"main","last_synced_at":"2024-04-24T08:31:02.603Z","etag":null,"topics":["chatgpt","gpt","langchain","llama-index","openai","python"],"latest_commit_sha":null,"homepage":"https://benneo.super.site/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/benthecoder.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-03-03T23:01:43.000Z","updated_at":"2024-04-09T06:23:37.000Z","dependencies_parsed_at":null,"dependency_job_id":"a6ee1ff8-5d09-4d8d-9323-097ca38d1029","html_url":"https://github.com/benthecoder/ClassGPT","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benthecoder%2FClassGPT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benthecoder%2FClassGPT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benthecoder%2FClassGPT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benthecoder%2FClassGPT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/benthecoder","download_url":"https://codeload.github.com/benthecoder/ClassGPT/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223867808,"owners_count":17216961,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatgpt","gpt","langchain","llama-index","openai","python"],"created_at":"2024-08-02T01:02:27.712Z","updated_at":"2024-11-09T18:30:18.766Z","avatar_url":"https://github.com/benthecoder.png","language":"Jupyter Notebook","funding_links":[],"categories":["Langchain"],"sub_categories":[],"readme":"# ClassGPT\n\n\u003e ChatGPT for my lecture slides\n\n\u003cimg width=\"1251\" alt=\"SCR-20230307-isgj\" src=\"https://user-images.githubusercontent.com/49143413/223467346-473681e4-6203-4f31-a1f1-253829d4768a.png\"\u003e\n\nBuilt with [Streamlit](https://github.com/streamlit/streamlit), powered by [LlamaIndex](https://github.com/jerryjliu/gpt_index) and [LangChain](https://github.com/hwchase17/langchain).\n\nUses the latest [ChatGPT API](https://platform.openai.com/docs/guides/chat) from [OpenAI](https://openai.com/).\n\nInspired by [AthensGPT](http://athensgpt.com/)\n\n## App Demo\n\nhttps://user-images.githubusercontent.com/49143413/222878151-42354446-5234-41fa-ad36-002dd74a5408.mp4\n\n## How this works\n\n1. Parses pdf with [pypdf](https://pypi.org/project/pypdf/)\n2. Index Construction with LlamaIndex's `GPTSimpleVectorIndex`\n   - the `text-embedding-ada-002` model is used to create embeddings\n   - see [vector store index](https://gpt-index.readthedocs.io/en/latest/guides/index_guide.html#vector-store-index) page to learn more\n   - here's a [sample index](notebooks/index.json)\n3. indexes and files are stored on s3\n4. Query the index\n   - uses the latest ChatGPT model `gpt-3.5-turbo`\n\n## Usage\n\n### Configuration and secrets\n\n1. configure aws ([quickstart](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html))\n\n```bash\n    aws configure\n```\n\n2. create an s3 bucket with a unique name\n\n3. Change the bucket name in the codebase (look for `bucket_name = \"classgpt\"` to whatever you created.\n\n4. rename [.env.local.example] to `.env` and add your openai credentials\n\n### Locally\n\n1. create python env\n\n```bash\n    conda create -n classgpt python=3.9\n    conda activate classgpt\n```\n\n2. install dependencies\n\n```bash\n    pip install -r requirements.txt\n```\n\n3. run streamlit app\n\n```bash\n    cd app/\n    streamlit run app/01_❓_Ask.py\n```\n\n### Docker\n\nAlternative, you can use Docker\n\n```bash\n    docker compose up\n```\n\nThen open up a new tab and navigate to \u003chttp://localhost:8501/\u003e\n\n## TODO\n\n- [ ] local mode for app (no s3)\n  - [ ] global variable use_s3 to toggle between local and s3 mode\n- [ ] deploy app to streamlit cloud\n  - [ ] have input box for openai key\n  - [ ] uses pyarrow local FS to store files\n- [ ] update code for new langchain update\n- [ ] Custom prompts and tweak settings\n  - [ ] create a settings page for tweaking model parameters and provide custom prompts [example](https://github.com/hayabhay/whisper-ui)\n- [ ] Add ability to query on multiple files\n  - [ ] Compose indices of multiple lectures and query on all of them\n  - [ ] loop through all existing index, create the ones that haven't been created, and compose them together\n  - references\n    - [Composability — LlamaIndex documentation](https://gpt-index.readthedocs.io/en/latest/how_to/composability.html)\n    - [gpt_index/ComposableIndices.ipynb](https://github.com/jerryjliu/gpt_index/blob/main/examples/composable_indices/ComposableIndices.ipynb)\n    - [Test Complex Queries over Multiple Documents](https://colab.research.google.com/drive/1IJAKd1HIe-LvFRQmd3BCDDIsq6CpOwBj?usp=sharing)\n\n## FAQ\n\n### Tokens\n\nTokens can be thought of as pieces of words. Before the API processes the prompts, the input is broken down into tokens. These tokens are not cut up exactly where the words start or end - tokens can include trailing spaces and even sub-words. Here are some helpful rules of thumb for understanding tokens in terms of lengths:\n\n- 1 token ~= 4 chars in English\n- 1 token ~= ¾ words\n- 100 tokens ~= 75 words\n- 1-2 sentence ~= 30 tokens\n- 1 paragraph ~= 100 tokens\n- 1,500 words ~= 2048 tokens\n\nTry the [OpenAI Tokenizer tool](https://platform.openai.com/tokenizer)\n\n[Source](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them)\n\n### Embeddings\n\nAn embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.\n\nFor `text-embedding-ada-002`, cost is $0.0004 / 1k tokens or 3000 pages/dollar\n\n- [Embeddings - OpenAI API](https://platform.openai.com/docs/guides/embeddings/use-cases)\n- [What Are Word and Sentence Embeddings?](https://txt.cohere.ai/sentence-word-embeddings/)\n\n### Models\n\nFor `gpt-3.5-turbo` model (ChatGPTAPI) cost is `$0.002 / 1K tokens`\n\nFor `text-davinci-003` model, cost is `$0.02 / 1K tokens`\n\n- [Chat completion - OpenAI API](https://platform.openai.com/docs/guides/chat)\n\n## References\n\n### Streamlit\n\n- [Increase upload limit of st.file_uploader](https://docs.streamlit.io/knowledge-base/deploy/increase-file-uploader-limit-streamlit-cloud)\n- [st.cache_resource - Streamlit Docs](https://docs.streamlit.io/library/api-reference/performance/st.cache_resource)\n- [Session State](https://docs.streamlit.io/library/api-reference/session-state)\n- [hayabhay/whisper-ui: Streamlit UI for OpenAI's Whisper](https://github.com/hayabhay/whisper-ui)\n\n### Deplyoment\n\n- [Streamlit Deployment Guide (wiki) - 🚀 Deployment - Streamlit](https://discuss.streamlit.io/t/streamlit-deployment-guide-wiki/5099)\n- [How to Deploy a streamlit application to AWS? Part-3](https://www.youtube.com/watch?v=Jc5GI3v2jtE)\n\n### LlamaIndex\n\n- [LlamaIndex Usage Pattern](https://gpt-index.readthedocs.io/en/latest/guides/usage_pattern.html#)\n- [Saving index](https://gpt-index.readthedocs.io/en/latest/guides/usage_pattern.html#optional-save-the-index-for-future-use)\n\nLoading data\n\n- [PDF Loader](https://llamahub.ai/l/file-pdf)\n- [llama-hub github repo](https://github.com/emptycrown/llama-hub/tree/main)\n- [document class](https://github.com/jerryjliu/gpt_index/blob/f07050b84309d53842a3552d3546e765012d168c/gpt_index/readers/schema/base.py#L4)\n- [PDFReader class](https://github.com/emptycrown/llama-hub/blob/main/loader_hub/file/pdf/base.py)\n\nmultimodal\n\n- [llama_index/Multimodal.ipynb at main](https://github.com/jerryjliu/llama_index/blob/main/examples/multimodal/Multimodal.ipynb)\n\nChatGPT\n\n- [gpt_index/SimpleIndexDemo-ChatGPT.ipynb](https://github.com/jerryjliu/gpt_index/blob/main/examples/vector_indices/SimpleIndexDemo-ChatGPT.ipynb)\n\n### Langchain\n\n- [gpt_index/LangchainDemo.ipynb](https://github.com/jerryjliu/gpt_index/blob/main/examples/langchain_demo/LangchainDemo.ipynb)\n- [OpenAIChat](https://langchain.readthedocs.io/en/latest/modules/llms/integrations/openaichat.html)\n\n### Boto3\n\n- [boto3 file_upload does it check if file exists](https://stackoverflow.com/questions/44978426/boto3-file-upload-does-it-check-if-file-exists)\n- [Boto 3: Resource vs Client](https://www.learnaws.org/2021/02/24/boto3-resource-client/)\n- [Writing json to file in s3 bucket](https://stackoverflow.com/questions/46844263/writing-json-to-file-in-s3-bucket)\n\n### Docker stuff\n\n- [amazon web services - What is the best way to pass AWS credentials to a Docker container?](https://stackoverflow.com/questions/36354423/what-is-the-best-way-to-pass-aws-credentials-to-a-docker-container)\n- [docker-compose up failing due to: error: can't find Rust compiler · Issue #572 · acheong08/ChatGPT](https://github.com/acheong08/ChatGPT/issues/572)\n- [linux - When installing Rust toolchain in Docker, Bash `source` command doesn't work](https://stackoverflow.com/questions/49676490/when-installing-rust-toolchain-in-docker-bash-source-command-doesnt-work)\n- [software installation - How to install a package with apt without the \"Do you want to continue [Y/n]?\" prompt? - Ask Ubuntu](https://askubuntu.com/questions/523962/how-to-install-a-package-with-apt-without-the-do-you-want-to-continue-y-n-p)\n- [How to use sudo inside a docker container?](https://stackoverflow.com/questions/25845538/how-to-use-sudo-inside-a-docker-container)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenthecoder%2Fclassgpt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbenthecoder%2Fclassgpt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenthecoder%2Fclassgpt/lists"}