{"id":28224524,"url":"https://github.com/incept5/ai-learning","last_synced_at":"2025-06-18T17:35:01.936Z","repository":{"id":290948921,"uuid":"976077864","full_name":"Incept5/ai-learning","owner":"Incept5","description":"This repository contains a collection of demos and examples for learning about Natural Language Processing (NLP) and Large Language Models (LLMs). It includes examples of tokenization, embeddings, sentiment analysis, and more","archived":false,"fork":false,"pushed_at":"2025-05-01T13:32:29.000Z","size":164,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-06-12T07:43:11.635Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Incept5.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-01T13:00:30.000Z","updated_at":"2025-05-01T13:32:32.000Z","dependencies_parsed_at":"2025-05-01T14:38:39.938Z","dependency_job_id":"300c1419-ee23-4891-82bf-f504736e840b","html_url":"https://github.com/Incept5/ai-learning","commit_stats":null,"previous_names":["incept5/ai-learning"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Incept5/ai-learning","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Incept5%2Fai-learning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Incept5%2Fai-learning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Incept5%2Fai-learning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Incept5%2Fai-learning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Incept5","download_url":"https://codeload.github.com/Incept5/ai-learning/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Incept5%2Fai-learning/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260599146,"owners_count":23034452,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-05-18T09:12:08.109Z","updated_at":"2025-06-18T17:34:56.917Z","avatar_url":"https://github.com/Incept5.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# NLP and LLM Learning Repository\n\nThis repository contains a collection of demos and examples for learning about Natural Language Processing (NLP) and Large Language Models (LLMs). It includes examples of tokenization, embeddings, sentiment analysis, and more.\n\n## Repository Structure\n\nThe repository is organized into the following main sections:\n\n### Tokenization\n\nThe `tokenization` directory contains demos and examples related to tokenization, which is the process of breaking text into tokens that can be processed by language models.\n\n- [TOKENIZATION_MODELS.md](tokenization/TOKENIZATION_MODELS.md) - Guide to different tokenization models\n- [llm-tokenization-demo.py](tokenization/llm-tokenization-demo.py) - Comprehensive demo comparing GPT-2 and BERT tokenizers\n- [simple_tokenization.py](tokenization/simple_tokenization.py) - Basic tokenization using BERT tokenizer\n- [simple_tokenization_02.py](tokenization/simple_tokenization_02.py) - Comparison between GPT-2 and OPT tokenizers\n- [tiktoken_demo.py](tokenization/tiktoken_demo.py) - Demo of OpenAI's tiktoken library\n- [sentencepiece_demo.py](tokenization/sentencepiece_demo.py) - Demo of Google's SentencePiece tokenizer\n- [tokenization_comparison.py](tokenization/tokenization_comparison.py) - Comparison of different tokenization approaches\n\n### Embeddings\n\nThe `embeddings` directory contains demos and examples related to text embeddings, which are vector representations of text that capture semantic meaning.\n\n- [EMBEDDING_MODELS.md](embeddings/EMBEDDING_MODELS.md) - Guide to different embedding models\n- [embeddings_demo.py](embeddings/embeddings_demo.py) - Demo using SentenceTransformers models\n- [huggingface_embeddings_demo.py](embeddings/huggingface_embeddings_demo.py) - Demo using Hugging Face Transformers models\n- [openai_embeddings_demo.py](embeddings/openai_embeddings_demo.py) - Demo using OpenAI's embedding models\n- [ollama_embedding.py](embeddings/ollama_embedding.py) - Demo using Ollama's local embedding models\n\n### Other Demos\n\n- [gp2_demo.py](gp2_demo.py) - Demo showcasing GPT-2 limitations and hallucinations\n- [sentiment_01.py](sentiment_01.py) - Simple sentiment analysis using a pre-trained model\n\n## Setup Instructions\n\n### Prerequisites\n\n- Python 3.8 or higher\n- pip (Python package installer)\n\n### Installation\n\n1. Clone this repository:\n   ```bash\n   git clone https://github.com:Incept5/ai-learning.git\n   cd ai-learning\n   ```\n\n2. Create and activate a virtual environment (recommended):\n   ```bash\n   python -m venv venv\n   \n   # On Windows\n   venv\\Scripts\\activate\n   \n   # On macOS/Linux\n   source venv/bin/activate\n   ```\n\n3. Install the required packages:\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n4. Set up environment variables (for API-based demos):\n   ```bash\n   # Copy the example .env file\n   cp .env.example .env\n   \n   # Edit the .env file with your API keys\n   # Get your OpenAI API key from: https://platform.openai.com/account/api-keys\n   # Get your Hugging Face token from: https://huggingface.co/settings/tokens\n   ```\n\n## Running the Demos\n\n### Tokenization Demos\n\n```bash\n# Basic tokenization demo\npython tokenization/simple_tokenization.py\n\n# Comprehensive tokenization comparison\npython tokenization/llm-tokenization-demo.py\n\n# OpenAI's tiktoken demo\npython tokenization/tiktoken_demo.py\n\n# SentencePiece demo\npython tokenization/sentencepiece_demo.py\n```\n\n### Embedding Demos\n\n```bash\n# List available SentenceTransformers models\npython embeddings/embeddings_demo.py --list-models\n\n# Run embedding demo with a specific model\npython embeddings/embeddings_demo.py --model all-mpnet-base-v2\n\n# OpenAI embeddings demo (requires API key)\npython embeddings/openai_embeddings_demo.py\n\n# Hugging Face embeddings demo\npython embeddings/huggingface_embeddings_demo.py\n\n# Ollama embeddings demo (requires Ollama installation)\npython embeddings/ollama_embedding.py\n```\n\n### Other Demos\n\n```bash\n# GPT-2 limitations demo\npython gp2_demo.py\n\n# Sentiment analysis demo\npython sentiment_01.py\n```\n\n## Additional Resources\n\n- [Hugging Face Transformers Documentation](https://huggingface.co/docs/transformers/index)\n- [SentenceTransformers Documentation](https://www.sbert.net/)\n- [OpenAI API Documentation](https://platform.openai.com/docs/api-reference)\n- [Ollama Documentation](https://ollama.com/docs)\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fincept5%2Fai-learning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fincept5%2Fai-learning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fincept5%2Fai-learning/lists"}