https://github.com/sanchariii/c4gt-23-proposal
Implementation Of An Asynchronous Document Embedding Uploader API
https://github.com/sanchariii/c4gt-23-proposal
fastapi huggingface-transformers nlp-machine-learning uvicorn
Last synced: 13 days ago
JSON representation
Implementation Of An Asynchronous Document Embedding Uploader API
- Host: GitHub
- URL: https://github.com/sanchariii/c4gt-23-proposal
- Owner: Sanchariii
- Created: 2023-06-09T06:20:07.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-06-11T10:18:52.000Z (about 3 years ago)
- Last Synced: 2025-03-01T20:47:32.010Z (over 1 year ago)
- Topics: fastapi, huggingface-transformers, nlp-machine-learning, uvicorn
- Homepage:
- Size: 217 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# C4GT'23-Proposal
Proposal for [C4GT](https://www.codeforgovtech.in/) 2023 | Project Idea : Document Uploader | Date: 11 June 2023
# Synopsis
In today's technological environment, AI tools have become crucial. These tools make use of machine learning and artificial intelligence to improve decision-making, automate tasks, and draw conclusions from data. AI technologies provide a wide range of functions that streamline complex operations and spur innovation across industries, from natural language processing to computer vision. Machines are now capable of comprehending, analysing, and producing human language thanks to Natural Language Processing (NLP) techniques. They make text classification, sentiment analysis, and language translation easier.
The proposed idea leverages Python, CLI, Natural Language Processing (NLP), System Design, Data Hugging Transformers , FastAPI, UVicorn technologies to enable the development of an asynchronous Document Embedding Uploader API. AsyncDocEmbed is a cutting-edge project that seeks to transform document embedding and sharing. It offers a simple API for uploading files, creating asynchronous embeddings for document chunks, and allowing users to download the entire file whenever they want. Hugging Face Transformers and NLP approaches are combined to give AsyncDocEmbed users the ability to extract valuable representations from their documents and apply them to a variety of applications. Proposed implementation details and Scalable High-Level system Designs (HLDs) have been discussed in the proposal.
The asynchronous nature of uploading and processing document chunks underlies the project's main functionality. The API allows users to quickly upload document chunks, and safe temporary storage of the data is provided. The Hugging Face Transformers library is then used to process the uploaded chunks using cutting-edge NLP techniques like tokenization and embedding creation. By capturing the semantic meaning and context of the document chunks, these embeddings enable users to get insightful information and carry out sophisticated research.
Keywords: Asynchronous, Document Uploader, API, NLP, Scalable
You can check the full proposal here : [Sanchari Ray C4GT 2023 Proposal](https://drive.google.com/file/d/1N-aeZRIrF57aO3yKJRNPxS6in0xSMOAa/view?usp=sharing)
# Thanking Note:
I want to express my sincere gratitude to all the mentors involved and the entire community for giving me the chance to work on such a creative project. I'll make an effort to give this community everything I've got. Thank you for putting up with all of my questions while I was writing this proposal, and I appreciate you all providing mentorship and direction as we develop our knowledge and talents. I'm eager to learn new things and look forward to the beautiful trip that lies ahead.