https://github.com/divakarkumarp/phi-3-vision-ms-multimodal
Phi-3-Vision-128K-Instruct Demo
https://github.com/divakarkumarp/phi-3-vision-ms-multimodal
phi-3-vision python
Last synced: 2 months ago
JSON representation
Phi-3-Vision-128K-Instruct Demo
- Host: GitHub
- URL: https://github.com/divakarkumarp/phi-3-vision-ms-multimodal
- Owner: divakarkumarp
- License: mit
- Created: 2024-06-06T17:31:27.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2024-06-08T12:39:08.000Z (11 months ago)
- Last Synced: 2025-01-22T08:13:18.324Z (4 months ago)
- Topics: phi-3-vision, python
- Language: Jupyter Notebook
- Homepage:
- Size: 604 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Phi-3-Vision-Microsoft-Multimodal
Microsoft Phi-3 Vision-the first Multimodal model By Microsoft, a multimodal model that brings together language and vision capabilities. the multimodal version comes with 128K context length (in tokens) it can support. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures.
[Demo with Huggingface🤗](https://github.com/divakarkumarp/Phi-3-Vision-MS-Multimodal/blob/main/Phi_3_vision_128k_instruct.ipynb)

* Hugging Face🤗 : [click-1](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct?library-transformers)
* Hugging Face🤗 : [click-2](https://huggingface.co/gradientai/Llama-3-8B-Instruct-Gradient-1048k)
* Hugging Face🤗 : [click-3](https://huggingface.co/docs/transformers/main/en/model_doc/llama3)