{"id":23646897,"url":"https://github.com/maxidonkey/delphihuggingface","last_synced_at":"2026-01-26T23:29:29.664Z","repository":{"id":269644937,"uuid":"906989414","full_name":"MaxiDonkey/DelphiHuggingFace","owner":"MaxiDonkey","description":"The Hugging Face API wrapper for Delphi leverages cutting-edge models to deliver powerful features, including object detection, music generation, text classification, sentiment analysis, image segmentation, speech-to-text transcription, and text generation. ","archived":false,"fork":false,"pushed_at":"2025-01-06T14:09:12.000Z","size":681,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-06T15:22:47.611Z","etag":null,"topics":["api-wrapper","audio-classification","bert","chatbot","delphi","gpt","huggingface","image-classification","image-prompting","music-generation","object-detection","text-classification"],"latest_commit_sha":null,"homepage":"","language":"Pascal","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MaxiDonkey.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-22T14:04:00.000Z","updated_at":"2025-01-06T14:07:03.000Z","dependencies_parsed_at":null,"dependency_job_id":"69749135-f793-49e3-a270-0af730a81b7a","html_url":"https://github.com/MaxiDonkey/DelphiHuggingFace","commit_stats":null,"previous_names":["maxidonkey/delphihuggingface"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaxiDonkey%2FDelphiHuggingFace","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaxiDonkey%2FDelphiHuggingFace/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaxiDonkey%2FDelphiHuggingFace/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MaxiDonkey%2FDelphiHuggingFace/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MaxiDonkey","download_url":"https://codeload.github.com/MaxiDonkey/DelphiHuggingFace/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239599078,"owners_count":19665911,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api-wrapper","audio-classification","bert","chatbot","delphi","gpt","huggingface","image-classification","image-prompting","music-generation","object-detection","text-classification"],"created_at":"2024-12-28T13:47:31.174Z","updated_at":"2025-11-12T03:30:16.691Z","avatar_url":"https://github.com/MaxiDonkey.png","language":"Pascal","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Delphi Hugging Face API\n\n___\n![GitHub](https://img.shields.io/badge/IDE%20Version-Delphi%2010.3/11/12-yellow)\n![GitHub](https://img.shields.io/badge/platform-all%20platforms-green)\n![GitHub](https://img.shields.io/badge/Updated%20the%2012/22/2024-blue)\n\n\u003cbr/\u003e\n\u003cbr/\u003e\n\n- [Introduction](#Introduction)\n    - [Resources available on Hugging Face Hub](#Resources-available-on-Hugging-Face-Hub)\n    - [Serverless Inference API](#Serverless-Inference-API)\n    - [Advantages of using Hugging Face Hub](#Advantages-of-using-Hugging-Face-Hub)\n    - [Rate Limits and Supported Models](#Rate-Limits-and-Supported-Models)\n    - [Licenses and Compliance](#Licenses-and-Compliance)\n    - [Tutorial content](#Tutorial-content)\n- [Remarks](#remarks)\n- [Tools for simplifying this tutorial](#Tools-for-simplifying-this-tutorial)\n- [Asynchronous callback mode management](#Asynchronous-callback-mode-management)\n- [Exploration Journey](#Exploration-Journey)\n    - [Initialization](#initialization)\n    - [Hugging Face Models Overview](#Hugging-Face-Models-Overview)\n        - [Model inference WARM COLD](#Model-inference-WARM-COLD)\n    - [Music-gen](#Music-gen)\n    - [Image object detection](#Image-object-detection)\n    - [Text To Sentiment analysis](#Text-To-Sentiment-analysis)\n    - [Audio classification](#Audio-classification)\n        - [Speech emotion recognition](#speech-emotion-recognition)\n        - [Gender recognition](#Gender-recognition)\n    - [Image classification](#Image-classification)\n    - [Image Segmentation](#Image-Segmentation)\n    - [Zero-Shot classification](#Zero-Shot-classification)\n    - [Token Classification](#Token-Classification)\n    - [Question Answering](#Question-Answering)\n    - [Table Question Answering](#Table-Question-Answering)\n    - [Fill-mask](#Fill-mask)\n    - [Text Classification](#Text-Classification)\n    - [Summarization](#Summarization)\n- [Common Ground Functionalities Across API Ecosystems](#Common-Ground-Functionalities-Across-API-Ecosystems)\n    - [Embeddings](#Embeddings)\n    - [Chat](#Chat)\n        - [Multi Turn Conversation](#Multi-Turn-Conversation)\n        - [Streamed Multi Turn Conversation](#Streamed-Multi-Turn-Conversation)\n        - [Vision](#Vision)\n        - [Use tools](#Use-tools)\n    - [Text Generation](#Text-Generation)\n    - [Translation](#Translation)\n    - [Image Generation](#Image-Generation)\n    - [Text-to-Speech](#Text-to-Speech)\n    - [Automatic Speech Recognition](#Automatic-Speech-Recognition)\n- [Contributing](#contributing)\n- [License](#license)\n \n\u003cbr/\u003e\n\u003cbr/\u003e\n\n\n# Introduction\n\n**Hugging Face Hub** is an open-source collaborative platform dedicated to democratizing access to artificial intelligence (AI) technologies. This platform hosts a vast collection of models, datasets, and interactive applications, facilitating the exploration, experimentation, and integration of AI solutions into various projects.\n[Official page](https://huggingface.co/docs/hub/index)\n\n## Resources available on Hugging Face Hub\n\n- **Models:** The Hub offers a multitude of pre-trained models covering domains such as natural language processing (NLP), computer vision, and audio recognition. These models are suited for various tasks, including text generation, classification, object detection, and speech transcription. \n- **Datasets:** A diverse library of datasets is available for training and evaluating your own models, providing a foundation for developing customized solutions. \n- **Spaces:** The Hub hosts interactive applications that allow you to visualize and test models directly from a browser. These spaces are useful for demonstrating model capabilities or conducting quick analyses. \n\n\u003cbr/\u003e\n\n## Serverless Inference API\n\nHugging Face Hub offers a Inference API, enabling rapid integration of AI models into your projects without the need for complex infrastructure management.\n\n\u003cbr/\u003e\n\n## Advantages of using Hugging Face Hub\n\n- **Time-saving:** Models are ready to use, eliminating the need to train or deploy them locally, which accelerates the development of applications.\n- **Scalability:** The Hub's infrastructure ensures automatic scaling, load balancing, and efficient caching.\n\n\u003cbr/\u003e\n\nIn summary, **Hugging Face Hub** is a resource for integrating AI models into projects. With its serverless Inference API and collection of ready-to-use resources, it offers an solution to enhance applications with AI capabilities while simplifying their implementation and maintenance.\n\n\u003cbr/\u003e\n\n## Rate Limits and Supported Models\n\nBy subscribing, you gain access to thousands of models. You can explore the benefits of individual, professional, and enterprise subscriptions by following the links below:\n\n- [Rate limits](https://huggingface.co/docs/api-inference/rate-limits)\n- [Supported models](https://huggingface.co/docs/api-inference/supported-models)\n\n\u003cbr/\u003e\n\n## Licenses and Compliance\n\nWhen integrating models or datasets from **Hugging Face Hub** into your projects, it is crucial to pay close attention to the associated licenses. Every resource hosted on the platform comes with a specific license that outlines the terms of use, modification, and distribution. A thorough understanding of these licenses is essential to ensure the legal and ethical compliance of your developments.\n\n**Why is this important?**\n\n- **Legal compliance:** Using a resource without adhering to its license terms can lead to legal violations, exposing your project to potential risks.\n- **Respect for creators' rights:** Licenses protect the rights of creators. By respecting them, you acknowledge and honor their work.\n- **Transparency and ethics:** Following the conditions of licenses promotes responsible and ethical use of open-source technologies.\n\nRefer to the `Model Card` or `Dataset Card` for each model or dataset used in your application.\n\n\u003cbr/\u003e\n\n## Tutorial content\n\nThe **Hugging Face Hub** provides open-source libraries such as `Transformers`, enables integration with `Gradio`, and offers evaluation tools like `Evaluate`. However, these aspects will not be covered in this tutorial, as they are beyond the scope of this document.\n\nInstead, this tutorial will focus on using the APIs with Delphi, highlighting key features such as image and sound classification, music generation (`music-gen`), sentiment analysis, object detection in images, image segmentation, and all natural language processing (NLP) functions.\n\n\u003cbr/\u003e\n\n# Remarks\n\n\u003e [!IMPORTANT]\n\u003e\n\u003e This is an unofficial library. **Hugging Face** does not provide any official library for `Delphi`.\n\u003e This repository contains `Delphi` implementation over [Hugging Face](https://huggingface.co/docs/api-inference) public API.\n\n\u003cbr/\u003e\n\n# Tools for simplifying this tutorial\n\nTo simplify the example codes provided in this tutorial, I have included two units in the source code: `VCL.Stability.Tutorial` and `FMX.Stability.Tutorial`. Depending on the option you choose to test the provided source code, you will need to instantiate either the `TVCLStabilitySender` or `TFMXStabilitySender` class in the application's `OnCreate` event, as follows:\n\n\u003e[!TIP]\n\u003e```Pascal\n\u003e//uses VCL.HuggingFace.Tutorial;\n\u003e\n\u003e  HFTutorial := TVCLHuggingFaceSender.Create(Memo1, Image1, Image2, MediaPlayer1);\n\u003e```\n\u003e\n\u003eor\n\u003e\n\u003e```Pascal\n\u003e//uses FMX.HuggingFace.Tutorial;\n\u003e\n\u003e  HFTutorial := TFMXHuggingFaceSender.Create(Memo1, Image1, Image2, MediaPlayer1);\n\u003e```\n\u003e\n\nMake sure to add a `TMemo`, two `TImage` and a `TMediaPlayer` component to your form beforehand.\n\n\u003cbr/\u003e\n\n# Asynchronous callback mode management\n\nIn the context of asynchronous methods, for a method that does not involve streaming, callbacks use the following generic record: `TAsynCallBack\u003cT\u003e = record` defined in the `HuggingFace.Async.Support.pas` unit. This record exposes the following properties:\n\n```Pascal\n   TAsynCallBack\u003cT\u003e = record\n   ... \n       Sender: TObject;\n       OnStart: TProc\u003cTObject\u003e;\n       OnSuccess: TProc\u003cTObject, T\u003e;\n       OnError: TProc\u003cTObject, string\u003e; \n```\n\u003cbr/\u003e\n\nFor methods requiring streaming, callbacks use the generic record `TAsynStreamCallBack\u003cT\u003e = record`, also defined in the `HuggingFace.Async.Support.pas` unit. This record exposes the following properties:\n\n```Pascal\n   TAsynCallBack\u003cT\u003e = record\n   ... \n       Sender: TObject;\n       OnStart: TProc\u003cTObject\u003e;\n       OnSuccess: TProc\u003cTObject\u003e;\n       OnProgress: TProc\u003cTObject, T\u003e;\n       OnError: TProc\u003cTObject, string\u003e;\n       OnCancellation: TProc\u003cTObject\u003e;\n       OnDoCancel: TFunc\u003cBoolean\u003e;\n```\n\nThe name of each property is self-explanatory; if needed, refer to the internal documentation for more details.\n\n\u003cbr/\u003e\n\n# Exploration Journey\n\nThis part of this document is designed to reflect the path I took while uncovering the features and possibilities of `Hugging Face Hub APIs`. Rather than presenting a rigid tutorial, I chose to structure it as an **Exploration Journey** to capture the iterative, curious, and hands-on process of discovery. Each step builds on the previous one, showcasing not only what I found but how I approached and learned from the API ecosystem.\"\n\n\n## Initialization\n\nTo initialize the API instance, you need to [obtain an API key from Hugging Face](https://huggingface.co/settings/tokens).\n\nOnce you have a token, you can initialize the `IHuggingFace` interface, which serves as the entry point to the API.\n\n\u003e [!NOTE]\n\u003e```Pascal\n\u003euses HuggingFace;\n\u003e\n\u003evar HuggingFace := THuggingFaceFactory.CreateInstance(API_KEY);\n\u003e```\n\nWhen accessing the `list of models` or retrieving the `description of a specific model`, a different endpoint is used than the API endpoint. To instantiate this interface, use the following code:\n\n```Pascal\nuses HuggingFace;\n\nvar HFHub := THuggingFaceFactory.CreateInstance(API_KEY, True);\n```\n\n\u003e[!Warning]\n\u003e To use the examples provided in this tutorial, especially to work with asynchronous methods, I recommend defining the HuggingFace interface with the widest possible scope.\n\u003e\u003cbr/\u003e\n\u003e So, set `HuggingFace := THuggingFaceFactory.CreateInstance(My_Key);` in the `OnCreate` event of your application.\n\u003e\u003cbr\u003e\n\u003eWhere `HuggingFace: IHuggingFace;`\n\n\u003cbr/\u003e\n\n## Hugging Face Models Overview\n\nA filtered list of models can be obtained directly from the [playground](https://huggingface.co/spaces/enzostvs/hub-api-playground) or access to search models page on [web site.](https://huggingface.co/models) \n\u003cbr/\u003e\u003cbr/\u003e\nUsing **Delphi**, this list can also be retrieved programmatically. To support filtering, the `TFetchParams` class, implemented in the `HuggingFace.Hub.Support` unit, must be used. This class accurately mirrors all parameters supported by the `/api/models` endpoint.\n\n\n\u003cbr/\u003e\n\n**Synchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; \n\n  var Models := HFHub.Hub.FetchModels(HFTutorial.UrlNext,\n    procedure (Params: TFetchParams)\n    begin\n      Params.Limit(50);\n      Params.Filter('eng,text-generation');\n    end);\n  try\n    Display(HFTutorial, Models);\n  finally\n    Models.Free;\n  end;\n```\n\n- **Remark :** A paginated result will be returned, containing 50 models per page. \nThe `HFTutorial.UrlNext` variable will store the URL of the next page. By re-executing this code, the next 50 results will be retrieved and displayed.\n\n\u003cbr/\u003e\n\n**Asynchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; \n\n  HFHub.Hub.FetchModels(HFTutorial.UrlNext,\n    procedure (Params: TFetchParams)\n    begin\n      Params.Limit(50);\n      Params.Filter('text-to-audio');\n    end,\n    function : TAsynModels\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnSuccess := Display;\n      Result.OnError := Display;\n    end);\n```\n\n\u003e[!TIP]\n\u003e The filter parameter queries the `Tags` field in the models' JSON format. Use a comma to separate different `Tags` values to include them in the same filter.\n\u003e\n\n\u003cbr/\u003e\n\nTo visualize a model's data, utilize its model ID with the FetchModel method :\n\n```Pascal\n  //Synchronously\n  function FetchModel(const RepoId: string): TRepoModel; overload;\n\n  //Asynchronously\n  procedure FetchModel(const RepoId: string; CallBacks: TFunc\u003cTAsynRepoModel\u003e); overload;\n```\n\n\u003cbr/\u003e\n\n### Model inference WARM COLD\n\nThe ML ecosystem evolves rapidly, and the Inference API provides access to models highly valued by the community, selected based on their recent popularity (likes, downloads, and usage). As a result, the available models may be replaced at any time without prior notice. Hugging Face strives to keep the most recent and popular models ready for immediate use.\n\nThe following distinctions are made:\n\n- **Warm models:** models that are ready to use.\n- **Cold models:** models that require loading before use.\n- **Frozen models:** models currently unavailable for use via the API.\n\nWhen invoking a model in the `COLD` state, it needs to be reloaded, which may result in a 503 error. In this case, you must wait before retrying the request with the same model.\nTo avoid the 503 error and wait for the model to reload and transition to the `WARM` state, you can add the following line of code:\n\n```Pascal\n  HuggingFace.WaitForModel := True;\n```\n\nNote : By default, the value of `WaitForModel` is set to False.\n\nRefer to [official documentation](https://huggingface.co/docs/api-inference/parameters)\n\n\u003cbr/\u003e\n\n## Music-gen\n\n[MusicGen](https://huggingface.co/facebook/musicgen-small) is a text-to-music model capable of generating high-quality music samples conditioned on text descriptions or audio prompts.\n\n**Asynchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; \n\n  HuggingFace.UseCache := False;  //Disable caching\n  HuggingFace.WaitForModel := True;  //Enable waiting for model reloading\n  HFTutorial.FileName := 'music.mp3';\n\n  HuggingFace.Text.TextToAudio(\n    procedure (Params: TTextToAudioParam)\n    begin\n      Params.Model('facebook/musicgen-small');\n      Params.Inputs('Pop music style with bass guitar');\n    end,\n    function : TAsynTextToSpeech\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnStart := Start;\n      Result.OnSuccess := Display;\n      Result.OnError := Display;\n    end);\n```\n\n\u003cbr/\u003e\n\n## Image object detection\n\nFor more details about the `object-detection` task, check out its [dedicated page](https://huggingface.co/tasks/object-detection)! You will find examples and related materials.\n\n\u003e[!NOTE]\n\u003e In the field of `Object Detection`, over 2,913 pre-trained models are available. \n\u003e\n\n[DEtection TRansformer (DETR) model](https://huggingface.co/facebook/detr-resnet-50) trained end-to-end on COCO 2017 object detection (118k annotated images).\nThe DETR model is an encoder-decoder transformer with a convolutional backbone.\n\n**Asynchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; \n\n  var ImageFilePath := 'Z:\\My_Folder\\Images\\My_Image.jpg';\n  HFTutorial.LoadImageFromFile(ImageFilePath);\n  HuggingFace.WaitForModel := True;\n\n  HuggingFace.Image.ObjectDetection(\n    procedure (Params: TObjectDetectionParam)\n    begin\n      Params.Model('facebook/detr-resnet-50');\n      Params.Inputs(ImageFilePath);\n    end,\n    function : TAsynObjectDetection\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnSuccess := Display;\n      Result.OnError := Display;\n    end);\n```\n\n![Object detection](/../main/images/ObjectDetection.png?raw=true \"Object detection\")\n\n\u003cbr/\u003e\n\n## Text To Sentiment analysis\n\nThis is a [RoBERTa-base model](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest) trained on ~124M tweets from January 2018 to December 2021, and finetuned for sentiment analysis with the TweetEval benchmark. \n\n- **Reference Paper:** [TimeLMs paper](https://arxiv.org/abs/2202.03829).\n- **Git Repo:** [TimeLMs official repository](https://github.com/cardiffnlp/timelms).\n\nLabels: 0 -\u003e Negative; 1 -\u003e Neutral; 2 -\u003e Positive\n\n**Asynchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; \n\n  HuggingFace.WaitForModel := True;\n\n  HuggingFace.Text. SentimentAnalysis(\n    procedure (Params: TSentimentAnalysisParams)\n    begin\n      Params.Model('cardiffnlp/twitter-roberta-base-sentiment-latest');\n      Params.Inputs('Today is a great day');\n    end,\n    function : TAsynSentimentAnalysis\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnSuccess := Display;\n      Result.OnError := Display;\n    end);\n```\n\n\u003cbr/\u003e\n\n## Audio classification\n\nFor more details about the `audio-classification` task, check out its [dedicated page](https://huggingface.co/tasks/audio-classification)! You will find examples and related materials.\n\n\u003cbr/\u003e\n\n\u003e[!NOTE]\n\u003e In the field of `Audio Classification`, over 2,859 pre-trained models are available. \n\u003e\n\n### Speech emotion recognition\n\n[Speech Emotion Recognition By Fine-Tuning Wav2Vec 2.0](https://huggingface.co/ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition) \u003cbr/\u003e\nThe model is a fine-tuned version of `jonatasgrosman/wav2vec2-large-xlsr-53-english` for a Speech Emotion Recognition (SER) task.\n\nThe dataset used to fine-tune the original pre-trained model is the RAVDESS dataset. This dataset provides 1440 samples of recordings from actors performing on 8 different emotions in English, which are:\n\n```Python\n  emotions = ['angry', 'calm', 'disgust', 'fearful', 'happy', 'neutral', 'sad', 'surprised']\n```\n\n**Asynchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; \n\n  HuggingFace.WaitForModel := True;\n\n  HuggingFace.Audio.Classification(\n    procedure (Params: TAudioClassificationParam)\n    begin\n      Params.Model('ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition');\n      Params.Inputs('SpeechRecorded.wav');\n    end,\n    function : TAsynAudioClassification\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnSuccess := Display;\n      Result.OnError := Display;\n    end);\n```\n\n\u003cbr/\u003e\n\n### Gender recognition\n\n[wav2vec2-large-xlsr-53-gender-recognition-librispeech](https://huggingface.co/alefiury/wav2vec2-large-xlsr-53-gender-recognition-librispeech) \u003cbr/\u003e\u003cbr/\u003e\nThis model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on Librispeech-clean-100 for gender recognition.\n\n**Asynchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; \n\n  HuggingFace.WaitForModel := True;\n\n  HuggingFace.Audio.Classification(\n    procedure (Params: TAudioClassificationParam)\n    begin\n      Params.Model('alefiury/wav2vec2-large-xlsr-53-gender-recognition-librispeech');\n      Params.Inputs('SpeechRecorded.wav');\n    end,\n    function : TAsynAudioClassification\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnSuccess := Display;\n      Result.OnError := Display;\n    end);\n```\n\n\u003cbr/\u003e\n\n## Image classification\n\nFor more details about the `image-classification` task, check out its [dedicated page](https://huggingface.co/tasks/image-classification)! You will find examples and related materials.\n\n\u003e[!NOTE]\n\u003e In the field of `image classification`, over 15,000 pre-trained models are available.\n\u003e\n\n[ResNet-50 v1.5](https://huggingface.co/microsoft/resnet-50) \u003cbr/\u003e\nResNet model pre-trained on ImageNet-1k at resolution 224x224. It was introduced in the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by He et al.\n\nResNet (Residual Network) is a convolutional neural network that democratized the concepts of residual learning and skip connections. This enables to train much deeper models.\n\n**Asynchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; \n\n  var ImageFilePath := 'images\\tiger.jpg';\n  HFTutorial.LoadImageFromFile(ImageFilePath);\n  HuggingFace.WaitForModel := True;\n\n  HuggingFace.Image.Classification(\n    procedure (Params: TImageClassificationParam)\n    begin\n      Params.Model('microsoft/resnet-50');\n      //Params.Model('google/vit-base-patch16-224');  //Can be used too\n      Params.Inputs(ImageFilePath);\n    end,\n    function : TAsynImageClassification\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnSuccess := Display;\n      Result.OnError := Display;\n    end);\n```\n\n[Vision Transformer (base-sized model)](https://huggingface.co/google/vit-base-patch16-224)\nVision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224, and fine-tuned on ImageNet 2012 (1 million images, 1,000 classes) at resolution 224x224. It was introduced in the paper An Image is Worth 16x16 Words: [Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Dosovitskiy et al. and first released in this repository. \n\n\u003cbr/\u003e\n\n## Image Segmentation\n\nFor more details about the `image-segmentation` task, check out its [dedicated page](https://huggingface.co/tasks/image-segmentation)! You will find examples and related materials.\n\n\u003e[!NOTE]\n\u003e In the field of `image segmentation`, over 1,093 pre-trained models are available. Each model is distinguished by specific skills.\n\u003e\n\n[openmmlab/upernet-convnext-small](https://huggingface.co/openmmlab/upernet-convnext-small) \u003cbr/\u003e\nUperNet framework for semantic segmentation, leveraging a ConvNeXt backbone. UperNet was introduced in the paper [Unified Perceptual Parsing for Scene Understanding](https://arxiv.org/abs/1807.10221) by Xiao et al.\n\n**Asynchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; \n\n  var ImageFilePath := 'images\\tiger.jpg';\n  HFTutorial.LoadImageFromFile(ImageFilePath);\n  HuggingFace.WaitForModel := True;\n\n  HuggingFace.Image.Segmentation(\n    procedure (Params: TImageSegmentationParam)\n    begin\n      Params.Model('openmmlab/upernet-convnext-small');\n      Params.Inputs(ImageFilePath);\n    end,\n    function : TAsynImageSegmentation\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnSuccess := Display;\n      Result.OnError := Display;\n    end);\n```\n\n![Image segmentation](/../main/images/ImageSegmentation.png?raw=true \"Image segmentation\")\n\n\u003cbr/\u003e\n\nOther models that you can easily test. It is up to you to choose the most suitable image:\n- [jonathandinu/face-parsing](https://huggingface.co/jonathandinu/face-parsing)\n- [nvidia/segformer-b1-finetuned-cityscapes-1024-1024](https://huggingface.co/nvidia/segformer-b1-finetuned-cityscapes-1024-1024)\n- [google/deeplabv3_mobilenet_v2_1.0_513](https://huggingface.co/google/deeplabv3_mobilenet_v2_1.0_513)\n- [facebook/mask2former-swin-large-cityscapes-semantic](https://huggingface.co/facebook/mask2former-swin-large-cityscapes-semantic)\n\n\u003cbr/\u003e\n\n## Zero-Shot classification\n\nFor more details about the `zero-shot-classification` task, check out its [dedicated page](https://huggingface.co/tasks/zero-shot-classification)! You will find examples and related materials.\n\n\u003e[!NOTE]\n\u003e In the field of `Zero-shot classification`, over 337 pre-trained models are available. \n\u003e\n\n[facebook/bart-large-mnli](https://huggingface.co/facebook/bart-large-mnli) \u003cbr/\u003e\nThis is the checkpoint for bart-large after being trained on the MultiNLI (MNLI) dataset.\n\nAdditional information about this model:\n- The [bart-large](https://huggingface.co/facebook/bart-large) model page\n- BART: [Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461)\n\n**Asynchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; \n\n  HuggingFace.WaitForModel := True;\n\n  HuggingFace.Text.ZeroShotClassification(\n    procedure (Params: TZeroShotClassificationParam)\n    begin\n      Params.Model('facebook/bart-large-mnli');\n      Params.Inputs('Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!');\n      Params.Parameters(\n        procedure (var Params: TZeroShotClassificationParameters)\n        begin\n          Params.CandidateLabels(['refund', 'legal', 'faq'])\n        end);\n    end,\n    function : TAsynZeroShotClassification\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnSuccess := Display;\n      Result.OnError := Display;\n    end);\n```\n\nOther models that you can easily test.\n- [valhalla/distilbart-mnli-12-9](https://huggingface.co/valhalla/distilbart-mnli-12-9)\n- [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli)\n\n\u003cbr/\u003e\n\n## Token Classification\n\nFor more details about the `token-classification` task, check out its [dedicated page](https://huggingface.co/tasks/token-classification)! You will find examples and related materials.\n\n\u003e[!NOTE]\n\u003e In the field of `Zero-shot classification`, over 20,755 pre-trained models are available. \n\u003e\n\n[FacebookAI/xlm-roberta-large-finetuned-conll03-english](https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll03-english) \u003cbr/\u003e\nThe model can be used for token classification, a natural language understanding task in which a label is assigned to some tokens in a text. \u003cbr/\u003e\nSee [associated paper](https://arxiv.org/abs/1911.02116)\n\n**Asynchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; \n\n  HuggingFace.WaitForModel := True;\n\n  HuggingFace.Text.TokenClassification(\n    procedure (Params: TTokenClassificationParam)\n    begin\n      Params.Model('FacebookAI/xlm-roberta-large-finetuned-conll03-english');\n      //Params.Model('dslim/bert-base-NER');  //Can be used too\n      Params.Inputs('My name is Sarah Jessica Parker but you can call me Jessica');\n    end,\n    function : TAsynTokenClassification\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnSuccess := Display;\n      Result.OnError := Display;\n    end);\n```\n\n\u003cbr/\u003e\n\n## Question Answering\n\nFor more details about the `question-answering` task, check out its [dedicated page](https://huggingface.co/tasks/question-answering)! You will find examples and related materials.\n\n\u003e[!NOTE]\n\u003e In the field of `Question Answering`, over 12,683 pre-trained models are available. \n\u003e\n\n[deepset/roberta-base-squad2](https://huggingface.co/deepset/roberta-base-squad2) \u003cbr/\u003e\nThis is the [roberta-base model](https://huggingface.co/FacebookAI/roberta-base), fine-tuned using the [SQuAD2.0 dataset](https://huggingface.co/datasets/rajpurkar/squad_v2). It's been trained on question-answer pairs, including unanswerable questions, for the task of Extractive Question Answering. \u003cbr/\u003e\nSee [associated paper](https://arxiv.org/abs/1907.11692)\n\n\u003cbr/\u003e\n\n**Asynchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; \n\n  HuggingFace.WaitForModel := True;\n\n  HuggingFace.Text.QuestionAnswering(\n    procedure (Params: TQuestionAnsweringParam)\n    begin\n      Params.Model('deepset/roberta-base-squad2');\n      Params.Inputs('What is my name?', 'My name is Clara and I live in Berkeley.');\n      Params.Parameters(\n        procedure (var Params: TQuestionAnsweringParameters)\n        begin\n          Params.TopK(3);\n        end);\n    end,\n    function : TAsynQuestionAnswering\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnSuccess := Display;\n      Result.OnError := Display;\n    end);\n```\n\n\u003cbr/\u003e\n\n## Table Question Answering\n\nFor more details about the `table-question-answering` task, check out its [dedicated page](https://huggingface.co/tasks/table-question-answering)! You will find examples and related materials.\n\n\u003e[!NOTE]\n\u003e In the field of `Table Question Answering`, over 133 pre-trained models are available. \n\u003e\n\n\u003cbr/\u003e\n\n[google/tapas-base-finetuned-wtq](https://huggingface.co/google/tapas-base-finetuned-wtq) \u003cbr/\u003e\n[TAPAS](https://github.com/google-research/tapas) is a BERT-like transformers model pretrained on a large corpus of English data from Wikipedia in a self-supervised fashion. This means it was pretrained on the raw tables and associated texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. \n\n\n**Asynchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; \n\n  HuggingFace.WaitForModel := True;\n\n  HuggingFace.Text.TableQuestionAnswering(\n    procedure (Params: TTableQAParam)\n    begin\n      Params.Model('google/tapas-base-finetuned-wtq');\n      Params.Inputs(\n        'How many stars does the tokenizers repository have?',\n        [ TRow.Create('Repository', ['Transformers', 'Datasets', 'Tokenizers']),\n          TRow.Create('Stars', ['36542', '4512', '3934']),\n          TRow.Create('Contributors', ['651', '77', '34']),\n          TRow.Create('Programming language',\n             [ 'Python',\n               'Python',\n               'Rust, Python and NodeJS'\n             ])\n        ]);\n    end,\n    function : TAsynTableQA\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnSuccess := Display;\n      Result.OnError := Display;\n    end);\n```\n\n\u003cbr/\u003e\n\n## Fill-mask\n\nFor more details about the `fill-mask` task, check out its [dedicated page](https://huggingface.co/tasks/fill-mask)! You will find examples and related materials.\n\n\u003e[!NOTE]\n\u003e In the field of `Fill-mask`, over 13,570 pre-trained models are available. \n\u003e\n\n[google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased) \u003cbr/\u003e\nPretrained model on English language using a masked language modeling (MLM) objective. It was introduced in [this paper](https://arxiv.org/abs/1810.04805) and first released in [this repository](https://github.com/google-research/bert). This model is uncased: it does not make a difference between english and English.\n\n\u003cbr/\u003e\n\n\n**Asynchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; \n\n  HuggingFace.API.WaitForModel := True;\n\n  HuggingFace.Mask.Fill(\n    procedure (Params: TMaskParam)\n    begin\n      Params.Model('google-bert/bert-base-uncased');\n      Params.Inputs('The answer to the universe is [MASK].');\n      Params.Parameters(['infinite', 'big', 'amazing', 'no', '42']);\n    end,\n    function : TAsynMask\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnSuccess := Display;\n      Result.OnError := Display;\n    end);\n```\n\n\u003cbr/\u003e\n\n## Text Classification\n\nFor more details about the `text-classification` task, check out its [dedicated page](https://huggingface.co/tasks/text-classification)! You will find examples and related materials.\n\n\u003e[!NOTE]\n\u003e In the field of `Text Classification`, over 77,280 pre-trained models are available. \n\u003e\n\n\u003cbr/\u003e\n\n[distilbert/distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english) \u003cbr/\u003e\nThis model is a fine-tune checkpoint of DistilBERT-base-uncased, fine-tuned on SST-2. This model reaches an accuracy of 91.3 on the dev set (for comparison, Bert bert-base-uncased version reaches an accuracy of 92.7). \u003cbr/\u003e\nFor more details about DistilBERT, we encourage to check out this [model card](https://huggingface.co/distilbert/distilbert-base-uncased).\n\n**Asynchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; \n\n  HuggingFace.WaitForModel := True;\n\n  HuggingFace.Text.TextClassification(\n    procedure (Params: TTextClassificationParam)\n    begin\n      Params.Model('distilbert/distilbert-base-uncased-finetuned-sst-2-english');\n      Params.Inputs('I like you. I love you.');\n    end,\n    function : TAsynTextClassification\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnSuccess := Display;\n      Result.OnError := Display;\n    end);\n```\nThis code example returns positive or negative depending on the meaning of the prompt.\n\n- Use the model : [papluca/xlm-roberta-base-language-detection](https://huggingface.co/papluca/xlm-roberta-base-language-detection) as a language detector.\n- Use the model: [cardiffnlp/twitter-roberta-base-sentiment-latest](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest) for sentiment analysis.\n\n\u003cbr/\u003e\n\n## Summarization\n\nSummarization is the task of producing a shorter version of a document while preserving its important information. Some models can extract text from the original input, while other models can generate entirely new text.\n\nFor more details about the `summarization` task, check out its [dedicated page](https://huggingface.co/tasks/summarization)! You will find examples and related materials.\n\n\u003e[!NOTE]\n\u003e In the field of `Summarization`, over 2,130 pre-trained models are available. \n\u003e\n\n\u003cbr/\u003e\n\n[facebook/bart-large-cnn](https://huggingface.co/facebook/bart-large-cnn) \u003cbr/\u003e\nBART is a transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. BART is pre-trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text.\n\n**Asynchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; \n\n  HuggingFace.WaitForModel := True;\n\n  HuggingFace.Text.Summarization(\n    procedure (Params: TSummarizationParam)\n    begin\n      Params.Model('facebook/bart-large-cnn');\n      Params.Inputs('The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct.');\n    end,\n    function : TAsynSummarization\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnSuccess := Display;\n      Result.OnError := Display;\n    end);\n```\n\n\u003cbr/\u003e\n\n# Common Ground Functionalities Across API Ecosystems\n\nIn the previous chapter, **Exploration Journey**, I walked through the unique features of `Hugging Face Hub APIs`, focusing on what makes them stand out. As I kept exploring, I noticed some strong overlaps with other platforms like `OpenAI`, `Anthropic`, and `Gemini`. That’s where Common Ground comes in. This chapter is about zooming out to look at those shared functionalities and seeing how these ecosystems stack up against each other. By focusing on what they have in common, we can get a clearer picture of the API landscape as a whole.\n\n\u003cbr/\u003e\n\n## Embeddings\n\nFeature extraction is the task of converting a text into a vector (often called “embedding”).\n\n**Example applications:**\n- Retrieving the most relevant documents for a query (for RAG applications).\n- Reranking a list of documents based on their similarity to a query.\n- Calculating the similarity between two sentences.\n\nFor more details about the `Embeddings` task, check out its [dedicated page](https://huggingface.co/tasks/feature-extraction)! You will find examples and related materials.\n\n\u003e[!NOTE]\n\u003e In the field of `Embeddings` over 7,400 pre-trained models are available. \n\u003e\n\n\u003cbr/\u003e\n\n[mixedbread-ai/mxbai-embed-large-v1](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) : Produce sentence embeddings.\n\n**Asynchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; \n\n  HuggingFace.API.WaitForModel := True;\n\n  HuggingFace.Embeddings.Create(\n    procedure (Params: TEmbeddingParams)\n    begin\n      Params.Model('mixedbread-ai/mxbai-embed-large-v1');\n      Params.Inputs('Today is a sunny day and I will get some ice cream.');\n    end,\n    function : TAsynEmbeddings\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnSuccess := Display;\n      Result.OnError := Display;\n    end);\n```\n\n\u003cbr/\u003e\n\n## Chat\n\nGenerate responses in a conversational context using a list of messages as input. This capability supports both conversational Language Models (LLMs) and Vision-Language Models (VLMs), bridging text-based and `image-to-text` functionalities. It is a specialized subtask within [`text generation`](https://huggingface.co/docs/api-inference/tasks/text-generation) and [`image-text-to-text`](https://huggingface.co/docs/api-inference/tasks/image-text-to-text) processing.\n\nRecommended Models :\n\nConversational Large Language Models (LLMs)\n- [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it): A robust text-generation model optimized for instruction following.\n- [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct): A highly capable model for generating text and adhering to instructions.\n- [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct): A compact yet efficient text-generation model.\n- [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct): A reliable model for text generation and instruction compliance.\n\nConversational Vision-Language Models (VLMs)\n- [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct): A powerful vision-language model with excellent capabilities in visual comprehension and reasoning.\n- [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct): A strong model designed for image-text-to-text tasks.\n\n\u003cbr/\u003e\n\n### Multi Turn Conversation\n\nGenerate text based on a prompt. For more details about the `text-generation` task, check out its [dedicated page](https://huggingface.co/tasks/text-generation)! You will find examples and related materials.\n\n\u003e[!NOTE]\n\u003e In the field of `text-generation` over 163,600 pre-trained models are available. \n\u003e\n\n**Synchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; \n\n  HuggingFace.UseCache := False;\n\n  var Chat := HuggingFace.Chat.Completion(\n    procedure (Params: TChatPayload)\n    begin\n      Params.Model('microsoft/Phi-3-mini-4k-instruct');\n      Params.Messages([\n         TPayload.User('Hello'),\n         TPayload.Assistant('Great to meet you. What would you like to know?'),\n         TPayload.User('I have two dogs in my house. How many paws are in my house?')\n      ]);\n      Params.MaxTokens(1024);\n    end);\n  try\n    Display(Memo1, Chat);\n  finally\n    Chat.Free;\n  end;\n```\n\n\u003cbr/\u003e\n\n**Asynchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; \n\n  HuggingFace.UseCache := False;\n\n  HuggingFace.Chat.Completion(\n    procedure (Params: TChatPayload)\n    begin\n      Params.Model('microsoft/Phi-3-mini-4k-instruct');\n      Params.Messages([\n         TPayload.User('Hello'),\n         TPayload.Assistant('Great to meet you. What would you like to know?'),\n         TPayload.User('I have two dogs in my house. How many paws are in my house?')\n      ]);\n      Params.MaxTokens(1024);\n    end,\n    function : TAsynChat\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnSuccess := Display;\n      Result.OnError := Display;\n    end);\n```\n\n\u003cbr/\u003e\n\n### Streamed Multi Turn Conversation\n\n**Synchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; \n  \n  HuggingFace.UseCache := False;\n\n  HuggingFace.Chat.CompletionStream(\n    procedure (Params: TChatPayload)\n    begin\n      Params.Model('microsoft/Phi-3.5-mini-instruct');\n      Params.Messages([\n         TPayload.User('Hello'),\n         TPayload.Assistant('Great to meet you. What would you like to know?'),\n         TPayload.User('I have two dogs in my house. How many paws are in my house?')\n      ]);\n      Params.Stream(True);\n      Params.MaxTokens(1024);\n    end,\n    procedure (var Chat: TChat; IsDone: Boolean; var Cancel: Boolean)\n    begin\n      if Assigned(Chat) and not IsDone then\n        begin\n          DisplayStream(HFTutorial, Chat);\n          Application.ProcessMessages;\n        end;\n    end);\n```\n\n\u003cbr/\u003e\n\n**Asynchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; \n\n  HuggingFace.UseCache := False;\n\n  HuggingFace.Chat.CompletionStream(\n    procedure (Params: TChatPayload)\n    begin\n      Params.Model('microsoft/Phi-3.5-mini-instruct');\n      Params.Messages([\n         TPayload.User('Hello'),\n         TPayload.Assistant('Great to meet you. What would you like to know?'),\n         TPayload.User('I have two dogs in my house. How many paws are in my house?')\n      ]);\n      Params.Stream(True);\n      Params.MaxTokens(1024);\n    end,\n    function : TAsynChatStream\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnProgress := DisplayStream;\n      Result.OnError := DisplayStream;\n    end);  \n```\n\n\u003cbr/\u003e\n\n### Vision\n\nModels that combine image and text inputs, often referred to as `vision-language` models (VLMs), generate text outputs based on both an image and a text prompt. Unlike traditional `image-to-text` models, which are primarily designed for specific tasks like image captioning, VLMs incorporate an additional layer of versatility by accepting text prompts. Some of these models are even trained to process entire conversations as input, enabling a broader range of applications.\n\nFor more details about the `image-text-to-text` task, check out its [dedicated page](https://huggingface.co/tasks/image-text-to-text)! You will find examples and related materials.\n\n\u003e[!NOTE]\n\u003e In the field of `image-text-to-text` over 5,750 pre-trained models are available. \n\u003e\n\n\u003cbr/\u003e\n\n**Synchronously streamed code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; \n\n  HuggingFace.UseCache := False;\n  var ImageFilePath := 'https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg';\n\n  HuggingFace.Chat.CompletionStream(\n    procedure (Params: TChatPayload)\n    begin\n      Params.Model('meta-llama/Llama-3.2-11B-Vision-Instruct');\n      Params.Messages([TPayload.User('Describe the image ?', [ImageFilePath])]);\n      Params.Stream(True);\n      Params.MaxTokens(1024);\n    end,\n    procedure (var Chat: TChat; IsDone: Boolean; var Cancel: Boolean)\n    begin\n      if Assigned(Chat) and not IsDone then\n        begin\n          DisplayStream(HFTutorial, Chat);\n          Application.ProcessMessages;\n        end;\n    end);\n```\n\n\u003cbr/\u003e\n\n**Asynchronously streamed code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; \n\n  HuggingFace.UseCache := False;\n  var ImageFilePath := 'https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg';\n  \n    HuggingFace.Chat.CompletionStream(\n    procedure (Params: TChatPayload)\n    begin\n      Params.Model('meta-llama/Llama-3.2-11B-Vision-Instruct');\n      Params.Messages([TPayload.User('Describe the image ?', [ImageFilePath])]);\n      Params.Stream(True);\n      Params.MaxTokens(1024);\n    end,\n    function : TAsynChatStream\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnProgress := DisplayStream;\n      Result.OnError := DisplayStream;\n    end);\n```\n\n\u003cbr/\u003e\n\n### Use tools\n\nWhat is the weather in Paris ?\n\nThe tool schema used :\n```Json\n  {\n    \"type\": \"object\",\n    \"properties\": {\n         \"location\": {\n             \"type\": \"string\",\n             \"description\": \"The city and department, e.g. Marseille, 13\"\n         },\n         \"unit\": {\n             \"type\": \"string\",\n             \"enum\": [\"celsius\", \"fahrenheit\"]\n         }\n     },\n     \"required\": [\"location\"]\n  }\n```\n\n\u003cbr/\u003e\n\n1. We will use the `TWeatherReportFunction` plugin defined in the `HuggingFace.Functions.Example` unit.\n\n```Delphi\n  var Weather: IFunctionCore := TWeatherReportFunction.Create;\n```\n\n2. We then define a method to display the result of the query using the `Weather` tool.\n\n```Delphi\nprocedure TMyForm.FuncExecuteStream(Sender: TObject; Text: string);\nbegin\n  HuggingFace.WaitForModel := True;\n  HuggingFace.UseCache := False;\n  HuggingFace.Chat.CompletionStream(\n    procedure (Params: TChatPayload)\n    begin\n      Params.Model('mistralai/Mixtral-8x7B-Instruct-v0.1');\n      Params.Messages([\n        TPayload.System('You are a fun and entertaining weather presenter.'),\n        TPayload.User(Text)]);\n      Params.Stream(True);\n      Params.MaxTokens(1024);\n    end,\n    function : TAsynChatStream\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnProgress := DisplayStream;\n      Result.OnError := DisplayStream;\n    end);\nend;\n```\n\n3. Building the query using the `Weather` tool\n\n**Synchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial, HuggingFace.Functions.Example; \n\n  HuggingFace.WaitForModel := True;\n  var Weather: IFunctionCore := TWeatherReportFunction.Create;\n  HFTutorial.Func := Weather;\n  HFTutorial.FuncProc := FuncExecuteStream;\n\n  var Chat := HuggingFace.Chat.Completion(\n    procedure (Params: TChatPayload)\n    begin\n      Params.Model('mistralai/Mixtral-8x7B-Instruct-v0.1');\n      Params.Messages([TPayload.User('What is the weather in Paris ?')]);\n      Params.Tools([Weather]);\n      Params.MaxTokens(1024);\n    end);\n  try\n    Display(Memo1, Chat);\n  finally\n    Chat.Free;\n  end;\n```\n\n\u003cbr/\u003e\n\n**Asynchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial, HuggingFace.Functions.Example; \n\n  HuggingFace.WaitForModel := True;\n  var Weather: IFunctionCore := TWeatherReportFunction.Create;\n  HFTutorial.Func := Weather;\n  HFTutorial.FuncProc := FuncExecuteStream;\n\n  HuggingFace.Chat.Completion(\n    procedure (Params: TChatPayload)\n    begin\n      Params.Model('mistralai/Mixtral-8x7B-Instruct-v0.1');\n      Params.Messages([TPayload.User('What is the weather in Paris ?')]);\n      Params.Tools([Weather]);\n      Params.MaxTokens(1024);\n    end,\n    function : TAsynChat\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnSuccess := Display;\n      Result.OnError := Display;\n    end);\n```\n\n\u003cbr/\u003e\n\n## Text Generation\n\nGenerate text based on a prompt.\n\nIf you are interested in a Chat Completion task, which generates a response based on a list of messages, check out the [`chat-completion`](#Chat) task.\n\nFor more details about the `text-generation` task, check out its [dedicated page](https://huggingface.co/tasks/text-generation)! You will find examples and related materials.\n\n\u003cbr/\u003e\n\n**Synchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial;\n\n  HuggingFace.WaitForModel := True;\n  HuggingFace.UseCache := False;\n\n  var Generation := HuggingFace.Text.Generation(\n    procedure (Params: TTextGenerationParam)\n    begin\n      Params.Model('google/gemma-2-2b-it');\n      Params.Inputs('Can you please let us know more details about your');\n      Params.Parameters(\n        procedure (var Params: TTextGenerationParameters)\n        begin\n          Params.MaxNewTokens(1024);\n          Params.DoSample(True);\n          Params.DecoderInputDetails(True);\n        end);\n    end);\n  try\n    Display(HFTutorial, Generation);\n  finally\n    Generation.Free;\n  end;\n```\n\n\u003cbr/\u003e\n\n**Asynchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial;\n\n  HuggingFace.WaitForModel := True;\n  HuggingFace.UseCache := False;\n\n  HuggingFace.Text.Generation(\n    procedure (Params: TTextGenerationParam)\n    begin\n      Params.Model('google/gemma-2-2b-it');\n      Params.Inputs('Can you please let us know more details about your');\n      Params.Parameters(\n        procedure (var Params: TTextGenerationParameters)\n        begin\n          Params.MaxNewTokens(1024);\n          Params.DoSample(True);\n          Params.DecoderInputDetails(True);\n        end);\n    end,\n    function : TAsynTextGeneration\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnSuccess := Display;\n      Result.OnError := Display;\n    end);\n```\n\n\u003cbr/\u003e\n\n**Asynchronously streamed code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial;\n\n  HuggingFace.WaitForModel := True;\n  HuggingFace.UseCache := False;\n\n  HuggingFace.Text.GenerationStream(\n    procedure (Params: TTextGenerationParam)\n    begin\n      Params.Model('google/gemma-2-2b-it');\n      Params.Inputs('Can you please let us know more details about your');\n      Params.Stream(True);\n    end,\n    function : TAsynTextGenerationStream\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnProgress := DisplayStream;\n      Result.OnError := Display;\n    end);\n```\n\n\u003cbr/\u003e\n\n## Translation\n\nTranslation is the task of converting text from one language to another.\n\nFor more details about the `translation` task, check out its [dedicated page](https://huggingface.co/tasks/translation)! You will find examples and related materials.\n\n\u003e[!NOTE]\n\u003e In the field of `translation` over 5,079 pre-trained models are available. \n\u003e\n\n\u003cbr\u003e\n\n**Asynchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial;\n\n  HuggingFace.WaitForModel := True;\n\n  //French to english translation\n  HuggingFace.Text.Translation(\n    procedure (Params: TTranslationParam)\n    begin\n      Params.Model('Helsinki-NLP/opus-mt-fr-en');\n      Params.Inputs('Je n''aurais pas dû abuser du chocolat, je crois que je vais le regretter.');\n      Params.Parameters(\n        procedure (var Params: TTranslationParameters)\n        begin\n          Params.SrcLang('french');\n          Params.TgtLang('english');\n        end);\n    end,\n    function : TAsynTranslation\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnSuccess := Display;\n      Result.OnError := Display;\n    end);\n```\n\n\u003cbr/\u003e\n\n## Image Generation\n\nGenerate an image based on a given text prompt.\n\nFor more details about the `text-to-image` task, check out its [dedicated page](https://huggingface.co/tasks/text-to-image)! You will find examples and related materials.\n\n\u003e[!NOTE]\n\u003e In the field of `text-to-image` over 50,539 pre-trained models are available. \n\u003e\n\n\u003cbr/\u003e\n\n**Asynchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial;\n\n  HuggingFace.WaitForModel := True;\n  HuggingFace.API.UseCache := False;\n  HFTutorial.FileName := 'Quarter.png';\n\n  HuggingFace.Text.TextToImage(\n    procedure (Params: TTextToImageParam)\n    begin\n      Params.Model('stabilityai/stable-diffusion-3-medium-diffusers');\n      Params.Inputs('A quarter dollar coin placed on a wooden floor in a close-up view');\n    end,\n    function : TAsynTextToImage\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnStart := Start;\n      Result.OnSuccess := Display;\n      Result.OnError := Display;\n    end);\n```\n\n\u003cbr/\u003e\n\n## Text-to-Speech\n\nConvert a text to an audio speech.\n\n\u003e[!NOTE]\n\u003e In the field of `text-to-speech` over 2,273 pre-trained models are available. \n\u003e\n\n\u003cbr/\u003e\n\n**Asynchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial;\n\n  HFTutorial.FileName := 'temp.mp3';\n  HuggingFace.WaitForModel := True;\n\n  HuggingFace.Text.TextToSpeech(\n    procedure (Params: TTextToSpeechParam)\n    begin\n      Params.Model('facebook/mms-tts-eng');\n      Params.Inputs('Hello and welcome. It''s nice to meet you.');\n    end,\n    function : TAsynTextToSpeech\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnStart := Start;\n      Result.OnSuccess := Display;\n      Result.OnError := Display;\n    end);\n```\n\n\u003cbr/\u003e\n\n## Automatic Speech Recognition\n\nAutomatic Speech Recognition (ASR), often referred to as Speech to Text (STT), involves converting spoken audio into written text.\n\nUse Cases:\n- Converting a podcast into text format\n- Creating a voice assistant system\n- Producing subtitles for video content\n\nFor more details about the `automatic-speech-recognition` task, check out its [dedicated page](https://huggingface.co/tasks/automatic-speech-recognition)! You will find examples and related materials.\n\n\u003e[!NOTE]\n\u003e In the field of `speech-to-text` over 21,386 pre-trained models are available. \n\u003e\n\nSuggested Models:\n- [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3): An advanced ASR model developed by OpenAI.\n- [nvidia/canary-1b](https://huggingface.co/nvidia/canary-1b): A robust model supporting multilingual ASR and speech translation, designed by Nvidia.\n- [pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1): A highly effective model for distinguishing and labeling different speakers in audio recordings.\n\n\u003cbr/\u003e\n\n**Asynchronously code example**\n\n```Pascal\n// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial;\n\n  HuggingFace.API.WaitForModel := True;\n\n  HuggingFace.Audio.AudioToText(\n    procedure (Params: TAudioToTextParam)\n    begin\n      Params.Model('openai/whisper-large-v3-turbo');\n      Params.Inputs('SpeechRecorded.wav');\n      Params.GenerationParameters(\n        procedure (var Params: TGenerationParameters)\n        begin\n          Params.MaxLength(10);\n        end);\n    end,\n    function : TAsynAudioToText\n    begin\n      Result.Sender := HFTutorial;\n      Result.OnSuccess := Display;\n      Result.OnError := Display;\n    end);\n```\nRemark: To run this example, you must first record some speech text in a file named `SpeechRecorded.wav`.\n\n# Contributing\n\nPull requests are welcome. If you're planning to make a major change, please open an issue first to discuss your proposed changes.\n\n\u003cbr/\u003e\n\n# License\n\nThis project is licensed under the [MIT](https://choosealicense.com/licenses/mit/) License.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxidonkey%2Fdelphihuggingface","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaxidonkey%2Fdelphihuggingface","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxidonkey%2Fdelphihuggingface/lists"}