https://github.com/maxidonkey/delphihuggingface

The Hugging Face API wrapper for Delphi leverages cutting-edge models to deliver powerful features, including object detection, music generation, text classification, sentiment analysis, image segmentation, speech-to-text transcription, and text generation.
https://github.com/maxidonkey/delphihuggingface
api-wrapper audio-classification bert chatbot delphi gpt huggingface image-classification image-prompting music-generation object-detection text-classification
Last synced: 5 months ago
JSON representation
Host: GitHub
URL: https://github.com/maxidonkey/delphihuggingface
Owner: MaxiDonkey
License: mit
Created: 2024-12-22T14:04:00.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-01-06T14:09:12.000Z (over 1 year ago)
Last Synced: 2025-01-06T15:22:47.611Z (over 1 year ago)
Topics: api-wrapper, audio-classification, bert, chatbot, delphi, gpt, huggingface, image-classification, image-prompting, music-generation, object-detection, text-classification
Language: Pascal
Homepage:
Size: 665 KB
Stars: 1
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # Delphi Hugging Face API

___

![GitHub](https://img.shields.io/badge/IDE%20Version-Delphi%2010.3/11/12-yellow)

![GitHub](https://img.shields.io/badge/platform-all%20platforms-green)

![GitHub](https://img.shields.io/badge/Updated%20the%2012/22/2024-blue)







- [Introduction](#Introduction)

    - [Resources available on Hugging Face Hub](#Resources-available-on-Hugging-Face-Hub)

    - [Serverless Inference API](#Serverless-Inference-API)

    - [Advantages of using Hugging Face Hub](#Advantages-of-using-Hugging-Face-Hub)

    - [Rate Limits and Supported Models](#Rate-Limits-and-Supported-Models)

    - [Licenses and Compliance](#Licenses-and-Compliance)

    - [Tutorial content](#Tutorial-content)

- [Remarks](#remarks)

- [Tools for simplifying this tutorial](#Tools-for-simplifying-this-tutorial)

- [Asynchronous callback mode management](#Asynchronous-callback-mode-management)

- [Exploration Journey](#Exploration-Journey)

    - [Initialization](#initialization)

    - [Hugging Face Models Overview](#Hugging-Face-Models-Overview)

        - [Model inference WARM COLD](#Model-inference-WARM-COLD)

    - [Music-gen](#Music-gen)

    - [Image object detection](#Image-object-detection)

    - [Text To Sentiment analysis](#Text-To-Sentiment-analysis)

    - [Audio classification](#Audio-classification)

        - [Speech emotion recognition](#speech-emotion-recognition)

        - [Gender recognition](#Gender-recognition)

    - [Image classification](#Image-classification)

    - [Image Segmentation](#Image-Segmentation)

    - [Zero-Shot classification](#Zero-Shot-classification)

    - [Token Classification](#Token-Classification)

    - [Question Answering](#Question-Answering)

    - [Table Question Answering](#Table-Question-Answering)

    - [Fill-mask](#Fill-mask)

    - [Text Classification](#Text-Classification)

    - [Summarization](#Summarization)

- [Common Ground Functionalities Across API Ecosystems](#Common-Ground-Functionalities-Across-API-Ecosystems)

    - [Embeddings](#Embeddings)

    - [Chat](#Chat)

        - [Multi Turn Conversation](#Multi-Turn-Conversation)

        - [Streamed Multi Turn Conversation](#Streamed-Multi-Turn-Conversation)

        - [Vision](#Vision)

        - [Use tools](#Use-tools)

    - [Text Generation](#Text-Generation)

    - [Translation](#Translation)

    - [Image Generation](#Image-Generation)

    - [Text-to-Speech](#Text-to-Speech)

    - [Automatic Speech Recognition](#Automatic-Speech-Recognition)

- [Contributing](#contributing)

- [License](#license)

 







# Introduction

**Hugging Face Hub** is an open-source collaborative platform dedicated to democratizing access to artificial intelligence (AI) technologies. This platform hosts a vast collection of models, datasets, and interactive applications, facilitating the exploration, experimentation, and integration of AI solutions into various projects.

[Official page](https://huggingface.co/docs/hub/index)

## Resources available on Hugging Face Hub

- **Models:** The Hub offers a multitude of pre-trained models covering domains such as natural language processing (NLP), computer vision, and audio recognition. These models are suited for various tasks, including text generation, classification, object detection, and speech transcription. 

- **Datasets:** A diverse library of datasets is available for training and evaluating your own models, providing a foundation for developing customized solutions. 

- **Spaces:** The Hub hosts interactive applications that allow you to visualize and test models directly from a browser. These spaces are useful for demonstrating model capabilities or conducting quick analyses. 




## Serverless Inference API

Hugging Face Hub offers a Inference API, enabling rapid integration of AI models into your projects without the need for complex infrastructure management.




## Advantages of using Hugging Face Hub

- **Time-saving:** Models are ready to use, eliminating the need to train or deploy them locally, which accelerates the development of applications.

- **Scalability:** The Hub's infrastructure ensures automatic scaling, load balancing, and efficient caching.




In summary, **Hugging Face Hub** is a resource for integrating AI models into projects. With its serverless Inference API and collection of ready-to-use resources, it offers an solution to enhance applications with AI capabilities while simplifying their implementation and maintenance.




## Rate Limits and Supported Models

By subscribing, you gain access to thousands of models. You can explore the benefits of individual, professional, and enterprise subscriptions by following the links below:

- [Rate limits](https://huggingface.co/docs/api-inference/rate-limits)

- [Supported models](https://huggingface.co/docs/api-inference/supported-models)




## Licenses and Compliance

When integrating models or datasets from **Hugging Face Hub** into your projects, it is crucial to pay close attention to the associated licenses. Every resource hosted on the platform comes with a specific license that outlines the terms of use, modification, and distribution. A thorough understanding of these licenses is essential to ensure the legal and ethical compliance of your developments.

**Why is this important?**

- **Legal compliance:** Using a resource without adhering to its license terms can lead to legal violations, exposing your project to potential risks.

- **Respect for creators' rights:** Licenses protect the rights of creators. By respecting them, you acknowledge and honor their work.

- **Transparency and ethics:** Following the conditions of licenses promotes responsible and ethical use of open-source technologies.

Refer to the `Model Card` or `Dataset Card` for each model or dataset used in your application.




## Tutorial content

The **Hugging Face Hub** provides open-source libraries such as `Transformers`, enables integration with `Gradio`, and offers evaluation tools like `Evaluate`. However, these aspects will not be covered in this tutorial, as they are beyond the scope of this document.

Instead, this tutorial will focus on using the APIs with Delphi, highlighting key features such as image and sound classification, music generation (`music-gen`), sentiment analysis, object detection in images, image segmentation, and all natural language processing (NLP) functions.




# Remarks

> [!IMPORTANT]

>

> This is an unofficial library. **Hugging Face** does not provide any official library for `Delphi`.

> This repository contains `Delphi` implementation over [Hugging Face](https://huggingface.co/docs/api-inference) public API.




# Tools for simplifying this tutorial

To simplify the example codes provided in this tutorial, I have included two units in the source code: `VCL.Stability.Tutorial` and `FMX.Stability.Tutorial`. Depending on the option you choose to test the provided source code, you will need to instantiate either the `TVCLStabilitySender` or `TFMXStabilitySender` class in the application's `OnCreate` event, as follows:

>[!TIP]

>```Pascal

>//uses VCL.HuggingFace.Tutorial;

>

>  HFTutorial := TVCLHuggingFaceSender.Create(Memo1, Image1, Image2, MediaPlayer1);

>```

>

>or

>

>```Pascal

>//uses FMX.HuggingFace.Tutorial;

>

>  HFTutorial := TFMXHuggingFaceSender.Create(Memo1, Image1, Image2, MediaPlayer1);

>```

>

Make sure to add a `TMemo`, two `TImage` and a `TMediaPlayer` component to your form beforehand.




# Asynchronous callback mode management

In the context of asynchronous methods, for a method that does not involve streaming, callbacks use the following generic record: `TAsynCallBack = record` defined in the `HuggingFace.Async.Support.pas` unit. This record exposes the following properties:

```Pascal

   TAsynCallBack = record

   ... 

       Sender: TObject;

       OnStart: TProc;

       OnSuccess: TProc;

       OnError: TProc; 

```




For methods requiring streaming, callbacks use the generic record `TAsynStreamCallBack = record`, also defined in the `HuggingFace.Async.Support.pas` unit. This record exposes the following properties:

```Pascal

   TAsynCallBack = record

   ... 

       Sender: TObject;

       OnStart: TProc;

       OnSuccess: TProc;

       OnProgress: TProc;

       OnError: TProc;

       OnCancellation: TProc;

       OnDoCancel: TFunc;

```

The name of each property is self-explanatory; if needed, refer to the internal documentation for more details.




# Exploration Journey

This part of this document is designed to reflect the path I took while uncovering the features and possibilities of `Hugging Face Hub APIs`. Rather than presenting a rigid tutorial, I chose to structure it as an **Exploration Journey** to capture the iterative, curious, and hands-on process of discovery. Each step builds on the previous one, showcasing not only what I found but how I approached and learned from the API ecosystem."

## Initialization

To initialize the API instance, you need to [obtain an API key from Hugging Face](https://huggingface.co/settings/tokens).

Once you have a token, you can initialize the `IHuggingFace` interface, which serves as the entry point to the API.

> [!NOTE]

>```Pascal

>uses HuggingFace;

>

>var HuggingFace := THuggingFaceFactory.CreateInstance(API_KEY);

>```

When accessing the `list of models` or retrieving the `description of a specific model`, a different endpoint is used than the API endpoint. To instantiate this interface, use the following code:

```Pascal

uses HuggingFace;

var HFHub := THuggingFaceFactory.CreateInstance(API_KEY, True);

```

>[!Warning]

> To use the examples provided in this tutorial, especially to work with asynchronous methods, I recommend defining the HuggingFace interface with the widest possible scope.

>


> So, set `HuggingFace := THuggingFaceFactory.CreateInstance(My_Key);` in the `OnCreate` event of your application.

>


>Where `HuggingFace: IHuggingFace;`




## Hugging Face Models Overview

A filtered list of models can be obtained directly from the [playground](https://huggingface.co/spaces/enzostvs/hub-api-playground) or access to search models page on [web site.](https://huggingface.co/models) 





Using **Delphi**, this list can also be retrieved programmatically. To support filtering, the `TFetchParams` class, implemented in the `HuggingFace.Hub.Support` unit, must be used. This class accurately mirrors all parameters supported by the `/api/models` endpoint.




**Synchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; 

  var Models := HFHub.Hub.FetchModels(HFTutorial.UrlNext,

    procedure (Params: TFetchParams)

    begin

      Params.Limit(50);

      Params.Filter('eng,text-generation');

    end);

  try

    Display(HFTutorial, Models);

  finally

    Models.Free;

  end;

```

- **Remark :** A paginated result will be returned, containing 50 models per page. 

The `HFTutorial.UrlNext` variable will store the URL of the next page. By re-executing this code, the next 50 results will be retrieved and displayed.




**Asynchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; 

  HFHub.Hub.FetchModels(HFTutorial.UrlNext,

    procedure (Params: TFetchParams)

    begin

      Params.Limit(50);

      Params.Filter('text-to-audio');

    end,

    function : TAsynModels

    begin

      Result.Sender := HFTutorial;

      Result.OnSuccess := Display;

      Result.OnError := Display;

    end);

```

>[!TIP]

> The filter parameter queries the `Tags` field in the models' JSON format. Use a comma to separate different `Tags` values to include them in the same filter.

>




To visualize a model's data, utilize its model ID with the FetchModel method :

```Pascal

  //Synchronously

  function FetchModel(const RepoId: string): TRepoModel; overload;

  //Asynchronously

  procedure FetchModel(const RepoId: string; CallBacks: TFunc); overload;

```




### Model inference WARM COLD

The ML ecosystem evolves rapidly, and the Inference API provides access to models highly valued by the community, selected based on their recent popularity (likes, downloads, and usage). As a result, the available models may be replaced at any time without prior notice. Hugging Face strives to keep the most recent and popular models ready for immediate use.

The following distinctions are made:

- **Warm models:** models that are ready to use.

- **Cold models:** models that require loading before use.

- **Frozen models:** models currently unavailable for use via the API.

When invoking a model in the `COLD` state, it needs to be reloaded, which may result in a 503 error. In this case, you must wait before retrying the request with the same model.

To avoid the 503 error and wait for the model to reload and transition to the `WARM` state, you can add the following line of code:

```Pascal

  HuggingFace.WaitForModel := True;

```

Note : By default, the value of `WaitForModel` is set to False.

Refer to [official documentation](https://huggingface.co/docs/api-inference/parameters)




## Music-gen

[MusicGen](https://huggingface.co/facebook/musicgen-small) is a text-to-music model capable of generating high-quality music samples conditioned on text descriptions or audio prompts.

**Asynchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; 

  HuggingFace.UseCache := False;  //Disable caching

  HuggingFace.WaitForModel := True;  //Enable waiting for model reloading

  HFTutorial.FileName := 'music.mp3';

  HuggingFace.Text.TextToAudio(

    procedure (Params: TTextToAudioParam)

    begin

      Params.Model('facebook/musicgen-small');

      Params.Inputs('Pop music style with bass guitar');

    end,

    function : TAsynTextToSpeech

    begin

      Result.Sender := HFTutorial;

      Result.OnStart := Start;

      Result.OnSuccess := Display;

      Result.OnError := Display;

    end);

```




## Image object detection

For more details about the `object-detection` task, check out its [dedicated page](https://huggingface.co/tasks/object-detection)! You will find examples and related materials.

>[!NOTE]

> In the field of `Object Detection`, over 2,913 pre-trained models are available. 

>

[DEtection TRansformer (DETR) model](https://huggingface.co/facebook/detr-resnet-50) trained end-to-end on COCO 2017 object detection (118k annotated images).

The DETR model is an encoder-decoder transformer with a convolutional backbone.

**Asynchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; 

  var ImageFilePath := 'Z:\My_Folder\Images\My_Image.jpg';

  HFTutorial.LoadImageFromFile(ImageFilePath);

  HuggingFace.WaitForModel := True;

  HuggingFace.Image.ObjectDetection(

    procedure (Params: TObjectDetectionParam)

    begin

      Params.Model('facebook/detr-resnet-50');

      Params.Inputs(ImageFilePath);

    end,

    function : TAsynObjectDetection

    begin

      Result.Sender := HFTutorial;

      Result.OnSuccess := Display;

      Result.OnError := Display;

    end);

```

![Object detection](/../main/images/ObjectDetection.png?raw=true "Object detection")




## Text To Sentiment analysis

This is a [RoBERTa-base model](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest) trained on ~124M tweets from January 2018 to December 2021, and finetuned for sentiment analysis with the TweetEval benchmark. 

- **Reference Paper:** [TimeLMs paper](https://arxiv.org/abs/2202.03829).

- **Git Repo:** [TimeLMs official repository](https://github.com/cardiffnlp/timelms).

Labels: 0 -> Negative; 1 -> Neutral; 2 -> Positive

**Asynchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; 

  HuggingFace.WaitForModel := True;

  HuggingFace.Text. SentimentAnalysis(

    procedure (Params: TSentimentAnalysisParams)

    begin

      Params.Model('cardiffnlp/twitter-roberta-base-sentiment-latest');

      Params.Inputs('Today is a great day');

    end,

    function : TAsynSentimentAnalysis

    begin

      Result.Sender := HFTutorial;

      Result.OnSuccess := Display;

      Result.OnError := Display;

    end);

```




## Audio classification

For more details about the `audio-classification` task, check out its [dedicated page](https://huggingface.co/tasks/audio-classification)! You will find examples and related materials.




>[!NOTE]

> In the field of `Audio Classification`, over 2,859 pre-trained models are available. 

>

### Speech emotion recognition

[Speech Emotion Recognition By Fine-Tuning Wav2Vec 2.0](https://huggingface.co/ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition) 


The model is a fine-tuned version of `jonatasgrosman/wav2vec2-large-xlsr-53-english` for a Speech Emotion Recognition (SER) task.

The dataset used to fine-tune the original pre-trained model is the RAVDESS dataset. This dataset provides 1440 samples of recordings from actors performing on 8 different emotions in English, which are:

```Python

  emotions = ['angry', 'calm', 'disgust', 'fearful', 'happy', 'neutral', 'sad', 'surprised']

```

**Asynchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; 

  HuggingFace.WaitForModel := True;

  HuggingFace.Audio.Classification(

    procedure (Params: TAudioClassificationParam)

    begin

      Params.Model('ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition');

      Params.Inputs('SpeechRecorded.wav');

    end,

    function : TAsynAudioClassification

    begin

      Result.Sender := HFTutorial;

      Result.OnSuccess := Display;

      Result.OnError := Display;

    end);

```




### Gender recognition

[wav2vec2-large-xlsr-53-gender-recognition-librispeech](https://huggingface.co/alefiury/wav2vec2-large-xlsr-53-gender-recognition-librispeech) 



This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on Librispeech-clean-100 for gender recognition.

**Asynchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; 

  HuggingFace.WaitForModel := True;

  HuggingFace.Audio.Classification(

    procedure (Params: TAudioClassificationParam)

    begin

      Params.Model('alefiury/wav2vec2-large-xlsr-53-gender-recognition-librispeech');

      Params.Inputs('SpeechRecorded.wav');

    end,

    function : TAsynAudioClassification

    begin

      Result.Sender := HFTutorial;

      Result.OnSuccess := Display;

      Result.OnError := Display;

    end);

```




## Image classification

For more details about the `image-classification` task, check out its [dedicated page](https://huggingface.co/tasks/image-classification)! You will find examples and related materials.

>[!NOTE]

> In the field of `image classification`, over 15,000 pre-trained models are available.

>

[ResNet-50 v1.5](https://huggingface.co/microsoft/resnet-50) 


ResNet model pre-trained on ImageNet-1k at resolution 224x224. It was introduced in the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by He et al.

ResNet (Residual Network) is a convolutional neural network that democratized the concepts of residual learning and skip connections. This enables to train much deeper models.

**Asynchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; 

  var ImageFilePath := 'images\tiger.jpg';

  HFTutorial.LoadImageFromFile(ImageFilePath);

  HuggingFace.WaitForModel := True;

  HuggingFace.Image.Classification(

    procedure (Params: TImageClassificationParam)

    begin

      Params.Model('microsoft/resnet-50');

      //Params.Model('google/vit-base-patch16-224');  //Can be used too

      Params.Inputs(ImageFilePath);

    end,

    function : TAsynImageClassification

    begin

      Result.Sender := HFTutorial;

      Result.OnSuccess := Display;

      Result.OnError := Display;

    end);

```

[Vision Transformer (base-sized model)](https://huggingface.co/google/vit-base-patch16-224)

Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224, and fine-tuned on ImageNet 2012 (1 million images, 1,000 classes) at resolution 224x224. It was introduced in the paper An Image is Worth 16x16 Words: [Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Dosovitskiy et al. and first released in this repository. 




## Image Segmentation

For more details about the `image-segmentation` task, check out its [dedicated page](https://huggingface.co/tasks/image-segmentation)! You will find examples and related materials.

>[!NOTE]

> In the field of `image segmentation`, over 1,093 pre-trained models are available. Each model is distinguished by specific skills.

>

[openmmlab/upernet-convnext-small](https://huggingface.co/openmmlab/upernet-convnext-small) 


UperNet framework for semantic segmentation, leveraging a ConvNeXt backbone. UperNet was introduced in the paper [Unified Perceptual Parsing for Scene Understanding](https://arxiv.org/abs/1807.10221) by Xiao et al.

**Asynchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; 

  var ImageFilePath := 'images\tiger.jpg';

  HFTutorial.LoadImageFromFile(ImageFilePath);

  HuggingFace.WaitForModel := True;

  HuggingFace.Image.Segmentation(

    procedure (Params: TImageSegmentationParam)

    begin

      Params.Model('openmmlab/upernet-convnext-small');

      Params.Inputs(ImageFilePath);

    end,

    function : TAsynImageSegmentation

    begin

      Result.Sender := HFTutorial;

      Result.OnSuccess := Display;

      Result.OnError := Display;

    end);

```

![Image segmentation](/../main/images/ImageSegmentation.png?raw=true "Image segmentation")




Other models that you can easily test. It is up to you to choose the most suitable image:

- [jonathandinu/face-parsing](https://huggingface.co/jonathandinu/face-parsing)

- [nvidia/segformer-b1-finetuned-cityscapes-1024-1024](https://huggingface.co/nvidia/segformer-b1-finetuned-cityscapes-1024-1024)

- [google/deeplabv3_mobilenet_v2_1.0_513](https://huggingface.co/google/deeplabv3_mobilenet_v2_1.0_513)

- [facebook/mask2former-swin-large-cityscapes-semantic](https://huggingface.co/facebook/mask2former-swin-large-cityscapes-semantic)




## Zero-Shot classification

For more details about the `zero-shot-classification` task, check out its [dedicated page](https://huggingface.co/tasks/zero-shot-classification)! You will find examples and related materials.

>[!NOTE]

> In the field of `Zero-shot classification`, over 337 pre-trained models are available. 

>

[facebook/bart-large-mnli](https://huggingface.co/facebook/bart-large-mnli) 


This is the checkpoint for bart-large after being trained on the MultiNLI (MNLI) dataset.

Additional information about this model:

- The [bart-large](https://huggingface.co/facebook/bart-large) model page

- BART: [Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461)

**Asynchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; 

  HuggingFace.WaitForModel := True;

  HuggingFace.Text.ZeroShotClassification(

    procedure (Params: TZeroShotClassificationParam)

    begin

      Params.Model('facebook/bart-large-mnli');

      Params.Inputs('Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!');

      Params.Parameters(

        procedure (var Params: TZeroShotClassificationParameters)

        begin

          Params.CandidateLabels(['refund', 'legal', 'faq'])

        end);

    end,

    function : TAsynZeroShotClassification

    begin

      Result.Sender := HFTutorial;

      Result.OnSuccess := Display;

      Result.OnError := Display;

    end);

```

Other models that you can easily test.

- [valhalla/distilbart-mnli-12-9](https://huggingface.co/valhalla/distilbart-mnli-12-9)

- [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli)




## Token Classification

For more details about the `token-classification` task, check out its [dedicated page](https://huggingface.co/tasks/token-classification)! You will find examples and related materials.

>[!NOTE]

> In the field of `Zero-shot classification`, over 20,755 pre-trained models are available. 

>

[FacebookAI/xlm-roberta-large-finetuned-conll03-english](https://huggingface.co/FacebookAI/xlm-roberta-large-finetuned-conll03-english) 


The model can be used for token classification, a natural language understanding task in which a label is assigned to some tokens in a text. 


See [associated paper](https://arxiv.org/abs/1911.02116)

**Asynchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; 

  HuggingFace.WaitForModel := True;

  HuggingFace.Text.TokenClassification(

    procedure (Params: TTokenClassificationParam)

    begin

      Params.Model('FacebookAI/xlm-roberta-large-finetuned-conll03-english');

      //Params.Model('dslim/bert-base-NER');  //Can be used too

      Params.Inputs('My name is Sarah Jessica Parker but you can call me Jessica');

    end,

    function : TAsynTokenClassification

    begin

      Result.Sender := HFTutorial;

      Result.OnSuccess := Display;

      Result.OnError := Display;

    end);

```




## Question Answering

For more details about the `question-answering` task, check out its [dedicated page](https://huggingface.co/tasks/question-answering)! You will find examples and related materials.

>[!NOTE]

> In the field of `Question Answering`, over 12,683 pre-trained models are available. 

>

[deepset/roberta-base-squad2](https://huggingface.co/deepset/roberta-base-squad2) 


This is the [roberta-base model](https://huggingface.co/FacebookAI/roberta-base), fine-tuned using the [SQuAD2.0 dataset](https://huggingface.co/datasets/rajpurkar/squad_v2). It's been trained on question-answer pairs, including unanswerable questions, for the task of Extractive Question Answering. 


See [associated paper](https://arxiv.org/abs/1907.11692)




**Asynchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; 

  HuggingFace.WaitForModel := True;

  HuggingFace.Text.QuestionAnswering(

    procedure (Params: TQuestionAnsweringParam)

    begin

      Params.Model('deepset/roberta-base-squad2');

      Params.Inputs('What is my name?', 'My name is Clara and I live in Berkeley.');

      Params.Parameters(

        procedure (var Params: TQuestionAnsweringParameters)

        begin

          Params.TopK(3);

        end);

    end,

    function : TAsynQuestionAnswering

    begin

      Result.Sender := HFTutorial;

      Result.OnSuccess := Display;

      Result.OnError := Display;

    end);

```




## Table Question Answering

For more details about the `table-question-answering` task, check out its [dedicated page](https://huggingface.co/tasks/table-question-answering)! You will find examples and related materials.

>[!NOTE]

> In the field of `Table Question Answering`, over 133 pre-trained models are available. 

>




[google/tapas-base-finetuned-wtq](https://huggingface.co/google/tapas-base-finetuned-wtq) 


[TAPAS](https://github.com/google-research/tapas) is a BERT-like transformers model pretrained on a large corpus of English data from Wikipedia in a self-supervised fashion. This means it was pretrained on the raw tables and associated texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. 

**Asynchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; 

  HuggingFace.WaitForModel := True;

  HuggingFace.Text.TableQuestionAnswering(

    procedure (Params: TTableQAParam)

    begin

      Params.Model('google/tapas-base-finetuned-wtq');

      Params.Inputs(

        'How many stars does the tokenizers repository have?',

        [ TRow.Create('Repository', ['Transformers', 'Datasets', 'Tokenizers']),

          TRow.Create('Stars', ['36542', '4512', '3934']),

          TRow.Create('Contributors', ['651', '77', '34']),

          TRow.Create('Programming language',

             [ 'Python',

               'Python',

               'Rust, Python and NodeJS'

             ])

        ]);

    end,

    function : TAsynTableQA

    begin

      Result.Sender := HFTutorial;

      Result.OnSuccess := Display;

      Result.OnError := Display;

    end);

```




## Fill-mask

For more details about the `fill-mask` task, check out its [dedicated page](https://huggingface.co/tasks/fill-mask)! You will find examples and related materials.

>[!NOTE]

> In the field of `Fill-mask`, over 13,570 pre-trained models are available. 

>

[google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased) 


Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in [this paper](https://arxiv.org/abs/1810.04805) and first released in [this repository](https://github.com/google-research/bert). This model is uncased: it does not make a difference between english and English.




**Asynchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; 

  HuggingFace.API.WaitForModel := True;

  HuggingFace.Mask.Fill(

    procedure (Params: TMaskParam)

    begin

      Params.Model('google-bert/bert-base-uncased');

      Params.Inputs('The answer to the universe is [MASK].');

      Params.Parameters(['infinite', 'big', 'amazing', 'no', '42']);

    end,

    function : TAsynMask

    begin

      Result.Sender := HFTutorial;

      Result.OnSuccess := Display;

      Result.OnError := Display;

    end);

```




## Text Classification

For more details about the `text-classification` task, check out its [dedicated page](https://huggingface.co/tasks/text-classification)! You will find examples and related materials.

>[!NOTE]

> In the field of `Text Classification`, over 77,280 pre-trained models are available. 

>




[distilbert/distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english) 


This model is a fine-tune checkpoint of DistilBERT-base-uncased, fine-tuned on SST-2. This model reaches an accuracy of 91.3 on the dev set (for comparison, Bert bert-base-uncased version reaches an accuracy of 92.7). 


For more details about DistilBERT, we encourage to check out this [model card](https://huggingface.co/distilbert/distilbert-base-uncased).

**Asynchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; 

  HuggingFace.WaitForModel := True;

  HuggingFace.Text.TextClassification(

    procedure (Params: TTextClassificationParam)

    begin

      Params.Model('distilbert/distilbert-base-uncased-finetuned-sst-2-english');

      Params.Inputs('I like you. I love you.');

    end,

    function : TAsynTextClassification

    begin

      Result.Sender := HFTutorial;

      Result.OnSuccess := Display;

      Result.OnError := Display;

    end);

```

This code example returns positive or negative depending on the meaning of the prompt.

- Use the model : [papluca/xlm-roberta-base-language-detection](https://huggingface.co/papluca/xlm-roberta-base-language-detection) as a language detector.

- Use the model: [cardiffnlp/twitter-roberta-base-sentiment-latest](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest) for sentiment analysis.




## Summarization

Summarization is the task of producing a shorter version of a document while preserving its important information. Some models can extract text from the original input, while other models can generate entirely new text.

For more details about the `summarization` task, check out its [dedicated page](https://huggingface.co/tasks/summarization)! You will find examples and related materials.

>[!NOTE]

> In the field of `Summarization`, over 2,130 pre-trained models are available. 

>




[facebook/bart-large-cnn](https://huggingface.co/facebook/bart-large-cnn) 


BART is a transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. BART is pre-trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text.

**Asynchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; 

  HuggingFace.WaitForModel := True;

  HuggingFace.Text.Summarization(

    procedure (Params: TSummarizationParam)

    begin

      Params.Model('facebook/bart-large-cnn');

      Params.Inputs('The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct.');

    end,

    function : TAsynSummarization

    begin

      Result.Sender := HFTutorial;

      Result.OnSuccess := Display;

      Result.OnError := Display;

    end);

```




# Common Ground Functionalities Across API Ecosystems

In the previous chapter, **Exploration Journey**, I walked through the unique features of `Hugging Face Hub APIs`, focusing on what makes them stand out. As I kept exploring, I noticed some strong overlaps with other platforms like `OpenAI`, `Anthropic`, and `Gemini`. That’s where Common Ground comes in. This chapter is about zooming out to look at those shared functionalities and seeing how these ecosystems stack up against each other. By focusing on what they have in common, we can get a clearer picture of the API landscape as a whole.




## Embeddings

Feature extraction is the task of converting a text into a vector (often called “embedding”).

**Example applications:**

- Retrieving the most relevant documents for a query (for RAG applications).

- Reranking a list of documents based on their similarity to a query.

- Calculating the similarity between two sentences.

For more details about the `Embeddings` task, check out its [dedicated page](https://huggingface.co/tasks/feature-extraction)! You will find examples and related materials.

>[!NOTE]

> In the field of `Embeddings` over 7,400 pre-trained models are available. 

>




[mixedbread-ai/mxbai-embed-large-v1](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) : Produce sentence embeddings.

**Asynchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; 

  HuggingFace.API.WaitForModel := True;

  HuggingFace.Embeddings.Create(

    procedure (Params: TEmbeddingParams)

    begin

      Params.Model('mixedbread-ai/mxbai-embed-large-v1');

      Params.Inputs('Today is a sunny day and I will get some ice cream.');

    end,

    function : TAsynEmbeddings

    begin

      Result.Sender := HFTutorial;

      Result.OnSuccess := Display;

      Result.OnError := Display;

    end);

```




## Chat

Generate responses in a conversational context using a list of messages as input. This capability supports both conversational Language Models (LLMs) and Vision-Language Models (VLMs), bridging text-based and `image-to-text` functionalities. It is a specialized subtask within [`text generation`](https://huggingface.co/docs/api-inference/tasks/text-generation) and [`image-text-to-text`](https://huggingface.co/docs/api-inference/tasks/image-text-to-text) processing.

Recommended Models :

Conversational Large Language Models (LLMs)

- [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it): A robust text-generation model optimized for instruction following.

- [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct): A highly capable model for generating text and adhering to instructions.

- [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct): A compact yet efficient text-generation model.

- [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct): A reliable model for text generation and instruction compliance.

Conversational Vision-Language Models (VLMs)

- [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct): A powerful vision-language model with excellent capabilities in visual comprehension and reasoning.

- [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct): A strong model designed for image-text-to-text tasks.




### Multi Turn Conversation

Generate text based on a prompt. For more details about the `text-generation` task, check out its [dedicated page](https://huggingface.co/tasks/text-generation)! You will find examples and related materials.

>[!NOTE]

> In the field of `text-generation` over 163,600 pre-trained models are available. 

>

**Synchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; 

  HuggingFace.UseCache := False;

  var Chat := HuggingFace.Chat.Completion(

    procedure (Params: TChatPayload)

    begin

      Params.Model('microsoft/Phi-3-mini-4k-instruct');

      Params.Messages([

         TPayload.User('Hello'),

         TPayload.Assistant('Great to meet you. What would you like to know?'),

         TPayload.User('I have two dogs in my house. How many paws are in my house?')

      ]);

      Params.MaxTokens(1024);

    end);

  try

    Display(Memo1, Chat);

  finally

    Chat.Free;

  end;

```




**Asynchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; 

  HuggingFace.UseCache := False;

  HuggingFace.Chat.Completion(

    procedure (Params: TChatPayload)

    begin

      Params.Model('microsoft/Phi-3-mini-4k-instruct');

      Params.Messages([

         TPayload.User('Hello'),

         TPayload.Assistant('Great to meet you. What would you like to know?'),

         TPayload.User('I have two dogs in my house. How many paws are in my house?')

      ]);

      Params.MaxTokens(1024);

    end,

    function : TAsynChat

    begin

      Result.Sender := HFTutorial;

      Result.OnSuccess := Display;

      Result.OnError := Display;

    end);

```




### Streamed Multi Turn Conversation

**Synchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; 

  

  HuggingFace.UseCache := False;

  HuggingFace.Chat.CompletionStream(

    procedure (Params: TChatPayload)

    begin

      Params.Model('microsoft/Phi-3.5-mini-instruct');

      Params.Messages([

         TPayload.User('Hello'),

         TPayload.Assistant('Great to meet you. What would you like to know?'),

         TPayload.User('I have two dogs in my house. How many paws are in my house?')

      ]);

      Params.Stream(True);

      Params.MaxTokens(1024);

    end,

    procedure (var Chat: TChat; IsDone: Boolean; var Cancel: Boolean)

    begin

      if Assigned(Chat) and not IsDone then

        begin

          DisplayStream(HFTutorial, Chat);

          Application.ProcessMessages;

        end;

    end);

```




**Asynchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; 

  HuggingFace.UseCache := False;

  HuggingFace.Chat.CompletionStream(

    procedure (Params: TChatPayload)

    begin

      Params.Model('microsoft/Phi-3.5-mini-instruct');

      Params.Messages([

         TPayload.User('Hello'),

         TPayload.Assistant('Great to meet you. What would you like to know?'),

         TPayload.User('I have two dogs in my house. How many paws are in my house?')

      ]);

      Params.Stream(True);

      Params.MaxTokens(1024);

    end,

    function : TAsynChatStream

    begin

      Result.Sender := HFTutorial;

      Result.OnProgress := DisplayStream;

      Result.OnError := DisplayStream;

    end);  

```




### Vision

Models that combine image and text inputs, often referred to as `vision-language` models (VLMs), generate text outputs based on both an image and a text prompt. Unlike traditional `image-to-text` models, which are primarily designed for specific tasks like image captioning, VLMs incorporate an additional layer of versatility by accepting text prompts. Some of these models are even trained to process entire conversations as input, enabling a broader range of applications.

For more details about the `image-text-to-text` task, check out its [dedicated page](https://huggingface.co/tasks/image-text-to-text)! You will find examples and related materials.

>[!NOTE]

> In the field of `image-text-to-text` over 5,750 pre-trained models are available. 

>




**Synchronously streamed code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; 

  HuggingFace.UseCache := False;

  var ImageFilePath := 'https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg';

  HuggingFace.Chat.CompletionStream(

    procedure (Params: TChatPayload)

    begin

      Params.Model('meta-llama/Llama-3.2-11B-Vision-Instruct');

      Params.Messages([TPayload.User('Describe the image ?', [ImageFilePath])]);

      Params.Stream(True);

      Params.MaxTokens(1024);

    end,

    procedure (var Chat: TChat; IsDone: Boolean; var Cancel: Boolean)

    begin

      if Assigned(Chat) and not IsDone then

        begin

          DisplayStream(HFTutorial, Chat);

          Application.ProcessMessages;

        end;

    end);

```




**Asynchronously streamed code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial; 

  HuggingFace.UseCache := False;

  var ImageFilePath := 'https://tripfixers.com/wp-content/uploads/2019/11/eiffel-tower-with-snow.jpeg';

  

    HuggingFace.Chat.CompletionStream(

    procedure (Params: TChatPayload)

    begin

      Params.Model('meta-llama/Llama-3.2-11B-Vision-Instruct');

      Params.Messages([TPayload.User('Describe the image ?', [ImageFilePath])]);

      Params.Stream(True);

      Params.MaxTokens(1024);

    end,

    function : TAsynChatStream

    begin

      Result.Sender := HFTutorial;

      Result.OnProgress := DisplayStream;

      Result.OnError := DisplayStream;

    end);

```




### Use tools

What is the weather in Paris ?

The tool schema used :

```Json

  {

    "type": "object",

    "properties": {

         "location": {

             "type": "string",

             "description": "The city and department, e.g. Marseille, 13"

         },

         "unit": {

             "type": "string",

             "enum": ["celsius", "fahrenheit"]

         }

     },

     "required": ["location"]

  }

```




1. We will use the `TWeatherReportFunction` plugin defined in the `HuggingFace.Functions.Example` unit.

```Delphi

  var Weather: IFunctionCore := TWeatherReportFunction.Create;

```

2. We then define a method to display the result of the query using the `Weather` tool.

```Delphi

procedure TMyForm.FuncExecuteStream(Sender: TObject; Text: string);

begin

  HuggingFace.WaitForModel := True;

  HuggingFace.UseCache := False;

  HuggingFace.Chat.CompletionStream(

    procedure (Params: TChatPayload)

    begin

      Params.Model('mistralai/Mixtral-8x7B-Instruct-v0.1');

      Params.Messages([

        TPayload.System('You are a fun and entertaining weather presenter.'),

        TPayload.User(Text)]);

      Params.Stream(True);

      Params.MaxTokens(1024);

    end,

    function : TAsynChatStream

    begin

      Result.Sender := HFTutorial;

      Result.OnProgress := DisplayStream;

      Result.OnError := DisplayStream;

    end);

end;

```

3. Building the query using the `Weather` tool

**Synchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial, HuggingFace.Functions.Example; 

  HuggingFace.WaitForModel := True;

  var Weather: IFunctionCore := TWeatherReportFunction.Create;

  HFTutorial.Func := Weather;

  HFTutorial.FuncProc := FuncExecuteStream;

  var Chat := HuggingFace.Chat.Completion(

    procedure (Params: TChatPayload)

    begin

      Params.Model('mistralai/Mixtral-8x7B-Instruct-v0.1');

      Params.Messages([TPayload.User('What is the weather in Paris ?')]);

      Params.Tools([Weather]);

      Params.MaxTokens(1024);

    end);

  try

    Display(Memo1, Chat);

  finally

    Chat.Free;

  end;

```




**Asynchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial, HuggingFace.Functions.Example; 

  HuggingFace.WaitForModel := True;

  var Weather: IFunctionCore := TWeatherReportFunction.Create;

  HFTutorial.Func := Weather;

  HFTutorial.FuncProc := FuncExecuteStream;

  HuggingFace.Chat.Completion(

    procedure (Params: TChatPayload)

    begin

      Params.Model('mistralai/Mixtral-8x7B-Instruct-v0.1');

      Params.Messages([TPayload.User('What is the weather in Paris ?')]);

      Params.Tools([Weather]);

      Params.MaxTokens(1024);

    end,

    function : TAsynChat

    begin

      Result.Sender := HFTutorial;

      Result.OnSuccess := Display;

      Result.OnError := Display;

    end);

```




## Text Generation

Generate text based on a prompt.

If you are interested in a Chat Completion task, which generates a response based on a list of messages, check out the [`chat-completion`](#Chat) task.

For more details about the `text-generation` task, check out its [dedicated page](https://huggingface.co/tasks/text-generation)! You will find examples and related materials.




**Synchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial;

  HuggingFace.WaitForModel := True;

  HuggingFace.UseCache := False;

  var Generation := HuggingFace.Text.Generation(

    procedure (Params: TTextGenerationParam)

    begin

      Params.Model('google/gemma-2-2b-it');

      Params.Inputs('Can you please let us know more details about your');

      Params.Parameters(

        procedure (var Params: TTextGenerationParameters)

        begin

          Params.MaxNewTokens(1024);

          Params.DoSample(True);

          Params.DecoderInputDetails(True);

        end);

    end);

  try

    Display(HFTutorial, Generation);

  finally

    Generation.Free;

  end;

```




**Asynchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial;

  HuggingFace.WaitForModel := True;

  HuggingFace.UseCache := False;

  HuggingFace.Text.Generation(

    procedure (Params: TTextGenerationParam)

    begin

      Params.Model('google/gemma-2-2b-it');

      Params.Inputs('Can you please let us know more details about your');

      Params.Parameters(

        procedure (var Params: TTextGenerationParameters)

        begin

          Params.MaxNewTokens(1024);

          Params.DoSample(True);

          Params.DecoderInputDetails(True);

        end);

    end,

    function : TAsynTextGeneration

    begin

      Result.Sender := HFTutorial;

      Result.OnSuccess := Display;

      Result.OnError := Display;

    end);

```




**Asynchronously streamed code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial;

  HuggingFace.WaitForModel := True;

  HuggingFace.UseCache := False;

  HuggingFace.Text.GenerationStream(

    procedure (Params: TTextGenerationParam)

    begin

      Params.Model('google/gemma-2-2b-it');

      Params.Inputs('Can you please let us know more details about your');

      Params.Stream(True);

    end,

    function : TAsynTextGenerationStream

    begin

      Result.Sender := HFTutorial;

      Result.OnProgress := DisplayStream;

      Result.OnError := Display;

    end);

```




## Translation

Translation is the task of converting text from one language to another.

For more details about the `translation` task, check out its [dedicated page](https://huggingface.co/tasks/translation)! You will find examples and related materials.

>[!NOTE]

> In the field of `translation` over 5,079 pre-trained models are available. 

>




**Asynchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial;

  HuggingFace.WaitForModel := True;

  //French to english translation

  HuggingFace.Text.Translation(

    procedure (Params: TTranslationParam)

    begin

      Params.Model('Helsinki-NLP/opus-mt-fr-en');

      Params.Inputs('Je n''aurais pas dû abuser du chocolat, je crois que je vais le regretter.');

      Params.Parameters(

        procedure (var Params: TTranslationParameters)

        begin

          Params.SrcLang('french');

          Params.TgtLang('english');

        end);

    end,

    function : TAsynTranslation

    begin

      Result.Sender := HFTutorial;

      Result.OnSuccess := Display;

      Result.OnError := Display;

    end);

```




## Image Generation

Generate an image based on a given text prompt.

For more details about the `text-to-image` task, check out its [dedicated page](https://huggingface.co/tasks/text-to-image)! You will find examples and related materials.

>[!NOTE]

> In the field of `text-to-image` over 50,539 pre-trained models are available. 

>




**Asynchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial;

  HuggingFace.WaitForModel := True;

  HuggingFace.API.UseCache := False;

  HFTutorial.FileName := 'Quarter.png';

  HuggingFace.Text.TextToImage(

    procedure (Params: TTextToImageParam)

    begin

      Params.Model('stabilityai/stable-diffusion-3-medium-diffusers');

      Params.Inputs('A quarter dollar coin placed on a wooden floor in a close-up view');

    end,

    function : TAsynTextToImage

    begin

      Result.Sender := HFTutorial;

      Result.OnStart := Start;

      Result.OnSuccess := Display;

      Result.OnError := Display;

    end);

```




## Text-to-Speech

Convert a text to an audio speech.

>[!NOTE]

> In the field of `text-to-speech` over 2,273 pre-trained models are available. 

>




**Asynchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial;

  HFTutorial.FileName := 'temp.mp3';

  HuggingFace.WaitForModel := True;

  HuggingFace.Text.TextToSpeech(

    procedure (Params: TTextToSpeechParam)

    begin

      Params.Model('facebook/mms-tts-eng');

      Params.Inputs('Hello and welcome. It''s nice to meet you.');

    end,

    function : TAsynTextToSpeech

    begin

      Result.Sender := HFTutorial;

      Result.OnStart := Start;

      Result.OnSuccess := Display;

      Result.OnError := Display;

    end);

```




## Automatic Speech Recognition

Automatic Speech Recognition (ASR), often referred to as Speech to Text (STT), involves converting spoken audio into written text.

Use Cases:

- Converting a podcast into text format

- Creating a voice assistant system

- Producing subtitles for video content

For more details about the `automatic-speech-recognition` task, check out its [dedicated page](https://huggingface.co/tasks/automatic-speech-recognition)! You will find examples and related materials.

>[!NOTE]

> In the field of `speech-to-text` over 21,386 pre-trained models are available. 

>

Suggested Models:

- [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3): An advanced ASR model developed by OpenAI.

- [nvidia/canary-1b](https://huggingface.co/nvidia/canary-1b): A robust model supporting multilingual ASR and speech translation, designed by Nvidia.

- [pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1): A highly effective model for distinguishing and labeling different speakers in audio recordings.




**Asynchronously code example**

```Pascal

// uses HuggingFace, HuggingFace.Types, HuggingFace.Aggregator, FMX.HuggingFace.Tutorial;

  HuggingFace.API.WaitForModel := True;

  HuggingFace.Audio.AudioToText(

    procedure (Params: TAudioToTextParam)

    begin

      Params.Model('openai/whisper-large-v3-turbo');

      Params.Inputs('SpeechRecorded.wav');

      Params.GenerationParameters(

        procedure (var Params: TGenerationParameters)

        begin

          Params.MaxLength(10);

        end);

    end,

    function : TAsynAudioToText

    begin

      Result.Sender := HFTutorial;

      Result.OnSuccess := Display;

      Result.OnError := Display;

    end);

```

Remark: To run this example, you must first record some speech text in a file named `SpeechRecorded.wav`.

# Contributing

Pull requests are welcome. If you're planning to make a major change, please open an issue first to discuss your proposed changes.




# License

This project is licensed under the [MIT](https://choosealicense.com/licenses/mit/) License.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/maxidonkey/delphihuggingface

Awesome Lists containing this project

README