{"id":50244764,"url":"https://github.com/IBM/fastfit","last_synced_at":"2026-06-12T14:01:08.980Z","repository":{"id":217838735,"uuid":"744911029","full_name":"IBM/fastfit","owner":"IBM","description":"FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes","archived":false,"fork":false,"pushed_at":"2025-05-07T18:33:16.000Z","size":101,"stargazers_count":208,"open_issues_count":11,"forks_count":20,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-07-11T16:01:31.349Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/IBM.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-01-18T08:59:56.000Z","updated_at":"2025-06-23T03:42:46.000Z","dependencies_parsed_at":"2024-01-18T13:11:21.533Z","dependency_job_id":"e9768325-b9f1-4980-b18f-229e3bc5a2a4","html_url":"https://github.com/IBM/fastfit","commit_stats":null,"previous_names":["ibm/fastfit"],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/IBM/fastfit","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IBM%2Ffastfit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IBM%2Ffastfit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IBM%2Ffastfit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IBM%2Ffastfit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/IBM","download_url":"https://codeload.github.com/IBM/fastfit/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IBM%2Ffastfit/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34247461,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-12T02:00:06.859Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-05-26T23:00:19.765Z","updated_at":"2026-06-12T14:01:08.974Z","avatar_url":"https://github.com/IBM.png","language":"Python","funding_links":[],"categories":["Tasks and Methods"],"sub_categories":["Text Classification and Sentiment Analysis"],"readme":"![fastfit_banner_white](https://github.com/IBM/fastfit/assets/23455264/a4de0a5e-b43a-462b-b1f2-9509ec873e76)\n\nFastFit, a method, and a Python package design to provide fast and accurate few-shot classification, especially for scenarios with many semantically similar classes. FastFit utilizes a novel approach integrating batch contrastive learning and token-level similarity score.  Compared to existing few-shot learning packages, such as SetFit, Transformers, or few-shot prompting of large language models via API calls, FastFit significantly improves multi-class classification performance in speed and accuracy across FewMany, our newly curated English benchmark, and Multilingual datasets. FastFit demonstrates a 3-20x improvement in training speed, completing training in just a few seconds.\n\n## Running the Training Script\n\nOur package provides a convenient command-line tool `train_fastfit` to train text classification models. This tool comes with a variety of configurable parameters to customize your training process.\n\n### Prerequisites\n\nBefore running the training script, ensure you have Python installed along with our package and its dependencies. If you haven't already installed our package, you can do so using pip:\n\n```bash\npip install fast-fit\n```\n\n### Usage\n\nTo run the training script with custom configurations, use the `train_fastfit` command followed by the necessary arguments similar to huggingface training args with few additions relevant for fast-fit.\n\n### Example Command\n\nHere's an example of how to use the `run_train` command with specific settings:\n\n```bash\ntrain_fastfit \\\n    --model_name_or_path \"sentence-transformers/paraphrase-mpnet-base-v2\" \\\n    --train_file $TRAIN_FILE \\\n    --validation_file $VALIDATION_FILE \\\n    --output_dir ./tmp/try \\\n    --overwrite_output_dir \\\n    --report_to none \\\n    --label_column_name label\\\n    --text_column_name text \\\n    --num_train_epochs 40 \\\n    --dataloader_drop_last true \\\n    --per_device_train_batch_size 32 \\\n    --per_device_eval_batch_size 64 \\\n    --evaluation_strategy steps \\\n    --max_text_length 128 \\\n    --logging_steps 100 \\\n    --dataloader_drop_last=False \\\n    --num_repeats 4 \\\n    --save_strategy no \\\n    --optim adafactor \\\n    --clf_loss_factor 0.1 \\\n    --do_train \\\n    --fp16 \\\n    --projection_dim 128\n```\n\n### Output\n\nUpon execution, `train_fastfit` will start the training process based on your parameters and output the results, including logs and model checkpoints, to the designated directory.\n\n## Training with python\nYou can simply run it with your python\n\n```python\nfrom datasets import load_dataset\nfrom fastfit import FastFitTrainer, sample_dataset\n\n# Load a dataset from the Hugging Face Hub\ndataset = load_dataset(\"FastFit/banking_77\")\ndataset[\"validation\"] = dataset[\"test\"]\n\n# Down sample the train data for 5-shot training\ndataset[\"train\"] = sample_dataset(dataset[\"train\"], label_column=\"label\", num_samples_per_label=5)\n\ntrainer = FastFitTrainer(\n    model_name_or_path=\"sentence-transformers/paraphrase-mpnet-base-v2\",\n    label_column_name=\"label\",\n    text_column_name=\"text\",\n    num_train_epochs=40,\n    per_device_train_batch_size=32,\n    per_device_eval_batch_size=64,\n    max_text_length=128,\n    dataloader_drop_last=False,\n    num_repeats=4,\n    optim=\"adafactor\",\n    clf_loss_factor=0.1,\n    fp16=True,\n    dataset=dataset,\n)\n\nmodel = trainer.train()\nresults = trainer.evaluate()\n\nprint(\"Accuracy: {:.1f}\".format(results[\"eval_accuracy\"] * 100))\n```\n\nThen the model can be saved:\n```python\nmodel.save_pretrained(\"fast-fit\")\n```\nThen you can use the model for inference\n```python\nfrom fastfit import FastFit\nfrom transformers import AutoTokenizer, pipeline\n\nmodel = FastFit.from_pretrained(\"fast-fit\")\ntokenizer = AutoTokenizer.from_pretrained(\"sentence-transformers/paraphrase-mpnet-base-v2\")\nclassifier = pipeline(\"text-classification\", model=model, tokenizer=tokenizer)\n\nprint(classifier(\"I love this package!\"))\n```\n\n## All avialble parameters:\n**Optional Arguments:**\n\n- `-h, --help`: Show this help message and exit.\n- `--num_repeats NUM_REPEATS`: The number of times to repeat the queries and docs in every batch. (default: 1)\n- `--proj_dim PROJ_DIM`: The dimension of the projection layer. (default: 128)\n- `--clf_loss_factor CLF_LOSS_FACTOR`: The factor to scale the classification loss. (default: 0.1)\n- `--pretrain_mode [PRETRAIN_MODE]`: Whether to do pre-training. (default: False)\n- `--inference_type INFERENCE_TYPE`: The inference type to be used. (default: sim)\n- `--rep_tokens REP_TOKENS`: The tokens to use for representation when calculating the similarity in training and inference. (default: all)\n- `--length_norm [LENGTH_NORM]`: Whether to normalize by length while considering pad (default: False)\n- `--mlm_factor MLM_FACTOR`: The factor to scale the MLM loss. (default: 0.0)\n- `--mask_prob MASK_PROB`: The probability of masking a token. (default: 0.0)\n- `--model_name_or_path MODEL_NAME_OR_PATH`: Path to pretrained model or model identifier from huggingface.co/models (default: None)\n- `--config_name CONFIG_NAME`: Pretrained config name or path if not the same as model_name (default: None)\n- `--tokenizer_name TOKENIZER_NAME`: Pretrained tokenizer name or path if not the same as model_name (default: None)\n- `--cache_dir CACHE_DIR`: Where do you want to store the pretrained models downloaded from huggingface.co (default: None)\n- `--use_fast_tokenizer [USE_FAST_TOKENIZER]`: Whether to use one of the fast tokenizer (backed by the tokenizers library) or not. (default: True)\n- `--no_use_fast_tokenizer`: Whether to use one of the fast tokenizer (backed by the tokenizers library) or not. (default: False)\n- `--model_revision MODEL_REVISION`: The specific model version to use (can be a branch name, tag name, or commit id). (default: main)\n- `--use_auth_token [USE_AUTH_TOKEN]`: Will use the token generated when running `transformers-cli login` (necessary to use this script with private models). (default: False)\n- `--ignore_mismatched_sizes [IGNORE_MISMATCHED_SIZES]`: Will enable to load a pretrained model whose head dimensions are different. (default: False)\n- `--load_from_FastFit [LOAD_FROM_FASTFIT]`: Will load the model from the trained model directory. (default: False)\n- `--task_name TASK_NAME`: The name of the task to train on: custom (default: None)\n- `--metric_name METRIC_NAME`: The name of the task to train on: custom (default: accuracy)\n- `--dataset_name DATASET_NAME`: The name of the dataset to use (via the datasets library). (default: None)\n- `--dataset_config_name DATASET_CONFIG_NAME`: The configuration name of the dataset to use (via the datasets library). (default: None)\n- `--max_seq_length MAX_SEQ_LENGTH`: The maximum total input sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded. (default: 128)\n- `--overwrite_cache [OVERWRITE_CACHE]`: Overwrite the cached preprocessed datasets or not. (default: False)\n- `--pad_to_max_length [PAD_TO_MAX_LENGTH]`: Whether to pad all samples to `max_seq_length`. If False, will pad the samples dynamically when batching to the maximum length in the batch. (default: True)\n- `--no_pad_to_max_length`: Whether to pad all samples to `max_seq_length`. If False, will pad the samples dynamically when batching to the maximum length in the batch. (default: False)\n- `--max_train_samples MAX_TRAIN_SAMPLES`: For debugging purposes or quicker training, truncate the number of training examples to this value if set. (default: None)\n- `--max_eval_samples MAX_EVAL_SAMPLES`: For debugging purposes or quicker training, truncate the number of evaluation examples to this value if set. (default: None)\n- `--max_predict_samples MAX_PREDICT_SAMPLES`: For debugging purposes or quicker training, truncate the number of prediction examples to this value if set. (default: None)\n- `--train_file TRAIN_FILE`: A csv or a json file containing the training data. (default: None)\n- `--validation_file VALIDATION_FILE`: A csv or a json file containing the validation data. (default: None)\n- `--test_file TEST_FILE`: A csv or a json file containing the test data. (default: None)\n- `--custom_goal_acc CUSTOM_GOAL_ACC`: If set, save the model every this number of steps. (default: None)\n- `--text_column_name TEXT_COLUMN_NAME`: The name of the column in the datasets containing the full texts (for summarization). (default: None)\n- `--label_column_name LABEL_COLUMN_NAME`: The name of the column in the datasets containing the labels. (default: None)\n- `--max_text_length MAX_TEXT_LENGTH`: The maximum total input sequence length after tokenization for text. (default: 32)\n- `--max_label_length MAX_LABEL_LENGTH`: The maximum total input sequence length after tokenization for label. (default: 32)\n- `--pre_train [PRE_TRAIN]`: The path to the pretrained model. (default: False)\n- `--added_tokens_per_label ADDED_TOKENS_PER_LABEL`: The number of added tokens to add to every class. (default: None)\n- `--added_tokens_mask_factor ADDED_TOKENS_MASK_FACTOR`: How much of the added tokens should be consisted of mask tokens embedding. (default: 0.0)\n- `--added_tokens_tfidf_factor ADDED_TOKENS_TFIDF_FACTOR`: How much of the added tokens should be consisted of tfidf tokens embedding. (default: 0.0)\n- `--pad_query_with_mask [PAD_QUERY_WITH_MASK]`: Whether to pad the query with the mask token. (default: False)\n- `--pad_doc_with_mask [PAD_DOC_WITH_MASK]`: Whether to pad the docs with the mask token. (default: False)\n- `--doc_mapper DOC_MAPPER`: The source for mapping docs to augmented docs (default: None)\n- `--doc_mapper_type DOC_MAPPER_TYPE`: The type of doc mapper (default: file)\n- `--output_dir OUTPUT_DIR`: The output directory where the model predictions and checkpoints will be written. (default: None)\n- `--overwrite_output_dir [OVERWRITE_OUTPUT_DIR]`: Overwrite the content of the output directory. Use this to continue training if output_dir points to a checkpoint directory. (default: False)\n- `--do_train [DO_TRAIN]`: Whether to run training. (default: False)\n- `--do_eval [DO_EVAL]`: Whether to run eval on the dev set. (default: False)\n- `--do_predict [DO_PREDICT]`: Whether to run predictions on the test set. (default: False)\n- `--evaluation_strategy {no,steps,epoch}`: The evaluation strategy to use. (default: no)\n- `--prediction_loss_only [PREDICTION_LOSS_ONLY]`: When performing evaluation and predictions, only returns the loss. (default: False)\n- `--per_device_train_batch_size PER_DEVICE_TRAIN_BATCH_SIZE`: Batch size per GPU/TPU core/CPU for training. (default: 8)\n- `--per_device_eval_batch_size PER_DEVICE_EVAL_BATCH_SIZE`: Batch size per GPU/TPU core/CPU for evaluation. (default: 8)\n- `--per_gpu_train_batch_size PER_GPU_TRAIN_BATCH_SIZE`: Deprecated, the use of `--per_device_train_batch_size` is preferred. Batch size per GPU/TPU core/CPU for training. (default: None)\n- `--per_gpu_eval_batch_size PER_GPU_EVAL_BATCH_SIZE`: Deprecated, the use of `--per_device_eval_batch_size` is preferred. Batch size per GPU/TPU core/CPU for evaluation. (default: None)\n- `--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS`: Number of updates steps to accumulate before performing a backward/update pass. (default: 1)\n- `--eval_accumulation_steps EVAL_ACCUMULATION_STEPS`: Number of predictions steps to accumulate before moving the tensors to the CPU. (default: None)\n- `--eval_delay EVAL_DELAY`: Number of epochs or steps to wait for before the first evaluation can be performed, depending on the evaluation_strategy. (default: 0)\n- `--learning_rate LEARNING_RATE`: The initial learning rate for AdamW. (default: 5e-05)\n- `--weight_decay WEIGHT_DECAY`: Weight decay for AdamW if we apply some. (default: 0.0)\n- `--adam_beta1 ADAM_BETA1`: Beta1 for AdamW optimizer (default: 0.9)\n- `--adam_beta2 ADAM_BETA2`: Beta2 for AdamW optimizer (default: 0.999)\n- `--adam_epsilon ADAM_EPSILON`: Epsilon for AdamW optimizer. (default: 1e-08)\n- `--max_grad_norm MAX_GRAD_NORM`: Max gradient norm. (default: 1.0)\n- `--num_train_epochs NUM_TRAIN_EPOCHS`: Total number of training epochs to perform. (default: 3.0)\n- `--max_steps MAX_STEPS`: If \u003e 0: set the total number of training steps to perform. Override num_train_epochs. (default: -1)\n- `--lr_scheduler_type {linear,cosine,cosine_with_restarts,polynomial,constant,constant_with_warmup}`: The scheduler type to use. (default: linear)\n- `--warmup_ratio WARMUP_RATIO`: Linear warmup over warmup_ratio fraction of total steps. (default: 0.0)\n- `--warmup_steps WARMUP_STEPS`: Linear warmup over warmup_steps. (default: 0)\n- `--log_level {debug,info,warning,error,critical,passive}`: Logger log level to use on the main node. Possible choices are the log levels as strings: 'debug', 'info', 'warning', 'error', and 'critical', plus a 'passive' level which doesn't set anything and lets the application set the level. Defaults to 'passive'. (default: passive)\n- `--log_level_replica {debug,info,warning,error,critical,passive}`: Logger log level to use on replica nodes. Same choices and defaults as `log_level` (default: passive)\n- `--log_on_each_node [LOG_ON_EACH_NODE]`: When doing a multinode distributed training, whether to log once per node or just once on the main node. (default: True)\n- `--no_log_on_each_node`: When doing a multinode distributed training, whether to log once per node or just once on the main node. (default: False)\n- `--logging_dir LOGGING_DIR`: Tensorboard log dir. (default: None)\n- `--logging_strategy {no,steps,epoch}`: The logging strategy to use. (default: steps)\n- `--logging_first_step [LOGGING_FIRST_STEP]`: Log the first global_step (default: False)\n- `--logging_steps LOGGING_STEPS`: Log every X updates steps. (default: 500)\n- `--logging_nan_inf_filter [LOGGING_NAN_INF_FILTER]`: Filter nan and inf losses for logging. (default: True)\n- `--no_logging_nan_inf_filter`: Filter nan and inf losses for logging. (default: False)\n- `--save_strategy {no,steps,epoch}`: The checkpoint save strategy to use. (default: steps)\n- `--save_steps SAVE_STEPS`: Save checkpoint every X updates steps. (default: 500)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FIBM%2Ffastfit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FIBM%2Ffastfit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FIBM%2Ffastfit/lists"}