{"id":23390261,"url":"https://github.com/diixo/fasttextcc","last_synced_at":"2025-07-25T01:17:29.751Z","repository":{"id":64014694,"uuid":"572163126","full_name":"diixo/fasttextCC","owner":"diixo","description":"fastText v0.9.3 (C++ port)","archived":false,"fork":false,"pushed_at":"2023-06-09T15:17:49.000Z","size":214,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-08T14:19:41.787Z","etag":null,"topics":["embeddings","fasttext","machine-learning","nlp","text-classification"],"latest_commit_sha":null,"homepage":"https://viix.co","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/diixo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-29T17:27:05.000Z","updated_at":"2025-02-17T16:16:03.000Z","dependencies_parsed_at":"2024-12-22T03:29:27.720Z","dependency_job_id":"ab55d0a1-210e-4f77-962d-4f6f4a6cd5dc","html_url":"https://github.com/diixo/fasttextCC","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/diixo/fasttextCC","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diixo%2FfasttextCC","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diixo%2FfasttextCC/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diixo%2FfasttextCC/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diixo%2FfasttextCC/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/diixo","download_url":"https://codeload.github.com/diixo/fasttextCC/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/diixo%2FfasttextCC/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266936770,"owners_count":24009424,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-24T02:00:09.469Z","response_time":99,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["embeddings","fasttext","machine-learning","nlp","text-classification"],"created_at":"2024-12-22T03:29:23.129Z","updated_at":"2025-07-25T01:17:29.655Z","avatar_url":"https://github.com/diixo.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# fastTextСС\n**fastTextСС** is extended version of [fastText](https://fasttext.cc/)\n\nFastText is a library for efficient learning of word representations and sentence classification.\n\n## Added features vs original **fastText**\n\n* UTF-8 input file support\n* Filtering by using stopwords-file (added **-stopwords** option), converting symbols to lower-case if stopwords (by default).\n* Fix of handling special symbols (soft-hyphen, no-brake space, etc..)\n\n\n## Table of contents\n\n* [Usage](#usage)\n* [Requirements](#requirements)\n* [Building fastText](#building-fasttext)\n   * [Building fastText using make (preferred)](#building-fasttext-using-make-preferred)\n   * [Building and installing fastText using cmake](#building-and-installing-fasttext-using-cmake)\n   * [Building and installing fastText for Python](#building-and-installing-fasttext-for-python)\n* [Example use cases](#example-use-cases)\n* [Loss functions](#loss-functions)\n* [Unsupervised learning](#unsupervised-learning)\n   * [Word representation learning](#word-representation-learning)\n   * [CBOW](#cbow)\n   * [Prediction](#prediction)\n   * [Printing word vectors](#printing-word-vectors)\n   * [Nearest neighbor queries](#nearest-neighbor-queries)\n   * [Advanced reader: measure of similarity](#advanced-reader-measure-of-similarity)\n   * [Word analogies](#word-analogies)\n   * [Importance of character n-grams](#importance-of-character-n-grams)\n   * [Obtaining word vectors for out-of-vocabulary words](#obtaining-word-vectors-for-out-of-vocabulary-words)\n* [Supervised learning](#supervised-learning)\n   * [Text classification](#text-classification)\n* [Full documentation](#full-documentation)\n* [References](#references)\n   * [Enriching Word Vectors with Subword Information](#enriching-word-vectors-with-subword-information)\n   * [Bag of Tricks for Efficient Text Classification](#bag-of-tricks-for-efficient-text-classification)\n   * [FastText.zip: Compressing text classification models](#fasttextzip-compressing-text-classification-models)\n* [License](#license)\n\n## Usage\nRunning the binary without any argument will print the high level documentation, showing the different use cases supported by fastText:\n\n```\n\u003e\u003e ./fasttext\nusage: fasttext \u003ccommand\u003e \u003cargs\u003e\n\nThe commands supported by fasttext are:\n\n  supervised              train a supervised classifier\n  quantize                quantize a model to reduce the memory usage\n  test                    evaluate a supervised classifier\n  test-label              print labels with precision and recall scores\n  predict                 predict most likely labels\n  predict-prob            predict most likely labels with probabilities\n  predict-next            predict most likely next word of words sequence with probabilities\n  skipgram                train a skipgram model\n  cbow                    train a cbow model\n  print-word-vectors      print word vectors given a trained model\n  print-sentence-vectors  print sentence vectors given a trained model\n  print-ngrams            print ngrams given a trained model and word\n  nn                      query for nearest neighbors\n  analogies               query for analogies\n  similarity              query similarity of word vs another word\n  dump                    dump arguments, dictionary, input/output vectors\n```\nCurrent version was extended by two additional commands for CBOW-mode: **predict-next**, **similarity**.\n\n## Requirements\n\nGenerally, **fastText** builds on modern Mac OS and Linux distributions.\nSince it uses some C++11 features, it requires a compiler with good C++11 support.\nThese include :\n\n* (g++-4.7.2 or newer) or (clang-3.3 or newer)\n\nCompilation is carried out using a Makefile, so you will need to have a working **make**.\nIf you want to use **cmake** you need at least version 2.8.9.\n\nFor the word-similarity evaluation script you will need:\n\n* Python 2.6 or newer\n* NumPy \u0026 SciPy\n\nFor the python bindings (see the subdirectory python) you will need:\n\n* Python version 2.7 or \u003e=3.4\n* NumPy \u0026 SciPy\n* [pybind11](https://github.com/pybind/pybind11)\n\n## Building fastText\n\nWe discuss building the latest stable version of fastText.\n\n### Building fastText using make (preferred)\n\n```\n$ wget https://github.com/diixo/fastText/archive/v0.9.2.zip\n$ unzip v0.9.2.zip\n$ cd fastText-0.9.2\n$ make\n```\n\nThis will produce object files for all the classes as well as the main binary `fasttext`.\nIf you do not plan on using the default system-wide compiler, update the two macros defined at the beginning of the Makefile (CC and INCLUDES).\n\n### Building and installing fastText using cmake\n\nFor now this is not part of a release, so you will need to clone the master branch.\n\n```\n$ git clone https://github.com/diixo/fastText.git\n$ cd fastText\n$ mkdir build \u0026\u0026 cd build \u0026\u0026 cmake ..\n$ make \u0026\u0026 make install\n```\n\nThis will create the fasttext binary and also all relevant libraries (shared, static, PIC).\n\n### Building and installing fastText for Python\n\nFor now this is not part of a release, so you will need to clone the master branch.\n\n```\n$ git clone https://github.com/diixo/fastText.git\n$ cd fastText\n$ make\n$ pip install .\n```\n\nFor further information and introduction see python/README.md\n\n## Example use cases\n\nThis library has two main use cases: word representation learning and text classification.\nThese were described in the two papers [1](#enriching-word-vectors-with-subword-information) and [2](#bag-of-tricks-for-efficient-text-classification).\n\n## Loss-functions\n\n* Negative-sampling loss = `-loss ns`, default for unsupervised.\n* Softmax loss = `-loss softmax`, default for supervised.\n* One-vs-all loss = `-loss ova`.\n* Hierarchical softmax loss = `-loss hs`.\n\n## Unsupervised learning\n### Word representation learning\n\nWord-representation modes **skipgram** and **cbow** use a default **-minCount** = 5, where **minCount** is minimal number of word occurences.\n\nIn order to learn word vectors, as described in [1](#enriching-word-vectors-with-subword-information), do:\n\n```\n$ ./fasttext skipgram -input data.txt -output model\n```\n\nwhere `data.txt` is a training file containing `UTF-8` encoded text.\nBy default the word vectors will take into account character n-grams from 3 to 6 characters.\nAt the end of optimization the program will save two files: `model.bin` and `model.vec`.\n`model.vec` is a text file containing the word vectors, one per line.\n`model.bin` is a binary file containing the parameters of the model along with the dictionary and all hyper parameters.\nThe binary file can be used later to compute word vectors or to restart the optimization.\n\n### CBOW\n\nThe CBOW-model predicts the target word according to its context. The context is represented as a bag of the words contained in a fixed size window around the target word.\n\nLet us illustrate this difference with an example: given the sentence 'Poets have been mysteriously silent on the subject of cheese' and the target word 'silent', a skipgram model tries to predict the target using a random close-by word, like 'subject' or 'mysteriously'. The cbow model takes all the words in a surrounding window, like {been, mysteriously, on, the}, and uses the sum of their vectors to predict the target.\n\n```\n$ ./fasttext cbow -input train-data.txt -output train-data -minCount 1 -stopwords stopwords.txt -epoch 100 -loss softmax -maxn 0\n```\n### Prediction\n\nPredict next word for sequence words (only for CBOW-mode)\n\n```\n./fasttext predict-next model.bin\n```\nNow let's have a look on our predictions, we want as many prediction as possible (argument -1) and we want only labels with probability higher or equal to 0.5 :\n\n```\n./fasttext predict-next model.bin - -1 0.5\n```\nand then type the sentence:\n\n*data structures*\n\nThe original sentence is: *data structures entity*. Now we get result :\n\n```\nentity\n```\nThe argument `k` is optional, and is equal to `1` by default.\nThe argument `threshold` is optional, and is equal to `0` by default.\n```\n./fasttext predict-next model.bin text.txt\n```\nor:\n```\n./fasttext predict-next model.bin text.txt 10\n```\n\nThe similarity param based on probability in value range: 0..1. Calculated by command: \n\n```\n./fasttext similarity model.bin word1 word2\n```\n\n### Printing word vectors\n\nSearching and printing word vectors directly from the `fil9.vec` file is cumbersome. Fortunately, there is a `print-word-vectors` functionality in fastText.\n\nFor example, we can print the word vectors of words asparagus, pidgey and yellow with the following command:\n\n```\n$ echo \"asparagus pidgey yellow\" | ./fasttext print-word-vectors result/fil9.bin\nasparagus 0.46826 -0.20187 -0.29122 -0.17918 0.31289 -0.31679 0.17828 -0.04418 ...\npidgey -0.16065 -0.45867 0.10565 0.036952 -0.11482 0.030053 0.12115 0.39725 ...\nyellow -0.39965 -0.41068 0.067086 -0.034611 0.15246 -0.12208 -0.040719 -0.30155 ...\n```\n\nBy using Python:\n\n```\n\u003e\u003e\u003e [model.get_word_vector(x) for x in [\"asparagus\", \"pidgey\", \"yellow\"]]\n[array([-0.25751096, -0.18716481,  0.06921121,  0.06455903,  0.29168844,\n        0.15426874, -0.33448914, -0.427215  ,  0.7813013 , -0.10600132,\n        ...\n        0.37090245,  0.39266172, -0.4555302 ,  0.27452755,  0.00467369],\n      dtype=float32),\n array([-0.20613593, -0.25325796, -0.2422259 , -0.21067499,  0.32879013,\n        0.7269511 ,  0.3782259 ,  0.11274897,  0.246764  , -0.6423613 ,\n        ...\n        0.46302193,  0.2530962 , -0.35795924,  0.5755718 ,  0.09843876],\n      dtype=float32),\n array([-0.304823  ,  0.2543754 , -0.2198013 , -0.25421786,  0.11219151,\n        0.38286993, -0.22636674, -0.54023844,  0.41095474, -0.3505803 ,\n        ...\n        0.54788435,  0.36740595, -0.5678512 ,  0.07523401, -0.08701935],\n      dtype=float32)]\n```\n\nA nice feature is that you can also query for words that did not appear in your data! Indeed words are represented by the sum of its substrings. As long as the unknown word is made of known substrings, there is a representation of it!\n\nAs an example let's try with a misspelled word:\n\n```\n$ echo \"enviroment\" | ./fasttext print-word-vectors result/fil9.bin\n```\n\nYou still get a word vector for it! But how good it is? Let's find out in the next sections!\n\n### Nearest neighbor queries\n\nA simple way to check the quality of a word vector is to look at its `nearest neighbors`. This give an intuition of the type of semantic information the vectors are able to capture.\n\nThis can be achieved with the `nearest neighbor` (nn) functionality. For example, we can query the 10 nearest neighbors of a word by running the following command:\n\n```\n$ ./fasttext nn result/fil9.bin\nPre-computing word vectors... done.\n```\nThen we are prompted to type our query word, let us try asparagus :\n\n```\nQuery word? asparagus\nbeetroot 0.812384\ntomato 0.806688\nhorseradish 0.805928\nspinach 0.801483\nlicorice 0.791697\nlingonberries 0.781507\nasparagales 0.780756\nlingonberry 0.778534\ncelery 0.774529\nbeets 0.773984\n```\n\nNice! It seems that vegetable vectors are similar. Note that the nearest neighbor is the word asparagus itself, this means that this word appeared in the dataset. What about pokemons?\n\n```\nQuery word? pidgey\npidgeot 0.891801\npidgeotto 0.885109\npidge 0.884739\npidgeon 0.787351\npok 0.781068\npikachu 0.758688\ncharizard 0.749403\nsquirtle 0.742582\nbeedrill 0.741579\ncharmeleon 0.733625\n```\n\nDifferent evolution of the same Pokemon have close-by vectors! But what about our misspelled word, is its vector close to anything reasonable? Let s find out:\n\n```\nQuery word? enviroment\nenviromental 0.907951\nenviron 0.87146\nenviro 0.855381\nenvirons 0.803349\nenvironnement 0.772682\nenviromission 0.761168\nrealclimate 0.716746\nenvironment 0.702706\nacclimatation 0.697196\necotourism 0.697081\n```\n\nThanks to the information contained within the word, the vector of our misspelled word matches to reasonable words! It is not perfect but the main information has been captured.\n\n### Advanced reader: measure of similarity\n\nIn order to find nearest neighbors, we need to compute a similarity score between words. Our words are represented by continuous word vectors and we can thus apply simple similarities to them. In particular we use the cosine of the angles between two vectors. This similarity is computed for all words in the vocabulary, and the 10 most similar words are shown. Of course, if the word appears in the vocabulary, it will appear on top, with a similarity of 1.\n\n### Word analogies\n\nIn a similar spirit, one can play around with word analogies. For example, we can see if our model can guess what is to France, and what Berlin is to Germany.\n\nThis can be done with the analogies functionality. It takes a word triplet (like Germany Berlin France) and outputs the analogy:\n\n```\n$ ./fasttext analogies result/fil9.bin\nPre-computing word vectors... done.\nQuery triplet (A - B + C)? berlin germany france\nparis 0.896462\nbourges 0.768954\nlouveciennes 0.765569\ntoulouse 0.761916\nvalenciennes 0.760251\nmontpellier 0.752747\nstrasbourg 0.744487\nmeudon 0.74143\nbordeaux 0.740635\npigneaux 0.736122\n```\n\nThe answer provided by our model is Paris, which is correct. Let's have a look at a less obvious example:\n\n```\nQuery triplet (A - B + C)? psx sony nintendo\ngamecube 0.803352\nnintendogs 0.792646\nplaystation 0.77344\nsega 0.772165\ngameboy 0.767959\narcade 0.754774\nplaystationjapan 0.753473\ngba 0.752909\ndreamcast 0.74907\nfamicom 0.745298\n```\n\nOur model considers that the nintendo analogy of a psx is the gamecube, which seems reasonable. Of course the quality of the analogies depend on the dataset used to train the model and one can only hope to cover fields only in the dataset.\n\n### Importance of character n-grams\n\nUsing subword-level information is particularly interesting to build vectors for unknown words. For example, the word gearshift does not exist on Wikipedia but we can still query its closest existing words:\n\n```\nQuery word? gearshift\ngearing 0.790762\nflywheels 0.779804\nflywheel 0.777859\ngears 0.776133\ndriveshafts 0.756345\ndriveshaft 0.755679\ndaisywheel 0.749998\nwheelsets 0.748578\nepicycles 0.744268\ngearboxes 0.73986\n```\n\nMost of the retrieved words share substantial substrings but a few are actually quite different, like cogwheel. You can try other words like sunbathe or grandnieces.\n\nNow that we have seen the interest of subword information for unknown words, let's check how it compares to a model that does not use subword information. To train a model without subwords, just run the following command:\n\n```\n$ ./fasttext skipgram -input data/fil9 -output result/fil9-none -maxn 0\n```\n\nThe results are saved in result/fil9-non.vec and result/fil9-non.bin.\n\nTo illustrate the difference, let us take an uncommon word in Wikipedia, like accomodation which is a misspelling of accommodation. Here is the nearest neighbors obtained without subwords:\n\n```\n$ ./fasttext nn result/fil9-none.bin\nQuery word? accomodation\nsunnhordland 0.775057\naccomodations 0.769206\nadministrational 0.753011\nlaponian 0.752274\nammenities 0.750805\ndachas 0.75026\nvuosaari 0.74172\nhostelling 0.739995\ngreenbelts 0.733975\nasserbo 0.732465\n```\n\nThe result does not make much sense, most of these words are unrelated. On the other hand, using subword information gives the following list of nearest neighbors:\n\n```\nQuery word? accomodation\naccomodations 0.96342\naccommodation 0.942124\naccommodations 0.915427\naccommodative 0.847751\naccommodating 0.794353\naccomodated 0.740381\namenities 0.729746\ncatering 0.725975\naccomodate 0.703177\nhospitality 0.701426\n```\n\nThe nearest neighbors capture different variation around the word accommodation. We also get semantically related words such as amenities or catering.\n\n\n### Obtaining word vectors for out-of-vocabulary words\n\nThe previously trained model can be used to compute word vectors for out-of-vocabulary words.\nProvided you have a text file `queries.txt` containing words for which you want to compute vectors, use the following command:\n\n```\n$ ./fasttext print-word-vectors model.bin \u003c queries.txt\n```\n\nThis will output word vectors to the standard output, one vector per line.\nThis can also be used with pipes:\n\n```\n$ cat queries.txt | ./fasttext print-word-vectors model.bin\n```\n\nSee the provided scripts for an example. For instance, running:\n\n```\n$ ./word-vector-example.sh\n```\n\nwill compile the code, download data, compute word vectors and evaluate them on the rare words similarity dataset RW [Thang et al. 2013].\n\n## Supervised learning\n### Text classification\n\nThis library can also be used to train supervised text classifiers, for instance for sentiment analysis.\nIn order to train a text classifier using the method described in [2](#bag-of-tricks-for-efficient-text-classification), use:\n\n```\n$ ./fasttext supervised -input train.txt -output model\n```\n\nwhere `train.txt` is a text file containing a training sentence per line along with the labels.\nBy default, we assume that labels are words that are prefixed by the string `__label__`.\nThis will output two files: `model.bin` and `model.vec`.\nOnce the model was trained, you can evaluate it by computing the precision and recall at k (P@k and R@k) on a test set using:\n\n```\n$ ./fasttext test model.bin test.txt k\n```\n\nThe argument `k` is optional, and is equal to `1` by default.\n\nIn order to obtain the k most likely labels for a piece of text, use:\n\n```\n$ ./fasttext predict model.bin test.txt k\n```\n\nor use `predict-prob` to also get the probability for each label\n\n```\n$ ./fasttext predict-prob model.bin test.txt k\n```\n\nwhere `test.txt` contains a piece of text to classify per line.\nDoing so will print to the standard output the k most likely labels for each line.\nThe argument `k` is optional, and equal to `1` by default.\nSee `classification-example.sh` for an example use case.\nIn order to reproduce results from the paper [2](#bag-of-tricks-for-efficient-text-classification), run `classification-results.sh`, this will download all the datasets and reproduce the results from Table 1.\n\nIf you want to compute vector representations of sentences or paragraphs, please use `print-sentence-vectors`:\n\n```\n$ ./fasttext print-sentence-vectors model.bin \u003c text.txt\n```\n\nThis assumes that the `text.txt` file contains the paragraphs that you want to get vectors for.\nThe program will output one vector representation per line in the file.\n\nYou can also quantize a supervised model to reduce its memory usage with the following command:\n\n```\n$ ./fasttext quantize -output model\n```\nThis will create a `.ftz` file with a smaller memory footprint. All the standard functionality, like `test` or `predict` work the same way on the quantized models:\n```\n$ ./fasttext test model.ftz test.txt\n```\nThe quantization procedure follows the steps described in [3](#fasttextzip-compressing-text-classification-models). You can\nrun the script `quantization-example.sh` for an example.\n\n\n## Full documentation\n\nInvoke a command without arguments to list available arguments and their default values:\n\n```\n$ ./fasttext supervised\nEmpty input or output path.\n\nThe following arguments are mandatory:\n  -input              training file path\n  -output             output file path\n\nThe following arguments are optional:\n  -verbose            verbosity level [2]\n\nThe following arguments for the dictionary are optional:\n  -minCount           minimal number of word occurrences [1]\n  -minCountLabel      minimal number of label occurrences [0]\n  -wordNgrams         max length of word ngram [1]\n  -bucket             number of buckets [2000000]\n  -minn               min length of char ngram [0]\n  -maxn               max length of char ngram [0]\n  -t                  sampling threshold [0.0001]\n  -label              labels prefix [__label__]\n\nThe following arguments for training are optional:\n  -lr                 learning rate [0.1]\n  -lrUpdateRate       change the rate of updates for the learning rate [100]\n  -dim                size of word vectors [100]\n  -ws                 size of the context window [5]\n  -epoch              number of epochs [5]\n  -neg                number of negatives sampled [5]\n  -loss               loss function {ns, hs, softmax} [softmax]\n  -thread             number of threads [1]\n  -pretrainedVectors  pretrained word vectors for supervised learning []\n  -saveOutput         whether output params should be saved [0]\n\nThe following arguments for quantization are optional:\n  -cutoff             number of words and ngrams to retain [0]\n  -retrain            finetune embeddings if a cutoff is applied [0]\n  -qnorm              quantizing the norm separately [0]\n  -qout               quantizing the classifier [0]\n  -dsub               size of each sub-vector [2]\n```\n\n## References\n\nPlease cite [1](#enriching-word-vectors-with-subword-information) if using this code for learning word representations or [2](#bag-of-tricks-for-efficient-text-classification) if using for text classification.\n\n### Enriching Word Vectors with Subword Information\n\n[1] P. Bojanowski\\*, E. Grave\\*, A. Joulin, T. Mikolov, [*Enriching Word Vectors with Subword Information*](https://arxiv.org/abs/1607.04606)\n\n```\n@article{bojanowski2017enriching,\n  title={Enriching Word Vectors with Subword Information},\n  author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},\n  journal={Transactions of the Association for Computational Linguistics},\n  volume={5},\n  year={2017},\n  issn={2307-387X},\n  pages={135--146}\n}\n```\n\n### Bag of Tricks for Efficient Text Classification\n\n[2] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, [*Bag of Tricks for Efficient Text Classification*](https://arxiv.org/abs/1607.01759)\n\n```\n@InProceedings{joulin2017bag,\n  title={Bag of Tricks for Efficient Text Classification},\n  author={Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas},\n  booktitle={Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers},\n  month={April},\n  year={2017},\n  publisher={Association for Computational Linguistics},\n  pages={427--431},\n}\n```\n\n### FastText.zip: Compressing text classification models\n\n[3] A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, T. Mikolov, [*FastText.zip: Compressing text classification models*](https://arxiv.org/abs/1612.03651)\n\n```\n@article{joulin2016fasttext,\n  title={FastText.zip: Compressing text classification models},\n  author={Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Douze, Matthijs and J{\\'e}gou, H{\\'e}rve and Mikolov, Tomas},\n  journal={arXiv preprint arXiv:1612.03651},\n  year={2016}\n}\n```\n\n(\\* These authors contributed equally.)\n\n\n## License\n\nfastText is MIT-licensed.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdiixo%2Ffasttextcc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdiixo%2Ffasttextcc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdiixo%2Ffasttextcc/lists"}