{"id":13427544,"url":"https://github.com/tech-srl/code2vec","last_synced_at":"2025-05-16T02:09:31.720Z","repository":{"id":38975796,"uuid":"142100489","full_name":"tech-srl/code2vec","owner":"tech-srl","description":"TensorFlow code for the neural network presented in the paper: \"code2vec: Learning Distributed Representations of Code\"","archived":false,"fork":false,"pushed_at":"2023-09-20T14:34:38.000Z","size":5265,"stargazers_count":1128,"open_issues_count":29,"forks_count":287,"subscribers_count":23,"default_branch":"master","last_synced_at":"2025-04-12T05:31:55.580Z","etag":null,"topics":["code","code2vec","distributed","learning","of","representations","technion"],"latest_commit_sha":null,"homepage":"https://code2vec.org","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tech-srl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2018-07-24T03:40:20.000Z","updated_at":"2025-04-04T16:52:11.000Z","dependencies_parsed_at":"2024-09-20T20:01:21.284Z","dependency_job_id":"47f14845-f019-40a6-87f2-5d36f48edb35","html_url":"https://github.com/tech-srl/code2vec","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tech-srl%2Fcode2vec","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tech-srl%2Fcode2vec/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tech-srl%2Fcode2vec/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tech-srl%2Fcode2vec/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tech-srl","download_url":"https://codeload.github.com/tech-srl/code2vec/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254453667,"owners_count":22073618,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["code","code2vec","distributed","learning","of","representations","technion"],"created_at":"2024-07-31T01:00:31.210Z","updated_at":"2025-05-16T02:09:31.680Z","avatar_url":"https://github.com/tech-srl.png","language":"Python","funding_links":[],"categories":["Paper List","Python"],"sub_categories":["RNN/LSTM-based"],"readme":"# Code2vec\nA neural network for learning distributed representations of code.\nThis is an official implementation of the model described in:\n\n[Uri Alon](http://urialon.cswp.cs.technion.ac.il), [Meital Zilberstein](http://www.cs.technion.ac.il/~mbs/), [Omer Levy](https://levyomer.wordpress.com) and [Eran Yahav](http://www.cs.technion.ac.il/~yahave/),\n\"code2vec: Learning Distributed Representations of Code\", POPL'2019 [[PDF]](https://urialon.cswp.cs.technion.ac.il/wp-content/uploads/sites/83/2018/12/code2vec-popl19.pdf)\n\n_**October 2018** - The paper was accepted to [POPL'2019](https://popl19.sigplan.org)_!\n\n_**April 2019** - The talk video is available [here](https://www.youtube.com/watch?v=EJ8okcxL2Iw)_.\n\n_**July 2019** - Add `tf.keras` model implementation (see [here](#choosing-implementation-to-use))._\n\nAn **online demo** is available at [https://code2vec.org/](https://code2vec.org/).\n\n## See also:\n  * **code2seq** (ICLR'2019) is our newer model. It uses LSTMs to encode paths node-by-node (rather than monolithic path embeddings as in code2vec), and an LSTM to decode a target sequence (rather than predicting a single label at a time as in code2vec). See [PDF](https://openreview.net/pdf?id=H1gKYo09tX), demo at [http://www.code2seq.org](http://www.code2seq.org) and [code](https://github.com/tech-srl/code2seq/).\n  * **Structural Language Models of Code** is a new paper that learns to generate the missing code within a larger code snippet. This is similar to code completion, but is able to predict complex expressions rather than a single token at a time. See [PDF](https://arxiv.org/pdf/1910.00577.pdf), demo at [http://AnyCodeGen.org](http://AnyCodeGen.org).\n  * **Adversarial Examples for Models of Code** is a new paper that shows how to slightly mutate the input code snippet of code2vec and GNNs models (thus, introducing adversarial examples), such that the model (code2vec or GNNs) will output a prediction of our choice. See [PDF](https://arxiv.org/pdf/1910.07517.pdf) (code: soon).\n  * **Neural Reverse Engineering of Stripped Binaries** is a new paper that learns to predict procedure names in stripped binaries, thus use neural networks for reverse engineering. See [PDF](https://arxiv.org/pdf/1902.09122) (code: soon).\n\nThis is a TensorFlow implementation, designed to be easy and useful in research, \nand for experimenting with new ideas in machine learning for code tasks.\nBy default, it learns Java source code and predicts Java method names, but it can be easily extended to other languages, \nsince the TensorFlow network is agnostic to the input programming language (see [Extending to other languages](#extending-to-other-languages).\nContributions are welcome.\nThis repo actually contains two model implementations. The 1st uses pure TensorFlow and the 2nd uses TensorFlow's Keras ([more details](#choosing-implementation-to-use)). \n\n\u003ccenter style=\"padding: 40px\"\u003e\u003cimg width=\"70%\" src=\"https://github.com/tech-srl/code2vec/raw/master/images/network.png\" /\u003e\u003c/center\u003e\n\nTable of Contents\n=================\n  * [Requirements](#requirements)\n  * [Quickstart](#quickstart)\n  * [Configuration](#configuration)\n  * [Features](#features)\n  * [Extending to other languages](#extending-to-other-languages)\n  * [Additional datasets](#additional-datasets)\n  * [Citation](#citation)\n\n## Requirements\nOn Ubuntu:\n  * [Python3](https://www.linuxbabe.com/ubuntu/install-python-3-6-ubuntu-16-04-16-10-17-04) (\u003e=3.6). To check the version:\n\u003e python3 --version\n  * TensorFlow - version 2.0.0 ([install](https://www.tensorflow.org/install/install_linux)).\n  To check TensorFlow version:\n\u003e python3 -c 'import tensorflow as tf; print(tf.\\_\\_version\\_\\_)'\n  * If you are using a GPU, you will need CUDA 10.0\n  ([download](https://developer.nvidia.com/cuda-10.0-download-archive-base)) \n  as this is the version that is currently supported by TensorFlow. To check CUDA version:\n\u003e nvcc --version\n  * For GPU: cuDNN (\u003e=7.5) ([download](http://developer.nvidia.com/cudnn)) To check cuDNN version:\n\u003e cat /usr/include/cudnn.h | grep CUDNN_MAJOR -A 2\n  * For [creating a new dataset](#creating-and-preprocessing-a-new-java-dataset)\n  or [manually examining a trained model](#step-4-manual-examination-of-a-trained-model)\n  (any operation that requires parsing of a new code example) - [Java JDK](https://openjdk.java.net/install/)\n\n## Quickstart\n### Step 0: Cloning this repository\n```\ngit clone https://github.com/tech-srl/code2vec\ncd code2vec\n```\n\n### Step 1: Creating a new dataset from java sources\nIn order to have a preprocessed dataset to train a network on, you can either download our\npreprocessed dataset, or create a new dataset of your own.\n\n#### Download our preprocessed dataset of ~14M examples (compressed: 6.3GB, extracted 32GB)\n```\nwget https://s3.amazonaws.com/code2vec/data/java14m_data.tar.gz\ntar -xvzf java14m_data.tar.gz\n```\nThis will create a data/java14m/ sub-directory, containing the files that hold that training, test and validation sets,\nand a vocabulary file for various dataset properties.\n\n#### Creating and preprocessing a new Java dataset\nIn order to create and preprocess a new dataset (for example, to compare code2vec to another model on another dataset):\n  * Edit the file [preprocess.sh](preprocess.sh) using the instructions there, pointing it to the correct training, validation and test directories.\n  * Run the preprocess.sh file:\n\u003e source preprocess.sh\n\n### Step 2: Training a model\nYou can either download an already-trained model, or train a new model using a preprocessed dataset.\n\n#### Downloading a trained model (1.4 GB)\nWe already trained a model for 8 epochs on the data that was preprocessed in the previous step.\nThe number of epochs was chosen using [early stopping](https://en.wikipedia.org/wiki/Early_stopping), as the version that maximized the F1 score on the validation set. This model can be downloaded [here](https://s3.amazonaws.com/code2vec/model/java14m_model.tar.gz) or using:\n```\nwget https://s3.amazonaws.com/code2vec/model/java14m_model.tar.gz\ntar -xvzf java14m_model.tar.gz\n```\n\n##### Note:\nThis trained model is in a \"released\" state, which means that we stripped it from its training parameters and can thus be used for inference, but cannot be further trained. If you use this trained model in the next steps, use 'saved_model_iter8.release' instead of 'saved_model_iter8' in every command line example that loads the model such as: '--load models/java14_model/saved_model_iter8'. To read how to release a model, see [Releasing the model](#releasing-the-model).\n\n#### Downloading a trained model (3.5 GB) _which can be further trained_\n\nA non-stripped trained model can be obtained [here](https://s3.amazonaws.com/code2vec/model/java14m_model_trainable.tar.gz) or using:\n\n```\nwget https://s3.amazonaws.com/code2vec/model/java14m_model_trainable.tar.gz\ntar -xvzf java14m_model_trainable.tar\n```  \n\nThis model weights more than twice than the stripped version, and it is recommended only if you wish to continue training a model which is already trained. To continue training this trained model, use the `--load` flag to load the trained model; the `--data` flag to point to the new dataset to train on; and the `--save` flag to provide a new save path.\n\n#### A model that was trained on the Java-large dataset\nWe provide an additional code2vec model that was trained on the \"Java-large\" dataset (this dataset was introduced in the code2seq paper). See [Java-large](#java-large-compressed-72gb-extracted-37gb)\n\n#### Training a model from scratch\nTo train a model from scratch:\n  * Edit the file [train.sh](train.sh) to point it to the right preprocessed data. By default, \n  it points to our \"java14m\" dataset that was preprocessed in the previous step.\n  * Before training, you can edit the configuration hyper-parameters in the file [config.py](config.py),\n  as explained in [Configuration](#configuration).\n  * Run the [train.sh](train.sh) script:\n```\nsource train.sh\n```\n\n##### Notes:\n  1. By default, the network is evaluated on the validation set after every training epoch.\n  2. The newest 10 versions are kept (older are deleted automatically). This can be changed, but will be more space consuming.\n  3. By default, the network is training for 20 epochs.\nThese settings can be changed by simply editing the file [config.py](config.py).\nTraining on a Tesla v100 GPU takes about 50 minutes per epoch. \nTraining on Tesla K80 takes about 4 hours per epoch.\n\n### Step 3: Evaluating a trained model\nOnce the score on the validation set stops improving over time, you can stop the training process (by killing it)\nand pick the iteration that performed the best on the validation set.\nSuppose that iteration #8 is our chosen model, run:\n```\npython3 code2vec.py --load models/java14_model/saved_model_iter8.release --test data/java14m/java14m.test.c2v\n```\nWhile evaluating, a file named \"log.txt\" is written with each test example name and the model's prediction.\n\n### Step 4: Manual examination of a trained model\nTo manually examine a trained model, run:\n```\npython3 code2vec.py --load models/java14_model/saved_model_iter8.release --predict\n```\nAfter the model loads, follow the instructions and edit the file [Input.java](Input.java) and enter a Java \nmethod or code snippet, and examine the model's predictions and attention scores.\n\n## Configuration\nChanging hyper-parameters is possible by editing the file\n[config.py](config.py).\n\nHere are some of the parameters and their description:\n#### config.NUM_TRAIN_EPOCHS = 20\nThe max number of epochs to train the model. Stopping earlier must be done manually (kill).\n#### config.SAVE_EVERY_EPOCHS = 1\nAfter how many training iterations a model should be saved.\n#### config.TRAIN_BATCH_SIZE = 1024 \nBatch size in training.\n#### config.TEST_BATCH_SIZE = config.TRAIN_BATCH_SIZE\nBatch size in evaluating. Affects only the evaluation speed and memory consumption, does not affect the results.\n#### config.TOP_K_WORDS_CONSIDERED_DURING_PREDICTION = 10\nNumber of words with highest scores in $ y_hat $ to consider during prediction and evaluation.\n#### config.NUM_BATCHES_TO_LOG_PROGRESS = 100\nNumber of batches (during training / evaluating) to complete between two progress-logging records.\n#### config.NUM_TRAIN_BATCHES_TO_EVALUATE = 100\nNumber of training batches to complete between model evaluations on the test set.\n#### config.READER_NUM_PARALLEL_BATCHES = 4\nThe number of threads enqueuing examples to the reader queue.\n#### config.SHUFFLE_BUFFER_SIZE = 10000\nSize of buffer in reader to shuffle example within during training.\nBigger buffer allows better randomness, but requires more amount of memory and may harm training throughput.\n#### config.CSV_BUFFER_SIZE = 100 * 1024 * 1024  # 100 MB\nThe buffer size (in bytes) of the CSV dataset reader.\n\n#### config.MAX_CONTEXTS = 200\nThe number of contexts to use in each example.\n#### config.MAX_TOKEN_VOCAB_SIZE = 1301136\nThe max size of the token vocabulary.\n#### config.MAX_TARGET_VOCAB_SIZE = 261245\nThe max size of the target words vocabulary.\n#### config.MAX_PATH_VOCAB_SIZE = 911417\nThe max size of the path vocabulary.\n#### config.DEFAULT_EMBEDDINGS_SIZE = 128\nDefault embedding size to be used for token and path if not specified otherwise.\n#### config.TOKEN_EMBEDDINGS_SIZE = config.EMBEDDINGS_SIZE\nEmbedding size for tokens.\n#### config.PATH_EMBEDDINGS_SIZE = config.EMBEDDINGS_SIZE\nEmbedding size for paths.\n#### config.CODE_VECTOR_SIZE = config.PATH_EMBEDDINGS_SIZE + 2 * config.TOKEN_EMBEDDINGS_SIZE\nSize of code vectors.\n#### config.TARGET_EMBEDDINGS_SIZE = config.CODE_VECTOR_SIZE\nEmbedding size for target words.\n#### config.MAX_TO_KEEP = 10\nKeep this number of newest trained versions during training.\n#### config.DROPOUT_KEEP_RATE = 0.75\nDropout rate used during training.\n#### config.SEPARATE_OOV_AND_PAD = False\nWhether to treat `\u003cOOV\u003e` and `\u003cPAD\u003e` as two different special tokens whenever possible.\n\n## Features\nCode2vec supports the following features: \n\n### Choosing implementation to use\nThis repo comes with two model implementations:\n(i) uses pure TensorFlow (written in [tensorflow_model.py](tensorflow_model.py));\n(ii) uses TensorFlow's Keras (written in [keras_model.py](keras_model.py)).\nThe default implementation used by `code2vec.py` is the pure TensorFlow.\nTo explicitly choose the desired implementation to use, specify `--framework tensorflow` or `--framework keras`\nas an additional argument when executing the script `code2vec.py`.\nParticularly, this argument can be added to each one of the usage examples (of `code2vec.py`) detailed in this file.\nNote that in order to load a trained model (from file), one should use the same implementation used during its training.\n\n### Releasing the model\nIf you wish to keep a trained model for inference only (without the ability to continue training it) you can\nrelease the model using:\n```\npython3 code2vec.py --load models/java14_model/saved_model_iter8 --release\n```\nThis will save a copy of the trained model with the '.release' suffix.\nA \"released\" model usually takes 3x less disk space.\n\n### Exporting the trained token vectors and target vectors\nToken and target embeddings are available to download: \n\n[[Token vectors]](https://s3.amazonaws.com/code2vec/model/token_vecs.tar.gz) [[Method name vectors]](https://s3.amazonaws.com/code2vec/model/target_vecs.tar.gz)\n\nThese saved embeddings are saved without subtoken-delimiters (\"*toLower*\" is saved as \"*tolower*\").\n\nIn order to export embeddings from a trained model, use the \"--save_w2v\" and \"--save_t2v\" flags:\n\nExporting the trained *token* embeddings:\n```\npython3 code2vec.py --load models/java14_model/saved_model_iter8.release --save_w2v models/java14_model/tokens.txt\n```\nExporting the trained *target* (method name) embeddings:\n```\npython3 code2vec.py --load models/java14_model/saved_model_iter8.release --save_t2v models/java14_model/targets.txt\n```\nThis saves the tokens/targets embedding matrices in word2vec format to the specified text file, in which:\nthe first line is: \\\u003cvocab_size\\\u003e \\\u003cdimension\\\u003e\nand each of the following lines contains: \\\u003cword\\\u003e \\\u003cfloat_1\\\u003e \\\u003cfloat_2\\\u003e ... \\\u003cfloat_dimension\\\u003e\n\nThese word2vec files can be manually parsed or easily loaded and inspected using the [gensim](https://radimrehurek.com/gensim/models/word2vec.html) python package:\n```python\npython3\n\u003e\u003e\u003e from gensim.models import KeyedVectors as word2vec\n\u003e\u003e\u003e vectors_text_path = 'models/java14_model/targets.txt' # or: `models/java14_model/tokens.txt'\n\u003e\u003e\u003e model = word2vec.load_word2vec_format(vectors_text_path, binary=False)\n\u003e\u003e\u003e model.most_similar(positive=['equals', 'to|lower']) # or: 'tolower', if using the downloaded embeddings\n\u003e\u003e\u003e model.most_similar(positive=['download', 'send'], negative=['receive'])\n```\nThe above python commands will result in the closest name to both \"equals\" and \"to|lower\", which is \"equals|ignore|case\".\nNote: In embeddings that were exported manually using the \"--save_w2v\" or \"--save_t2v\" flags, the input token and target words are saved using the symbol \"|\" as a subtokens delimiter (\"*toLower*\" is saved as: \"*to|lower*\"). In the embeddings that are available to download (which are the same as in the paper), the \"|\" symbol is not used, thus \"*toLower*\" is saved as \"*tolower*\".\n\n### Exporting the code vectors for the given code examples\nThe flag `--export_code_vectors` allows to export the code vectors for the given examples. \n\nIf used with the `--test \u003cTEST_FILE\u003e` flag,\na file named `\u003cTEST_FILE\u003e.vectors` will be saved in the same directory as `\u003cTEST_FILE\u003e`. \nEach row in the saved file is the code vector of the code snipped in the corresponding row in `\u003cTEST_FILE\u003e`.\n \nIf used with the `--predict` flag, the code vector will be printed to console.\n\n\n## Extending to other languages  \n\nThis project currently supports Java and C\\# as the input languages.\n\n_**April 2020** - an extension for code2vec that addresses obfuscated Java code was developed by [@basedrhys](https://github.com/basedrhys), and is available here:\n[https://github.com/basedrhys/obfuscated-code2vec](https://github.com/basedrhys/obfuscated-code2vec)._\n\n\n_**January 2020** - an extractor for predicting TypeScript type annotations for JavaScript input using code2vec was developed by [@izosak](https://github.com/izosak) and Noa Cohen, and is available here:\n[https://github.com/tech-srl/id2vec](https://github.com/tech-srl/id2vec)._\n\n~~_**June 2019** - an extractor for **C** that is compatible with our model was developed by [CMU SEI team](https://github.com/cmu-sei/code2vec-c)._~~ - removed by CMU SEI team.\n\n_**June 2019** - an extractor for **Python, Java, C, C++** by JetBrains Research is available here: [PathMiner](https://github.com/JetBrains-Research/astminer)._\n\nIn order to extend code2vec to work with other languages, a new extractor (similar to the [JavaExtractor](JavaExtractor))\nshould be implemented, and be called by [preprocess.sh](preprocess.sh).\nBasically, an extractor should be able to output for each directory containing source files:\n  * A single text file, where each row is an example.\n  * Each example is a space-delimited list of fields, where:\n  1. The first \"word\" is the target label, internally delimited by the \"|\" character.\n  2. Each of the following words are contexts, where each context has three components separated by commas (\",\"). Each of these components cannot include spaces nor commas.\n  We refer to these three components as a token, a path, and another token, but in general other types of ternary contexts can be considered.  \n\nFor example, a possible novel Java context extraction for the following code example:\n```java\nvoid fooBar() {\n\tSystem.out.println(\"Hello World\");\n}\n```\nMight be (in a new context extraction algorithm, which is different than ours since it doesn't use paths in the AST):\n\u003e foo|Bar System,FIELD_ACCESS,out System.out,FIELD_ACCESS,println THE_METHOD,returns,void THE_METHOD,prints,\"hello_world\" \n\nConsider the first example context \"System,FIELD_ACCESS,out\". \nIn the current implementation, the 1st (\"System\") and 3rd (\"out\") components of a context are taken from the same \"tokens\" vocabulary, \nand the 2nd component (\"FIELD_ACCESS\") is taken from a separate \"paths\" vocabulary. \n\n## Additional datasets\nWe preprocessed additional three datasets used by the [code2seq](https://arxiv.org/pdf/1808.01400) paper, using the code2vec preprocessing.\nThese datasets are available in raw format (i.e., .java files) at [https://github.com/tech-srl/code2seq/blob/master/README.md#datasets](https://github.com/tech-srl/code2seq/blob/master/README.md#datasets),\nand are also available to download in a preprocessed format (i.e., ready to train a code2vec model on) here:\n\n### Java-small (compressed: 366MB, extracted 1.9GB)\n```\nwget https://s3.amazonaws.com/code2vec/data/java-small_data.tar.gz\n```\nThis dataset is based on the dataset of [Allamanis et al. (ICML'2016)](http://groups.inf.ed.ac.uk/cup/codeattention/), with the difference that training/validation/test are split by-project rather than by-file.\nThis dataset contains 9 Java projects for training, 1 for validation and 1 testing. Overall, it contains about 700K examples.\n\n### Java-med (compressed: 1.8GB, extracted 9.3GB)\n```\nwget https://s3.amazonaws.com/code2vec/data/java-med_data.tar.gz\n```\nA dataset of the 1000 top-starred Java projects from GitHub. It contains\n800 projects for training, 100 for validation and 100 for testing. Overall, it contains about 4M examples.\n\n### Java-large (compressed: 7.2GB, extracted 37GB)\n```\nwget https://s3.amazonaws.com/code2vec/data/java-large_data.tar.gz\n```\nA dataset of the 9500 top-starred Java projects from GitHub that were created\nsince January 2007. It contains 9000 projects for training, 200 for validation and 300 for\ntesting. Overall, it contains about 16M examples.\n\nAdditionally, we provide a trained code2vec model that was trained on the Java-large dataset (this model was not part of the original code2vec paper, but was later used as a baseline in the code2seq paper which introduced this dataset).\nTrainable model (3.5 GB):\n```\nwget https://code2vec.s3.amazonaws.com/model/java-large-model.tar.gz\n```\n\n\"Released model\" (1.4 GB, cannot be further trained).\n```\nwget https://code2vec.s3.amazonaws.com/model/java-large-released-model.tar.gz\n```\n\n## Citation\n\n[code2vec: Learning Distributed Representations of Code](https://urialon.cswp.cs.technion.ac.il/wp-content/uploads/sites/83/2018/12/code2vec-popl19.pdf)\n\n```\n@article{alon2019code2vec,\n author = {Alon, Uri and Zilberstein, Meital and Levy, Omer and Yahav, Eran},\n title = {Code2Vec: Learning Distributed Representations of Code},\n journal = {Proc. ACM Program. Lang.},\n issue_date = {January 2019},\n volume = {3},\n number = {POPL},\n month = jan,\n year = {2019},\n issn = {2475-1421},\n pages = {40:1--40:29},\n articleno = {40},\n numpages = {29},\n url = {http://doi.acm.org/10.1145/3290353},\n doi = {10.1145/3290353},\n acmid = {3290353},\n publisher = {ACM},\n address = {New York, NY, USA},\n keywords = {Big Code, Distributed Representations, Machine Learning},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftech-srl%2Fcode2vec","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftech-srl%2Fcode2vec","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftech-srl%2Fcode2vec/lists"}