{"id":19852561,"url":"https://github.com/rosette-api-community/visualize-embeddings","last_synced_at":"2025-02-28T21:17:42.503Z","repository":{"id":84699322,"uuid":"85236393","full_name":"rosette-api-community/visualize-embeddings","owner":"rosette-api-community","description":"A simple Python script for transforming a corpus of documents into text vectors suitable for visualization ","archived":false,"fork":false,"pushed_at":"2017-03-16T22:42:25.000Z","size":7,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-01-11T13:27:52.829Z","etag":null,"topics":["machine-learning","natural-language-processing","nlp","python","text-embedding","text-vectorization","tsv","visualization"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rosette-api-community.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-03-16T19:58:20.000Z","updated_at":"2020-09-18T03:06:53.000Z","dependencies_parsed_at":null,"dependency_job_id":"ba6a237f-c6a6-49fb-8d28-ba550a34250d","html_url":"https://github.com/rosette-api-community/visualize-embeddings","commit_stats":{"total_commits":16,"total_committers":2,"mean_commits":8.0,"dds":0.375,"last_synced_commit":"63dc8d053b5a1c8bd0778df470f44be3c7b8dd6c"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rosette-api-community%2Fvisualize-embeddings","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rosette-api-community%2Fvisualize-embeddings/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rosette-api-community%2Fvisualize-embeddings/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rosette-api-community%2Fvisualize-embeddings/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rosette-api-community","download_url":"https://codeload.github.com/rosette-api-community/visualize-embeddings/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241241425,"owners_count":19932742,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","natural-language-processing","nlp","python","text-embedding","text-vectorization","tsv","visualization"],"created_at":"2024-11-12T14:03:31.236Z","updated_at":"2025-02-28T21:17:42.495Z","avatar_url":"https://github.com/rosette-api-community.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Rosette API Text Embeddings Visualization Sample Code\nA simple Python script for transforming a corpus of documents into text vectors suitable for visualization in .tsv format. It uses the [Rosette API](https://developer.rosette.com/)'s `/text-embedding` endpoint and the [BBC News Corpus](http://mlg.ucd.ie/datasets/bbc.html). Note that the corpus is only free for research purposes.\n\n## Getting started\n1. Clone the repo and open the files in your favorite text editor/python IDE.\n2. Download the [raw text files zip](http://mlg.ucd.ie/files/datasets/bbc-fulltext.zip), `bbc-fulltext.zip` from http://mlg.ucd.ie/datasets/bbc.html and extract it into the project root folder. You should get a folder called \"bbc\". \n3. Run `visualize-embeddings.py` via your python IDE or command line (replace `ROSAPI_KEY` with your [Rosette API key](https://developer.rosette.com/admin/applications)):\n\n        $ python visualize-embeddings.py --key ROSAPI_KEY\n\nYou'll see that the script parses the raw text files of the corpus into a list of documents. Each document consist of 3 fields:\n  * category\n  * headline\n  * content\n  \nThe script then creates two files:\n  * embeddings.tsv: a TSV file where each line contains the text vector for a document's content field.\n  * metadata.tsv: a TSV file where each line contains a document's metadata (i.e. category and headline).\n\nTo visualize the embeddings, load them into Google TensorFlow's [Embedding Projector](http://projector.tensorflow.org/). Turn on color coding by category to really see the vectors in action. You can see our projection [at this link](http://projector.tensorflow.org/?config=https://gist.githubusercontent.com/hillelt/bd4fad5280eefba4d2d8875e87f0eabb/raw/0672efa576a6fd5c14ec93ed86a2b9326a35c3bf/projector_config.json).\n\n## Customize for your data\nTry replacing the BBC News corpus with your own data. And if you find anything interesting, we'd love to hear about it! Find us at community@rosette.com.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frosette-api-community%2Fvisualize-embeddings","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frosette-api-community%2Fvisualize-embeddings","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frosette-api-community%2Fvisualize-embeddings/lists"}