{"id":21371368,"url":"https://github.com/gcondeh/tokens","last_synced_at":"2026-05-21T07:44:17.215Z","repository":{"id":262849614,"uuid":"888533966","full_name":"gcondeh/Tokens","owner":"gcondeh","description":"Pequeñas utilidades para contar tokens y cortar cadenas de texto","archived":false,"fork":false,"pushed_at":"2024-11-15T08:35:59.000Z","size":22,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-16T08:41:46.575Z","etag":null,"topics":["langchain","python","spacy-nlp","spanish","tiktoken"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc0-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gcondeh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-14T15:05:32.000Z","updated_at":"2024-11-15T09:01:51.000Z","dependencies_parsed_at":"2024-11-14T16:42:55.757Z","dependency_job_id":"877728df-90ac-4953-8901-68f93afe38be","html_url":"https://github.com/gcondeh/Tokens","commit_stats":null,"previous_names":["gcondeh/tokens"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gcondeh%2FTokens","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gcondeh%2FTokens/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gcondeh%2FTokens/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gcondeh%2FTokens/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gcondeh","download_url":"https://codeload.github.com/gcondeh/Tokens/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243846982,"owners_count":20357297,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["langchain","python","spacy-nlp","spanish","tiktoken"],"created_at":"2024-11-22T08:12:45.746Z","updated_at":"2026-05-21T07:44:17.171Z","avatar_url":"https://github.com/gcondeh.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Tokens\nPequeñas utilidades para contar tokens y cortar cadenas de texto\n\n## contar_tokens_tiktoken\nSe cargan de Datos desde un csv en df_texto, y se eliminan las filas que no tienen datos en la columna \"subtitulo\".\nSe definen las funciones:\n* Función mayor_long: Encuentra el Índice de la fila con el subtitulo de mayor longitud en la columna \"subtitulo\".\n* Función num_tokens: Usa tiktoken para obtener el número de tokens en cada texto de textos.\nEjemplos de uso:\n* Obtener la fila con subtitulo más Largo y su longitud.\n* Se define un límite n (en este caso, 20), y se cuenta cuantos subtítulos tienen más de 20 tokens.\n\n## contar_tokens_spacy\nBásicamente hace lo mismo que \"contar_tokens_tiktoken\", pero con spacy\n\n## separar_textos_en_parrafos\nSe cargan de Datos desde un csv en df_texto, y divide el campo \"subtitulo\" en fragmentos de una longitud dada aumentando el número de filas del dataframe. Después se guarda el resultado en un fichero.\nImplementa dos métodos para separar los fragmentos. Por tokens, usando Tiktoken y por longitud del texto.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgcondeh%2Ftokens","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgcondeh%2Ftokens","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgcondeh%2Ftokens/lists"}