{"id":29715633,"url":"https://github.com/carlosrs14/parallel-data-preprocessig-system","last_synced_at":"2025-07-24T05:38:24.319Z","repository":{"id":296461680,"uuid":"993447256","full_name":"carlosrs14/parallel-data-preprocessig-system","owner":"carlosrs14","description":"A parallel data preprocessing system using threads and synchronization mechanisms (barrier, busy-waiting, condition variables) to clean and prepare data for AI training.","archived":false,"fork":false,"pushed_at":"2025-05-30T23:06:46.000Z","size":3822,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-05-31T07:55:29.327Z","etag":null,"topics":["barrier-method","c","condition-variable","data","operative-systems","parallel-computing","posix","preprocessing","synchronization","threads"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/carlosrs14.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-30T20:05:32.000Z","updated_at":"2025-05-30T23:06:50.000Z","dependencies_parsed_at":"2025-05-31T08:11:48.479Z","dependency_job_id":null,"html_url":"https://github.com/carlosrs14/parallel-data-preprocessig-system","commit_stats":null,"previous_names":["carlosrs14/parallel-data-preprocessig-system"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/carlosrs14/parallel-data-preprocessig-system","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/carlosrs14%2Fparallel-data-preprocessig-system","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/carlosrs14%2Fparallel-data-preprocessig-system/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/carlosrs14%2Fparallel-data-preprocessig-system/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/carlosrs14%2Fparallel-data-preprocessig-system/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/carlosrs14","download_url":"https://codeload.github.com/carlosrs14/parallel-data-preprocessig-system/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/carlosrs14%2Fparallel-data-preprocessig-system/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266797010,"owners_count":23985519,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-24T02:00:09.469Z","response_time":99,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["barrier-method","c","condition-variable","data","operative-systems","parallel-computing","posix","preprocessing","synchronization","threads"],"created_at":"2025-07-24T05:38:23.719Z","updated_at":"2025-07-24T05:38:24.311Z","avatar_url":"https://github.com/carlosrs14.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Sistema Paralelo de Preprocesamiento de Datos.\n\n### Universidad del Magdalena - Ingeniería de Sistemas.\n**Curso:** Sistemas Operativos.\n**Tema:** Sincronización y Procesamiento Paralelo con pthread.\n**Autores:** Jiménez Rossimar y Rincones Carlos.\n\n---\n\n## Objetivo.\n\nDiseñar e implementar una solución multihilo que realice, en paralelo, tareas de **preprocesamiento de datos textuales**, aplicando técnicas comunes de NLP (Procesamiento de Lenguaje Natural), utilizando métodos de sincronización como:\n\n- Barreras.\n- Variables de condición.\n- Espera activa.\n\n---\n\n## Técnicas de Preprocesamiento Usadas.\n\nCada hilo aplica las siguientes técnicas sobre sus filas de la matriz cargadas en ram con anterioridad:\n\n1. Conversión a minúsculas.\n2. Eliminación de signos de puntuación.\n3. Eliminación de números.\n4. Eliminación de stopwords.  \n\n---\n\n## Arquitectura General del Sistema.\n\nEl sistema se basa en un modelo de procesamiento por bloques:\n\n- Los datos se representan como una matriz `N x M`, donde cada fila contiene un comentario.\n- Cada hilo procesa una o varias filas de esta matriz por ronda.\n- De este problema se implementarán tres versiones de la solución para poder compararlas en ciertos aspectos, las versiones de la solución se harán cada uno con:\n\n  - **Barrera.**\n  - **Variable de condición.**\n  - **Espera activa.**\n\n- Debido a la carga computacional que genera la creación de los hilos, se mantiene la persistencia de los hilos durante todas las rondas.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcarlosrs14%2Fparallel-data-preprocessig-system","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcarlosrs14%2Fparallel-data-preprocessig-system","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcarlosrs14%2Fparallel-data-preprocessig-system/lists"}