{"id":13719266,"url":"https://github.com/pierrepita/atyimo","last_synced_at":"2025-05-07T11:31:26.682Z","repository":{"id":80584540,"uuid":"181895964","full_name":"pierrepita/atyimo","owner":"pierrepita","description":null,"archived":false,"fork":false,"pushed_at":"2019-04-20T01:02:26.000Z","size":137,"stargazers_count":13,"open_issues_count":0,"forks_count":4,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-11-14T08:35:52.052Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pierrepita.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2019-04-17T13:20:12.000Z","updated_at":"2023-10-07T20:02:04.000Z","dependencies_parsed_at":null,"dependency_job_id":"e894ade0-18d9-4f92-9325-27c7009ff801","html_url":"https://github.com/pierrepita/atyimo","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pierrepita%2Fatyimo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pierrepita%2Fatyimo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pierrepita%2Fatyimo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pierrepita%2Fatyimo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pierrepita","download_url":"https://codeload.github.com/pierrepita/atyimo/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252868824,"owners_count":21816923,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T01:00:45.342Z","updated_at":"2025-05-07T11:31:26.381Z","avatar_url":"https://github.com/pierrepita.png","language":"Python","funding_links":[],"categories":[":hammer: Frameworks","Software"],"sub_categories":["Clustering /"],"readme":"# Atyimo - Record Linkage Application for Heterogeneous Platforms\n\n# Description\nAtyImo implements a mixture of deterministic and probabilistic routines for data linkage. initially developed in 2013 to serve as a linkage tool supporting a joint Brazil–U.K. project aiming at building a large population-based cohort with data from more than 100 million participants and producing disease-specific data to facilitate diverse epidemiological research studies.\n\n# Requirements: \nJava 8+, Spark 2.1.x, gcc\n\n# Setup steps:\n1. Edit the config.py file\n2. edit the line \"export PATH=$PATH:/path/to/spark/bin\" using the location of Apache Spark's installation\n3. Use the 'run_all.sh' script to execute the atyimo. \n\n\nAuthors\n==============\nRobespierre Pita and Clicia Pinto and Marcos Barreto and Spiros Denaxas\n1. FEDERAL UNIVERSITY OF BAHIA (UFBA), ATYIMOLAB (www.atyimolab.ufba.br)\n2. University College London, Denaxas Lab (www.denaxaslab.org)\n\nMore information\n=================\n* PITA, Robespierre et al. [On the accuracy and scalability of probabilistic data linkage over the Brazilian 114 million cohort](https://ieeexplore.ieee.org/document/8293793). IEEE journal of biomedical and health informatics, v. 22, n. 2, p. 346-353, 2018.\n\n* Pita R., Mendonça E., Reis S., Barreto M., Denaxas S. (2017) [A Machine Learning Trainable Model to Assess the Accuracy of Probabilistic Record Linkage.](https://link.springer.com/chapter/10.1007/978-3-319-64283-3_16) In: Bellatreche L., Chakravarthy S. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2017. Lecture Notes in Computer Science, vol 10440. Springer.\n\n* PITA, Robespierre; PINTO, Clicia; MELO, Pedro; Silva, Malu; BARRETO, Marcos; RASELLA, Davide. (2015) [A Spark-based workflow for probabilistic record linkage of healthcare data. Workshop on Algorithms and Systems for MapReduce and Beyond](http://ceur-ws.org/Vol-1330/EDBTICDT-WS2015-complete.pdf) (BeyondMR - EDBT/ICDT 2015), Brussels.\n\n\n* PINTO, Clicia; PITA, Robespierre; BARBOSA, George; ARAÚJO, Bruno; BERTOLDO, Juracy; SENA, Samila; REIS, Sandra; FIACCONE, Rosemeire; AMORIM, Leila; ICHIHARA, Maria Yuri; BARRETO, Mauricio; BARRETO, Marcos; DENAXAS, Spiros. [Probabilistic integration of large Brazilian socioeconomic and clinical databases.](http://dx.doi.org/10.1109/CBMS.2017.64) 30th IEEE International Symposium on Computer-Based Medical Systems (CBMS 2017), Thessaloniki, 2017. \n\n* PINTO, Clicia; BORATTO, Murilo; ALONSO, Pedro; BARRETO, Marcos. Scaling probabilistic record linkage on multicore and multi-GPU system. 17th International Conference on Computational and Mathematical Methods in Science and Engineering (CMMSE 2017), Cadiz, 2017\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpierrepita%2Fatyimo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpierrepita%2Fatyimo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpierrepita%2Fatyimo/lists"}