{"id":19856413,"url":"https://github.com/ipeagit/geocode_benchmark","last_synced_at":"2025-07-14T06:04:35.530Z","repository":{"id":232692146,"uuid":"784896224","full_name":"ipeaGIT/geocode_benchmark","owner":"ipeaGIT","description":null,"archived":false,"fork":false,"pushed_at":"2024-05-10T15:51:08.000Z","size":160,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-02-28T23:46:38.359Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ipeaGIT.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-10T19:20:43.000Z","updated_at":"2024-05-10T15:51:12.000Z","dependencies_parsed_at":"2024-05-10T16:52:18.647Z","dependency_job_id":"3fd29474-683b-4703-8f36-1f7b1244d876","html_url":"https://github.com/ipeaGIT/geocode_benchmark","commit_stats":null,"previous_names":["ipeagit/geocode_benchmark"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ipeaGIT/geocode_benchmark","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ipeaGIT%2Fgeocode_benchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ipeaGIT%2Fgeocode_benchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ipeaGIT%2Fgeocode_benchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ipeaGIT%2Fgeocode_benchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ipeaGIT","download_url":"https://codeload.github.com/ipeaGIT/geocode_benchmark/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ipeaGIT%2Fgeocode_benchmark/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265249497,"owners_count":23734466,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-12T14:15:27.931Z","updated_at":"2025-07-14T06:04:34.757Z","avatar_url":"https://github.com/ipeaGIT.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n  \n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n# Geocode benchmark\n\nTeste feito para analisar o efeito do número de threads sobre o tempo de processamento da geolocalização usando o `{geocodepro}`. Foram geolocalizados 100.000 endereços aleatórios retirados do CNES. Cinco testes foram feitos com cada opção de número de threads (i.e. cinco com 10 threads, cinco com 15, etc), e o tempo reportado nas figuras a seguir é a média dessas cinco observações. Quatro cenários foram analisados:\n\n- Cenário 1: Hardware Dell 940 comprado em 2017. Repositório estava salvo na rede. Havia a opção de rodar com até 60 cores, mas causava hyperthreading. Número de threads limitado em 28.\n- Cenário 2: Hardware Dell 940 comprado em 2017. Repositório salvo em pasta local. Configuração do servidor foi mudada, havia a opção de rodar com até 40 cores. Número de threads limitado em 30.\n- Cenário 3: Hardware Dell 940 comprado em 2017. Repositório estava salvo na rede. Configuração do servidor foi novamente mudada, havia a opção de rodar com até 30 cores. Número de threads limitado em 28.\n- Cenário 4: Hardware HCI Lenovo VX630V3, modelo 2023. Repositório salvo em pasta local. Número de threads limitado em 28.\n\n```{r, echo = FALSE, warning = FALSE}\nlibrary(ggplot2)\nlibrary(targets)\n\nold_server_timings \u003c- readRDS(tar_read(old_server_timings_path))\nold_server_timings[, server := \"old\"]\n\nnew_server_timings \u003c- readRDS(tar_read(new_server_timings_path))\nnew_server_timings[, server := \"new\"]\n\nthird_server_timings \u003c- readRDS(tar_read(third_server_timings_path))\nthird_server_timings[, server := \"third\"]\n\nfourth_server_timings \u003c- readRDS(tar_read(fourth_server_timings_path))\nfourth_server_timings[, server := \"fourth\"]\n\ntimings \u003c- rbind(\n  old_server_timings,\n  new_server_timings,\n  third_server_timings,\n  fourth_server_timings\n)\n\ntimings \u003c- timings[n_threads \u003c= 30]\ntimings[, server := factor(server, levels = c(\"old\", \"new\", \"third\", \"fourth\"))]\ntimings \u003c- timings[\n  ,\n  .(avg_time = mean(time)),\n  by = .(n_threads, n_rows, server)\n]\ntimings[, expected_speedup := n_threads / n_threads[1], by = .(n_rows, server)]\ntimings[, actual_speedup := avg_time[1] / avg_time, by = .(n_rows, server)]\n\nggplot(timings) +\n  geom_line(aes(x = n_threads, y = avg_time / 60, color = server, group = server)) +\n  geom_point(aes(x = n_threads, y = avg_time / 60, color = server)) +\n  scale_y_continuous(\n    \"Tempo de processamento (minutos)\",\n    labels = scales::label_number(),\n    limits = c(0, 5)\n  ) +\n  scale_x_continuous(\n    \"Número de threads\",\n    labels = scales::label_number(),\n    limits = c(10, 30)\n  ) +\n  scale_color_discrete(\"Cenário\", labels = paste0(\"Cenário \", 1:4)) +\n  ggtitle(\"Tempo de processamento por número de threads\")\n```\n\nNo gráfico abaixo, o speed-up esperado foi calculado como a razão entre o número de threads dividido por 10 (que é o menor número de threads que foi analisado no teste). O speed-up realizado foi calculado como o tempo de processamento usando 10 threads dividido pelo tempo de processamento usando os demais números de threads analisados.\n\n```{r, echo = FALSE}\nmelted_timings \u003c- data.table::melt(\n  timings,\n  id.vars = c(\"n_threads\", \"n_rows\", \"server\"),\n  measure.vars = c(\"expected_speedup\", \"actual_speedup\"),\n  variable.name = \"type\",\n  value.name = \"speedup\"\n)\n\nggplot(melted_timings) +\n  geom_line(\n    aes(\n      x = n_threads,\n      y = speedup,\n      linetype = type,\n      group = type\n    )\n  ) +\n  geom_point(aes(x = n_threads, y = speedup)) +\n  facet_wrap(\n    ~ server,\n    nrow = 1\n    ,\n    labeller = as_labeller(\n      c(\n        \"old\" = \"Cenário 1\",\n        \"new\" = \"Cenário 2\",\n        \"third\" = \"Cenário 3\",\n        \"fourth\" = \"Cenário 4\"\n      )\n    )\n  ) +\n  scale_y_continuous(\n    \"Speed-up (referência: tempo de processamento com 10 threads)\",\n    labels = scales::label_number(suffix = \"x\"),\n    limits = c(1, 3)\n  ) +\n  scale_x_continuous(\n    \"Número de threads\",\n    labels = scales::label_number(),\n    limits = c(10, 40)\n  ) +\n  scale_color_discrete(\"Servidor\", labels = c(\"Novo\", \"Antigo\")) +\n  scale_linetype_discrete(\"Speed-up\", labels = c(\"Esperado\", \"Realizado\")) +\n  ggtitle(\"Speed-up: esperado vs realizado\")\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fipeagit%2Fgeocode_benchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fipeagit%2Fgeocode_benchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fipeagit%2Fgeocode_benchmark/lists"}