{"id":22502587,"url":"https://github.com/rberenguel/identity-graphs","last_synced_at":"2025-07-19T08:35:17.241Z","repository":{"id":49255977,"uuid":"358297166","full_name":"rberenguel/identity-graphs","owner":"rberenguel","description":"Presentation about Graphframes and how we handle graphs with more than 2 billion nodes at Hybrid Theory","archived":false,"fork":false,"pushed_at":"2021-06-21T18:17:43.000Z","size":30867,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-06T17:08:25.248Z","etag":null,"topics":["graphframes","spark"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rberenguel.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-04-15T14:55:45.000Z","updated_at":"2022-10-14T11:51:54.000Z","dependencies_parsed_at":"2022-09-05T07:00:32.223Z","dependency_job_id":null,"html_url":"https://github.com/rberenguel/identity-graphs","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/rberenguel/identity-graphs","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rberenguel%2Fidentity-graphs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rberenguel%2Fidentity-graphs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rberenguel%2Fidentity-graphs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rberenguel%2Fidentity-graphs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rberenguel","download_url":"https://codeload.github.com/rberenguel/identity-graphs/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rberenguel%2Fidentity-graphs/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265905116,"owners_count":23846696,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["graphframes","spark"],"created_at":"2024-12-06T23:19:46.747Z","updated_at":"2025-07-19T08:35:17.188Z","avatar_url":"https://github.com/rberenguel.png","language":null,"funding_links":["https://www.buymeacoffee.com/rberenguel"],"categories":[],"sub_categories":[],"readme":"# Keeping identity graphs in sync with Apache Spark\n\nPresentation I ([@berenguel](https://twitter.com/berenguel)) gave at the [Data Love Conference](https://datalove.konfy.care) on April 2021\nand May at the [Data+AI Summit](https://databricks.com/session_na21/keeping-identity-graphs-in-sync-with-apache-spark) to explain how we manage a 2 billion node graph at [Hybrid Theory](https://www.hybridtheory.com). You can find the slides\n[here](https://github.com/rberenguel/identity-graphs/releases/download/0.2.0/identity-graphs.pdf)\n(some images might look slightly blurry). I recommend you check the version with\n[presenter\nnotes](https://github.com/rberenguel/identity-graphs/releases/download/0.2.0/identity-graphs-with-notes.pdf)\nwhich is only available here. You can also head over the _releases_ tab in case I have a more recent version and forgot to update this README.\n\nIf you want additional information about Spark in general, I gave an\n`introduction to Spark` talk with [Carlos Peña](http://twitter.com/crafty_coder)\nthat you can find [here](https://github.com/rberenguel/WelcomeToApacheSpark).\n\n---\n\nThe video from Data Love is available [here](https://www.youtube.com/watch?v=xL8uFgXLEQY\u0026list=PLBqWQH1MiwBS8f0PhhDeQuBVCjxC_i0X5\u0026index=23). Don't miss the whole [playlist](https://www.youtube.com/playlist?list=PLBqWQH1MiwBS8f0PhhDeQuBVCjxC_i0X5) of videos of the conference.\n\nYou can watch the recording from Data+AI Summit by registering to it and selecting \"On Demand\" [here](https://databricks.com/session_na21/keeping-identity-graphs-in-sync-with-apache-spark).\n\n---\n\nThis presentation is formatted in Markdown and prepared to be used with\n[Deckset](https://www.decksetapp.com/). The drawings were done on an iPad Mini 5\nusing [Procreate](https://procreate.art).\n\n---\n\u003cdiv align=\"center\"\u003e\n\n### Live at [Data+AI Summit](https://databricks.com/session_na21/keeping-identity-graphs-in-sync-with-apache-spark) (2021, May)\n\n\n  \u003cimg src=\"images/data-ai-summit.jpg\" width=\"600\"\u003e\u003c/img\u003e\n\n### Live at [Data Love ❤️](https://datalove.konfy.care) (2021, April)\n\n\n  \u003cimg src=\"images/datalove.png\" width=\"600\"\u003e\u003c/img\u003e\n\n---\n\n\u003ca href=\"https://www.buymeacoffee.com/rberenguel\" target=\"_blank\"\u003e\u003cimg src=\"https://cdn.buymeacoffee.com/buttons/default-orange.png\" alt=\"Buy Me A Coffee\" height=\"51\" width=\"217\"\u003e\u003c/a\u003e\n\n---\n\n\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frberenguel%2Fidentity-graphs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frberenguel%2Fidentity-graphs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frberenguel%2Fidentity-graphs/lists"}