{"id":28724111,"url":"https://github.com/processone/dpk","last_synced_at":"2025-06-15T10:09:17.336Z","repository":{"id":57589498,"uuid":"163275365","full_name":"processone/dpk","owner":"processone","description":"Analyse \u0026 convert data from online services for backup, indexing or migration purpose","archived":false,"fork":false,"pushed_at":"2023-06-07T12:52:01.000Z","size":796,"stargazers_count":9,"open_issues_count":7,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-06-15T09:14:28.739Z","etag":null,"topics":["data-portability","semantic-web","social-network"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/processone.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-12-27T09:32:16.000Z","updated_at":"2022-09-27T11:26:42.000Z","dependencies_parsed_at":"2022-08-29T23:21:12.310Z","dependency_job_id":"214038f0-63ad-4364-b800-75a3de2b5255","html_url":"https://github.com/processone/dpk","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/processone/dpk","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/processone%2Fdpk","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/processone%2Fdpk/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/processone%2Fdpk/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/processone%2Fdpk/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/processone","download_url":"https://codeload.github.com/processone/dpk/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/processone%2Fdpk/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259957281,"owners_count":22937549,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-portability","semantic-web","social-network"],"created_at":"2025-06-15T10:09:15.227Z","updated_at":"2025-06-15T10:09:17.287Z","avatar_url":"https://github.com/processone.png","language":"Go","readme":"# Data Portability Kit\n\n[![Codeship](https://app.codeship.com/projects/0dbc4220-fb96-0136-3604-5aa5b52ee74f/status?branch=master)](https://app.codeship.com/projects/322207) [![GoDoc](https://godoc.org/github.com/processone/dpk?status.svg)](https://godoc.org/github.com/processone/dpk)\n\nDPK is a Data Portability Kit. This is the Swift Army Knife that let you take back control on your online data.\n\nThanks to GDPR, online providers now have to offer takeout features for your data. This is a great opportunity\nto get back your data. It is thus now a good time to manage and possibly publish them using open tools and platform.\n\nThat said, each providers will give you access to your data in a different format.\n\nThe goal of DPK is to provide a unified tool to extract your data in a unified, ready-to-use format.\n\nThe project will create a directory structure that is directly usable with your post in Markdown format and Metadata\nin a consistent and unique JSON format.\n\nWith DPK, you get the change to get your data in a format that can really be reused, without any hidden dependencies.\n\n## Use cases: From backup to migration\n\nThe main goal for this tool is to help you take back control of your data:\n\n1. You can use it to backup and index data you have uploaded to various cloud services providers.\n2. You can index your local data as you wish to make them more easily searchable.\n3. Or, you can use it to migrate to a new service that you will either fully managed or host with another provider.\n\nYou can take back control of your data incrementally. That said, we encourage you to aim for full control and how your data on your own domain. For more details, you should read and follow the principles of the Indieweb: [Publish (on your) Own Site, Syndicate Elsewhere](https://indieweb.org/POSSE).\n\n## General principles\n\nThe goal of the tool is to produce a data set that is self-contained and directly usable. As such, we do not want to rely on third-party\nservices that can disappear at anytime. That's why as much as we can, we try to resolve short URLs to their final\ntarget.\n\nWe also do not want to promote trackers. When using Twitter oembed for example, we sanitize the provided HTML and\nthus we do not includes the `widget.js` Javascript tags. If we embed third-party  content as a convenience (for playing\nvideos inline for example), it will be asynchonously, on user demand. This, thus will be compliant with browser Do Not\nTrack policy. We will not even check if Do Not Track is enabled and assume it is enabled (as it should be).\n\n## Data conversion\n\n### Shortlinks\n\nURL Shorteners were popular when it was needed to share long links on Twitter, due to Tweet size limitations. Now, they\nare mostly used for click on shared links. Short URL also hide the real link and if the short URL service disappear or\ndecide to redirect to another target, the original content will be lost.\n\nThat's why the toolkit provide methods to resolve short URLs and replace the short URL link with it's longer form. It\nhelps preserving the web link feature by removing middlemen.\n\n### Twitter\n\nYou can ask Twitter to download your archive here: [Your Twitter Data](https://twitter.com/settings/your_twitter_data).  \nYou will receive a link to download your archive when ready.\n\nWhen you got it, unzip the file and convert your tweets to a Markdown directory structure with the command:\n\n```bash\ngo run cmd/twitter-to-md/twitter-to-md.go ~/Downloads/twitter-2018-12-27-abcd121212 posts\n``` \n\nIt will create a directory with your data in a format you can reuse with your blogging tool platform.\n\nIn the process, it will also embed a local representation of quoted tweets and replace shortened links with their\noriginal value.\n\n## Tooling\n\n### `mget`\n\n`mget` is a command-line tool to download web page metadata and format it as JSON.\n\nIt is able to extract metadata using various standards and specifications such as:\n\n- HTML 5 + RDFa (Linked Data)\n- Dublin Core\n- Open Graph\n- Twitter cards\n\nThis is an handy tool to explore the semantic web:\n\nYou can install it with go command-line tool:\n\n```bash\n$ go get -u github.com/processone/dpk/cmd/mget\n```\n\nAssuming you have `~/go/bin` in your path, you can then run it with:\n\n```\n$ mget https://www.process-one.net\n{\n\t\"properties\": {\n\t\t\"description\": \"ProcessOne delivers rich Messaging, IoT and Push services that will help your business grow.\",\n\t\t\"og:description\": \"ProcessOne delivers rich Messaging, IoT and Push services that will help your business grow.\",\n\t\t\"og:image\": \"https://static.process-one.net/bootstrap/img/art/p1.jpg\",\n\t\t\"og:title\": \"Build Awesome Realtime Software with ProcessOne\",\n\t\t\"og:type\": \"product\",\n\t\t\"og:url\": \"https://www.process-one.net/en/\",\n\t\t\"title\": \"Build Awesome Realtime Software with ProcessOne\",\n\t\t\"twitter:card\": \"summary_large_image\",\n\t\t\"twitter:description\": \"ProcessOne delivers rich Messaging, IoT and Push services that will help your business grow.\",\n\t\t\"twitter:image\": \"https://static.process-one.net/bootstrap/img/art/p1.jpg\",\n\t\t\"twitter:site\": \"@processone\",\n\t\t\"twitter:title\": \"Build Awesome Realtime Software with ProcessOne\"\n\t}\n}\n```  \n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprocessone%2Fdpk","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprocessone%2Fdpk","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprocessone%2Fdpk/lists"}