{"id":18377244,"url":"https://github.com/bbc/dropbox-paper-to-json","last_synced_at":"2025-04-06T21:31:30.977Z","repository":{"id":40751460,"uuid":"138772315","full_name":"bbc/dropbox-paper-to-json","owner":"bbc","description":"A node module to convert a dropbox paper document to json","archived":false,"fork":false,"pushed_at":"2023-07-11T13:07:30.000Z","size":559,"stargazers_count":10,"open_issues_count":9,"forks_count":3,"subscribers_count":22,"default_branch":"master","last_synced_at":"2024-04-08T21:02:36.473Z","etag":null,"topics":["bbc-news-labs-news-mixer","dropbox","dropbox-paper","dropbox-sdk","json"],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bbc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-06-26T17:40:38.000Z","updated_at":"2021-04-25T21:37:59.000Z","dependencies_parsed_at":"2023-02-02T20:40:12.458Z","dependency_job_id":null,"html_url":"https://github.com/bbc/dropbox-paper-to-json","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bbc%2Fdropbox-paper-to-json","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bbc%2Fdropbox-paper-to-json/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bbc%2Fdropbox-paper-to-json/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bbc%2Fdropbox-paper-to-json/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bbc","download_url":"https://codeload.github.com/bbc/dropbox-paper-to-json/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223264059,"owners_count":17116041,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bbc-news-labs-news-mixer","dropbox","dropbox-paper","dropbox-sdk","json"],"created_at":"2024-11-06T00:27:26.815Z","updated_at":"2024-11-06T00:27:27.355Z","avatar_url":"https://github.com/bbc.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Dropbox Paper to Markdown\n\n\u003c!-- _One liner + link to confluence page_ --\u003e\n\nA Node module to import data from a dropbox paper document and convert it into a json data structure.\n\n\n## Setup\n\n\u003c!-- _stack - optional_ --\u003e\n\n\n\u003c!-- _How to build and run the code/app_ --\u003e\n\n### 1. get dropbox access token\n\n\n#### Create a dropbox App\n\n- `create app` [from developer console](https://www.dropbox.com/developers/apps)\n  - chose `Dropbox API`\n  - chose `Full Dropbox`\n  - give it a Name: eg `News Mixer`\n    - click on the newly created app\n      - and [Generate an access token for your own account](https://blogs.dropbox.com/developers/2014/05/generate-an-access-token-for-your-own-account/)\n\n\n### 2. get dropbox paper `document id`\n\neg if the url of your dropbox paper is something like\n\n```\nhttps://paper.dropbox.com/doc/Main-Title-vJdrjMJAHdgfHz0rl83Z\n```\n\nThen the last string element after the last `-`, reading from left to right, is your document id.\n\nIn this ficticious example it would be: `vJdrjMJAHdgfHz0rl83Z`.\n\n### 2. add `DROPBOX_ACCESS_TOKEN` to `.env`\n\nThe project uses [dotenv](https://www.npmjs.com/package/dotenv) to deal with credentials and enviroment variables.\n\nIn the root of the folder repo create a `.env` file, this is excluded from the github repo by `.gitignore` to avoid leaking credentials.\n\nHere's an examples format of `.env` file, _with some fictitious credentials_\n\n```\n# Dropbox credentials\nDROPBOX_ACCESS_TOKEN=vJdrjMJAHdgfHz0rl83ZvJdrjMJAHdgfHz0rl83Z\nDROPBOX_DOC_ID=vJdrjMJAHdgfHz0rl83Z\n```\n\n\n## Usage\n\n### In development\n\nclone this repo\n```\ngit clone git@github.com:bbc/dropbox-paper-to-json.git\n```\n\ncd into folder\n\n```\ncd dropbox-paper-to-json\n```\n\n`npm install`\n\n`npm start`\n\nThis will save a `data.json` file in the root of the project.\n\n### In production\n\nnpm install\n\n```\nnpm install dropbox-paper-to-json@git+ssh://git@github.com/bbc/dropbox-paper-to-json.git#master -save\n```\n\nAdd to your code base\n\n```js\n//if using dotenv for environment variable credentials for dropbox paper\nrequire('dotenv').config();\n// optional if you want to write the resulting json\nconst fs = require('fs');\n// require module\nconst dbpMdToJson = require('dropbox-paper-to-json');\n\ndbpMdToJson({\n    accessToken: process.env.DROPBOX_ACCESS_TOKEN,\n    dbp_doc_id: process.env.DROPBOX_DOC_ID,\n    // default for nested === true\n    nested: true\n}).then((data) =\u003e {\n    console.log(`done Dropbox Paper to JSON conversion`);\n    // optional: now do something with the data\n    fs.writeFileSync('./data.json', JSON.stringify(data, null, 2));\n});\n\n```\n\n\n## System Architecture\n\n_High level overview of system architecture_\n\n### Downloading a Dropbox paper\nThe module uses [`dpb-download-md`](./dpb-download-md/index.js) node module to get a dropbox paper as markdown given a dropbox paper id and access token.\n\nAs the official SDK didn't seem to have a straightforward way to get to a dropbox paper document content.\n\n### Converting markdown dropbox paper to \"linear\" json\n\nThe submodule [`md-to-json/linear.js`](./md-to-json/linear.js) takes the content of a markdown file as a string and converts it into an array of objects, representing markdown elements.\n\nit's a flat data structure, with no nesting, hence why sometimes refered to as linear.\n\n#### Example \"linear json\"\n\n```json\n[\n    {\n      \"text\": \"Chapter 1\",\n      \"type\": \"h1\"\n    },\n    {\n      \"text\": \"Text\",\n      \"type\": \"h2\"\n    },\n    {\n      \"text\": \"vitae elementum velit urna id mi. Sed sodales arcu mi, eu condimentum tellus ornare non. Aliquam non mauris purus. Cras a dignissim tellus. Cras pharetra, felis et convallis tristique, sapien augue interdum ipsum, aliquet rhoncus enim diam vitae eros. Cras ullamcorper, lectus id commodo volutpat, odio urna venenatis tellus, vitae vehicula sapien velit eu purus. Pellentesque a feugiat ex. Proin volutpat congue libero vitae malesuada.\",\n      \"type\": \"p\"\n    },\n    {\n      \"text\": \"Video \",\n      \"type\": \"h2\"\n    },\n...\n]\n```\n\n### Converting linear markdown json to nested json\n\nFor some use cases it might be heplfull to nest all the elments between an h1 tag to the next h1 take as siblings/childres/elements of that tag.\n\nEg h1 tag could contain h2, p tag, link etc..\n\nLikewise h2 tag could contain all other elements up to the next h2 or h1 tag.\n\n_NOTE_ dropbox paper flavour of markdown only properly reppresents `H1` and `H2` tags hence why we stopped the nesting only at two levels for this use case. But it could be nested further should there be a use case for it.\n\nThis is done in [`md-to-json/index.js`](./md-to-json/index.j)\n\n#### Example \"nested json\"\n\n```json\n{\n  \"title\": \"TEST CMS\",\n  \"elements\": [\n    {\n      \"text\": \"Chapter 1\",\n      \"type\": \"h1\",\n      \"elements\": [\n        {\n          \"text\": \"some text element between h1 and h2 tags\",\n          \"type\": \"p\"\n        },\n        {\n          \"text\": \"text\",\n          \"type\": \"h2\",\n          \"elements\": [\n            {\n              \"text\": \"vitae elementum velit urna id mi. Sed sodales arcu mi, eu condimentum tell.\",\n              \"type\": \"p\"\n            }\n          ]\n        },\n       ...\n}\n```\n\nFor full example see [`md-to-json/examples/example_output.json`](./md-to-json/examples/example_output.json).\n\n## Development env\n\n _How to run the development environment_\n\n_Coding style convention ref optional, eg which linter to use_\n\n_Linting, github pre-push hook - optional_\n\n- node\n- npm\n- eslint see [`.eslintrc.json`](./.eslintrc.json)\n\n\n## Build\n\n_How to run build_\n\nNA `?`\n\n\u003c!-- might need to add Babel? to make it more widely compatible? --\u003e\n\n## Tests\n\n_How to carry out tests_\n\nMinimal test coverage using [`jest`](https://facebook.github.io/jest/) for testing, to run tests:\n\n```\nnpm test\n```\n\n## Deployment\n\n_How to deploy the code/app into test/staging/production_\n\nNA, it's a node module.\n\n\n## Contributing\n\n- Pull requests are welcome.\n- For questions, bugs, ideas feel free to raise a github issue.\n\n---\n\n## Notes Dropbox \"flavoured\" markdown\n\nUnforntunatelly, Dropbox paper has it's own flawour of markdown. Some of the most relevant and notable difference are:\n\n- Title of the doc and first `heading 1` element, are both marked has `h1` / `#`.\n- `Heading 3` is represented as bold `**` instead of `h3`/`###`.\n- There's no Heading 4, 5 or 6.\n\n### Example of dropbox flavour markdown\n\nsee [`md-to-json/examples/test.md`](./md-to-json/examples/test.md) as an example of dropbox flavour markdown file.\n\n### Markdown elements not included in module\n\n- [ ] `H3` tag,since dropbox paper markdown represents it as bold `**`\n- [ ] Parsing markdown github flavour tags `h3` to `h6` as not generated by dropbox paper markdown.\n\n\n### Markdown elements that could be included in module\n- [X] Parsing markdown github flavour tags for images eg `![alt text](link url)`. These appear on their own line.\n   - _NOTE_ luckily even when displayed on the same line in dropbox paper, the images are still represented on individual lines when exported as markdown. Which makes it easier to identify as separate from other elements and parse.\n\n- [ ]  Parsing markdown github flavour tags for links eg `[text](link url)` these generally appear as part of a paragraph, but could also appear in their own line, or as part of a heading etc..\n\n\u003c!--\nSome research link\n\n- [Dropbox for JavaScript Developers](https://www.dropbox.com/developers/documentation/javascript#overview)\n- [Dropbox JavaScript SDK](http://dropbox.github.io/dropbox-sdk-js/index.html)\n- [Dropbox access token](https://blogs.dropbox.com/developers/2014/05/generate-an-access-token-for-your-own-account/)\n\n- [File request](https://www.dropbox.com/help/files-folders/received-file-request)\n\n- [dropbox-paper](https://www.npmjs.com/package/dropbox-paper#download-doc)\n\n- [Markdown to JSON converter - python](https://github.com/njvack/markdown-to-json)\n- [`json2md`](https://github.com/IonicaBizau/json2md)\n\n-[`md-2-json`](https://www.npmjs.com/package/md-2-json)\n\n- [`marked`](https://www.npmjs.com/package/marked)\n\n --\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbbc%2Fdropbox-paper-to-json","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbbc%2Fdropbox-paper-to-json","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbbc%2Fdropbox-paper-to-json/lists"}