https://github.com/behdad/gdocs-me-up
High-fidelity Google Docs exporter to plain HTML+CSS
https://github.com/behdad/gdocs-me-up
Last synced: about 2 months ago
JSON representation
High-fidelity Google Docs exporter to plain HTML+CSS
- Host: GitHub
- URL: https://github.com/behdad/gdocs-me-up
- Owner: behdad
- Created: 2025-02-09T21:21:09.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-02-12T13:46:57.000Z (3 months ago)
- Last Synced: 2025-02-12T14:44:44.603Z (3 months ago)
- Language: JavaScript
- Size: 62.5 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Note: The code in this repository, as well as this README, was generated iteratively with ChatGPT o1 and o1pro for my own needs. I have started making manual changes now, and PRs are welcome. Sharing because caring. -behdad
# GDocs-Me-Up: A Google Docs High-Fidelity Exporter
A Node.js script that exports Google Docs to HTML+CSS with high fidelity, preserving essential formatting like headings, line spacing, alignment, bullet lists, images, and more. Perfect for creating an **offline** or **self-hosted** version of your docs that closely matches the original.
---
## Overview
**Why this script?** Because simpler exports often lose alignment, line spacing, or right-to-left details. This script pulls detailed styling info (like heading levels, inline font sizes, doc-based image sizes, and bullet indentation) directly from the Google Docs API. It then **merges** that styling into HTML and **inserts** a corresponding CSS that replicates Google Docs formatting while also **neutralizing** default browser quirks.
### What It Does
1. **Named Styles**: Detects **Title**, **Subtitle**, **HEADING_1..6**, and applies them to HTML headings (H1..H6) or custom classes.
2. **Line Spacing & Margins**: Honors `paragraphStyle.lineSpacing`, `spaceAbove`, `spaceBelow`, indentation, alignment.
3. **Right-to-Left**: If the doc says a paragraph is RTL, we add `dir="rtl"` and flip alignment (START → right).
4. **Tables**: GDocs tables become `` with `` and ``, keeping paragraph formatting in each cell.
5. **Images**: Exports each embedded image at the doc’s reported width/height (in pt → px), respecting scaling. Saves images in an `images/` folder.
6. **TOC**: If your doc has a table of contents, we export it in a ``, indenting each line by its heading level.
7. **Bullet/Numbered Lists**: Detects GDocs bullet styles, outputting `` / `
`. If the doc is RTL, we do `
` so bullets align on the right.
8. **Google Fonts**: Gathers unique fonts used in the doc. Inserts a `` to [fonts.googleapis.com](https://fonts.googleapis.com/) so text families match.
9. **Neutralized Headings**: Browsers normally inflate ``. We override heading tags (`h1..h6 { font-size: 1em }`) so Google Docs’ inline style alone sets the final size.
---
## Installation
1. **Prerequisites**:
- **Node.js** (v14 or later).
- `npm install googleapis`.
- A **Google Cloud** service account JSON file with read permissions on the doc.2. **Get the Script**:
- Download or clone this repository.
- Ensure `gdocs-me-up.js` and your `service_account.json` are in the same folder (or update the path in the script).3. **Authenticate**:
- Go to [Google Cloud Console](https://console.cloud.google.com/), enable **Docs API** + **Drive API**.
- Create a service account with suitable permissions.
- Download the JSON key file as `service_account.json`.
- Make sure your doc is accessible by that service account (share it if needed).---
## Usage
```bash
node gdocs-me-up.js
```- **``**: The unique ID from your doc’s URL. For example:
```
https://docs.google.com/document/d/1AbCdE-FgHiJKlMnOpQRs7TuVMue/edit
^^^^^^^^^^^^^^^^^^^^^
```
- **``**: The folder where the script will write `index.html` and an `images/` subfolder.**Example**:
```bash
node gdocs-me-up.js 1AbCdE-FgHiJK docs_export
```On completion:
- **`docs_export/index.html`**: Your doc in near-pixel HTML+CSS fidelity.
- **`docs_export/images/`**: Downloaded images.Open `docs_export/index.html` in your browser. You’ll see headings, bullet-lists, alignment, images, and more, closely mirroring the original doc.
---
## Key Details
1. **Line Spacing**: The script reads `paragraphStyle.lineSpacing` (e.g., 100 = 1.0, 115 = 1.15, 200 = 2.0) and sets `line-height`. It also applies `spaceAbove` + `spaceBelow` as `margin-top` + `margin-bottom`.
2. **Right-to-Left Paragraphs**: If `paragraphStyle.direction = RIGHT_TO_LEFT`, we add `dir="rtl"`. If alignment=START, it becomes `right`; alignment=END => `left`. Lists also carry `dir="rtl"` so bullets go on the right side.
3. **Images**: We read `imageProperties` to get `width.magnitude` + `height.magnitude` (in points), multiply by ~1.333 to convert to px, and store them in `
`. If the doc scaled an image, we read `transform.scaleX/scaleY`.
4. **TOC Indentation**: For each line in the doc’s table of contents, the script checks the heading level of the link target. It then adds a `
` (for example) with a margin-left rule in the CSS.5. **Merging Identical Runs**: Google Docs often splits text into multiple runs. If two consecutive runs share the same style (bold, color, font-size, etc.), we merge them to keep the final HTML lean.
6. **Heading Size**: We override heading tags in CSS to `font-size: 1em; font-weight: normal;`. The doc sets an inline `font-size: 18pt;` (for example), so you get exactly 18pt, not 18pt multiplied by the browser’s default heading scale.
7. **Fonts**: If your doc uses “Roboto” and “Lato,” we add a single `` to `https://fonts.googleapis.com/css2?family=Roboto&family=Lato&display=swap`, letting the final HTML use those fonts.
---
## Customizing
- **Force a Different Column Width**: Edit `computeDocContainerWidth()` to remove the `+ 50`, or set a fixed width.
- **Line Spacing**: If you want a global `line-height:1.2`, remove or comment out the lines in `renderParagraph` referencing `paragraphStyle.lineSpacing`.
- **Images as Base64**: Set `EMBED_IMAGES_AS_BASE64 = true;`, so images are embedded inline instead of written to `images/`.
- **Heading Tags**: If you’d rather not use `..
`, replace them with `
` in the code. Then style them in CSS as you like.
---
## Troubleshooting
1. **Invalid Grant / 401**: Check your service account JSON, or ensure the doc is shared with your service account email.
2. **Images All Full-Width**: Possibly the doc’s stored size is as wide as the page. Shrink them in GDocs or scale them down.
3. **TOC Not Indented**: Make sure your doc has headings labeled `HEADING_1..6`. If your doc uses custom styles, the script may not see them as headings.
4. **H3 Still Big**: Confirm the code’s `` CSS override is present, or remove any conflicting styles from your own stylesheet.
5. **Using a Different Auth**: If you want user-based OAuth, adapt `getAuthClient()` to your flow.---
## Example
**Doc**: “My Example Document” with:
- Heading 3 at 14pt
- Right-to-left paragraphs
- A table of contents
- Several images scaled to 50%**Command**:
```bash
node gdocs-me-up.js 1XYZabc docs_export
```**Result**:
- `docs_export/index.html`: Headings, bullet-lists, alignment, images at half-size, lines spaced as in doc, etc.
- `docs_export/images/`: The images as `png`.
- The TOC lines are indented by heading level.Open the HTML in your browser or upload to a simple web server. Should be extremely close to the Google Doc’s layout, including RTL paragraphs and scaled images.
---
## Contributing
1. **Fork** or clone this repository.
2. Modify the script (e.g., add footnote support or custom style merges).
3. **Submit a Pull Request** describing your changes, or open an issue with suggestions.We welcome improvements or bug fixes. This script is licensed under **MIT**, so feel free to adapt or include it in your projects, with attribution appreciated.
---
**Thanks** for checking out **GDocs-Me-Up**! We hope it helps you create accurate offline or self-hosted versions of your docs. If you have suggestions, issues, or ideas, please open an issue or PR. Happy exporting!