{"id":20612770,"url":"https://github.com/phlummox/pptx-to-md","last_synced_at":"2025-06-24T00:31:31.241Z","repository":{"id":78656251,"uuid":"239506126","full_name":"phlummox/pptx-to-md","owner":"phlummox","description":"Convert PowerPoint or LibreOffice Impress files to Beamer-friendly, Pandoc-style markdown","archived":false,"fork":false,"pushed_at":"2020-02-10T13:34:07.000Z","size":14,"stargazers_count":11,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-15T07:09:20.207Z","etag":null,"topics":["beamer","document-conversion","latex","libreoffice","markdown","openoffice","pdf","powerpoint","presentation-slides"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"unlicense","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/phlummox.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-02-10T12:26:38.000Z","updated_at":"2025-04-07T02:56:43.000Z","dependencies_parsed_at":"2023-04-15T04:14:14.207Z","dependency_job_id":null,"html_url":"https://github.com/phlummox/pptx-to-md","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/phlummox/pptx-to-md","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phlummox%2Fpptx-to-md","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phlummox%2Fpptx-to-md/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phlummox%2Fpptx-to-md/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phlummox%2Fpptx-to-md/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/phlummox","download_url":"https://codeload.github.com/phlummox/pptx-to-md/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phlummox%2Fpptx-to-md/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261582629,"owners_count":23180632,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beamer","document-conversion","latex","libreoffice","markdown","openoffice","pdf","powerpoint","presentation-slides"],"created_at":"2024-11-16T11:07:51.378Z","updated_at":"2025-06-24T00:31:31.234Z","avatar_url":"https://github.com/phlummox.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# pptx-to-md\n\nConverts presentation files (PowerPoint/Impress,\nor anything else that LibreOffice can read as a\npresentation) to [Beamer](https://ctan.org/pkg/beamer)-friendly,\nPandoc-style markdown.\n\nIt's a two part conversion: one script (`pptx-to-yaml.py`)\nconverts from pptx (or ppt, odp etc) into an intermediate\nYAML format, then another (`yaml-to-md.py`) converts the\nYAML to Pandoc-style markdown.\n\nAs a convenience, a Bash wrapper script is provided (`converter.sh`)\nwhich calls both of these, and handles a few other graphics-conversion\ntasks (such as converting SVG files, which LaTeX can't natively read,\nto encapsulated PostScript files, which it can).\n\n## pptx-to-yaml usage\n\n```bash\n$ ./pptx-to-yaml.py [--use-server HOST:PORT] INPUT_FILE OUTPUT_FILE IMAGE_DIR\n```\n\n`INPUT_FILE` is the path to some ppt, pptx or Impress file.\n\n`OUTPUT_FILE` is the name of the YAML file to be written.\n\n`IMAGE_DIR` is a directory where images will be extracted to.\n\nIt will be created if it doesn't exist.\n\nSee [soffice-server](#soffice-server) below for details of\nwhat `--use-server` is for.\n\n## PowerPoint features supported\n\nTitle text, \"outline\" text (i.e. bullets) and embedded\ngraphics like JPEGs or PNGs are handled reasonably well.\nFormatting such as italics, bold or colouring of text\nis not preserved. Nor are numbered lists - they're\nconverted into bulleted lists.\n\nEmbedded \"metafiles\" (EMF or WMF vector graphics)\nshould get converted to SVG. (And thence to EPS, if you\nuse `convert.sh`.)\n\nIf it finds any tables, drawing shapes (arrows/boxes etc),\n`pptx-to-yaml.py` tries to collect them all together\nand export them as an SVG.\n\n## soffice server\n\n`pptx-to-yaml.py` attempts to start an `soffice` process\nand communicate with it over port 2002 on the local host;\nit's the `soffice` process that knows how to read\nand manipulate PowerPoint files.\n\nHowever, the `HOST:PORT` arguments can be supplied if you prefer\nto run your own instance of `soffice` as a separate process.\nWhich you might want to, since:\n\na.  If you have a lot of files to convert, you can just\n    keep one `soffice` process running, and re-use it,\n    avoiding the time taken to start a new process for\n    each document.\n\nb.  Sometimes `pptx-to-yaml.py` just doesn't seem to\n    start the `soffice` process up correctly - I have no idea why.\n\nSo you could start the server process using something\nlike the following:\n\n```bash\n$ xterm -e 'soffice --accept=\"socket,host=localhost,port=2002;urp;\" \\\n    --norestore --nologo --nodefault --headless' \u0026\n```\n\n... which will open an `soffice` instance running in its own terminal\nwindow; and then specify `HOST` and `PORT` to `pptx-to-yaml.py`.\n\n\n## yaml-to-md usage\n\n```bash\n$ ./yaml-to-md.py INPUT_FILE OUTPUT_FILE\n```\n\nJust takes an input file and output file.\n\n## utility script - convert.sh\n\nusage:\n\n```bash\n$ ./convert.sh [INPUT_FILE..]\n```\n\nConvenience wrapper around pptx-to-yaml and yaml-to-md. Also converts\nSVG files to encapsulated PostScript (EPS) for use by LaTeX,\nand attempts to use Pandoc to create LaTeX and PDF files.\n(If it fails, that means the .md file needs some tidying, so the\nPDF file just isn't produced.)\n\n\n## prerequisites\n\nFor `pptx-to-yaml.py` and `yaml-to-md.py`:\n\n-   Python 3.5 or greater\n-   LibreOffice 5.1.6. On Ubuntu 16.04 (xenial), this\n    can be installed with `sudo apt-get install libreoffice`.\n-   `python3-uno`. On Ubuntu, this can be installed with\n    `sudo apt-get install python3-uno`.\n-   pyyaml. Most easily installed with something like\n    `pip3 install --user pyyaml`.\n\nFor `convert.sh`:\n\n-  Requires bash, sed, [Inkscape](https://inkscape.org)\n  (for converting SVG to EPS) and [Pandoc](https://pandoc.org/)\n  (for converting .md to .tex or .pdf).\n\n## idiosyncracies\n\nExported/graphics files are all referred to by absolute pathname,\nso if you want to move your generated files around,\nyou'll have to edit any references to them in the\nYAML/markdown, as appropriate.\n\n## Portability\n\nNot at all portable, and not tested on any other platform\nother than Ubuntu 16.04, nor with any other version of\nLibreOffice than 5.1.6.\n\n## Reporting bugs\n\nYou can if you want, but there's no guarantee I'll fix them.\nThe scripts are really just offered as a starting point for\nanyone else who wants to improve them.\n\n## (Un)license\n\nThis software is in the public domain. Do with it what you will.\nIf you manage to improve it, it would be nice to hear from\nyou. Try contacting me on Twitter, handle \n[`@phlummox`](https://twitter.com/phlummox).\n\n## troubleshooting\n\n### can't connect\n\nIf you get some error saying `pptx-to-yaml.py` couldn't connect to the\nserver -- kill any stray soffice process and try again.\n\nIf it still fails, possibly add a bigger `time.sleep`\nin the script, or just run your own server process.\n\n### can't open wmf/emf files in inkscape\n\nTry opening them in lodraw.\n\n### Pandoc/LaTeX fails to compile .pdf\n\nLots of things could have gone wrong. By default, Pandoc\nuses pdflatex, which will choke on many Unicode symbols.\nGraphics might not have converted. etc.\n\nThe only thing to do is take a look at the original\nPowerPoint file, and the generated markdown, and see if you\ncan fix whatever went wrong.\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphlummox%2Fpptx-to-md","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fphlummox%2Fpptx-to-md","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphlummox%2Fpptx-to-md/lists"}