{"id":23394561,"url":"https://github.com/interjc/doc-to-md","last_synced_at":"2025-07-09T18:14:44.679Z","repository":{"id":268193251,"uuid":"903596219","full_name":"interjc/doc-to-md","owner":"interjc","description":null,"archived":false,"fork":false,"pushed_at":"2025-01-28T02:03:56.000Z","size":9,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-28T03:19:49.783Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/interjc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-15T02:42:30.000Z","updated_at":"2025-01-28T02:03:59.000Z","dependencies_parsed_at":null,"dependency_job_id":"5e34b535-be4c-4266-85b9-c167238cfe8c","html_url":"https://github.com/interjc/doc-to-md","commit_stats":null,"previous_names":["interjc/doc-to-md"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interjc%2Fdoc-to-md","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interjc%2Fdoc-to-md/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interjc%2Fdoc-to-md/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interjc%2Fdoc-to-md/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/interjc","download_url":"https://codeload.github.com/interjc/doc-to-md/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238845323,"owners_count":19540326,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-22T06:16:11.008Z","updated_at":"2025-02-14T12:48:05.064Z","avatar_url":"https://github.com/interjc.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Document to Markdown Converter\n\n## Introduction\n\nThis project is based on the [markitdown](https://github.com/microsoft/markitdown/) library, implementing functionality to convert Office documents (such as .docx) to Markdown files.\nThe project provides a set of Python scripts and corresponding Shell scripts for convenient local execution of the conversion process.\n\n## Directory Structure\n\n```\nproject_root/\n├─ README.md\n├─ requirements.txt\n├─ install_env.sh\n├─ python/\n│  ├─ converter.py\n│  └─ __init__.py\n└─ scripts/\n   ├─ run_convert.sh\n   └─ activate_env.sh\n```\n\n- `install_env.sh`: Script to create and install required environment and dependencies through conda.\n- `requirements.txt`: Python dependency list, including third-party libraries like markitdown.\n- `python/converter.py`: Core conversion logic script, using markitdown to convert Office documents to Markdown.\n- `scripts/run_convert.sh`: Shell script for activating the environment and calling `converter.py`.\n- `scripts/activate_env.sh`: Helper script for activating the conda virtual environment.\n\n## Environment Setup\n\n1. Ensure [conda](https://docs.conda.io/en/latest/) is installed.\n2. Run in the project root directory:\n   ```bash\n   chmod +x install_env.sh\n   ./install_env.sh\n   ```\n   The script will create and activate the conda environment, then install dependencies from `requirements.txt`.\n\n## Usage\n\n1. Prepare input file (e.g., `input.docx`).\n2. Execute script:\n   ```bash\n   ./scripts/run_convert.sh input.docx output.md\n   ```\n   - The script will automatically activate the corresponding conda environment and call `converter.py` to complete the conversion.\n   - After execution, `output.md` will be the conversion result.\n\n## Flow Chart\n\n```mermaid\ngraph TD\n    A(\"User runs run_convert.sh\") --\u003e B(\"run_convert.sh script starts\")\n    B --\u003e C(\"Activate conda environment\")\n    C --\u003e D(\"Parse user input parameters (input file, output file)\")\n    D --\u003e E(\"Call converter.py with parameters\")\n    E --\u003e F(\"converter.py uses markitdown to convert document\")\n    F --\u003e G(\"Output Markdown file\")\n    G --\u003e H(\"Complete and return result\")\n```\n\n## Common Issues\n\n- **Q:** What if conda is not installed?\n- **A:** Please refer to the [conda official documentation](https://docs.conda.io/en/latest/) to install conda before running `install_env.sh`.\n\n- **Q:** How to add new dependencies?\n- **A:** Modify `requirements.txt` and re-run `install_env.sh` to update the environment.\n\n## References\n\n- [markitdown project repository](https://github.com/microsoft/markitdown/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finterjc%2Fdoc-to-md","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finterjc%2Fdoc-to-md","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finterjc%2Fdoc-to-md/lists"}