{"id":24064532,"url":"https://github.com/sourasishbasu/file-wizard","last_synced_at":"2026-04-14T03:31:00.347Z","repository":{"id":228789327,"uuid":"745807008","full_name":"SourasishBasu/File-Wizard","owner":"SourasishBasu","description":"File converter application using AWS, JS and Python","archived":false,"fork":false,"pushed_at":"2024-06-28T08:51:41.000Z","size":10899,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-11-21T03:08:36.553Z","etag":null,"topics":["apigateway","aws","azure","docker","lambda","python","s3"],"latest_commit_sha":null,"homepage":"https://filewizard.vercel.app","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SourasishBasu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-20T07:40:08.000Z","updated_at":"2025-05-31T15:25:16.000Z","dependencies_parsed_at":"2024-04-20T16:48:33.845Z","dependency_job_id":"20e73f59-fb1e-4aec-ba09-ea913530fa7e","html_url":"https://github.com/SourasishBasu/File-Wizard","commit_stats":null,"previous_names":["sourasishbasu/file-wizard"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/SourasishBasu/File-Wizard","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SourasishBasu%2FFile-Wizard","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SourasishBasu%2FFile-Wizard/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SourasishBasu%2FFile-Wizard/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SourasishBasu%2FFile-Wizard/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SourasishBasu","download_url":"https://codeload.github.com/SourasishBasu/File-Wizard/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SourasishBasu%2FFile-Wizard/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31781292,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-14T02:24:21.117Z","status":"ssl_error","status_checked_at":"2026-04-14T02:24:20.627Z","response_time":153,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apigateway","aws","azure","docker","lambda","python","s3"],"created_at":"2025-01-09T10:37:58.582Z","updated_at":"2026-04-14T03:31:00.328Z","avatar_url":"https://github.com/SourasishBasu.png","language":"Python","readme":"![banner](https://github.com/SourasishBasu/File-Wizard/assets/89185962/bff3880e-d6d9-46c6-aa20-a95c2e5952fd)\n\u003ch1 align=\"center\"\u003eFile Wizard\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  An open-source file conversion webapp built with NextJs, Python\u003cbr\u003e\n  and AWS for the HTTP API, Lambda functions and S3 object storage.\u003cbr\u003eConverts .docx files to .pdf\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"#features\"\u003e\u003cstrong\u003eFeatures\u003c/strong\u003e\u003c/a\u003e ·\n  \u003ca href=\"#running-locally\"\u003e\u003cstrong\u003eRunning locally\u003c/strong\u003e\u003c/a\u003e ·\n  \u003ca href=\"#overview\"\u003e\u003cstrong\u003eOverview\u003c/strong\u003e\u003c/a\u003e ·\n  \u003ca href=\"#api-routing\"\u003e\u003cstrong\u003eAPI Routing\u003c/strong\u003e\u003c/a\u003e ·\n  \u003ca href=\"#authors\"\u003e\u003cstrong\u003eAuthors\u003c/strong\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\n## Features\n\n- **Website**\n  - [NextJs](https://nextjs.org) App Router\n  - [Amazon Web Services](https://docs.aws.amazon.com/) for backend functionality\n  - Support for `HTTP API`, `S3` File Storage, and `Lambda` functions\n  - Edge runtime-ready\n  \n- **AWS Infrastructure**\n  - [Amazon S3](https://aws.amazon.com/s3) Allows for object storage and static site hosting\n  - [API Gateway](https://aws.amazon.com/eventbridge) hosts the HTTP API \n  - [AWS Lambda](https://aws.amazon.com/lambda) for processing JSON and filtering required data\n  - [Amazon EC2](https://aws.amazon.com/sns) for provisioning VM instances \n\n### Tech Stack\n![NextJs](https://img.shields.io/badge/Nextjs-black?style=for-the-badge\u0026logo=nextdotjs\u0026logoColor=white)\n![Python](https://img.shields.io/badge/Python-blue?style=for-the-badge\u0026logo=python\u0026logoColor=white)\n![EC2](https://img.shields.io/badge/ec2-orange?style=for-the-badge\u0026logo=amazon-ec2\u0026logoColor=white)\n![API Gateway](https://img.shields.io/badge/API%20Gateway-8A2BE2?style=for-the-badge\u0026logo=amazon-api-gateway\u0026logoColor=white)\n![S3](https://img.shields.io/badge/S3-2d2dba?style=for-the-badge\u0026logo=amazon-s3\u0026logoColor=white)\n![Lambda](https://img.shields.io/badge/Lambda-FF9900?style=for-the-badge\u0026logo=aws-lambda\u0026logoColor=white)\n\n\n## Overview\n\u003cimg alt=\"AWS Architecture\" src=\"./assets/architecture-fw.png\"\u003e\n\n- A static site is hosted on `S3` with a document upload form. We use `API Gateway` to create an API which makes a `GET` request to a `Lambda` function after the user clicks \u003ckbd\u003eUpload File\u003c/kbd\u003e on the form.\n\n- The API sends a `presigned bucket URL` for the `uploads-bucket`. The site then automatically conducts a `PUT` request to the same bucket with the `.docx` file data.\n\n- Another `Lambda` function is configured to listen for `PUT Object events` in the S3 `uploads-bucket`. It parses the event record for file name and sends a `POST` request to the Python `Flask App` performing the document conversion.\n\n- An `EC2` instance is deployed with an Ubuntu OS image. A python script is setup to run as a background process.\n\n- The python microservice converts documents using `pandoc` package and is exposed as an API using `Flask` listening for `POST` requests on a specified port.\n\n- It downloads and saves the specified file with its ID, uploads the converted file to the `output-bucket` on `S3`. The static site returns the download link for the converted file from the `output-bucket`.\n\n# Configuring application on AWS\n\n## S3 Configuration (Only for using the static site as frontend)\nThe frontend of the app is hosted as a Static site in a separate S3 bucket.\n\n\u003e [!NOTE]\n\u003e To learn more about the `S3` static site and how to deploy it, visit the [`frontend/README.md`](./frontend/README.md)\n\n## API Routing\n\nThe `HTTP API` is hosted on AWS using API Gateway and Lambda function which deploys a `getPresignedURL.js app`. Source code for lambda function is in the [`lambda/presignedURL.js`](./lambda/presignedURL.js)\n\n\u003e [!NOTE]\n\u003e To learn more about the `getPresignedURL.js app` and how to deploy it, visit the [`lambda/README.md`](./lambda/README.md) \n\n## Setup Flask Microservice in EC2 for PDF conversion\n\n1. Create a `EC2 t2.micro` instance with an `Ubuntu Linux AMI` and note the VM's public IPv4 address.\n\n2. Assign an IAM role to the EC2 instance with the `AmazonS3FullAccess` policy attached.\n\n3. Run the Flask development server within the VM:\n\n### Installation\n\nBefore installing ensure its the correct Python version via `python -V`\n\n```bash\nsudo apt update \u0026\u0026 apt upgrade\nsudo apt install pandoc texlive python3.10-venv\n```\n\n### Setup Python venv and script\n\n```bash\npython3 -m venv venv\nsource venv/bin/activate\npip install pypandoc boto3 flask\nmkdir inputs outputs\ntouch app.py\n```\n\nCopy the contents of `app.py` within the python file by opening it with any code editor (nano, vim etc).\n\n```bash\nsudo su\nnohup python3 app.py \u003e log.txt 2\u003e\u00261 \u0026\n```\n\n- The Flask app should now be able to handle requests 24/7. It is being run as a background process using the `nohup` command to ensure application uptime as long as VM is running even if we were to exit out of remote shell.\n- The logs and stdout along with stderr is saved to `log.txt` in the same directory.\n- The `\u0026` displays the process ID for the python process which may be recorded to perform `kill \u003cPID\u003e` in case the process is to be stopped.\n\nThe Flask app should now be running on:\n[http://{ec2-instance-public-ipv4-address}:5000](http://{ec2-instance-public-ipv4-address}:5000/)\n\nReplace this address in the API endpoint URL within the [trigger_converter.py](./lambda/trigger_converter.py) Lambda function to send the S3 `.docx` files to the Flask microservice to be converted.\n\n\u003e [!WARNING]\n\u003e This command only starts the webapp. You will need to configure the instance Security Group to allow TCP connections to port 5000 of the EC2 instance from any external IPv4 address [0.0.0.0/0] on AWS to get the full functionality.\n\n\u003e [!NOTE]\n\u003e Follow the above steps for the `PNG` and `CSV` converter microservices in similar fashion in separate directories and expose them on different ports.\n\n## Usage\n\n\u003e [!Tip]\n\u003e In case webapp demo videos aren't loading below in the README, please visit [Youtube](https://www.youtube.com/watch?v=7NJh7KChyYo).\n\n\u003cp align=\"center\"\u003e \n  \u003cvideo src= \"https://github.com/SourasishBasu/File-Wizard/assets/89185962/405d58e0-a0a2-4aaf-9629-1145efe463bf\" width=\"300\"/\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\u003cb\u003e DOCX to PDF Conversion \u003c/b\u003e\u003c/p\u003e\n\u003cbr\u003e\n\u003cp align=\"center\"\u003e \n  \u003cvideo src= \"https://github.com/SourasishBasu/File-Wizard/assets/89185962/a6b96fc3-22e8-4201-9425-932af09d0936\" width=\"300\"/\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\u003cb\u003e PNG to PDF Conversion \u003c/b\u003e\u003c/p\u003e\n\u003cbr\u003e\n\u003cp align=\"center\"\u003e \n  \u003cimg src=\"https://github.com/SourasishBasu/File-Wizard/blob/4dff9e2de97c4b4e5aeb06a40a2c829e3ced37b7/assets/inputs.png\" /\u003e\n   \u003cbr\u003e\u003cb\u003eS3 uploads-bucket for .docx files\u003c/b\u003e\n\u003c/p\u003e\n\u003cbr\u003e\n\u003cp align=\"center\"\u003e \n  \u003cimg src=\"https://github.com/SourasishBasu/File-Wizard/blob/4dff9e2de97c4b4e5aeb06a40a2c829e3ced37b7/assets/outputs.png\" /\u003e\n   \u003cbr\u003e\u003cb\u003eS3 output-bucket for .pdf files\u003c/b\u003e\n\u003c/p\u003e\n\u003cbr\u003e\n\u003cp align=\"center\"\u003e \n  \u003cimg src=\"https://github.com/SourasishBasu/File-Wizard/blob/4dff9e2de97c4b4e5aeb06a40a2c829e3ced37b7/assets/pid.png\" /\u003e\n   \u003cbr\u003e\u003cb\u003eFlask App process running in EC2\u003c/b\u003e\n\u003c/p\u003e\n\n## Authors\n\nThis project is created by [MLSA KIIT](https://mlsakiit.com) for Cloud Computing Domain's Project Wing:\n\n- Sourasish Basu ([@SourasishBasu](https://github.com/SourasishBasu)) - [MLSA KIIT](https://mlsakiit.com)\n\n## Version\n| Version | Date          \t\t| Comments        |\n| ------- | ------------------- | --------------- |\n| 1.0     | Jan 24th, 2024   | Initial release |\n\n## Future Roadmap\n**Website/API**\n- [X] File Validation and Sanitization on server side\n- [ ] Better PDF conversion engine to retain original formatting in higher quality\n- [X] Better Error Handling\n  \n**AWS Infrastructure**\n- [X] Actual implementation in production\n- [X] Conversion feature between multiple file types\n- [ ] Implementing image compression using methods such as Huffman Encoding\n\n----\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsourasishbasu%2Ffile-wizard","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsourasishbasu%2Ffile-wizard","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsourasishbasu%2Ffile-wizard/lists"}