{"id":22439021,"url":"https://github.com/aalkiyumi/project-3-docker-container-for-data-processing-script","last_synced_at":"2026-05-17T15:39:41.309Z","repository":{"id":261514559,"uuid":"874982113","full_name":"AAlkiyumi/Project-3-Docker-Container-for-Data-Processing-Script","owner":"AAlkiyumi","description":"This Dockerized Python application analyzes two text files (IF.txt and AlwaysRememberUsThisWay.txt). It counts total words, identifies the largest file, and finds the top three most frequent words in each. Results are saved to an output file and printed to the console.","archived":false,"fork":false,"pushed_at":"2024-11-12T21:44:37.000Z","size":1966,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-27T09:24:34.125Z","etag":null,"topics":["cs5165","data-analysis","data-engineering","data-science","docker","introduction-to-cloud-computing","statistical-analysis","text-processing","uc","uc2026","university-of-cincinnati"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AAlkiyumi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-18T20:39:33.000Z","updated_at":"2024-11-12T21:46:39.000Z","dependencies_parsed_at":"2024-11-07T00:37:00.338Z","dependency_job_id":"c5fea8c9-e153-4f5d-8c3e-3677829ba8d6","html_url":"https://github.com/AAlkiyumi/Project-3-Docker-Container-for-Data-Processing-Script","commit_stats":null,"previous_names":["aalkiyumi/project_3_docker"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/AAlkiyumi/Project-3-Docker-Container-for-Data-Processing-Script","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AAlkiyumi%2FProject-3-Docker-Container-for-Data-Processing-Script","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AAlkiyumi%2FProject-3-Docker-Container-for-Data-Processing-Script/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AAlkiyumi%2FProject-3-Docker-Container-for-Data-Processing-Script/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AAlkiyumi%2FProject-3-Docker-Container-for-Data-Processing-Script/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AAlkiyumi","download_url":"https://codeload.github.com/AAlkiyumi/Project-3-Docker-Container-for-Data-Processing-Script/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AAlkiyumi%2FProject-3-Docker-Container-for-Data-Processing-Script/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278512191,"owners_count":25999255,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-05T02:00:06.059Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cs5165","data-analysis","data-engineering","data-science","docker","introduction-to-cloud-computing","statistical-analysis","text-processing","uc","uc2026","university-of-cincinnati"],"created_at":"2024-12-06T01:12:28.945Z","updated_at":"2025-10-05T20:24:17.759Z","avatar_url":"https://github.com/AAlkiyumi.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Project 3: Docker Container for Data Processing Script\n\n## Project Overview\nThis project involves building and deploying a Docker container that automates text file processing through a Python script. The container will count words, handle contractions, find frequent words, and display results along with the machine’s IP address. Screenshots of Docker Desktop and a tar file of the final container image are required for submission.\n\n## Requirements\n\n### Part 1: Docker Installation\n1. Install Docker Desktop on your personal computer (Windows, macOS, or Linux).\n2. Submit a screenshot of Docker Desktop showing your containers running.\n\n### Part 2: Dockerfile Setup\n1. Create a `Dockerfile` using a lightweight base image (e.g., `ubuntu`, `alpine`, or `python:3.9-slim`).\n2. Submit the `Dockerfile` as a text file or share it on GitHub.\n\n### Part 3: Script Development\n1. Write a Python script (`scripts.py`) to read and process two text files, `IF.txt` and `AlwaysRememberUsThisWay.txt`, from `/home/data` inside the container.\n2. Submit the script as a text file or on GitHub.\n\n### Part 4: Script Objectives\nThe Python script should accomplish the following:\n\n- **Word Count in Each File**: Count the total number of words in each file.\n- **Grand Total Word Count**: Sum the word counts from both files.\n- **Top 3 Frequent Words in IF.txt**: Find and display the three most frequent words in `IF.txt` with counts.\n- **Top 3 Frequent Words in AlwaysRememberUsThisWay.txt**: Handle contractions (e.g., \"I'm\") by splitting, then display the three most frequent words with counts.\n- **IP Address Retrieval**: Display the IP address of the machine running the container.\n- **Output Results**: Write all results to `/home/data/output/result.txt` and print the contents to the console upon container execution.\n\n### Part 5: Optimize Docker Image\n- Minimize the Docker image size (target size: less than 200MB).\n\n### Part 6: Submit Final Image\n1. Create a tar file of your final Docker image, named with your email username (e.g., `yourusername.tar`).\n2. Submit the tar file for evaluation.\n\n---\n\n## Extra Credit\n- **Container Orchestration with Kubernetes or Docker Swarm**:\n  - Deploy and manage at least two replicas of your container using Kubernetes or Docker Swarm.\n  - Submit your Kubernetes manifest (YAML file) or Docker Swarm configuration.\n  - Provide the output of `kubectl get pods \u003e kube_output.txt; cat kube_output.txt` or an equivalent command for Docker Swarm.\n  \n## Key Points to Remember\n- Use a lightweight base image in the Dockerfile to minimize the final image size.\n- Ensure the script can handle contractions and edge cases.\n- The container should be fully automated, executing, generating output, and exiting without manual interaction.\n- Test the container to ensure it runs correctly on any machine.\n- Confirm that all outputs (word counts, IP address, etc.) are written to `result.txt` and printed to the console when the container runs.\n\n---\n\n## Application Setup Screenshots\n\nBelow are the key screenshots documenting each step of the application setup on AWS EC2.\n\n![Image 1](Images/image1.png)\n![Image 2](Images/image2.png)\n![Image 3](Images/image3.png)\n![Image 4](Images/image4.png)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faalkiyumi%2Fproject-3-docker-container-for-data-processing-script","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faalkiyumi%2Fproject-3-docker-container-for-data-processing-script","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faalkiyumi%2Fproject-3-docker-container-for-data-processing-script/lists"}