{"id":31110646,"url":"https://github.com/statusneo/observability-as-code","last_synced_at":"2026-04-11T03:03:48.653Z","repository":{"id":56147584,"uuid":"259029129","full_name":"StatusNeo/Observability-As-Code","owner":"StatusNeo","description":"Real Time Twitter Mining for StatusNeo Official Account","archived":false,"fork":false,"pushed_at":"2020-11-24T09:06:18.000Z","size":20717,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-09-17T07:52:35.422Z","etag":null,"topics":["ansible","automation","datadog","devops","docker","github","github-packages","githubactions","hadoop","hashicorp","java","maven","packer","python","python3","terraform","twitter"],"latest_commit_sha":null,"homepage":"https://twitter.com/StatusNeo2","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/StatusNeo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-04-26T12:49:26.000Z","updated_at":"2024-10-04T20:45:00.000Z","dependencies_parsed_at":"2022-08-15T13:31:36.325Z","dependency_job_id":null,"html_url":"https://github.com/StatusNeo/Observability-As-Code","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/StatusNeo/Observability-As-Code","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StatusNeo%2FObservability-As-Code","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StatusNeo%2FObservability-As-Code/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StatusNeo%2FObservability-As-Code/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StatusNeo%2FObservability-As-Code/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/StatusNeo","download_url":"https://codeload.github.com/StatusNeo/Observability-As-Code/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StatusNeo%2FObservability-As-Code/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31667034,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-10T17:19:37.612Z","status":"online","status_checked_at":"2026-04-11T02:00:05.776Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ansible","automation","datadog","devops","docker","github","github-packages","githubactions","hadoop","hashicorp","java","maven","packer","python","python3","terraform","twitter"],"created_at":"2025-09-17T07:52:26.611Z","updated_at":"2026-04-11T03:03:48.630Z","avatar_url":"https://github.com/StatusNeo.png","language":"Python","readme":"# DevOps pipeline for Real Time Social/Web Mining\n\n## Workflow \n\n![Workflow](img/Workflow.png)\n\n## Technology Stack\n\n* **Git:** Version Control\n\n* **GitHub:** Distributed Development and SCM\n\n* **Python:** Tweepy and Pandas library for Data Mining using Twitter API and Matplotlib library for Data Visualization\n\n* **Java:** Big Data cleaning and stripping workflow using MapReduce\n\n* **Apache Maven:** Build Automation Tool for Java\n\n* **GitHub Actions:** Continuous Integration tool for Apache Maven build whenever Java source code is pushed.\n\n* **Hadoop:** Setup a HDFS cluster for Big Data Analytics.\n\n* **Likert Scaling:** Data Classification into 5 class model.\n\n* **Python:** Sentimental Analysis programming\n\n* **Docker:** Cross-platform package image pushed to DockerHub.\n\n* **DataDog:** Monitoring tool for our Docker Package.\n\n* **Docker-Compose:** Integrating Docker Image of StatusNeo Twitter Mining and DataDog Agent\n\n* **HashiCorp Packer:** Creating cross platform deployable images\n\n* **HashiCorp Terraform:** Infrastructure as Code\n\n* **Ansible:** Configuration Management and Automated Provisioning\n\n## Important Source files and dependencies\n\n1. [pom.xml](pom.xml) - Setup Apache Maven\n\n2. [helloworld.java](src/main/java/pkg/HelloWorld.java) - Basic Java project setup\n\n3. [maven.yml](.github/workflows/maven.yml) - setup GitHub Actions\n\n4. [crawler.py](src/crawler/Crawl.py) - Web Crawler in Python to extract twitter data based on specific hashtags.\n\n5. [info.csv](src/crawler/info.csv) - data file created as an output for the crawler and to be sent to the HDFS core for processing\n\n6. MapReduce functionalities in Java\n\n* [Map Function](src/main/java/pkg/Map.java)\n  \n* [Reduce Function](src/main/java/pkg/Reduce.java)\n  \n* [Main Java Code](src/main/java/pkg/WordCountDriver.java)\n  \n7. [Sentimental Analysis in Python](src/sentimental_analysis)\n\n* Convolutional Neural Networks\n* Decision Tree\n* SVM\n* Pre-Processing\n* Random Forests\n* Naive Bayes\n* XGBoost\n\n8. [matplotlib.py](src/visualization/matplotlib.py) - Data Visualization using matplotlib in python\n\n9. Hadoop Setup\n\n* [Hadoop Core Setup](hdfs_setup/core-site.xml)\n* [HDFS Setup](hdfs_setup/hdfs-site.xml)\n* [MapReduce in Task Tracker](hdfs_setup/TaskTracker_MapReduce.xml)\n* [MapReduce in Job Tracker](hdfs_setup/JobTracker_MapReduce.xml)\n\n10. [Dockerfile](Dockerfile)\n\n* [Install.sh](install.sh) to provision the docker image locally before pushing it to [DockerHub](https://hub.docker.com/r/shreyasingh18/statusneo).c\n\n11. [Automation.sh](Automation.sh) Run locally on Linux based machine.\n\n12. [docker-compose.yml](docker-compose.yml) for DataDog x Docker integration.\n\n13. [Ansible Playbook](ansible/playbook.yml)\n\n14. [Packer Image Builder](packer/template.json)\n\n15. [Infrastructure as Service using Terraform](terraform)\n\n## Backlog\n\n[x] Setting up Apache Maven for Java project - User Interface and MapReduce functions\n\n[x] Setting up GitHub repository workflow\n\n[x] Setting up GitHub Actions for automation\n\n[x] Creating a web crawler in Python using Tweepy library to fetch data based on some parameter.\n\n[] Create a User Interface\n\n[x] Create a HDFS cluster for MapReduce functionality and program Hadoop MapReduce in Java\n\n[x] Setup Hadoop Core and create Job Tracker and Task Trackers for the project\n\n[x] Implement MapReduce in HDFS using Java to count the frequency of significant words in Data dictionary, in Twitter string\n\n[x] Configure Apache Maven with MapReduce codes and install Apache Hadoop Jar dependency\n\n[x] Configure MapReduce code in GitHub Actions for automation\n\n[x] Automate the Big Data pipeline till MapReduce using GitHub Actions\n\n[] Use Data Ingestion tools like Flume to send data from crawler to HDFS at real time\n\n[x] WAP in Java to implement MapReduce from JSON file extracted from crawler to find the frequency of significant words - Textual Analysis\n\n[] Data Classification - create a multi-class data dictionary for sentimental analysis - currently for words (in future, we might extend it for phrases and sentences for improved accuracy)\n\n[x] Data Predicition - Using the KNN algorithm in Python to find the relation between tweets and their sentiments.\n\n[x] Data Visualization - Using the Python **matplotlib** library to implement visualization.\n\n## How to Contribute\n\nIt is an open source project. Open for everyone.\n\nFollow these contribution [guidelines](CONTRIBUTING.md).\n\n## License\n\nMIT [License](LICENSE), copyrighted to StatusNeo, forked from Storms in Brewing (2019-2020) \n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstatusneo%2Fobservability-as-code","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstatusneo%2Fobservability-as-code","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstatusneo%2Fobservability-as-code/lists"}