https://github.com/lucashomuniz/project-13
Validating a Machine Learning Model for Cryptocurrency Price Forecasting with PySpark
https://github.com/lucashomuniz/project-13
analytics apache-spark apache-spark-framework bitcoin-price criptocurrency data-analysis machine-learning-algorithms pyspark pyspark-python python-language realtime-database
Last synced: 4 months ago
JSON representation
Validating a Machine Learning Model for Cryptocurrency Price Forecasting with PySpark
- Host: GitHub
- URL: https://github.com/lucashomuniz/project-13
- Owner: lucashomuniz
- License: apache-2.0
- Created: 2023-07-12T14:06:44.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2025-01-28T14:04:33.000Z (8 months ago)
- Last Synced: 2025-01-28T15:21:59.917Z (8 months ago)
- Topics: analytics, apache-spark, apache-spark-framework, bitcoin-price, criptocurrency, data-analysis, machine-learning-algorithms, pyspark, pyspark-python, python-language, realtime-database
- Language: Python
- Homepage:
- Size: 69.3 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ✅ PROJECT-13
In this project, **developed for Green Flow**, a liquid fertilizer manufacturer, the objective is to analyze financial sales data from **2023 and 2024** alongside specific **product characteristics** to deliver strategic insights for commercial decision-making. The datasets include details such as **fertilizer identification**, **chemical types**, **usage recommendations**, **average prices per gallon**, **monthly revenue**, **sustainability**, and **sales status**.
The project focuses on assessing data consistency and creating a **visualization dashboard** to showcase historical sales performance (volume and revenue) and the correlation between product characteristics and commercial results for both **active** and **discontinued products**. Built using **Microsoft Power BI**, the dashboard adheres to **best practices in data visualization**, highlighting key insights.
Data transformations are executed directly at the source using **SQL in BigQuery** within the **Google Cloud Platform (GCP)**. This approach enables a **real-time data pipeline**, directly connecting Power BI to GCP, eliminating the need for manual data loads. The final deliverable is a robust analytical solution aligned with Green Flow's commercial goals, emphasizing the dashboard's functionality and the primary business insights derived from the analysis.
**Keywords**: PowerBI, PowerQuery, DAX, Google Cloud Platform, Business Analytics, BigQuery, Data Visualization, Data Analysis.
# ✅ PROCESS
The analysis of the **ID** variable revealed significant variation in product performance, highlighting fertilizers with **lower average prices** as top performers in both **sales volume** and **revenue**. These findings emphasize the value of **promotional campaigns** targeting low-cost products to attract consumers with **lower purchasing power**. The comparison of **2024 Average Price** and **Revenue** suggests an opportunity to further boost revenue by focusing on this segment. The strong consumer preference for **affordable products** validates the effectiveness of a **mass-market strategy**, which should be expanded. Sales volume data reinforce this trend, with economical products dominating, presenting opportunities to **optimize margins** and drive additional revenue growth.

When segmenting the analysis by **TYPE**, **organic fertilizers** led in **sales volume**, while **mineral fertilizers** were the most **profitable**, indicating higher **margins** and greater **perceived market value**. In contrast, **synthetic fertilizers** showed lower market participation, aligning with the company's focus on **preservation** and **sustainability**. This pattern reflects a strategic positioning favoring fertilizers linked to **responsible agricultural practices**. The combination of high **sales volume** in organics and higher **revenues** in minerals suggests an opportunity to explore **hybrid strategies** that cater to diverse market segments, enabling the company to expand its reach while maintaining its **sustainable approach**.

The analysis of the **USAGE INDICATION** variable revealed that fertilizers designed for **fruit** and **grain crops** dominate the portfolio in both **sales volume** and **revenue**, establishing them as core pillars of the business. Fertilizers for **vegetables**, while secondary, offer strategic growth potential in specific niches. A clear correlation exists between the **mineral** and **organic types**, which consistently lead in revenue and sales volume, and the company's commitment to **environmental sustainability**. The underperformance of **synthetic fertilizers** further underscores Green Flow's focus on **sustainable agricultural practices**. These findings suggest prioritizing campaigns tailored to the demands of **fruit** and **grain crops**, while strategically exploring the **vegetable market** to expand the consumer base and capture new opportunities.

The analysis of **SALES STATUS** revealed that **active products** contribute **86% of total revenue**, while **discontinued products**, representing only **8% of the portfolio**, maintain financial significance with **higher average prices**, reflecting a **premium positioning**. These discontinued fertilizers rank among the top three in average price, suggesting their removal aligns with a strategy to prioritize **low-cost products** for broader market reach. This reinforces the company's shift toward **affordable, high-demand fertilizers** to cater to a wider audience. Implementing regular **product lifecycle analysis** could help prevent the premature discontinuation of items with untapped market potential.

The **SUSTAINABILITY** variable revealed that fertilizers classified as **"High Sustainability"** led in **sales volume**, while **"Medium Sustainability"** products generated the highest **revenue**, balancing **competitive pricing** with strong market acceptance. These results highlight not only a business strategy but also an internal culture committed to **environmentally responsible practices**. This analysis aligns with other variables, such as **type** and **usage**, confirming that **sustainability** is a core value for Green Flow. The company is well-positioned to attract **environmentally conscious consumers**, leveraging the **environmental benefits** of its products as a competitive differentiator.

# ✅ CONCLUSION
The analysis revealed that fertilizers with **lower average prices** lead in both **sales volume** and **revenue**, confirming the effectiveness of a **low-cost product strategy**. This trend was complemented by insights into **chemical types**, where **organic fertilizers** led in sales volume, while **mineral fertilizers** proved more profitable, emphasizing the need for **segmented strategies** to address diverse markets. The analyses of **Usage Indication** and **Sales Status** showed that products for **fruits** and **grains** dominate the portfolio, while **discontinued fertilizers**, despite their limited presence, retain financial relevance. These findings highlight market opportunities in specific niches, such as **vegetables**, and underscore the importance of revisiting **product lifecycles** to optimize the portfolio and unlock growth potential.
The **Sustainability** analysis reaffirmed the company's commitment to **environmentally responsible practices**, showing that **high-sustainability fertilizers** lead in sales volume, while **medium-sustainability products** balance **revenue** and **market acceptance**. These results consolidate Green Flow's position as a leader in **sustainable agricultural solutions**, demonstrating that the synergy between **accessibility**, **sustainability**, and **innovation** can continue to drive success. This project highlighted strategic patterns and opportunities, providing a solid foundation for **future commercial decisions** aligned with Green Flow's goals of **growth** and **sustainable innovation**.