Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yeisonmontoya1815/special-topics-in-data-analytics
In my PDD Data Analytics studies at Douglas College, the Special Topics course stands out as a crucial component. This specialized module delves into advanced aspects of data analysis beyond the core curriculum, offering a deep exploration of intricate domains. Through this focused study, I aim to enhance my proficiency in handling complex datasets
https://github.com/yeisonmontoya1815/special-topics-in-data-analytics
analytics data-science jupyter-notebook python structured-data unstructured-data
Last synced: 6 days ago
JSON representation
In my PDD Data Analytics studies at Douglas College, the Special Topics course stands out as a crucial component. This specialized module delves into advanced aspects of data analysis beyond the core curriculum, offering a deep exploration of intricate domains. Through this focused study, I aim to enhance my proficiency in handling complex datasets
- Host: GitHub
- URL: https://github.com/yeisonmontoya1815/special-topics-in-data-analytics
- Owner: yeisonmontoya1815
- License: cc0-1.0
- Created: 2024-01-16T04:11:33.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-08T07:14:43.000Z (6 months ago)
- Last Synced: 2024-12-06T01:12:34.851Z (2 months ago)
- Topics: analytics, data-science, jupyter-notebook, python, structured-data, unstructured-data
- Language: Jupyter Notebook
- Homepage:
- Size: 15.2 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Special Topics in Data Analytics
This Space has been created to share some examples of the seminar given in the Special Topics in Data Analytics.
This repository is licensed under the MIT License. See the [LICENSE](./LICENSE) file for more information.
📌 **Find me!**
* [Linkedin](https://www.linkedin.com/in/yeisonmontoya/)
* [Github](https://github.com/yeisonmontoya1815)## Seminar 1
**Structured Data Mining:**
- Tracking Patterns
- Classification
- Association
- Outlier Detection
- Clustering
- Regression
- Prediction
- Decision Trees
- Neural Networks
- Time Series Analysis**Random Forest Algorithm:** Random Forest is an ensemble learning algorithm that combines the strength of multiple decision trees to enhance the overall predictive accuracy and robustness of the model. It falls under the category of supervised machine learning and is utilized for both classification and regression tasks.
## Seminar 2
**Unstructured Data Mining (web mining) - Hyperlink-Induced Topic Search (HITS)**, also known as the Hubs and Authorities algorithm, is a link analysis algorithm that assigns two scores to each page: Hub score and Authority score. It evaluates the importance of web pages based on the structure of the hyperlink graph.
## Seminar 3
Text mining, a dynamic field within natural language processing, involves nuanced analysis of textual data. Major areas include:
- **Social Media Evaluation:** Unraveling sentiments and trends.
- **Semantic Analysis:** Deciphering word meanings for context.
- **Chatbots:** Enhancing responsiveness through text mining.
- **Deep Learning for Text Analysis:** Empowering pattern discernment.
- **Machine Translation:** Facilitating language translation.
- **Text Generation:** Contributing to content creation.
- **Information Extraction/Sentiment Analysis:** Discerning emotions and extracting insights.In these domains, text mining is a potent tool for extracting valuable insights from vast textual information.
## Seminar 4
**Big Data: Challenges and Strategies in Data Generation, Storage, and Retrieval:**
Big Data introduces a host of intricate challenges in terms of data generation, storage, and retrieval, as emphasized by Piety (2019). Addressing these challenges necessitates strategic approaches to unlock the potential inherent in extensive and varied datasets. The complexities involve efficiently handling diverse data types, particularly with the surge of information from sources like IoT and social media (Laney 2001).
Furthermore, the substantial volume of data requires scalable storage solutions that often surpass the capabilities of traditional databases. Quick and seamless access to relevant information is paramount for optimal functionality, necessitating systems tailored to retrieve data from repositories without compromising performance. In light of these challenges, this document aims to delineate key issues and propose strategies for data generation, storage, and retrieval.
It also advocates for the adoption of DynamoDB, an AWS-backed framework renowned for its competence in building technology stacks for managing large datasets at an affordable cost.
## Seminar 5
**BIG DATA VELOCITY & FOUNDATIONS AND TRENDS**
Big data is an emerging technology that can benefit businesses. However, it is important to address the challenges associated with its adoption, Katal et al. (2013). New analytical tools are being taught in business analytics (BA) because of the acceleration of data velocity trends.## Seminar 6
**Big Data: Variety and veracity applications**
Users on Facebook have the freedom to express their feelings about various topics, ranging from politics to the environment. This makes Facebook an ideal platform for conducting sentiment analysis tasks and leveraging the power of social network algorithms developed through machine learning using Python. By performing sentiment analysis on Facebook posts, trends among users can be accurately identified, enabling effective management of vast amounts of information. Implementation of batch data processing makes sense in the case of high volumes of data.## Seminar 7
**Data Security**
The rise of big data has brought about numerous possibilities and obstacles in managing information, particularly in terms of safeguarding security and privacy. This study explores the intricacies of securing vast amounts of data and tackling significant challenges such as breaches, exposure, and compliance matters. It highlights the technologies and regulations that can help reduce risks and focuses on recent incidents, underscoring the pressing need for strong security measures. The report's analysis contributes to better understanding and resolving security and privacy concerns in the age of big data.
## Seminar 8
**Ethics of Big Data**
The proliferation of big data in today's digital landscape has revolutionized industries, driving innovation, and transforming the way businesses operate. However, this rapid expansion of data collection and analysis has also raised significant ethical concerns and challenges. In this context, it becomes imperative to examine the ethical implications of big data practices and explore solutions for maintaining ethical integrity.
This journey delves into the ethical issues faced by the big data industry, highlights recent challenges encountered by companies handling vast datasets, and evaluates measures taken to address these challenges. Using Facebook as a case study, we analyze a prominent ethical dilemma and assess the effectiveness of the company's responses in mitigating ethical violations. By examining these issues and solutions, we gain insights into the complex interplay between technology, ethics, and data governance in the era of big data.