An open API service indexing awesome lists of open source software.

https://github.com/awesomecosmos/CS673-Scalable-Databases

Repository for storing code for my MS in Data Science course CS673 Scalable Databases at Pace University.
https://github.com/awesomecosmos/CS673-Scalable-Databases

List: CS673-Scalable-Databases

cypher neo4j pace-university python sql

Last synced: 2 months ago
JSON representation

Repository for storing code for my MS in Data Science course CS673 Scalable Databases at Pace University.

Awesome Lists containing this project

README

        

# CS673-Scalable-Databases
Repository for storing code for my MS in Data Science course CS673 Scalable Databases at Pace University.

**Course description:** After reviewing relational databases and SQL, students will learn the fundamentals of alternative data storage schemas to deal with large amounts of data (structured and unstructured). The course covers big data and the development of the Hadoop file system, the MapReduce programming paradigm, and database management systems such as Cassandra, HBase, and Neo4j. Students will learn about NoSQL, distributed databases, and graph databases. The course emphasizes the differences between traditional database management systems and alternatives with respect to accessibility, cost, transaction speed, and structure. Part of the course is dedicated to accessing, handling, and processing data from different sources and of different types using Python. The course provides hands-on practice.

## Project 1

In this [project](https://github.com/awesomecosmos/CS673-Scalable-Databases/tree/main/Project%201), I analyzed a dataset of my choosing using [SQL](https://github.com/awesomecosmos/CS673-Scalable-Databases/blob/main/Project%201/table_and_query_creation.sql). Specifically, I analyzed the [Data Scientist Salaries 2023](https://www.kaggle.com/datasets/arnabchaki/data-science-salaries-2023) dataset, and created a local database, created tables, and wrote queries to explore this dataset.

## Project 2

In this [project](https://github.com/awesomecosmos/CS673-Scalable-Databases/blob/main/Project%202/VermaAPythonAssignment1.ipynb), I had to complete some basic Python commands.

## Project 3

In this [project](https://github.com/awesomecosmos/CS673-Scalable-Databases/blob/main/Project%203/CS673%20Assignment%203.scala), I completed some tasks using SparkSQL in the Spark big data context. You can also find my results here on [Databricks Community](https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/7431188062807507/2622594062738597/3254322822581140/latest.html).

## Project 4

In this [project](https://github.com/awesomecosmos/CS673-Scalable-Databases/blob/main/Project%204/queries.txt), I completed some tasks and wrote queries using HBase/Cassandra.

## Midterm Project
In this [project](https://github.com/awesomecosmos/CS673-Scalable-Databases/blob/main/Midterm%20Project/cs673_midterm_project.ipynb), my [partner](https://github.com/woodskd24) and I analyzed the [Data Scientist Salaries 2023](https://www.kaggle.com/datasets/arnabchaki/data-science-salaries-2023) dataset, and performed EDA, data cleaning, wrangling, manipulation, etc. in order to answer targeted queries and extract insights from the dataset. We recorded our presentation on this project here: https://youtu.be/z1-39Pkm-2E

## Final Project
In this [project](https://github.com/awesomecosmos/CS673-Scalable-Databases/tree/main/Final%20Project), I wrote Neo4j queries to create a graph network and analyze connections. I recorded a tutorial here: https://youtu.be/_NO4wwGkpRo, and this project has its own Github repo [here](https://github.com/awesomecosmos/Exploring-Our-Connections).