Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/lucacanali/miscellaneous

Includes notes on using Apache Spark in general, notes on using Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark, tools for performance testing CPUs, Jupyter notebooks examples for Spark, examples for Oracle and other DB systems.
https://github.com/lucacanali/miscellaneous

apache-spark database jupyter-notebooks performance-analysis performance-monitoring performance-testing

Last synced: about 1 month ago
JSON representation

Includes notes on using Apache Spark in general, notes on using Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark, tools for performance testing CPUs, Jupyter notebooks examples for Spark, examples for Oracle and other DB systems.

Awesome Lists containing this project

README

        

# Miscellaneous projects and scripts.
Author and contact: [email protected]

## Spark and Performance Engineering

| Folder | Description
|------------------------------------------------------------------| -------------------------------------------------------------------------------------
| [**Spark Dashboard**](Spark_Dashboard) | A tool for Apache monitoring, use to build a performance dashboard and troubleshoot Spark jobs.
| [**Spark Notes**](Spark_Notes) | Miscellaneous tips and code snippets about Apache Spark.
| [**Spark for Physics**](Spark_Physics) | Examples, with code and data of how Apache Spark can be used in the domain of High Energy Physics data analysis.
| [**Performance Testing**](Performance_Testing) | Code and examples, includes:
- A tool to run TPCDS at scale with PySpark and collect execution metrics
- Tools for load-testing CPUs in writetn Python and Rust
- Notes on how to use tooling for performace measurements

## Data Engineering and Data Science

| Folder | Description
|------------------------------------------------------------------| -------------------------------------------------------------------------------------
| [**Deep Learning Notes**](DeepLearning_Notes) | Notes and examples on Deep Learning tools and related data pipelines.
| [**Pyspark_SQL_Magic_Jupyter**](Pyspark_SQL_Magic_Jupyter) | How to write Jupyter SQL magic functions for PySpark and Spark SQL.
| [**Trino and Presto on Jupyter**](Trino_Presto_Jupyter) | Example of using Trino or Presto on a Jupyter notebook.
| [**PostgreSQL and YugabyteDB on Jupyter**](Trino_Presto_Jupyter) | Example of using PostgreSQL or YugabyteDB on a Jupyter notebook.
| [**Oracle_Jupyter**](Oracle_Jupyter) | Examples of how to query Oracle using Jupyter/IPython notebooks.
| [**Impala_SQL_Jupyter**](Impala_SQL_Jupyter) | Examples of how to run SQL on Apache Impala using Jupyter/IPython notebooks.
| [**SQL_color_Mandelbrot**](SQL_color_Mandelbrot) | How to use SQL to compute and display the Mandelbrot set with colors. Examples for Oracle and PostgreSQL.
| [**PLSQL_Neural_Network**](PLSQL_Neural_Network) | An example of how to deploy a DL serving engine for Oracle using PL/SQL.