https://github.com/lucacanali/miscellaneous
Includes notes on using Apache Spark, with drill down on Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark. Also tools for stress testing and measuring CPUs's performance. Jupyter notebooks examples for using various DB systems.
https://github.com/lucacanali/miscellaneous
apache-spark database jupyter-notebooks performance-analysis performance-monitoring performance-testing
Last synced: 7 months ago
JSON representation
Includes notes on using Apache Spark, with drill down on Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark. Also tools for stress testing and measuring CPUs's performance. Jupyter notebooks examples for using various DB systems.
- Host: GitHub
- URL: https://github.com/lucacanali/miscellaneous
- Owner: LucaCanali
- License: apache-2.0
- Created: 2015-08-28T20:09:55.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2025-05-09T21:12:05.000Z (8 months ago)
- Last Synced: 2025-05-09T22:20:59.058Z (8 months ago)
- Topics: apache-spark, database, jupyter-notebooks, performance-analysis, performance-monitoring, performance-testing
- Language: Jupyter Notebook
- Homepage:
- Size: 33.7 MB
- Stars: 446
- Watchers: 24
- Forks: 152
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Miscellaneous Projects, Tools, and Scripts
[](https://doi.org/10.5281/zenodo.15313041)
Contact: Luca.Canali@cern.ch
## Performance Engineering and Apache Spark
| Folder | Description
|------------------------------------------------------------------| -------------------------------------------------------------------------------------
| [**Spark Dashboard**](Spark_Dashboard) | A tool for Apache monitoring, use to build a performance dashboard and troubleshoot Spark jobs.
| [**Spark Notes**](Spark_Notes) | Miscellaneous tips and code snippets about Apache Spark.
| [**Spark for Physics**](Spark_Physics) | Examples, with code and data of using Apache Spark for High Energy Physics data analysis.
| [**Performance Testing**](Performance_Testing) | Includes:
- TPCDS-PySpark, run TPCDS bemchmark at scale with PySpark and collect execution metrics
- Load testing tools for CPU benchmarking, in Python and Rust
- Notes on how to use various tools for performance investigations
## Data Engineering and Data Science
| Folder | Description
|-----------------------------------------------------------------------| -------------------------------------------------------------------------------------
| [**Kepler Analysis**](Data_Analyses/Kepler) | A curated collection of interactive notebooks for executing Kepler's orbital analysis on Mars.
| [**Deep Learning Notes**](DeepLearning_Notes) | Notes and examples on Deep Learning tools and related data pipelines.
| [**Pyspark_SQL_Magic_Jupyter**](Pyspark_SQL_Magic_Jupyter) | How to write Jupyter SQL magic functions for PySpark and Spark SQL.
| [**Trino and Presto on Jupyter**](Trino_Presto_Jupyter) | Example of using Trino or Presto on a Jupyter notebook.
| [**PostgreSQL and YugabyteDB on Jupyter**](Trino_Presto_Jupyter) | Example of using PostgreSQL or YugabyteDB on a Jupyter notebook.
| [**Oracle_Jupyter**](Oracle_Jupyter) | Examples of how to query Oracle using Jupyter/IPython notebooks.
| [**Impala_SQL_Jupyter**](Impala_SQL_Jupyter) | Examples of how to run SQL on Apache Impala using Jupyter/IPython notebooks.
| [**SQL_color_Mandelbrot**](SQL_color_Mandelbrot) | How to use SQL to compute and display the Mandelbrot set with colors. Examples for Oracle and PostgreSQL.
| [**PLSQL_Neural_Network**](PLSQL_Neural_Network) | An example of neural network inference using Oracle RDBMS and PL/SQL.