https://github.com/malexandersalazar/tools-python-mssql-statistics-descriptor
A lightweight tool based on sweetviz that generates high-density visualizations to kickstart Exploratory Data Analysis within Microsoft Azure SQL Database using ODBC with just one line of code
https://github.com/malexandersalazar/tools-python-mssql-statistics-descriptor
azure-sql-database data-analysis data-visualization eda python
Last synced: 29 days ago
JSON representation
A lightweight tool based on sweetviz that generates high-density visualizations to kickstart Exploratory Data Analysis within Microsoft Azure SQL Database using ODBC with just one line of code
- Host: GitHub
- URL: https://github.com/malexandersalazar/tools-python-mssql-statistics-descriptor
- Owner: malexandersalazar
- License: mit
- Created: 2023-05-12T23:16:17.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-05-16T02:34:24.000Z (about 3 years ago)
- Last Synced: 2025-05-15T06:11:31.325Z (about 1 year ago)
- Topics: azure-sql-database, data-analysis, data-visualization, eda, python
- Language: Jupyter Notebook
- Homepage:
- Size: 426 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MSSQL Statistics Descriptor

A lightweight tool based on sweetviz that generates high-density visualizations to kickstart Exploratory Data Analysis within Snowflake with just one line of code.
## Installation
Copy the `main.py` script and install the requirements located in the dist folder.
```
pip install -r requirements.txt
```
We will also need to download and install the ODBC Driver for SQL Server, this repo is using the ODBC Driver 18 for SQL Server version.
[Download ODBC Driver for SQL Server](https://learn.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server)
## Getting Started
| Positional argument | Example |
| --- | --- |
| server | tcp:my-sqldbs-dev.database.windows.net |
| database | sqldb-adventureworkslt-dev |
| schema | MySchema |
| Option | Example/Description |
| --- | --- |
| -h, --help | show this help message and exit |
| -u, --user | database-user, server-admin@contoso.com |
| -p, --password | specifies the user password, required only at non-interactive login |
| -r, --rows | specifies the number of rows to sample from the table (default: 500000) |
| -l, --level | specifies the database object level in which the analysis should be executed, "s" for schema and "t" for table (default: "s") |
| -t, --table | specifies the database table name |
| --associations | indicates that a correlation graph should be generated |
| --open-browser | indicates that a web browser tab should be opened while datasets are analyzed |
| --interactive | indicates that program should authenticate with an Azure Active Directory identity using interactive authentication, requires Azure Active Directory admin enabled on Azure SQL server resource |
The default behaviour of the script will load and analyze the specified number of rows of each table in the selected database schema.
```
python main.py tcp:my-sqldbs-dev.database.windows.net sqldb-adventureworkslt-dev SalesLT -u=database-user -p=S3cUr3P@S$w0rD -r=10000
```
The program will build and save locally high-density HTML visualizations and generate an Excel summary with table name, table rows, data size, table index size and parsed record count in a new folder called **obj**.

If we need a correlation graph to be generated for the columns of each table, we must include the `--associations` flag.
```
python main.py tcp:my-sqldbs-dev.database.windows.net sqldb-adventureworkslt-dev SalesLT -u=database-user -p=S3cUr3P@S$w0rD -r=10000 --associations
```
We must consider that correlations and other associations may take a **quadratic time (n^2)** to complete.

If we only need the analysis for a single table we must specify "**t**" as `-l` or `--level` argument value with the corresponding **table name** in `-t` or `--table` argument.
```
python main.py tcp:my-sqldbs-dev.database.windows.net sqldb-adventureworkslt-dev SalesLT -u=database-user -p=S3cUr3P@S$w0rD -r=500000 -l=t -t=Product
```
If we need an Azure Active Directory authentication we have to set the `--interactive` flag and enable Azure Active Directory admin for our database.
```
python main.py tcp:my-sqldbs-dev.database.windows.net sqldb-adventureworkslt-dev SalesLT -l=t -t=Product -r=10000 --open-browser --interactive
```

## Prerequisites
MSSQL Statistics Descriptor was tested with:
* Python: 3.7.16
* Packages:
* pyodbc: 4.0.39
* pandas: 1.3.5
* sweetviz: 2.1.4
* XlsxWriter: 3.1.0
* Anaconda: 2.4.0
## License
This project is licenced under the [MIT License][1].
[1]: https://opensource.org/licenses/mit-license.html "The MIT License | Open Source Initiative"