An open API service indexing awesome lists of open source software.

https://github.com/srinibas-masanta/streamlit-dataprofile

Data Profiler is a Streamlit app designed to provide insightful data analysis and visualization. Users can upload their datasets in '.csv' or '.xlsx' format, and the app generates a comprehensive profiling report using the YData Profiling library.
https://github.com/srinibas-masanta/streamlit-dataprofile

streamlit ydata-profiling

Last synced: about 2 months ago
JSON representation

Data Profiler is a Streamlit app designed to provide insightful data analysis and visualization. Users can upload their datasets in '.csv' or '.xlsx' format, and the app generates a comprehensive profiling report using the YData Profiling library.

Awesome Lists containing this project

README

        

# **Data Profiler**

**Data Profiler** is a powerful and user-friendly web application built with Streamlit that allows you to analyze and visualize your datasets with ease. Simply upload your data in `.csv` or `.xlsx` format, and generate comprehensive profiling reports that help you detect anomalies, patterns, and trends within your data.

## **Features**

- **Automated Data Analysis:** Quickly generate detailed profiling reports by uploading your dataset.
- **Customizable Reports:** Choose between different display modes, including `Primary`, `Dark`, and `Orange`.
- **Support for Multiple Formats:** Upload `.csv` or `.xlsx` files (up to 10 MB) for analysis.
- **Interactive UI:** Easy-to-use interface with options to select specific sheets for `.xlsx` files.
- **Downloadable Reports:** Save the profiling report as an HTML file for offline analysis.

## **Installation**

To run the Data Profiler application on your local machine, follow the steps below:

### **1. Clone the Repository**

```bash
git clone https://github.com/srinibas-masanta/data-profiler.git
cd data-profiler
```

### **2. Set Up a Virtual Environment**

Create and activate a virtual environment to manage dependencies.

```bash
python -m venv dataprofile
.\dataprofile\Scripts\activate # On Windows
source dataprofile/bin/activate # On macOS/Linux
```

### **3. Install Dependencies**

Install the required Python packages listed in the `requirements.txt` file.

```bash
pip install -r requirements.txt
```

Alternatively, manually install the necessary packages:

```bash
pip install numpy pandas scipy matplotlib streamlit ydata-profiling streamlit-pandas-profiling openpyxl xlrd
```

### **4. Run the Application**

Start the Streamlit application by running the following command:

```bash
streamlit run app.py
```

## **Usage**

Once the application is running, follow these steps:

1. **Upload Your Data:** Use the sidebar to upload a `.csv` or `.xlsx` file (up to 10 MB).
2. **Select Options:** Choose the report mode (`Primary`, `Dark`, `Orange`), and decide if you want a minimal report or a full report.
3. **Generate Report:** Click to generate the report, which will be displayed within the app.
4. **Download Report (Optional):** If desired, save the report as an HTML file using the download button.

## **Project Structure**

- **app.py:** Main script containing the Streamlit application logic.
- **media/DP Logo.jpg:** Logo used in the welcome page of the application.
- **requirements.txt:** List of all the Python dependencies required to run the application.

## **License**

This project is licensed under the MIT License - see the [LICENSE.txt](LICENSE.txt) file for details.