Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ireddragonicy/cb24153-closestpairofpoints
An advanced and comprehensive implementation of the Closest Pair of Points problem using the Divide and Conquer algorithm in computational geometry. This project provides an in-depth exploration of efficient algorithms for solving proximity problems in a two-dimensional plane, focusing on optimizing performance for large-scale datasets.
https://github.com/ireddragonicy/cb24153-closestpairofpoints
algorithm analysis closest-pair-of-points divide-and-conquer dnd optimize pair-programming programming python tutorial umpsa
Last synced: 6 days ago
JSON representation
An advanced and comprehensive implementation of the Closest Pair of Points problem using the Divide and Conquer algorithm in computational geometry. This project provides an in-depth exploration of efficient algorithms for solving proximity problems in a two-dimensional plane, focusing on optimizing performance for large-scale datasets.
- Host: GitHub
- URL: https://github.com/ireddragonicy/cb24153-closestpairofpoints
- Owner: IRedDragonICY
- License: mit
- Created: 2024-12-07T16:21:44.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-12-22T15:44:01.000Z (about 2 months ago)
- Last Synced: 2024-12-22T16:31:18.132Z (about 2 months ago)
- Topics: algorithm, analysis, closest-pair-of-points, divide-and-conquer, dnd, optimize, pair-programming, programming, python, tutorial, umpsa
- Language: Python
- Homepage:
- Size: 774 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Closest Pair of Points Algorithm in Computational Geometry
![]()
![]()
![]()
> **Note:**
> Example problem and solution based on Dr. Mohm Azwan Bin Mohammad Hamza's assignment "Algorithm & Complexity"An advanced and comprehensive implementation of the **Closest Pair of Points** problem using the **Divide and Conquer** algorithm in computational geometry. This project provides an in-depth exploration of efficient algorithms for solving proximity problems in a two-dimensional plane, focusing on optimizing performance for large-scale datasets.
## Table of Contents
- [Introduction](#introduction)
- [Problem Statement](#problem-statement)
- [Case Study Questions](#case-study-questions)
- [Features](#features)
- [Dataset](#dataset)
- [Algorithm Implementation](#algorithm-implementation)
- [Step-by-Step Solution](#step-by-step-solution)
- [Time Complexity Analysis](#time-complexity-analysis)
- [Usage](#usage)
- [Performance Evaluation](#performance-evaluation)
- [Real-World Applications](#real-world-applications)
- [Critical Evaluation](#critical-evaluation)
- [Conclusion](#conclusion)
- [License](#license)## Introduction
Computational geometry is a foundational field in computer science that deals with algorithms and data structures for solving geometric problems. One of the classic challenges in this domain is determining the **Closest Pair of Points** in a two-dimensional plane. Efficiently solving this problem is crucial for applications such as **GPS navigation systems**, **machine learning clustering algorithms**, **image processing**, **wireless network optimization**, and **robotics**.
This project implements the **Divide and Conquer** algorithm to address the Closest Pair of Points problem effectively, reducing the time complexity from $O(n^2)$ (as seen in the brute-force method) to $O(n \log n)$. This optimization is significant when handling large datasets where performance is a critical factor.
## Problem Statement
### Closest Pair of Points Problem
The **Closest Pair of Points** problem involves finding the pair of points with the minimum Euclidean distance between them within a set of points on a 2D plane.
#### Euclidean Distance Calculation
For any two points $P_1 (x_1, y_1)$ and $P_2 (x_2, y_2)$, the Euclidean distance $d$ is calculated as:
$$
d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}
$$This formula measures the "straight line" distance between two points in Euclidean space.
### Inefficiency of Brute-Force Approach
The brute-force method compares every possible pair of points to find the minimum distance, resulting in a time complexity of $O(n^2)$. This quadratic growth becomes computationally expensive and impractical for large datasets due to the substantial number of comparisons required.
## Case Study Questions
*Detailed answers to the following questions are available in the [`/doc`](./doc/) folder.*
1. **Problem Understanding**
- Definition of the problem.
- Euclidean distance calculation.
- Inefficiency of the brute-force approach.2. **Divide and Conquer Approach**
- Process explanation.
- The three main steps: Divide, Conquer, Combine.
- Implementation of the 'strip' optimization.3. **Solve the Problem**
- Step-by-step solution using the provided dataset.4. **Time Complexity Analysis**
- Comparison between $O(n \log n)$ and $O(n^2)$ time complexities.5. **Algorithm Implementation**
- Code construction in Python.
- Execution on datasets containing 1,000, 10,000, and 100,000 points.6. **Performance Comparison**
- Evaluation of the Divide and Conquer approach versus the brute-force method.7. **Real-World Applications**
- Five practical applications of the Closest Pair of Points algorithm.8. **Critical Evaluation**
- Strengths, limitations, and potential trade-offs of the algorithm.## Features
- **Efficient Algorithm Implementation**: Leverages the Divide and Conquer strategy for optimal performance.
- **Scalability**: Effectively handles large datasets; tested with up to 100,000 points.
- **Modular and Clean Code**: Written in Python with emphasis on readability and maintainability.
- **Comprehensive Documentation**: Provides detailed explanations and analyses.
- **Performance Analysis Tools**: Includes scripts to compare and visualize algorithm performance.## Dataset
The algorithm operates on a set of 2D points with associated labels. Below is the sample dataset:
```python
points = [
("green", 2, 3),
("green", 5, 1),
("green", 6, 2),
("green", 7, 7),
("green", 20, 24),
("black", 3, 5),
("black", 13, 14),
("black", 27, 25),
("purple", 9, 6),
("purple", 12, 10),
("purple", 17, 21),
("purple", 18, 15),
("blue", 8, 15),
("blue", 19, 20),
("blue", 31, 33),
("blue", 40, 50),
("red", 12, 30),
("red", 22, 29),
("red", 25, 18),
("red", 35, 40),
]
```A visual representation of this dataset helps understand the spatial distribution of points across the plane.
![]()
## Algorithm Implementation
The project is implemented in **Python**, utilizing its robust libraries and simplicity for algorithm development.
### Divide and Conquer Algorithm Steps
1. **Divide**
- **Sorting**: Sort the points based on their x-coordinates.
- **Partitioning**: Divide the sorted set into two equal halves.2. **Conquer**
- **Recursive Processing**: Recursively apply the algorithm to both halves to find the minimum distances $d_L$ and $d_R$.
- **Base Case**: When subsets are small (typically 2 or 3 points), use the brute-force method.3. **Combine**
- **Find Minimal Distance**: Determine the minimal distance $d = \min(d_L, d_R)$.
- **Strip Creation**: Build a strip of points within distance $d$ from the central dividing line.
- **Strip Optimization**: Sort the strip points based on y-coordinates and check for closer pairs.#### Strip Optimization
- **Purpose**: Efficiently find the closest pair across the partition.
- **Implementation**:
- **Create Strip**: Collect points that lie within distance $d$ of the dividing line.
- **Sort Strip**: Sort these points based on their y-coordinates.
- **Compare Points**: For each point in the strip, compare it with the next seven points to find the closest pair.## Step-by-Step Solution
A detailed step-by-step solution for the provided dataset is available in the [`/doc`](./doc). It includes:
- **Sorting Points**: Demonstrates the initial sorting by x and y coordinates.
- **Dividing Dataset**: Explains how the dataset is recursively partitioned.
- **Calculating Closest Pair**: Shows how the closest pairs are found within subsets.
- **Combining Results**: Describes how the minimal distances are compared across the partitions.## Time Complexity Analysis
### Divide and Conquer Algorithm ($O(n \log n)$)
- **Initial Sorting**: Takes $O(n \log n)$ time.
- **Recursive Division**: Creates $\log n$ levels of recursion.
- **Combine Step**: At each level, the strip processing takes $O(n)$ time.
- **Overall Complexity**: The combination of sorting and recursive steps results in $O(n \log n)$ time complexity.### Brute-Force Method ($O(n^2)$)
- **Pairwise Comparison**: Every pair of points is compared.
- **Total Comparisons**: $\frac{n(n - 1)}{2}$ comparisons, leading to quadratic time growth.### Comparative Analysis
The Divide and Conquer method substantially outperforms the brute-force approach for large $n$, as it reduces the number of necessary comparisons and scales logarithmically.
## Usage
### Installation
Ensure that **Python 3.x** is installed on your system.
1. **Clone the Repository**
```bash
git clone https://github.com/IRedDragonICY/cb24153-closestpairofpoints.git
```2. **Navigate to the Project Directory**
```bash
cd cb24153-closestpairofpoints
```3. **Create a Virtual Environment (Optional)**
```bash
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```4. **Install Dependencies**
```bash
pip install -r requirements.txt
```### Running the Algorithm
To run the algorithm, you need to execute three Python scripts: `case-bigdataset.py`, `case-plot.py`, and `case-plot-step.py`. Follow the steps below to ensure everything runs smoothly:
1. **Ensure Dependencies are Installed**
```bash
pip install -r requirements.txt
```2. **Run the Scripts**
```bash
python src/case-bigdataset.py
python src/case-plot.py
python src/case-plot-step.py
```Each script has a specific role:
- `case-bigdataset.py`: Evaluates the performance of the brute force and divide & conquer methods on larger datasets.
- `case-plot.py`: Plots the performance results of the algorithms.
- `case-plot-step.py`: Provides a step-by-step visualization of the algorithm's execution.By following these steps, you will be able to run the algorithm and visualize the results effectively.
## Performance Evaluation
### Result Summary
| Number of Points | Brute Force Distance | Brute Force Time (s) | Divide & Conquer Distance | Divide & Conquer Time (s) |
|------------------|----------------------|----------------------|---------------------------|---------------------------|
| 1,000 | 952.135495 | 0.178031 | 952.135495 | 0.003254 |
| 10,000 | 30.413813 | 14.760878 | 30.413813 | 0.031042 |
| 100,000 | 11.401754 | 1465.536086 | 11.401754 | 0.475594 |### Observations
- **Efficiency Gains**: The Divide and Conquer algorithm dramatically reduces computation time.
- **Scalability**: Maintains reasonable execution times even as the dataset size increases significantly.
- **Practically Feasible**: The brute-force method becomes impractical for datasets larger than 10,000 points.### Graphical Representation
![]()
*The graph illustrates the execution time vs. number of points for both algorithms.*
## Real-World Applications
1. **GPS Navigation Systems**
- **Nearest Neighbor Search**: Finding the closest service location.
- **Route Optimization**: Calculating the shortest path by evaluating proximity between waypoints.2. **Machine Learning Clustering**
- **Data Analysis**: Grouping similar data points for pattern recognition.
- **Anomaly Detection**: Identifying outliers based on distance metrics.3. **Image Processing**
- **Feature Matching**: Aligning images by matching key points.
- **Object Recognition**: Detecting and classifying objects within images.4. **Wireless Network Optimization**
- **Network Design**: Placing transmitters for optimal coverage.
- **Signal Strength Mapping**: Analyzing proximity to access points.5. **Robotics**
- **Obstacle Avoidance**: Real-time detection of nearby obstacles.
- **Path Planning**: Determining efficient paths to targets.## Critical Evaluation
### Strengths
- **High Efficiency**: Significantly reduces computation time for large datasets.
- **Wide Applicability**: Useful in various domains requiring proximity calculations.
- **Foundational Knowledge**: Enhances understanding of computational geometry techniques.### Limitations
- **Algorithm Complexity**: More complex to implement compared to the brute-force method.
- **Dimensionality**: Efficiency gains diminish in higher-dimensional spaces.### Potential Trade-Offs
- **Development Time**: More time needed for coding and debugging.
- **Memory Usage**: Increased memory usage due to recursion and additional data structures.## Conclusion
The **Divide and Conquer** algorithm offers a powerful and efficient solution to the Closest Pair of Points problem, especially for large datasets. Its implementation in this project demonstrates significant performance improvements over the brute-force method, making it highly suitable for real-world applications that demand quick and reliable proximity calculations.
## License
This project is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for detailed information.
---
*For comprehensive answers to the case study questions, theoretical explanations, and additional resources, please refer to the [`/doc`](./doc/) folder.*
---
Empowering efficient computational geometry solutions for complex proximity problems.