https://github.com/mrinalxdev/bidirect-graph

Implementation of A linkedIn's optimise way of solving set inter-set problem
https://github.com/mrinalxdev/bidirect-graph

Last synced: 10 months ago
JSON representation

Implementation of A linkedIn's optimise way of solving set inter-set problem

Host: GitHub
URL: https://github.com/mrinalxdev/bidirect-graph
Owner: mrinalxdev
Created: 2024-12-22T09:13:54.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-12-23T07:39:42.000Z (about 1 year ago)
Last Synced: 2025-04-11T20:08:43.928Z (10 months ago)
Language: Go
Size: 13.7 KB
Stars: 10
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# BiDirect - Social Graph Service

BiDirect is a minimalist implementation of a distributed social graph service, designed to demonstrate the core concepts of building scalable social networking systems like LinkedIn or Facebook's social graph infrastructure.

## Overview

This implementation is a micro-instance of how large-scale social networks manage connection data and calculate relationship distances. While production systems handle billions of connections across thousands of nodes, this implementation uses a smaller scale to demonstrate the key architectural concepts.

### Key Features
- Distributed graph storage using Redis
- Partitioned data architecture
- Connection degree calculation (1st, 2nd, 3rd degree connections)
- Shared connection discovery
- Network distance computation

## Architecture

### Scaled-Down Components

1. **Partitioning Strategy**
- Current: Simple modulo-based partitioning across 3 Redis nodes
- Production: Would use consistent hashing or range-based sharding across thousands of nodes

2. **Caching Layer**
- Current: Single Redis instance for second-degree connections
- Production: Multi-tiered caching with in-memory, near-memory, and disk-based caches

3. **Node Management**
- Current: Static node configuration
- Production: Dynamic node discovery and automated rebalancing

### API Endpoints

```
GET /api/connections/{memberID} # Get member's direct connections
GET /api/shared-connections/{id1}/{id2} # Find shared connections
POST /api/distances # Calculate network distances
```

## Technical Design Decisions

### Data Distribution
- Uses partition-based distribution to demonstrate how social graphs can be split across multiple nodes
- Each node manages multiple partitions to show how load can be distributed
- Simplified partitioning function for demonstration purposes

### Connection Traversal
- Implements efficient 2nd and 3rd-degree connection discovery
- Uses Redis sorted sets for quick connection lookups
- Demonstrates caching of frequently accessed paths

### Set Cover Algorithm
- Implements a greedy approach for finding minimum node sets
- Shows how to optimize multi-node queries in a distributed system

## Getting Started

### Prerequisites
- Go 1.21+
- Docker and Docker Compose

### Running the Service

1. Start the infrastructure:
```bash
docker-compose up -d
```

2. The service will be available at `http://localhost:8080`

3. Load sample data:
```bash
python imp.py
```

## Production Considerations

This implementation is intentionally simplified. In a production environment, you would need to consider:

1. **Scalability**
- Current: 3 Redis nodes, 10 partitions per node
- Production: Thousands of nodes, dynamic partition allocation

2. **Reliability**
- Current: Basic error handling
- Production: Circuit breakers, fallbacks, redundancy

3. **Performance**
- Current: Simple caching strategy
- Production: Multi-level caching, pre-computation of common paths

4. **Monitoring**
- Current: Basic logging
- Production: Comprehensive metrics, tracing, alerting

5. **Security**
- Current: No authentication
- Production: OAuth, rate limiting, encryption

## Design Choices

### Why Redis?
- Demonstrates in-memory graph storage principles
- Sorted sets provide efficient connection lookups
- Easy to understand and set up for demonstration

### Why Partition-Based Distribution?
- Shows basic concepts of data sharding
- Demonstrates how to handle cross-partition queries
- Simplified version of production-grade distribution strategies

## Contributing

This is an educational project designed to demonstrate distributed systems concepts. Contributions that help clarify these concepts or add new educational examples are welcome.

## License

MIT License

## Acknowledgments

This implementation draws inspiration from real-world social graph systems while maintaining a focus on educational value and clarity over production-grade features.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mrinalxdev/bidirect-graph

Awesome Lists containing this project

README