https://github.com/ostad-ai/graph-analysis
This repository is devoted to the field of Graph Analysis and related topics.
https://github.com/ostad-ai/graph-analysis
a-star-algorithm algorithm betweenness-centrality breadth-first-search clique cliques closeness-centrality depth-first-search dijkstra-algorithm directed-graph graph greedy-best-first-search kruskal-algorithm label-propagation minimum-spanning-tree prim-algorithm python topological-sorting undirected-graph
Last synced: 7 days ago
JSON representation
This repository is devoted to the field of Graph Analysis and related topics.
- Host: GitHub
- URL: https://github.com/ostad-ai/graph-analysis
- Owner: ostad-ai
- License: mit
- Created: 2025-08-14T12:27:51.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-11-12T17:41:47.000Z (5 months ago)
- Last Synced: 2025-11-12T19:21:41.253Z (5 months ago)
- Topics: a-star-algorithm, algorithm, betweenness-centrality, breadth-first-search, clique, cliques, closeness-centrality, depth-first-search, dijkstra-algorithm, directed-graph, graph, greedy-best-first-search, kruskal-algorithm, label-propagation, minimum-spanning-tree, prim-algorithm, python, topological-sorting, undirected-graph
- Language: Jupyter Notebook
- Homepage:
- Size: 1.76 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Graph-Analysis
Programs related to the field of **Graph Analysis** are expessed here. More on Graph Analysis is at the following link:
https://www.pinterest.com/HamedShahHosseini/graph-analysis/
## History:
**Graph theory** is said to be originated by the work of **Leonard Euler** when dealing by the **Konigsberg bridges problem**. He abstracted the lands (four) by nodes and the bridges (seven) by edges.
- However, Islamic Scholars (9th–14th centuries) used tree diagrams for lineage and knowledge organization.
The **Konigsberg bridges problem** asks to start at any land areas of the city, walk over each of the seven bridges exactly once, and return to the starting point. Euler proved that there is no solution for this problem.
## Topics:
1) **What is a graph?** A graph is a mathematical structure to represent *relationships* between *objects*. Objects are **nodes** (vertices), and relationships are **edges** (links).
Here, we review the graph definition, and discuss **undirected** graphs versus **directed** graphs with examples in Python.
2) **Graph: Adjacency matrix** For a graph with *n* nodes, the **adjacency matrix** is of size *n*-by-*n*. For **simple graphs**, the entries of the adjacency matrix is composed of zeros and ones. For undirected graphs, this matrix is *symmetric*.
Here, we give the Python code for defining the adjacency matrix for different types of graphs.
3) **Graph: Node degree** The **degree** of a node in graph is a simple concept which is quite useful in analyzing and understanding the graph. The degree of a node is the number of edges connected to the node. The definition of node degree for directed graphs includes in-degrees and out-degrees.
We have some interesting formulae regarding the node degrees. For example, the **degree sum formula** (handshaking lemma) states that the sum of degrees of nodes in a graph is twice the number of the edges of the graph.
Here, we review the concept of degrees in graphs by examples in Python.
4) **Adjacency list:** The adjacency list is another way to represent graphs. It is a data structure holding a list of nodes of a graph such that each node in the list stores its neighbors. Adjacency lists are particularly efficient for **sparse graphs** such as social networks, web graphs, and biological networks.
Here, we implement a Python class for a graph which holds the adjacency list of the graph in a *defaultdict*. We build two graphs by our class and show their adjacency lists. As a bonus, a Python function has also been defined to convert our graph to that of *NetworkX*.
5) **Walk, trail, path, cycle (circuit), and Eulerian trail/circuit:** The concepts that define the sequences of nodes and edges within a graph are reviewed here. The conditions for a graph to have an Eulerian trail or circuit (cycle) are also mentioned.
The **Python** code of this NoteBook contains a small **graph analyzer** implemented from scratch for walks, trails, paths, cycles, and Eulerian trail or circuit.
6) **Subgraph:** A **subgraph** is a graph formed from a subset of the nodes and edges of a larger graph. A subgraph preserves the connections from the original graph, but may contain fewer nodes and edges.
Here, we review the concept of subgraph and some of its types. Also, we provide a *Python* function from scratch to check if a graph is a subgraph of another graph. We also use some functions available by *NetworkX* too, for subgraphs.
7) **Clique:** A **clique** is subset of nodes in a graph, which for its every two distinct nodes, there is an edge in the graph. In other words, a clique forms a **complete** subgraph.
In this Notebook, we review and implement the **Bron-Kerbosch** algorithm from scratch to find all maximal cliques in a given undirected graph. Moreover, we use the function **find_cliques** in *NetworkX* to compare with our results.
8) **Trees:** A **tree** is an **undirected** graph which is **connected** and **acyclic**. Or you may say a tree is an undirected connected graph with n nodes which has n-1 edges.
Here, we review trees in graph theory and express different kinds of them. Then, we bring some Python code to build trees and do some operations on them. As a bonus, we also create a simple decision tree for medical diagnosis.
9) **Depth-first search:** A **depth-first search** (**DFS**) is a search algorithm that is used to traverse graphs or trees. A DFS can be implemented in **recursive** way or **iterative** way.
Here, we implement both the *recursive* and *iterative* versions, and test them by a given graph. We also implement a DFS that finds a path between two nodes, and extend it to find all paths between two nodes.
10) **Topological sorting (ordering)** When we have a **DAG** (directed acyclic graph), we can form a **linear order** of nodes such that for every directed edge (u → v), node u comes before node v in the ordering. A DAG is a **directed** graph that has **no cycles**. Topological ordering is only possible for DAGs.
Here, we give two methods to create such orderings for DAGs: a **depth-first search** (DFS) method, and **Kahn's algorithm**. We bring both methods in Python codes. One feature of Kahn's algorthm is that it can report cycles in the given directed graph if such cycles exist.
As a bonus, we have also expressed a DFS-based method for checking if a directed graph is a DAG or not.
11) **Breadth-first search:** Breadth-first search (BFS) is another **uninformed** search algorithm, which is used for traversing graphs (or trees). BFS explores all nodes at the current depth level before moving on to nodes at the next depth level. It employs a **queue** to keep the order of exploration.
Here, we implement a BFS in Python code and use it for exploring nodes of a graph. We also implement the BFS for finding the **shortest path** from a start node to an end node. As a bonus, we use a BFS to compute the **degrees-of-separation** between two persons on a small *social network*.
12) **Dijkstra's algorithm:** This algorithm finds the shortest paths from a single **source** node to all other nodes in a **weighted** graph with **non-negative** edge weights. It uses a **greedy** approach, repeatedly visiting the closest unvisited node and updating distances to its neighbors. The algorithm guarantees **optimal** shortest paths and completes in O((|V|+|E|) log|V|) time with a **priority queue** (for graph G=(V,E)).
Here, we implement Dijkstra's algorithm from scratch in Python with the use of a priority queue. The priority queue is created by a min-heap. A simple graph is given to the algorithm to test it. Also, as a bonus, we use the Dijkstra's algorithm to find the strongest friendship path between two users in a social network.
13) **A\* algorithm:** A\* algorithm is another **informed** search algorithm that finds the shortest path from a `start` node to a `goal` node. It may be viewed as an extention to **Dijkstra's** algorithm. A\* algorithm uses the **evaluation** function `f(n)=g(n)+h(n)`, where `g(n)` is the actual distance from the start node to current node `n`, and `h(n)` is the **heuristic** which estimates the distance from the current node `n` to the `goal` node. If the heuristic function `h(.)` never overestimates the actual distance to goal, then the A\* algorithm will find the **optimal** path. Such a heuristic is called **admissible**.
Here, we implement A\* algorithm from scratch in Python, and test it with a simple grid world.
14) **Greedy best-first search:** The **GBFS** is another **informed** search algorithm that relies solely on a **heuristic** function, say `h(.)`, to explore nodes. Thus, the **evaluation** function will be `f(n)=h(n)`.
The GBFS is a fast algorithm that may not find the optimal path. The GBFS is implemented here from scratch in **Python**, and then is tested by a grid graph having some obstacles.
15) **Prim's algorithm for MST**: Prim's algorithm is used to find a **minimum spanning tree** (**MST**) of a weighted undirected graph. A graph may have more than one MST. Prim's algorithm is a node-based MST algorithm. It means that Prim's algorithm starts from a random node and grows it as a tree by always adding cheapest edge that connects a node inside the tree to a node outside the tree.
Here, we implement Prim's algorithm from scratch in Python using a binary min-heap as a priority queue. We then apply the algorithm to a simple graph. The original graph and its MST are shown by **NetworkX**.
16) **Kruskal's algorithm for MST:** Kruskal's algorithm is an edge-based algorithm for obtaining the **minimum spanning tree** (**MST**) of a weighted undirected graph. In this algorithm, we first sort edges based on their weight. Then, we add edges one by one until we get `n-1` edges for a graph with `n` nodes such that none of the edges creates cycle.
Here, we give the Python code of Kruskal's algorithm from scratch using **union-find** data structure. Then, we use two example graphs to test the algorithm.
17) **Label propagation:** Label propagation is a **community detection** algorithm. Label propagation discovers clusters (communities) in a graph or network by having nodes adopt the most frequent label among their neighbors, repeating this process until labels stabilize. There is no need to specifiy the number of communities. This algorithm is fast and can work well on large-scale graphs (or networks).
We implement label propagation from scratch in Python. Then, we use it with a simple manual graph, and with the Zachary's karate club network.
It should be reminded that label propagation might report different communities in different runs; because it is based on randomness, communities sometimes overlap, and networks may have ambiguous nodes that could belong to more than one community.
18) **What is modularity?** **Modularity** measures how well a network divides into communities by comparing actual connections to random expectations. Its range is from: -0.5 to 1.0. High values indicates strong community structure while *zero* means it is no better than random.
In this notebook, we implement the function to compute modularity from **adjacency matrix**. We also give a function to compute modularity from **adjacency list**. Some graphs along different community assignments are given to the function to see how modularity reflects community partitions in numerical values.
19) **Degree centrality:** This the the simplest and straightforward measure for **node's importance**. Degree centrality counts the number of connections of a node in an *unweighted* graph (which is the node degree). In a *weighted* graph, degree centrality of a node is the sum of the edge weights connected to the node. It is reminded that we talked about node degree in an earleir post.
Here, we bring the Python code of degree centrality for both unweighted and weighted graphs. We also mention the **normalization** of degree centrallity. A few examples accompany the Notebook file.
20) **Betweenness centrality:** Betweenness centrality measures how often a node lies on the shortest paths between other nodes in a network. A node with high betweenness means that it acts as bridges or bottlenecks that control information flow. Here, we implement functions to compute betweenness centrality for **unweighted** graphs and **weighted** graphs. We test these two functions by some examples.
It is reminded that the function to compute betweeness for unweighted graphs uses a **breadth-first search** (BFS), while for weighted graphs, it uses **Dijkstra's algorithm**.
21) **Closeness centrality:** Another metric for centrality is the **closeness centrality** in which we measure how close a node is to all other nodes in a network or graph. It calculates the reciprocal of the sum of shortest path distances from a node to all others. Nodes with high closeness can quickly interact with or reach the rest of the network, making them centrally positioned for efficient communication or influence spread.
Here, we implement the functions to compute closeness centrality for both **unweighted** and **weighted** graphs. A few examples are also included. It is noted that for **disconnected** graphs, the **harmonic** closeness is a better choice than the standard closeness.
22) **Eigenvector centrality:** Eigenvector centrality measures a node's *influence* (*importance*) by assigning higher scores to nodes connected to other high-scoring nodes—capturing the idea that *importance comes from being connected to important neighbors*.
Here, we implement **power iteration** algorithm for estimating the eigenvector centrality both for **unweighted** and **weighted** graphs in Python. We test the algorithm by a few examples of graphs.
23) **PageRank:** PageRank is a *link analysis* algorithm. it was primarily proposed to score web pages in the *World Wide Web* (or any *webgraph*). Here, every outgoing link is like a vote that has a value proportional to the rank of the node divided by the number of its outgoing links. Then, a node is important if important nodes link to it.
We express the PageRank algorithm from scratch in Python. Also, we test the algorithm with a few **directed** graphs.