https://github.com/fareedkhan-dev/github-social-network-analysis

This project is done using Python library NetworkX, which helps us to analyze the GitHub Social Network Data.
https://github.com/fareedkhan-dev/github-social-network-analysis
gephi github graph network networkx social-network social-network-analysis
Last synced: 7 days ago
JSON representation
This project is done using Python library NetworkX, which helps us to analyze the GitHub Social Network Data.
Host: GitHub
URL: https://github.com/fareedkhan-dev/github-social-network-analysis
Owner: FareedKhan-dev
Created: 2022-10-02T03:48:54.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2022-10-04T14:49:14.000Z (over 3 years ago)
Last Synced: 2025-06-21T09:05:27.314Z (8 months ago)
Topics: gephi, github, graph, network, networkx, social-network, social-network-analysis
Language: Jupyter Notebook
Homepage:
Size: 4.62 MB
Stars: 3
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project

README

          # SNA Group Assignment

Importing Libraries

```python

import networkx as nx

import pandas as pd

import matplotlib.pyplot as plt

```

Importing dataset

```python

users = pd.read_csv('musae_git_target.csv')

```

Reading Graphs

```python

Data = open('musae_git_edges.csv', "r")

next(Data, None)  # skip the first line in the input file

followers = nx.parse_edgelist(Data, delimiter=',', create_using=nx.Graph(), nodetype=int)

```

Finding Local Cluster (each nodes clustering coefficient)

```python

local_cluster = nx.clustering(followers)

sorted_local_cluster = {k: v for k, v in sorted(local_cluster.items(), key=lambda item: item[1])}

sorted_local_cluster

```

    {0: 0,

     34526: 0,

     34035: 0,

     6067: 0,

     ...

     33455: 0,

     13958: 0,

     7764: 0,

     ...}

Global Clustering with count zeroes (default)

```python

global_cluster = nx.average_clustering(followers, count_zeros=True)

global_cluster

```

    0.16753704480107323

eccentricity (shortest path with maximum number of nodes)

```python

# import pickle

# with open('eccentricity.pkl', 'rb') as f:

#     eccentricity = pickle.load(f)

eccentricity = nx.eccentricity(followers)

sorted_eccentricity = {k: v for k, v in sorted(eccentricity.items(), key=lambda item: item[1])}

sorted_eccentricity = list(dict(sorted(eccentricity.items(), key=lambda item: item[1], reverse=True)).items())

# get index (node number) and value (node eccentricity value) top 10 after sorting

sorted_eccentricity_indexes = [x[0] for x in sorted_eccentricity]

sorted_eccentricity_values = [x[1] for x in sorted_eccentricity]

# Creating dataframe

top_sorted_eccentricity = pd.DataFrame({'Name':users.iloc[sorted_eccentricity_indexes].name.tolist(), 

                                      'Eccentricity': sorted_eccentricity_values, 

                                      'ml_target':users.iloc[sorted_eccentricity_indexes].ml_target.tolist()})

top_sorted_eccentricity.groupby('Eccentricity').count()['ml_target'].sort_values(ascending=False)

# top_sorted_eccentricity

```

    Eccentricity

    7     27399

    8      8677

    6      1113

    9       480

    10       27

    11        4

    Name: ml_target, dtype: int64

Radius of Followers Graph

```python

radius_of_graph = nx.radius(followers)

radius_of_graph

```

    6

    

Diameter of Followers Graph

```python

diameter_of_graph = nx.diameter(followers)

diameter_of_graph

```

    11

    

For verification diameter=maximum eccentricity

Density of follower

```python

density_of_graph = nx.density(followers)

density_of_graph

```

    0.0004066878203117068

Degree Distribution upto degree value 50 

```python

degree_distrubution = nx.degree_histogram(followers)

fig = plt.figure(figsize=(15, 10))

x = [i for i in range(0, 50)]

plt.bar(x, degree_distrubution[0:50])

plt.show()

```

    

![png](GitHub%20Social%20Network%20Code%20File_21_0.png)

    

Connected Components

```python

print(nx.is_connected(followers))

print(nx.number_connected_components(followers))

```

    True

    1

    

Average Path Length

```python

average_short_path_length = nx.average_shortest_path_length(followers)

average_short_path_length

```

    3.2464090056353823

_______________

```python

# Creating dataframe

single_value_calc = pd.DataFrame({'Radius': [radius_of_graph], 'Diameter': [diameter_of_graph], 'Density':[density_of_graph], 

                                  'Connected Component': [nx.number_connected_components(followers)],

                                  'Average Path Length': [average_short_path_length]})

# single_value_calc = pd.DataFrame({'Radius': [6], 'Diameter': [11],  'Density': [0.0004066878203117068], 

#                                   'Connected Component': [1], 'Average Path Length': [3.2464090056353823]})

single_value_calc.rename(index={0:'Overall Graph Measures'})

```



    .dataframe tbody tr th:only-of-type {

        vertical-align: middle;

    }

    .dataframe tbody tr th {

        vertical-align: top;

    }

    .dataframe thead th {

        text-align: right;

    }

  

    

      

      Radius

      Diameter

      Density

      Connected Component

      Average Path Length

    

  

  

    

      Overall Graph Measures

      6

      11

      0.000407

      1

      3.246409

    

  



```python

local_clustering_sort = list((k,v) for k, v in sorted(local_cluster.items(), key=lambda item: item[0], reverse=True))

eccentricity_sort = list((k,v)for k, v in sorted(eccentricity.items(), key=lambda item: item[0], reverse=True))

# get index (node number) and value (node centrality value) top 10 after sorting

sorted_indexes = [x[0] for x in local_clustering_sort]

local_values_sort = [x[1] for x in local_clustering_sort]

eccentricity_values_sort = [x[1] for x in eccentricity_sort]

# Creating dataframe

eccentricity_and_local_cluster = pd.DataFrame({'Name':users.iloc[sorted_indexes].name.tolist(), 

                                      'Eccentricity': eccentricity_values_sort, 

                                      'Local Clustering': local_values_sort})

eccentricity_and_local_cluster

```



    .dataframe tbody tr th:only-of-type {

        vertical-align: middle;

    }

    .dataframe tbody tr th {

        vertical-align: top;

    }

    .dataframe thead th {

        text-align: right;

    }

  

    

      

      Name

      Eccentricity

      Local Clustering

    

  

  

    

      0

      caseycavanagh

      7

      0.333333

    

    

      1

      Injabie3

      8

      0.000000

    

    

      2

      qpautrat

      7

      0.000000

    

    

      3

      kris-ipeh

      7

      0.000000

    

    

      4

      shawnwanderson

      8

      0.000000

    

    

      ...

      ...

      ...

      ...

    

    

      37695

      sunilangadi2

      8

      0.000000

    

    

      37696

      SuhwanCha

      7

      0.000000

    

    

      37697

      JpMCarrilho

      8

      0.000000

    

    

      37698

      shawflying

      8

      0.178571

    

    

      37699

      Eiryyy

      8

      0.000000

    

  

37700 rows × 3 columns



___________

### Centrality Measures

Degree Centrality

```python

# Applying degree centrality in NetworX

deg_centrality = nx.degree_centrality(followers)

# Applying degree centrality in NetworX

deg_centrality = nx.degree_centrality(followers)

# Sorting degree centrality and getting top 10

sorted_deg_centrality = list(dict(sorted(deg_centrality.items(), key=lambda item: item[1], reverse=True)).items())

# get index (node number) and value (node centrality value) top 10 after sorting

sorted_deg_centrality_indexes = [x[0] for x in sorted_deg_centrality]

sorted_deg_centrality_values = [x[1] for x in sorted_deg_centrality]

# Creating dataframe

top_degree_centrality = pd.DataFrame({'Name':users.iloc[sorted_deg_centrality_indexes].name.tolist(), 

                                      'Degree Centrality': sorted_deg_centrality_values, 

                                      'ml_target':users.iloc[sorted_deg_centrality_indexes].ml_target.tolist()})

top_degree_centrality

```



    .dataframe tbody tr th:only-of-type {

        vertical-align: middle;

    }

    .dataframe tbody tr th {

        vertical-align: top;

    }

    .dataframe thead th {

        text-align: right;

    }

  

    

      

      Name

      Degree Centrality

      ml_target

    

  

  

    

      0

      dalinhuang99

      0.250882

      0

    

    

      1

      nfultz

      0.187936

      0

    

    

      2

      addyosmani

      0.088172

      0

    

    

      3

      Bunlong

      0.078464

      0

    

    

      4

      gabrielpconceicao

      0.065466

      0

    

    

      ...

      ...

      ...

      ...

    

    

      37695

      chrisryancarter

      0.000027

      0

    

    

      37696

      kcakdemir

      0.000027

      1

    

    

      37697

      chadmazilly

      0.000027

      0

    

    

      37698

      orionblastar

      0.000027

      1

    

    

      37699

      jovanidash21

      0.000027

      0

    

  

37700 rows × 3 columns



Closeness Centrality

```python

# Loading save json of closeness centrality output

# import pickle

# with open('closness_centrality.pkl', 'rb') as f:

#     closeness_centrality = pickle.load(f)

# Applying closeness centrality in NetworX

closeness_centrality = nx.closeness_centrality(followers)

# Sorting degree centrality and getting top 10

sorted_closness_centrality = list(dict(sorted(closeness_centrality.items(), key=lambda item: item[1], reverse=True)).items())

# get index (node number) and value (node centrality value) top 10 after sorting

sorted_closeness_centrality_indexes = [x[0] for x in sorted_closness_centrality]

sorted_closeness_centrality_values = [x[1] for x in sorted_closness_centrality]

# Creating dataframe

top_closeness_centrality = pd.DataFrame({'Name':users.iloc[sorted_closeness_centrality_indexes].name.tolist(), 

                                      'Closeness Centrality': sorted_closeness_centrality_values, 

                                      'ml_target':users.iloc[sorted_closeness_centrality_indexes].ml_target.tolist()})

top_closeness_centrality

```



    .dataframe tbody tr th:only-of-type {

        vertical-align: middle;

    }

    .dataframe tbody tr th {

        vertical-align: top;

    }

    .dataframe thead th {

        text-align: right;

    }

  

    

      

      Name

      Closeness Centrality

      ml_target

    

  

  

    

      0

      nfultz

      0.523081

      0

    

    

      1

      dalinhuang99

      0.517787

      0

    

    

      2

      Bunlong

      0.466324

      0

    

    

      3

      addyosmani

      0.450342

      0

    

    

      4

      gabrielpconceicao

      0.447461

      0

    

    

      ...

      ...

      ...

      ...

    

    

      37695

      haochenli

      0.155371

      0

    

    

      37696

      abhishekpopli

      0.153934

      1

    

    

      37697

      scandeiro

      0.147439

      0

    

    

      37698

      jazzchipc

      0.145151

      0

    

    

      37699

      SOUMAJYOTI

      0.141389

      1

    

  

37700 rows × 3 columns



Betweenness Centrality

```python

# Loading save json of closeness centrality output

# import pickle

# with open('betweeness_centrality.pkl', 'rb') as f:

#     betweeness_centrality = pickle.load(f)

# Applying betweeness centrality in NetworX

betweeness_centrality = nx.betweenness_centrality(followers)

# Sorting degree centrality and getting top 10

sorted_between_centrality = list(dict(sorted(betweeness_centrality.items(), key=lambda item: item[1], reverse=True)).items())

# get index (node number) and value (node centrality value) top 10 after sorting

sorted_between_centrality_indexes = [x[0] for x in sorted_between_centrality]

sorted_between_centrality_values = [x[1] for x in sorted_between_centrality]

# Creating dataframe

top_between_centrality = pd.DataFrame({'Name':users.iloc[sorted_between_centrality_indexes].name.tolist(), 

                                      'Betweeness Centrality': sorted_between_centrality_values, 

                                      'ml_target':users.iloc[sorted_between_centrality_indexes].ml_target.tolist()})

top_between_centrality

```



    .dataframe tbody tr th:only-of-type {

        vertical-align: middle;

    }

    .dataframe tbody tr th {

        vertical-align: top;

    }

    .dataframe thead th {

        text-align: right;

    }

  

    

      

      Name

      Betweeness Centrality

      ml_target

    

  

  

    

      0

      dalinhuang99

      0.269599

      0

    

    

      1

      nfultz

      0.240541

      0

    

    

      2

      Bunlong

      0.055323

      0

    

    

      3

      addyosmani

      0.043408

      0

    

    

      4

      gabrielpconceicao

      0.035337

      0

    

    

      ...

      ...

      ...

      ...

    

    

      37695

      chrisryancarter

      0.000000

      0

    

    

      37696

      kcakdemir

      0.000000

      1

    

    

      37697

      chadmazilly

      0.000000

      0

    

    

      37698

      orionblastar

      0.000000

      1

    

    

      37699

      jovanidash21

      0.000000

      0

    

  

37700 rows × 3 columns



```python

plt.subplot(1,3,1)

plt.xlabel('Centrality Values')

plt.title('Degree Centrality')

top_degree_centrality['Degree Centrality'].plot.hist(figsize=(30, 5), ylim=(0,200))

plt.subplot(1,3,2)

plt.xlabel('Centrality Values')

plt.title('Closeness Centrality')

top_closeness_centrality['Closeness Centrality'].plot.hist(figsize=(30, 5))

plt.subplot(1,3,3)

plt.xlabel('Centrality Values')

plt.title('Betweeness Centrality')

top_between_centrality['Betweeness Centrality'].plot.hist(figsize=(30, 5))

plt.show()

```

    

![png](GitHub%20Social%20Network%20Code%20File_37_0.png)

    

Visualizing Highest Degree Centrality Node in Graph

```python

followers_network = pd.read_csv('musae_git_edges.csv')

users = pd.read_csv('musae_git_target.csv')

nodes_with_deg_gret_100 = dict((k, v) for k, v in dict(followers.degree).items() if v > 50)

# Creating a function to check if id_1 or id_2 is ML or Web developer

def check_if_exist(id):

    if id in list(nodes_with_deg_gret_100.keys()):

        return 1

    else:

        return 0

followers_network['first_id'] = followers_network['id_1'].apply(check_if_exist)

# followers_network['second_id'] = followers_network['id_2'].apply(check_if_exist)

followers_test = followers_network[(followers_network['first_id'] == 1) & (followers_network['id_2'] == 31890)]

G = nx.from_pandas_edgelist(followers_test, 'id_1', 'id_2')

# Visualizing the highest degree centrality node regin in Graph

fig = plt.figure(figsize=(10, 10))

plt.margins(0,0)

nodes_sizes = []

nodes_color = []

for each in list(G.nodes):

    if each == sorted_deg_centrality[0][0]:

        nodes_sizes.append(4000)

        nodes_color.append('#f02b79')

    elif each == sorted_deg_centrality[1][0]:

        nodes_sizes.append(1000)

        nodes_color.append('#f02b79')

    else:

        nodes_sizes.append(20)

        nodes_color.append('#02A4DE')

print('Highest Degree Centrality Node Lies in Red Shaded Region')

nx.draw(G, node_size=nodes_sizes, node_color=nodes_color, edge_color='#dce8e0')

```

    Highest Degree Centrality Node Lies in Red Shaded Region

    

    

![png](GitHub%20Social%20Network%20Code%20File_39_1.png)

    

Visualizing Two Highest Closeness Centrality Node in Graph

```python

followers_network = pd.read_csv('musae_git_edges.csv')

users = pd.read_csv('musae_git_target.csv')

nodes_with_deg_gret_100 = dict((k, v) for k, v in dict(followers.degree).items() if v > 50)

# Creating a function to check if id_1 or id_2 is ML or Web developer

def check_if_exist(id):

    if id in list(nodes_with_deg_gret_100.keys()):

        return 1

    else:

        return 0

followers_network['first_id'] = followers_network['id_1'].apply(check_if_exist)

# followers_network['second_id'] = followers_network['id_2'].apply(check_if_exist)

followers_test = followers_network[(followers_network['first_id'] == 1) & (followers_network['id_2'] == 31890)]

G = nx.from_pandas_edgelist(followers_test, 'id_1', 'id_2')

# Visualizing the highest degree centrality node regin in Graph

fig = plt.figure(figsize=(10, 10))

plt.margins(0,0)

nodes_sizes = []

nodes_color = []

for each in list(G.nodes):

    if each == sorted_closness_centrality[0][0]:

        nodes_sizes.append(4000)

        nodes_color.append('#f02b79')

    elif each == sorted_closness_centrality[1][0]:

        nodes_sizes.append(1000)

        nodes_color.append('#f02b79')

    else:

        nodes_sizes.append(20)

        nodes_color.append('#02A4DE')

print('Highest Closeness Centrality Node Lies in Red Shaded Region')

nx.draw(G, node_size=nodes_sizes, node_color=nodes_color, edge_color='#dce8e0')

```

    Highest Closeness Centrality Node Lies in Red Shaded Region

    

    

![png](GitHub%20Social%20Network%20Code%20File_41_1.png)

    

Visualizing Two Betwenness Closeness Centrality Node in Graph

```python

followers_network = pd.read_csv('musae_git_edges.csv')

users = pd.read_csv('musae_git_target.csv')

nodes_with_deg_gret_100 = dict((k, v) for k, v in dict(followers.degree).items() if v > 50)

# Creating a function to check if id_1 or id_2 is ML or Web developer

def check_if_exist(id):

    if id in list(nodes_with_deg_gret_100.keys()):

        return 1

    else:

        return 0

followers_network['first_id'] = followers_network['id_1'].apply(check_if_exist)

# followers_network['second_id'] = followers_network['id_2'].apply(check_if_exist)

followers_test = followers_network[(followers_network['first_id'] == 1) & (followers_network['id_2'] == 31890)]

G = nx.from_pandas_edgelist(followers_test, 'id_1', 'id_2')

# Visualizing the highest degree centrality node regin in Graph

fig = plt.figure(figsize=(10, 10))

plt.margins(0,0)

nodes_sizes = []

nodes_color = []

for each in list(G.nodes):

    if each == sorted_between_centrality[0][0]:

        nodes_sizes.append(4000)

        nodes_color.append('#f02b79')

    elif each == sorted_between_centrality[1][0]:

        nodes_sizes.append(1000)

        nodes_color.append('#f02b79')

    else:

        nodes_sizes.append(20)

        nodes_color.append('#02A4DE')

print('Highest Betweenness Centrality Node Lies in Red Shaded Region')

nx.draw(G, node_size=nodes_sizes, node_color=nodes_color, edge_color='#dce8e0')

```

    Highest Betweenness Centrality Node Lies in Red Shaded Region

    

    

![png](GitHub%20Social%20Network%20Code%20File_43_1.png)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fareedkhan-dev/github-social-network-analysis

Awesome Lists containing this project

README