Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/ggeop/recommendation-systems-evaluation-metrics

Basic Recommendation & Evaluation System.
https://github.com/ggeop/recommendation-systems-evaluation-metrics
centrality-metrics python3 recommender-system social-network-analysis
Last synced: 18 days ago
JSON representation
Basic Recommendation & Evaluation System.
Host: GitHub
URL: https://github.com/ggeop/recommendation-systems-evaluation-metrics
Owner: ggeop
License: agpl-3.0
Created: 2018-05-21T21:54:15.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2019-02-02T12:04:52.000Z (almost 6 years ago)
Last Synced: 2024-12-06T19:01:55.681Z (27 days ago)
Topics: centrality-metrics, python3, recommender-system, social-network-analysis
Language: Jupyter Notebook
Homepage:
Size: 604 KB
Stars: 3
Watchers: 0
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

        ![alt text](https://github.com/ggeop/Recommendation-System/blob/master/imgs/photo_cover-01.png)

## Datasets

### Original Dataset

The dataset we used consists of 'circles' (or “friends” lists') from Facebook. Facebook data was collected from survey participants using this Facebook app. The dataset includes node features (profiles), circles, and ego networks. This dataset consists of 4039 nodes (users) and 88234 edges (friendships between users). 

The dataset was downloaded from https://snap.stanford.edu/data/egonets-Facebook.html. 

#### Transform the graph to undirected

```{py}

#Load the dataset

data=pd.read_csv("facebook_combined.txt",sep=" ", header=None)

#Add column names

data.columns = ["node1", "node2"]

#Transform the graph to undirected

data2=pd.concat([data.node2,data.node1], axis=1)

#Rename the columns in order to merge the columns

data2.columns= ["node1", "node2"]

data=data.append(data2)

#Reset indexes

data = data.reset_index(drop=True)

```

In the original dataset the graph was connected and directed (each edge counts as friendship for both nodes). A directed graph  is a graph in which edges have orientations. So for this project to work, we had to transform the graph into undirected graph in order to perform all the necessary tasks. An undirected graph is a graph in which edges have no orientation. 

In order to achieve this the only thing we had to do was to add the missing edges. For instance, if we have an edge from node1 to node2 we added an edge from node2 to node1 in the file. We did that by using pandas library, and more specific we used the command concat() in order to concatenate the missing edges.

In that way we have created a new dataset in which we have changed the position of the two columns we already have (node1, node2).  Then we renamed the columns again in order to use the append() command to merge the two columns by rows. Since we wanted to use the append() command we had to rename the columns because this command performs the merging based on the indexes of the columns.

### Test Dataset

```

#Create a sample graph dataset

test_data = pd.DataFrame([[5, 2], 

                       [9, 3],

                       [9, 11],

                       [3, 6],

                       [4, 6],

                       [5, 7],

                       [1, 11],

                       [6, 2],

                       [7, 9],

                       [8, 9],

                       [5, 11],

                       [6, 7],

                       [6, 11],

                       [7, 6],

                       [2, 11],

                       [11,2],

                       [2, 5],

                       [2, 7],

                       [7, 2]],

                      columns=["node1", "node2"])

```

Now we have created dictionaries with the id of the node as the key and a set with the friends of the node as the value.

```

friendships={}

#Create friendships dict

for node in [1,2,3,4,5,6,7,8,9,11]:

    #Create a list with the friends of node

    ls=dataset[test_data.node1 == node]['node2'].tolist()

    #Create a dictionary with key the node and value the list

    friendships[node]=ls

    

print(friendships) 

```

The relationships are the following: 1: [11], 2: [5, 6, 7, 11], 3: [6, 9], 4: [6], 5: [2, 7], 6: [2, 3, 4, 7], 7: [2, 5, 6], 8: [9], 9: [3, 8], 11: [1, 2]

After that we created a graphical representation of the above relationships.

![alt text](https://github.com/ggeop/Recommendation-Systems-Evaluation-Metrics/blob/master/imgs/SampleDatasetgGraphs.png)

## Local structure based methods for link prediction

### Recommending friends using Common neighbors (friend-of-friend (FoF) method)

Common Neighbor method is one of the simplest techniques used for link prediction. Two nodes are likely to form a link if they have many common neighbors. 

This method of link prediction is also called friend-of-friend link prediction.

So, what we did here was to create a function that takes as input the users, the dataset we want and the target user that we want to make the recommendation. At the beginning, for the target user we found all of its friends.

Then we used the intersection between the friends of the target user and the rest of the users and we created a set. And finally we created the sorted list, with the ability in ties to take the smallest number of user id, with the 10 recommended friends for the target user using the Common Neighbors method.

```

def friendOfFriendScore(users, dataset, target):  

    #Initialize  

    l=list()  

    friendships={}  

  

    #Create friendships dict  

    for node in users:  

        #Create a list with the friends of node  

        ls=dataset[dataset.node1 == node]['node2'].tolist()  

  

        #Create a dictionary with key the node and value the list  

        friendships[node]=ls  

  

    # Initialize a dictionary with the intersections  

    inter={}  

  

    #Intersection between users  

    for j in friendships:  

        if (target != j) and (target not in friendships[j]) :  

                 intersection=(len(set(friendships.get(target)).intersection(set(friendships.get(j)))))  

              

            #Keep intersection into a list

            inter[j]=intersection  

     

    #Create a sorted list, in ties we take the smallest ID  

    lis=sorted(inter.items(), key=lambda value: value[1], reverse=True)

  

    #Final Result  

    return(lis[0:10]);  

```

### Recommending friends using Jaccard coefficient

Jaccard   in 1901 proposed a statistic to compare similarity and diversity of sample sets. 

It is the ratio of common neighbors of nodes x and y to the all neighbors nodes of x and y. 

As a result value of Jaccard index prevents higher degree nodes to have high similarity index with other nodes.

Based on that we created a function that takes as an input again the users, the dataset and the target user. At first, we found the target user’s friends as previous and then we applied the Jaccard coefficient, by dividing the intersection of the target user’s friends with the rest of the users by the equivalent union.

At the end we had again a sorted list, with the ability in ties to take the smallest number of user id, of the 10 recommended friends for the target user using the Jaccard similarity coefficient.

```

def JaccardCoefficientScore(users, dataset, target):

    #Initialize

    l=list()

    friendships={}

    #Create friendships dict

    for node in users:

        #Create a list with the friends of node

        ls=dataset[dataset.node1 == node]['node2'].tolist()

        #Create a dictionary with key the node and value the list

        friendships[node]=ls

    # Initialize a dictionary with the intersections

    inter={}

    #Intersection between users

    for j in friendships:

        if (target != j) and (target not in friendships[j]) :

            

            # Create union

             union=len(set(friendships.get(target)).union(set(friendships.get(j))))

            

            # Check for No zero denominator

            if (union != 0) :

                inter[j]=len(set(friendships.get(target)).intersection(set(friendships.get(j))))/union

    #Create a sorted list, in ties we take the smallest ID

    lis=sorted(inter.items(), key=lambda value: value[1], reverse=True)

    #Final Result

    return(lis[0:10]);

```

### Recommending friends using Adamic/Adar function

Adamic-Adar index proposed by Adamic and Adar is calculated by adding weights to the nodes which are connected to both nodes A and B.

Again we created a function which takes as inputs the users, the dataset and the target user. Then we found all the friends of the target user and put them in a list. Then, we had to apply the Adamic/Adar so we performed the intersection between the target user’s friends and the rest of the users. Then we applied the Adamic/Adar measure by summing the number of neighbors any two users have in common and divide it by the log frequency of their occurrence, in order to weight items that are unique to a few users more than commonly occurring items.

At the end we created again a sorted list, with the ability in ties to take the smallest number of user id, of the 10 recommended friends for the target user according to the Adamic/Adar measure.

```

def AdamicAdarFunctionScore(users, dataset, target):

    #Initialize

    l=list()

    friendships={}

    #Create friendships dict

    for node in users:

        #Create a list with the friends of node

        ls=dataset[dataset.node1 == node]['node2'].tolist()

        #Create a dictionary with key the node and value the list

        friendships[node]=ls

    # Initialize a dictionary with the intersections

    inter={}

    #Intersection between users

    for j in friendships:

        if (target != j) and (target not in friendships[j]) :

            intersection =    set(friendships.get(target)).intersection(set(friendships.get(j)))

            # Adamic and Adar score calculation

            sum = 0

            for k in intersection :

                if (k in friendships.keys()) and (friendships[k] != []) and len(friendships[k]) != 1:

                    sum = sum+1/np.log(len(friendships[k]))

            inter[j]=sum

   

    #Create a sorted list, in ties we take the smallest ID

    lis=sorted(inter.items(), key=lambda value: value[1], reverse=True)

    #Final Result

    return(lis[0:10]);

```

### Recommending Friends with Leicht-Holme-Newman Index (Extra method)

Leicht proposed a measure to define local structure based similarity measure. It is the ratio of common neighbors of nodes a and b to the product of degrees of nodes a and b.

So we created a function with inputs the users, the dataset and the target user. As in previous methods we found the target user’s friends and stored them in a list. Then we performed the intersection between the target user’s friends and the rest of the users and stored them in a set. And finally we calculated the degrees of the two compared users each time and we divided the intersection with that. So now we have our final sorted list, with the ability in ties to take the smallest number of user id, of the 10 recommended friends for the target user according to the Leicht-Holme-Newman method.

```

    

def LeichtHolmeNewmanScore(users, dataset, target):  

        #Initialize  

        l=list()  

        friendships={}  

      

        #Create friendships dict  

        for node in users:  

            #Create a list with the friends of node  

            ls=dataset[dataset.node1 == node]['node2'].tolist()  

      

            #Create a dictionary with key the node and value the list  

            friendships[node]=ls  

      

        # Initialize a dictionary with the intersections  

        inter={}  

      

        #Intersection between users  

        for j in friendships:  

            if (target != j) and (target not in friendships[j]) :  

                intersection=(len(set(friendships.get(target)).intersection(set(friendships.get(j)))))  

                  

                #Calculate the k for j and target  

                k1=len(friendships.get(j))  

                k2=len(friendships.get(target))  

                  

                if (k1 !=0 and k2 !=0):  

                    #Store the intersection in the list inter[]  

                    inter[j]=intersection/(k1*k2)  

         

        #Create a sorted list, in ties we take the smallest ID  

        lis=sorted(inter.items(), key=lambda value: value[1], reverse=True)  

      

        #Final Result  

        return(lis[0:10]);  

```

## Evaluation of the recommendation system

### Compute the average similarity

```

# Create users list

users=list(range(0,4038))

#Initialization

s1=[]

s2=[]

s3=[]

for i in list(range(100,4100,100)):

    

    #Run the functions

    fofList=friendOfFriendScore(users, data,i)

    JaccardList=JaccardCoefficientScore (users, data,i)

    AdamicAdarList=AdamicAdarFunctionScore (users, data,i)

    

    #Similarity Percentage of FoF and Jaccard

    s1.append(len(set(fofList).intersection(set(JaccardList)))*10)

    

    #Similarity Percentage of FoF and Adamic and Adar

    s2.append(len(set(fofList).intersection(set(AdamicAdarList)))*10)

    

    #Similarity Percentage of Jaccard and Adamic and Adar

    s3.append(len(set(AdamicAdarList).intersection(set(JaccardList)))*10)

#Average Similarity (%)

print("The average similarity of FoF & Jaccard is:",np.mean(s1),"%")

print("The average similarity of FoF & Adamic Adar is:",np.mean(s2),"%")

print("The average similarity of Adamic Adar & Jaccard is:",np.mean(s3),"%")

```

After a few minutes of computations finally the results are:

o	The average similarity of FoF & Jaccard is: 55.5 %

o	The average similarity of FoF & Adamic/Adar is: 90.75 %

o	The average similarity of Adamic/Adar & Jaccard is: 57.0 %

### Forecast Recommendations

#### Evaluation Function

In this stage we have to estimate the quality of the recommendation methods. We create a function (evaluationFunction()) which computes the strength of the connection between two nodes. In more details, we insert two already friends of our network and the function removes this connection from the dataset. After the connection is dropped, the algorithm searches for every method if one of the two nodes (ex. F1) exists in the list of the second node (ex. F2). We do the same process in both F1 and F2. Also, we would like to mention that if a node does not exist in the recommendation list of the other node we exclude this relationship.

#### Score Calculation

The score for each algorithm is calculated according to the position of the list. Also, we take the average value of the position for both F1 and F2. The higher the score is, the higher the quality of the algorithm.

```

def evaluationFunction(dataset,users,F1,F2):

    ####Remove the relationship

    

    #First we find the connection F1-F2

    l1=dataset[dataset.node2 == F1].index

    l2=dataset[dataset.node1 == F2 ].index

    rm1=set(l1).intersection(set(l2))

    

    #Then we find the connection F2-F1

    l1=dataset[dataset.node2 == F2 ].index

    l2=dataset[dataset.node1 == F1].index

    rm2=set(l1).intersection(set(l2))

    #We create the union

    rm=rm1.union(rm2)

   

    

    #Remove the elements of the set rm

    for i in rm:

        dataset=dataset.drop(i)

    

    ###FoF (friend-of-friend)

    if ((F1 in friendOfFriend(users, dataset,F2)) and (F2 in friendOfFriend(users, dataset,F1))): 

        

        #Compute the recommendations for F1

        Friend1=10 - friendOfFriend(users, dataset,F1).index(F2)

        #Compute the resommentdations for F2

        Friend2=10 - friendOfFriend(users, dataset,F2).index(F1)

        ####Compute the score

        scoreFoF=(Friend1+Friend2)/2

        

    else:

        return(None);

    

    ###Jaccard

    if ((F1 in JaccardCoefficient(users, dataset,F2)) and (F2 in JaccardCoefficient(users, dataset,F1))): 

        

        #Compute the recommendations for F1

        Friend1=10 - JaccardCoefficient(users, dataset,F1).index(F2)

        #Compute the resommentdations for F2

        Friend2=10 - JaccardCoefficient(users, dataset,F2).index(F1)

        ####Check if either of these does not exist

        ####Compute the score

        scoreJaccard=(Friend1+Friend2)/2

        

    else:

        return(None);

    

    ###AdamicAdar

    if ((F1 in AdamicAdarFunction(users, dataset,F2)) and (F2 in AdamicAdarFunction(users, dataset,F1))):

        

        #Compute the recommendations for F1

        Friend1=10 - AdamicAdarFunction(users, dataset,F1).index(F2)

        #Compute the resommentdations for F2

        Friend2=10 - AdamicAdarFunction(users, dataset,F2).index(F1)

        ####Check if either of these does not exist

        ####Compute the score

        scoreAdamicAdar=(Friend1+Friend2)/2

        

    else:

        return(None); 

          

 

    return(scoreFoF,scoreJaccard,scoreAdamicAdar);

```

#### Iteration Function

In order to have more accurate results we should run the algorithm more than once. So, we created the algorithm (finalscore()) in order to recall the evaluation function many times. Specifically, this function takes a random index from the original dataset. Then we call the evaluation function in order to create a score for each relationship; the outputs of the evaluation function are stored in a list. Finally, after all that repetitions, we calculate the average score for each method.

```

#Function for iterations

def finalScore(dataset,users, n):

    eval_scores=[]

    #We want to have 100 succesful iterations in all methods

    while i