Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ggeop/recommendation-systems-evaluation-metrics

Basic Recommendation & Evaluation System.
https://github.com/ggeop/recommendation-systems-evaluation-metrics

centrality-metrics python3 recommender-system social-network-analysis

Last synced: 18 days ago
JSON representation

Basic Recommendation & Evaluation System.

Awesome Lists containing this project

README

        

![alt text](https://github.com/ggeop/Recommendation-System/blob/master/imgs/photo_cover-01.png)

## Datasets

### Original Dataset

The dataset we used consists of 'circles' (or “friends” lists') from Facebook. Facebook data was collected from survey participants using this Facebook app. The dataset includes node features (profiles), circles, and ego networks. This dataset consists of 4039 nodes (users) and 88234 edges (friendships between users).
The dataset was downloaded from https://snap.stanford.edu/data/egonets-Facebook.html.

#### Transform the graph to undirected

```{py}
#Load the dataset
data=pd.read_csv("facebook_combined.txt",sep=" ", header=None)

#Add column names
data.columns = ["node1", "node2"]

#Transform the graph to undirected
data2=pd.concat([data.node2,data.node1], axis=1)

#Rename the columns in order to merge the columns
data2.columns= ["node1", "node2"]
data=data.append(data2)

#Reset indexes
data = data.reset_index(drop=True)

```

In the original dataset the graph was connected and directed (each edge counts as friendship for both nodes). A directed graph is a graph in which edges have orientations. So for this project to work, we had to transform the graph into undirected graph in order to perform all the necessary tasks. An undirected graph is a graph in which edges have no orientation.

In order to achieve this the only thing we had to do was to add the missing edges. For instance, if we have an edge from node1 to node2 we added an edge from node2 to node1 in the file. We did that by using pandas library, and more specific we used the command concat() in order to concatenate the missing edges.
In that way we have created a new dataset in which we have changed the position of the two columns we already have (node1, node2). Then we renamed the columns again in order to use the append() command to merge the two columns by rows. Since we wanted to use the append() command we had to rename the columns because this command performs the merging based on the indexes of the columns.

### Test Dataset

```
#Create a sample graph dataset
test_data = pd.DataFrame([[5, 2],
[9, 3],
[9, 11],
[3, 6],
[4, 6],
[5, 7],
[1, 11],
[6, 2],
[7, 9],
[8, 9],
[5, 11],
[6, 7],
[6, 11],
[7, 6],
[2, 11],
[11,2],
[2, 5],
[2, 7],
[7, 2]],
columns=["node1", "node2"])

```
Now we have created dictionaries with the id of the node as the key and a set with the friends of the node as the value.

```
friendships={}
#Create friendships dict
for node in [1,2,3,4,5,6,7,8,9,11]:
#Create a list with the friends of node
ls=dataset[test_data.node1 == node]['node2'].tolist()

#Create a dictionary with key the node and value the list
friendships[node]=ls

print(friendships)

```
The relationships are the following: 1: [11], 2: [5, 6, 7, 11], 3: [6, 9], 4: [6], 5: [2, 7], 6: [2, 3, 4, 7], 7: [2, 5, 6], 8: [9], 9: [3, 8], 11: [1, 2]

After that we created a graphical representation of the above relationships.

![alt text](https://github.com/ggeop/Recommendation-Systems-Evaluation-Metrics/blob/master/imgs/SampleDatasetgGraphs.png)

## Local structure based methods for link prediction

### Recommending friends using Common neighbors (friend-of-friend (FoF) method)

Common Neighbor method is one of the simplest techniques used for link prediction. Two nodes are likely to form a link if they have many common neighbors.
This method of link prediction is also called friend-of-friend link prediction.

So, what we did here was to create a function that takes as input the users, the dataset we want and the target user that we want to make the recommendation. At the beginning, for the target user we found all of its friends.

Then we used the intersection between the friends of the target user and the rest of the users and we created a set. And finally we created the sorted list, with the ability in ties to take the smallest number of user id, with the 10 recommended friends for the target user using the Common Neighbors method.

```
def friendOfFriendScore(users, dataset, target):
#Initialize
l=list()
friendships={}

#Create friendships dict
for node in users:
#Create a list with the friends of node
ls=dataset[dataset.node1 == node]['node2'].tolist()

#Create a dictionary with key the node and value the list
friendships[node]=ls

# Initialize a dictionary with the intersections
inter={}

#Intersection between users
for j in friendships:
if (target != j) and (target not in friendships[j]) :
intersection=(len(set(friendships.get(target)).intersection(set(friendships.get(j)))))

#Keep intersection into a list
inter[j]=intersection

#Create a sorted list, in ties we take the smallest ID
lis=sorted(inter.items(), key=lambda value: value[1], reverse=True)

#Final Result
return(lis[0:10]);

```

### Recommending friends using Jaccard coefficient

Jaccard in 1901 proposed a statistic to compare similarity and diversity of sample sets.
It is the ratio of common neighbors of nodes x and y to the all neighbors nodes of x and y.
As a result value of Jaccard index prevents higher degree nodes to have high similarity index with other nodes.
Based on that we created a function that takes as an input again the users, the dataset and the target user. At first, we found the target user’s friends as previous and then we applied the Jaccard coefficient, by dividing the intersection of the target user’s friends with the rest of the users by the equivalent union.

At the end we had again a sorted list, with the ability in ties to take the smallest number of user id, of the 10 recommended friends for the target user using the Jaccard similarity coefficient.

```

def JaccardCoefficientScore(users, dataset, target):
#Initialize
l=list()
friendships={}

#Create friendships dict
for node in users:
#Create a list with the friends of node
ls=dataset[dataset.node1 == node]['node2'].tolist()

#Create a dictionary with key the node and value the list
friendships[node]=ls

# Initialize a dictionary with the intersections
inter={}

#Intersection between users
for j in friendships:
if (target != j) and (target not in friendships[j]) :

# Create union
union=len(set(friendships.get(target)).union(set(friendships.get(j))))

# Check for No zero denominator
if (union != 0) :
inter[j]=len(set(friendships.get(target)).intersection(set(friendships.get(j))))/union

#Create a sorted list, in ties we take the smallest ID
lis=sorted(inter.items(), key=lambda value: value[1], reverse=True)

#Final Result
return(lis[0:10]);

```

### Recommending friends using Adamic/Adar function

Adamic-Adar index proposed by Adamic and Adar is calculated by adding weights to the nodes which are connected to both nodes A and B.
Again we created a function which takes as inputs the users, the dataset and the target user. Then we found all the friends of the target user and put them in a list. Then, we had to apply the Adamic/Adar so we performed the intersection between the target user’s friends and the rest of the users. Then we applied the Adamic/Adar measure by summing the number of neighbors any two users have in common and divide it by the log frequency of their occurrence, in order to weight items that are unique to a few users more than commonly occurring items.

At the end we created again a sorted list, with the ability in ties to take the smallest number of user id, of the 10 recommended friends for the target user according to the Adamic/Adar measure.

```

def AdamicAdarFunctionScore(users, dataset, target):
#Initialize
l=list()
friendships={}

#Create friendships dict
for node in users:
#Create a list with the friends of node
ls=dataset[dataset.node1 == node]['node2'].tolist()

#Create a dictionary with key the node and value the list
friendships[node]=ls

# Initialize a dictionary with the intersections
inter={}

#Intersection between users
for j in friendships:
if (target != j) and (target not in friendships[j]) :
intersection = set(friendships.get(target)).intersection(set(friendships.get(j)))

# Adamic and Adar score calculation
sum = 0
for k in intersection :
if (k in friendships.keys()) and (friendships[k] != []) and len(friendships[k]) != 1:
sum = sum+1/np.log(len(friendships[k]))

inter[j]=sum

#Create a sorted list, in ties we take the smallest ID
lis=sorted(inter.items(), key=lambda value: value[1], reverse=True)

#Final Result
return(lis[0:10]);

```

### Recommending Friends with Leicht-Holme-Newman Index (Extra method)

Leicht proposed a measure to define local structure based similarity measure. It is the ratio of common neighbors of nodes a and b to the product of degrees of nodes a and b.
So we created a function with inputs the users, the dataset and the target user. As in previous methods we found the target user’s friends and stored them in a list. Then we performed the intersection between the target user’s friends and the rest of the users and stored them in a set. And finally we calculated the degrees of the two compared users each time and we divided the intersection with that. So now we have our final sorted list, with the ability in ties to take the smallest number of user id, of the 10 recommended friends for the target user according to the Leicht-Holme-Newman method.

```

def LeichtHolmeNewmanScore(users, dataset, target):
#Initialize
l=list()
friendships={}

#Create friendships dict
for node in users:
#Create a list with the friends of node
ls=dataset[dataset.node1 == node]['node2'].tolist()

#Create a dictionary with key the node and value the list
friendships[node]=ls

# Initialize a dictionary with the intersections
inter={}

#Intersection between users
for j in friendships:
if (target != j) and (target not in friendships[j]) :
intersection=(len(set(friendships.get(target)).intersection(set(friendships.get(j)))))

#Calculate the k for j and target
k1=len(friendships.get(j))
k2=len(friendships.get(target))

if (k1 !=0 and k2 !=0):
#Store the intersection in the list inter[]
inter[j]=intersection/(k1*k2)

#Create a sorted list, in ties we take the smallest ID
lis=sorted(inter.items(), key=lambda value: value[1], reverse=True)

#Final Result
return(lis[0:10]);

```

## Evaluation of the recommendation system

### Compute the average similarity

```

# Create users list
users=list(range(0,4038))

#Initialization
s1=[]
s2=[]
s3=[]

for i in list(range(100,4100,100)):

#Run the functions
fofList=friendOfFriendScore(users, data,i)
JaccardList=JaccardCoefficientScore (users, data,i)
AdamicAdarList=AdamicAdarFunctionScore (users, data,i)

#Similarity Percentage of FoF and Jaccard
s1.append(len(set(fofList).intersection(set(JaccardList)))*10)

#Similarity Percentage of FoF and Adamic and Adar
s2.append(len(set(fofList).intersection(set(AdamicAdarList)))*10)

#Similarity Percentage of Jaccard and Adamic and Adar
s3.append(len(set(AdamicAdarList).intersection(set(JaccardList)))*10)

#Average Similarity (%)
print("The average similarity of FoF & Jaccard is:",np.mean(s1),"%")
print("The average similarity of FoF & Adamic Adar is:",np.mean(s2),"%")
print("The average similarity of Adamic Adar & Jaccard is:",np.mean(s3),"%")

```

After a few minutes of computations finally the results are:

o The average similarity of FoF & Jaccard is: 55.5 %
o The average similarity of FoF & Adamic/Adar is: 90.75 %
o The average similarity of Adamic/Adar & Jaccard is: 57.0 %

### Forecast Recommendations

#### Evaluation Function
In this stage we have to estimate the quality of the recommendation methods. We create a function (evaluationFunction()) which computes the strength of the connection between two nodes. In more details, we insert two already friends of our network and the function removes this connection from the dataset. After the connection is dropped, the algorithm searches for every method if one of the two nodes (ex. F1) exists in the list of the second node (ex. F2). We do the same process in both F1 and F2. Also, we would like to mention that if a node does not exist in the recommendation list of the other node we exclude this relationship.

#### Score Calculation
The score for each algorithm is calculated according to the position of the list. Also, we take the average value of the position for both F1 and F2. The higher the score is, the higher the quality of the algorithm.

```

def evaluationFunction(dataset,users,F1,F2):

####Remove the relationship

#First we find the connection F1-F2
l1=dataset[dataset.node2 == F1].index
l2=dataset[dataset.node1 == F2 ].index
rm1=set(l1).intersection(set(l2))

#Then we find the connection F2-F1
l1=dataset[dataset.node2 == F2 ].index
l2=dataset[dataset.node1 == F1].index
rm2=set(l1).intersection(set(l2))

#We create the union
rm=rm1.union(rm2)


#Remove the elements of the set rm
for i in rm:
dataset=dataset.drop(i)

###FoF (friend-of-friend)
if ((F1 in friendOfFriend(users, dataset,F2)) and (F2 in friendOfFriend(users, dataset,F1))):

#Compute the recommendations for F1
Friend1=10 - friendOfFriend(users, dataset,F1).index(F2)

#Compute the resommentdations for F2
Friend2=10 - friendOfFriend(users, dataset,F2).index(F1)

####Compute the score
scoreFoF=(Friend1+Friend2)/2

else:
return(None);

###Jaccard
if ((F1 in JaccardCoefficient(users, dataset,F2)) and (F2 in JaccardCoefficient(users, dataset,F1))):

#Compute the recommendations for F1
Friend1=10 - JaccardCoefficient(users, dataset,F1).index(F2)

#Compute the resommentdations for F2
Friend2=10 - JaccardCoefficient(users, dataset,F2).index(F1)

####Check if either of these does not exist

####Compute the score
scoreJaccard=(Friend1+Friend2)/2

else:
return(None);

###AdamicAdar
if ((F1 in AdamicAdarFunction(users, dataset,F2)) and (F2 in AdamicAdarFunction(users, dataset,F1))):

#Compute the recommendations for F1
Friend1=10 - AdamicAdarFunction(users, dataset,F1).index(F2)

#Compute the resommentdations for F2
Friend2=10 - AdamicAdarFunction(users, dataset,F2).index(F1)

####Check if either of these does not exist

####Compute the score
scoreAdamicAdar=(Friend1+Friend2)/2

else:
return(None);


return(scoreFoF,scoreJaccard,scoreAdamicAdar);

```

#### Iteration Function
In order to have more accurate results we should run the algorithm more than once. So, we created the algorithm (finalscore()) in order to recall the evaluation function many times. Specifically, this function takes a random index from the original dataset. Then we call the evaluation function in order to create a score for each relationship; the outputs of the evaluation function are stored in a list. Finally, after all that repetitions, we calculate the average score for each method.

```

#Function for iterations
def finalScore(dataset,users, n):
eval_scores=[]

#We want to have 100 succesful iterations in all methods
while i