Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/wyattjoh/berkley

Last synced: 18 days ago
JSON representation

Host: GitHub
URL: https://github.com/wyattjoh/berkley
Owner: wyattjoh
License: mit
Created: 2012-11-22T17:50:35.000Z (almost 12 years ago)
Default Branch: master
Last Pushed: 2015-03-26T19:14:47.000Z (over 9 years ago)
Last Synced: 2024-10-11T18:32:50.093Z (about 1 month ago)
Language: C++
Size: 938 KB
Stars: 0
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        #Assignment \#4 CMPUT 291

```

Name: Wyatt Johnson

Unix: wyatt

Date: Nov 23, 2012

Language: C/C++

```

##Program Description

---

NOTE: APPLICATIONS AUTOMATICALLY DELETE DATABASE FILES WHEN COMPLETED

###*indexedsearch.o* program

---

####Sorting

This uses the quick sort algorithm to sort an associative struct of:

```

struct indexed

{

	uint32_t index;

	double value;

};

```

Using a custom sorting function as defined:

```

int Index::compare(const void * b, const void * a)

{

	indexed *struct_a = (indexed *) a;

	indexed *struct_b = (indexed *) b;

	

	if(struct_a->value < struct_b->value)

		return 1;

	else if(struct_a->value == struct_b->value)

		return 0;

	else

		return -1;

}

```

That performs the sorting on a value level, while preserving the indicies as they are carried with the value. This allows simple determination printing of the top 3 entries after the quicksort to display the smallest value for the distance calculation, carried out as:

```

double Index::compare(song *SongA, song *SongB)

{

	double result = 42;

	

	int matchingEntries = 0;

	

	int * A;

	int * B;

	

	if(SongA->rCount >= SongB->rCount)

	{

		A = new int[SongA->rCount];

		B = new int[SongA->rCount];

	}

	else

	{

		A = new int[SongB->rCount];

		B = new int[SongA->rCount];

	}

	

	for(int i = 0; i< SongA->rCount; i++)

	{

		for(int f = 0; frCount; f++)

		{

			

			if(SongA->ratings[i].User.compare(SongB->ratings[f].User) == 0)

			{

				A[matchingEntries] = SongA->ratings[i].rating;

				B[matchingEntries] = SongB->ratings[f].rating;

				

				matchingEntries++;

			}

		}

	}

	

	if( matchingEntries > 0 )

	{

		uint64_t sum = 0;

		

		for(int i = 0; i < matchingEntries; i++)

		{

			uint64_t temp = (A[i] - B[i]);

			sum += temp*temp;

		}

		

		result = sqrt(sum);

		

		result /= matchingEntries;

	}

	

	delete [] A;

	delete [] B;

	

	return result;

}

```

Which is designed to calculate the distance on two songs given a song struct, as defined:

```

struct rating {

	std::string User;

	uint8_t rating;

};

struct song {

	int id;

	std::string Title;

	std::string Artists;

	rating ratings[255];

	int rCount;

};

```

Through the use of the suggested distance formula:

```

 D(A, B) = SQRT( (R1A - R1B)^2 + (R2A - R2B)^2 + … (R2A - R2B)^2 )/N

```

###Indexing/Serilizing Data

Once the main entries of the input data file are filed into the main database file as:

```

Key: SONG ID

Data: SONG STRING

Ex.

For entry: {[5], [When I approach], [Travie McCoy Feat, Livin, Joe Budden], [(Ethan, 6), (Michael, 4), (Mason, 5)] }

Key: 5

Data: {[5], [When I approach], [Travie McCoy Feat, Livin, Joe Budden], [(Ethan, 6), (Michael, 4), (Mason, 5)] }

```

Where the string is parsed when needed. The indexed database, calculated after, is stored as:

```

Key: USER NAME

Data: SONG ID CSV

Ex.

For entrys:

{[4], [I Wanna Be A Billionaire], [Travie McCoy Feat, Bruno Mars], [(Michael, 6), (Mason, 2), (Sophia, 1)] }

{[5], [When I approach], [Travie McCoy Feat, Livin, Joe Budden], [(Ethan, 6), (Michael, 4), (Mason, 5)] }

We would store:

Key: Michael

Data: 4, 5,

Key: Mason

Data: 4, 5,

Key: Sophia

Data: 4

Key: Ethan

Data: 5

```

This would allow a list to be generated using each of the users whe have rated a given song to be directly pulled from the database rather then linearly scanning the database via the inverted indexed database.