Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/majajuri/text-classification-using-string-kernels
Projekt u sklopu predmeta Uvod u znanost o podacima
https://github.com/majajuri/text-classification-using-string-kernels
data-analysis string-kernel
Last synced: 4 days ago
JSON representation
Projekt u sklopu predmeta Uvod u znanost o podacima
- Host: GitHub
- URL: https://github.com/majajuri/text-classification-using-string-kernels
- Owner: MajaJuri
- Created: 2024-05-19T08:18:58.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-05-19T09:00:46.000Z (7 months ago)
- Last Synced: 2024-05-20T09:29:50.432Z (7 months ago)
- Topics: data-analysis, string-kernel
- Language: Jupyter Notebook
- Homepage:
- Size: 420 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Text Classification using String Kernels
For my project in the "Introduction to Data Science" class, I had to replicate and enhance the results from the paper "Text Classification Using String Kernels." This project was divided into three main phases.Phase 1: Data Analysis
In the first phase, I analyzed the data used in the original study. This involved preprocessing the text data, exploring its characteristics, understanding the distribution of classes and more. That gave me a solid foundation for the subsequent replication and improvement steps.Phase 2: Replication of the results from the paper
Next, I focused on replicating the results presented in the paper. Using the methods described in the paper, I implemented the string kernel techniques for text classification. This involved careful tuning of parameters and ensuring that the experimental setup closely matched the one in the paper. My goal was to achieve comparable results to those reported by the authors, validating the effectiveness of their approach.Phase 3: Trying to improve on the results from the paper
In the final phase, I aimed to improve the original results. This involved experimenting with different variations of string kernels, different parameters and incorporating additional preprocessing steps.