Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/aayushker/devfoolio
Uncover originality, empower authenticity
https://github.com/aayushker/devfoolio
django-rest-framework keybert nextjs nltk plagiarism-detection scikit-learn
Last synced: about 1 month ago
JSON representation
Uncover originality, empower authenticity
- Host: GitHub
- URL: https://github.com/aayushker/devfoolio
- Owner: aayushker
- License: gpl-3.0
- Created: 2024-11-09T18:13:37.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2024-11-14T18:10:17.000Z (about 2 months ago)
- Last Synced: 2024-11-23T07:07:30.470Z (about 1 month ago)
- Topics: django-rest-framework, keybert, nextjs, nltk, plagiarism-detection, scikit-learn
- Language: TypeScript
- Homepage: https://devfoolio.vercel.app
- Size: 327 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
![HackCBS](https://socialify.git.ci/aayushker/HackCBS/image?description=1&name=1&owner=1&theme=Auto)
[![Vercel Deploy](https://deploy-badge.vercel.app/vercel/hackcbs7)](https://hackcbs7.vercel.app)
![Render](https://img.shields.io/badge/Render-Deployed-green?logo=render)
Devfoolio is an automated plagiarism detection tool designed to maintain the integrity of hackathons by analyzing project submissions on Devfolio. It scans each new submission, compares it with existing projects, and identifies potential similarities or copied content to ensure originality and fair competition.## Table of Contents
- [π₯ **Features**](#-features)
- [π₯οΈ **Technologies Used**](#οΈ-technologies-used)
- [π **Getting Started**](#-getting-started)
- [π **Challenges We Ran Into**](#-challenges-we-ran-into)
- [π **Future Enhancements**](#-future-enhancements)
- [π **License**](#-license)
- [π€ **Acknowledgments**](#-acknowledgments)
## π₯ **Features**
- **Automated Plagiarism Detection**: Compares new submissions with past projects to detect similarities.
- **Efficient Scalability**: Handles large volumes of data, enabling quick comparisons across thousands of projects.
- **Tech-Stack Flexibility**: Supports a variety of technologies including Python, NLP, and web scraping tools.
- **Fair Competition**: Ensures hackathons remain focused on fostering genuine innovation and creativity.
- **Real-Time Scanning**: Scans projects upon submission, providing instant feedback to organizers and participants.
- **Customizable Similarity Threshold**: Allows organizers to adjust sensitivity to false positives and negatives.## π₯οΈ **Technologies Used**
- **Backend**: Django Rest Framework (Python), CSV
- **Frontend**: Next.js, Typescript, NextUI, ShadCN, Tailwind CSS
- **Framework & Library**: Hugging Face Transformers, Selenium, Pandas, Scikit-learn, KeyBERT, NLTK
- **API Communication**: Axios
- **Deployment**: Vercel, Docker, Render
## π **Getting Started**
### Prerequisites
Ensure you have the following installed:
- **Python**
- **Pipenv**
- **Node.js**
- **NPM**## Installation
### Backend Setup
1. **Clone the repository:**
```bash
git clone https://github.com/aayushker/HackCBS.git
cd HackCBS/backend
2. **Create a virtual environment:**
```bash
python -m venv venv
source venv/bin/activate
3. **Install dependencies:**
```bash
pip install -r requirements.txt
4. **Run the development server:**
```bash
python manage.py runserver
### Frontend Setup
1. **Navigate to the frontend directory:**
```bash
cd ../frontend
2. **Install dependencies:**
```bash
npm install
3. **Run the development server:**
```bash
npm run dev
## Challenges We Ran Into
### 1. Handling Large-Scale Data
- **Hurdle**: With over 180,000 projects on Devfolio, managing and comparing such a large dataset posed performance challenges.
- **Solution**: We optimized data storage and comparison processes, using efficient indexing and caching mechanisms to improve performance.### 2. Dynamic Content Parsing and Scraping
- **Hurdle**: Devfolio project pages are dynamically rendered, which made scraping the project data tricky.
- **Solution**: We used tools like Selenium and BeautifulSoup with smart fallback mechanisms to adapt to changing page structures and dynamically load content.### 3. Textual Crux Extraction and Vectorization
- **Hurdle**: Summarizing project descriptions into a concise "crux" for comparison was challenging due to varying lengths and formats.
- **Solution**: We implemented NLP models to extract key phrases and summarize descriptions, enabling accurate vectorization and faster comparisons.### 4. Ensuring Accuracy in Similarity Detection
- **Hurdle**: Finding the right balance between false positives and negatives was difficult.
- **Solution**: We refined similarity thresholds using feedback from multiple test runs, optimizing the detection algorithms to strike a balance between speed and accuracy.## π **Future Enhancements**
### 1. Enhanced Similarity Detection
- **Goal**: Improve the accuracy and efficiency of plagiarism detection using advanced Natural Language Processing (NLP) models and machine learning algorithms.
- **Plan**: Implement techniques like **semantic similarity** (e.g., BERT, GPT models) to better understand the context of project descriptions and reduce false positives.
- **Impact**: Higher accuracy in detecting truly similar projects, even those with paraphrased descriptions or code snippets.### 2. Real-Time Plagiarism Detection
- **Goal**: Enable real-time project comparisons as new submissions are added to Devfolio.
- **Plan**: Integrate real-time scraping and similarity checking via a webhook system or automated API calls as new projects are uploaded.
- **Impact**: Instant feedback for hackathon participants, reducing manual verification time and increasing event efficiency.### 3. Visual Similarity Detection (Code Comparison)
- **Goal**: Expand plagiarism detection to include visual code structure and not just textual content.
- **Plan**: Utilize AI-driven code analysis tools like **CodeBERT** or other similar models to compare submitted code for structural similarity, variable names, and algorithms.
- **Impact**: A more robust system that can detect plagiarism in both code and documentation, ensuring the integrity of the projects on a deeper level.### 4. User Feedback and Iteration
- **Goal**: Continuously improve the tool based on user feedback from hackathon participants and organizers.
- **Plan**: Implement feedback loops where users can report inaccuracies or issues, which will help improve the systemβs performance.
- **Impact**: A more user-centric tool, continuously improving its accuracy and usability.### 5. Cross-Platform Integration
- **Goal**: Expand the plagiarism detection tool to other hackathon platforms like **Devpost**, **Hackerearth**, and **Kaggle**.
- **Plan**: Build integrations with other project-hosting platforms and extend the database to include submissions from these sites.
- **Impact**: Wider usage across different communities, enhancing the tool's value and adoption.### 6. Reporting and Analytics Dashboard
- **Goal**: Provide detailed analytics and reporting for hackathon organizers and participants.
- **Plan**: Create a dashboard where organizers can see trends in project originality, track similarity scores, and generate detailed reports.
- **Impact**: Better insights for hackathon organizers to maintain fairness, and an additional layer of transparency for participants.### 7. Open Source Collaboration
- **Goal**: Make the project open-source, allowing other developers to contribute and improve the tool.
- **Plan**: Publish the source code on platforms like GitHub and encourage collaboration through issues, pull requests, and discussions.
- **Impact**: Leverage the open-source community for rapid innovation and feature development.### 8. Gamification of the Originality Process
- **Goal**: Encourage developers to submit unique and original projects through gamification.
- **Plan**: Introduce rewards, leaderboards, or badges for participants with high originality scores based on the plagiarism detection results.
- **Impact**: Increased motivation for participants to submit more creative and authentic projects.## Conclusion
By evolving Devfoolio with these future aspects, we aim to create a comprehensive and indispensable tool for hackathon organizers and participants, ensuring the integrity and fairness of every project submission.
## π **License**
This project is licensed under the GNU General Public License. See the `LICENSE` file for more details.
## π€ **Acknowledgments**
This project built under HackCBS 7.0 hackathon. Thanks to the HackCBS team for organizing this hackathon and providing a platform for innovation and collaboration. We also appreciate the support from MLH for their guidance and resources throughout the event.
We would like to thank the organizers for providing us with the opportunity to work on this project and showcase our skills. We also appreciate the support from the mentors and judges who helped us improve our project and provided valuable feedback.
Team Hacker X Coders
[Aayushker](https://github.com/aayushker) [Chaitanya](https://github.com/hiCXK) [Sarvagya](https://github.com/Sarvagyapradhan)