Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/silasener/web-scraping-academia

This project aims to facilitate access to information from the Springer academic website using web scraping.
https://github.com/silasener/web-scraping-academia

java mongodb mongodb-database mongorepository nosql spring-boot web webscraping

Last synced: about 18 hours ago
JSON representation

This project aims to facilitate access to information from the Springer academic website using web scraping.

Awesome Lists containing this project

README

        

# Web Scraping Academia

## Introduction
This project aims to facilitate access to information from the Springer academic website using web scraping. It utilizes MongoDB database with Elasticsearch query structures and is implemented using the Java Spring framework, MongoDB repository, and Jsoup for web scraping. Additionally, MongoDB Compass and NoSQL queries are employed for database management.

## Technologies
- Java Spring framework
- MongoDB database
- Jsoup for web scraping
- MongoDB Compass for database management
- NoSQL queries for database operations

## Project Screens

1. **Main Screen:**

Ekran Resmi 2024-03-08 11 36 55

2. **Detail Screen:**

Ekran Resmi 2024-03-08 11 51 33

## Project Overview
The project consists of three main components:

1. **Web Scraping:**
- Retrieves information from the [Springer](https://link.springer.com/) academic website based on user-entered keywords.
- Displays details of at least the top 10 academic publications on a custom-built web page.
- Utilizes HTML parsing or request methods to access the desired data from the Springer website.
- Downloads PDF files for each publication.

2. **Database:**
- Stores the scraped data using MongoDB.
- Required publication information includes:
- Publication ID
- Publication title
- Author names
- Publication type (research paper, review, conference, book, etc.)
- Publication date
- Publisher name
- Keywords (searched on the academic search engine)
- Keywords (related to the article)
- Abstract
- References
- Citation count
- DOI number (if available)
- URL address
- MongoDB Compass and NoSQL queries are used for database management.

3. **Web Interface:**
- Creates a web page to display the retrieved publication information.
- Provides a text area for users to enter keywords for searching publications.
- Initially displays all records from the database upon page load.
- Enables dynamic searching with automatic spelling correction suggestions.
- Includes dynamic filtering options based on various attributes of publications.
- Allows sorting by publication date and citation count.

## Usage
1. **Installation:**
- Clone the repository:
```
git clone https://github.com/your/repository.git
```
- Navigate to the project directory:
```
cd project-directory
```
- Install dependencies:
```
// Add commands to install any dependencies if needed
```

2. **Running the Application:**
- Start the application:
```
// Add commands to start the application
```

3. **Accessing the Web Interface:**
- Once the application is running, access the web interface by navigating to [http://localhost:port](http://localhost:port) in your web browser.

4. **Using the Web Interface:**
- Enter keywords in the provided text area to search for publications.
- Browse through the displayed publications and click on a publication title for detailed information.
- Use the dynamic filtering options to refine the displayed publications.
- Sort the publications by publication date or citation count.

## MongoDB Compass and NoSQL Queries
Include instructions or examples of using MongoDB Compass and NoSQL queries for managing the database.

## Contributors
- [Emre Terzi](https://github.com/emretterzi)