https://github.com/alejoduarte23/researchaggregator

Fetch academic content from multiple sources ( MDPI, SpringerOpen, Arxiv, Elsevier) and generate an HTML report using Jinja2 templates. The script sanitizes query strings, fetches metadata, abstracts, and conclusions.
https://github.com/alejoduarte23/researchaggregator

aiohttp asyncio bs4 pydantic webscrapping

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/alejoduarte23/researchaggregator
Owner: AlejoDuarte23
Created: 2024-07-25T15:43:02.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-07-29T01:58:58.000Z (11 months ago)
Last Synced: 2025-01-14T23:33:12.246Z (6 months ago)
Topics: aiohttp, asyncio, bs4, pydantic, webscrapping
Language: Python
Homepage:
Size: 57.6 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: readme.md

Awesome Lists containing this project

README

        # Research Aggregator

A tool for querying and processing research data from various sources such as Arxiv, Mdpi, Springeropen, and Elsevier. The main functionality is encapsulated in the QueryExecutor class, which handles the asynchronous fetching and processing of data.

## Sources 

- Arvix: Fetch and process functions for Arxiv.

- Mdpi: Fetch and process functions for Mdpi, with specified journals.

- Springeropen: Fetch and process functions for Springeropen.

- Elsevier: Fetch and process functions for Elsevier (requires API).

Sources are defined with a fetch function and a process function as follow:

```python

from typing import Callable, Literal, Union

from pydantic import BaseModel

from .springer_open_functions import fetch_springeropen_content, process_data_springeropen

from .mdp_functions import fetch_mdpi_content, process_data_mdpi

from .arvix_functions import fetch_arvix_content, process_data_arvix

from .elsevier_functions_async import fetch_elsevier_content, process_data_elsevier

class Arvix(BaseModel):

    fetch_function: Callable = fetch_arvix_contents

    process_function: Callable = process_data_arvix

class Mdpi(BaseModel):

    fetch_function: Callable = fetch_mdpi_content

    process_function: Callable = process_data_mdpi

    journal: str = Literal['buildings', 'sensors', 'acoustics', 'algorithms', 'applmech', 'computation']

class Springeropen(BaseModel):

    fetch_function: Callable = fetch_springeropen_content

    process_function: Callable = process_data_springeropen

class Elsevier(BaseModel):

    fetch_function: Callable = fetch_elsevier_content

    process_function: Callable = process_data_elsevier

ResearchSource = Union[Arvix, Mdpi, Springeropen, Elsevier]

```

Go to classes.py for more details.

## Usage

```python

if __name__ == '__main__':

from research_aggregator import Arvix, Mdpi, Springeropen, Elsevier, QueryExecutor

if __name__ == "__main__":

    query = "Modal Analysis"

    sources = [

        Mdpi(journal="buildings"),

        Arvix(),

        Mdpi(journal="sensors"),

    ]

    executor = QueryExecutor(query=query, sources=sources)

    result = executor.search() # Queryresults

    print(result)

    

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/alejoduarte23/researchaggregator

Awesome Lists containing this project

README