Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/salvatoreamaddio/pipelinewebsite
This a console line application is an Ad-hoc Solution for a client who needed a way of extracting data from their own website and print them onto a spreadsheet.
https://github.com/salvatoreamaddio/pipelinewebsite
csharp csharp-app csharp-code elt elt-pipeline excel-export web
Last synced: about 1 month ago
JSON representation
This a console line application is an Ad-hoc Solution for a client who needed a way of extracting data from their own website and print them onto a spreadsheet.
- Host: GitHub
- URL: https://github.com/salvatoreamaddio/pipelinewebsite
- Owner: SalvatoreAmaddio
- Created: 2024-07-20T23:43:33.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-07-21T00:40:59.000Z (5 months ago)
- Last Synced: 2024-10-13T14:11:02.887Z (2 months ago)
- Topics: csharp, csharp-app, csharp-code, elt, elt-pipeline, excel-export, web
- Language: C#
- Homepage: https://salvatoreamaddio.co.uk/
- Size: 263 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PipelineWebsite
This console line application is an Ad-hoc Solution for a Client of mine who needed to extract data from their website and print them onto a spreadsheet.
For privacy purposes, some information has been omitted such as:
- Login Details
- Website Links
- Records information to retrieve.Therefore, if you would try to debug this code, it won't work.
Yet you can see how I have approached the problem.
# Approach to Bottle-Neck:
Loading a single page takes approximately 2 minutes and 30 seconds. With a total of 258 pages, this process would take around 9 hours. To prevent this lengthy task from occurring each time the application runs, I have developed an SQLite database to store all records. The whole dataset was fetched during the production stage.The release checks the number of records in the database against the number of records displayed on the webpage. This allows the application to fetch only the new records and insert them into the database. As a result, execution time is reduced by 95%, ensuring that the client always has access to the complete dataset, including the latest entries.
# Flow Chart
![Alt text](https://github.com/SalvatoreAmaddio/PipelineWebsite/blob/main/static/Flow-Chart.png)