https://github.com/myselfabk5/commodity_price_extraction
Extrating Commodity Prices
https://github.com/myselfabk5/commodity_price_extraction
commodityprices selenium selenium-python selenium-webdriver webscraping
Last synced: about 1 month ago
JSON representation
Extrating Commodity Prices
- Host: GitHub
- URL: https://github.com/myselfabk5/commodity_price_extraction
- Owner: myselfabk5
- Created: 2025-08-06T04:50:10.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-08-13T20:27:59.000Z (10 months ago)
- Last Synced: 2025-09-04T23:15:20.828Z (10 months ago)
- Topics: commodityprices, selenium, selenium-python, selenium-webdriver, webscraping
- Language: Jupyter Notebook
- Homepage:
- Size: 9.77 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Description
**Purpose** – The function web_price_data_scrapping() scrapes daily commodity price data for multiple dates from the official FCA Info Web portal.
**Inputs** – Accepts a list of dates (list_of_dates) in the format DD/MM/YYYY for which price reports need to be extracted.
**Selenium Setup** – Uses selenium with Chrome WebDriver in headless mode to automate browser actions without opening a visible browser window.
**Website Navigation** – Programmatically selects the “Price Report” option, chooses “Daily Prices” report type, enters the given date, and triggers the Get Data button.
**Dynamic Content Handling** – Uses WebDriverWait and expected_conditions to ensure that page elements (radio buttons, dropdowns, input fields, tables) are fully loaded before interacting.
**Table Extraction** – Locates the HTML table (id="gv0") containing price data, retrieves its HTML, and parses it with BeautifulSoup.
**Data Cleaning** – Extracts table headers () and row data ( + ), creates a Pandas DataFrame, and filters only relevant commodity columns.
**Date Annotation** – Adds a Date column to tag each row with the corresponding report date for easier analysis later.
**Multiple Dates** – Loops through all given dates, scraping and appending each day’s data into a single output DataFrame.
**Return Value** – Returns the combined DataFrame containing state/UT-wise daily prices for commodities such as Rice, Wheat, Pulses, Edible Oils, Sugar, Vegetables, etc.