Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/slevin48/automate
Automate Excel and Word using Python
https://github.com/slevin48/automate
beautifulsoup excel python streamlit word
Last synced: 25 days ago
JSON representation
Automate Excel and Word using Python
- Host: GitHub
- URL: https://github.com/slevin48/automate
- Owner: slevin48
- License: mit
- Created: 2021-04-24T18:22:31.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2022-12-05T23:34:11.000Z (about 2 years ago)
- Last Synced: 2024-11-10T04:37:18.050Z (3 months ago)
- Topics: beautifulsoup, excel, python, streamlit, word
- Language: Jupyter Notebook
- Homepage:
- Size: 2.48 MB
- Stars: 5
- Watchers: 1
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Automate Excel, Word and the Web using Python
## Excel Sheet Splitter [![Open in Streamlit](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://share.streamlit.io/slevin48/automate/main/app.py)
Streamlit app to split sheets of Excel files: https://excel-splitter-48.herokuapp.com/
https://user-images.githubusercontent.com/12418115/142772669-d9f2b3bc-2587-4308-a5a6-fd38699ef159.mp4
## Excel automation
https://openpyxl.readthedocs.io/en/stable/We get the price of real estate in Paris 14 from the following gist: https://gist.github.com/slevin48/05c0d4f348f0f10870a0fa721cfcb1b1
Adding manually a second sheet selecting only the surface and price
```python
workbook = xl.load_workbook('dvf14_chart.xlsx')
sheet_2 = workbook['Sheet2']
```
![immo_chart](dvf14_chart.png)```python
chart = ScatterChart()
chart.title = "Scatter Chart"
chart.style = 13
chart.y_axis.title = 'Price'
chart.x_axis.title = 'Surface'xvalues = Reference(sheet_2, min_col = 1, min_row = 2, max_row = sheet_1.max_row)
values = Reference(sheet_2, min_col=2, min_row=1, max_row=mr)
series = Series(values, xvalues,title_from_data=True)
series.marker.symbol = "diamond"
series.marker.graphicalProperties.solidFill = "0000FF" # Marker filling
series.marker.graphicalProperties.line.solidFill = "0000FF" # Marker outline
series.graphicalProperties.line.noFill = True # hide lines
chart.series.append(series)sheet_2.add_chart(chart, "D2")
workbook.save('dvf14_chart.xlsx')
```## Extracting chart
Access Excel through COM
```
pip install pywin32
```
```python
input_file = "C:/Users/.../Book1.xlsx"
output_image = "C:/Users/.../chart.png"
operation = win32com.client.Dispatch("Excel.Application")
operation.Visible = 0
operation.DisplayAlerts = 0
workbook_bis = operation.Workbooks.Open(input_file)
sheet_bis = operation.Sheets(1)
```And use Pillow to grab image
https://pillow.readthedocs.io/en/stable/index.html
```
pip install pillow
```
iterate over all of the chart objects in the spreadsheet (if there are more than one) and save them in the specified location as such:```python
for x, chart in enumerate(sheet_bis.Shapes):
chart.Copy()
image = ImageGrab.grabclipboard()
image.save(output_image, 'png')
pass
workbook_bis.Close(True)
operation.Quit()
```![chart](immo_chart.png)
## Create Word report
https://python-docx.readthedocs.io/en/latest/```python
from docx import Documentdocument = Document()
document.add_heading('Report on Excel and Word automation', 0)...
document.save('dvf14_report.docx')
```![report](report.png)
## Scraping web pages with Beautiful Soup
[Beautiful Soup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
Example: [web_automate.ipynb](web_automate.ipynb)
```python
import requests as rq
from bs4 import BeautifulSoupURL = 'https://realpython.github.io/fake-jobs/'
page = rq.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
res = soup.find_all(class_ = "location")
open("location1.txt","w").write(res[0].text)
```## Automate the browser interaction with Selenium
### Installation
| Browser | Webdriver |
|---------|-----------------------------------------------|
| Chrome: | https://sites.google.com/chromium.org/driver/ |
| Edge: | https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/ |
| Firefox: | https://github.com/mozilla/geckodriver/releases |### Simple usage
https://selenium-python.readthedocs.io/getting-started.html#simple-usage### Locating elements
https://selenium-python.readthedocs.io/locating-elements.htmlExample usage:
```python
from selenium.webdriver.common.by import Bydriver.find_element(By.XPATH, '//button[text()="Some text"]')
driver.find_elements(By.XPATH, '//button')
```These are the attributes available for By class:
```python
ID = "id"
XPATH = "xpath"
LINK_TEXT = "link text"
PARTIAL_LINK_TEXT = "partial link text"
NAME = "name"
TAG_NAME = "tag name"
CLASS_NAME = "class name"
CSS_SELECTOR = "css selector"
``````python
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
url = "https://realpython.github.io/fake-jobs/"
title = driver.find_element(by=By.CLASS_NAME, value="title")
print(title.text)
res = driver.find_elements(by=By.TAG_NAME, value="img")
src = res[0].get_property('src')
item = driver.find_elements(by=By.CLASS_NAME, value="card-footer-item")
# Get apply link
apply = [r for r in item[1::2]] # every other element of the list (starting at the second element)
apply[0].click()
# Or simply get location of the link
href = apply[0].get_attribute('href')
driver.get(href)
```## Resources
- [Working with Excel Spreadsheet - Automate the boring Stuff](https://automatetheboringstuff.com/2e/chapter13/)
- [Web Scraping - Automate the boring Stuff](https://automatetheboringstuff.com/2e/chapter12/)
- [Video Selenium - Technology for Noobs](https://www.youtube.com/watch?v=id-HGghty6c) - [Sources](https://github.com/sharmasw/Data-Science-with-python/tree/master/selenium)
- https://realpython.com/beautiful-soup-web-scraper-python/
- https://xkcd.com/1205/![is_it_worth_the_time](is_it_worth_the_time.png)