{"id":19350705,"url":"https://github.com/slevin48/automate","last_synced_at":"2025-04-13T03:14:09.747Z","repository":{"id":112494686,"uuid":"361236144","full_name":"slevin48/automate","owner":"slevin48","description":"Automate Excel and Word using Python","archived":false,"fork":false,"pushed_at":"2022-12-05T23:34:11.000Z","size":2603,"stargazers_count":6,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-13T03:13:47.955Z","etag":null,"topics":["beautifulsoup","excel","python","streamlit","word"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/slevin48.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-04-24T18:22:31.000Z","updated_at":"2025-01-21T01:39:15.000Z","dependencies_parsed_at":"2023-05-15T08:45:42.253Z","dependency_job_id":null,"html_url":"https://github.com/slevin48/automate","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slevin48%2Fautomate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slevin48%2Fautomate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slevin48%2Fautomate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/slevin48%2Fautomate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/slevin48","download_url":"https://codeload.github.com/slevin48/automate/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248657922,"owners_count":21140846,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beautifulsoup","excel","python","streamlit","word"],"created_at":"2024-11-10T04:33:36.369Z","updated_at":"2025-04-13T03:14:09.723Z","avatar_url":"https://github.com/slevin48.png","language":"Jupyter Notebook","readme":"# Automate Excel, Word and the Web using Python\n\n## Excel Sheet Splitter [![Open in Streamlit](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://share.streamlit.io/slevin48/automate/main/app.py)\n\nStreamlit app to split sheets of Excel files: https://excel-splitter-48.herokuapp.com/\n\nhttps://user-images.githubusercontent.com/12418115/142772669-d9f2b3bc-2587-4308-a5a6-fd38699ef159.mp4\n\n## Excel automation\nhttps://openpyxl.readthedocs.io/en/stable/\n\nWe get the price of real estate in Paris 14 from the following gist: https://gist.github.com/slevin48/05c0d4f348f0f10870a0fa721cfcb1b1\n\nAdding manually a second sheet selecting only the surface and price\n\n```python\nworkbook = xl.load_workbook('dvf14_chart.xlsx')\nsheet_2 = workbook['Sheet2']\n```\n![immo_chart](dvf14_chart.png)\n\n```python\nchart = ScatterChart()\nchart.title = \"Scatter Chart\"\nchart.style = 13\nchart.y_axis.title = 'Price'\nchart.x_axis.title = 'Surface'\n\nxvalues = Reference(sheet_2, min_col = 1, min_row = 2, max_row = sheet_1.max_row)\nvalues = Reference(sheet_2, min_col=2, min_row=1, max_row=mr)\nseries = Series(values, xvalues,title_from_data=True)\nseries.marker.symbol = \"diamond\"\nseries.marker.graphicalProperties.solidFill = \"0000FF\" # Marker filling\nseries.marker.graphicalProperties.line.solidFill = \"0000FF\" # Marker outline\nseries.graphicalProperties.line.noFill = True  # hide lines\nchart.series.append(series)\n\nsheet_2.add_chart(chart, \"D2\")\nworkbook.save('dvf14_chart.xlsx')\n```\n\n## Extracting chart\n\nAccess Excel through COM\n\n```\npip install pywin32\n```\n```python\ninput_file = \"C:/Users/.../Book1.xlsx\"\noutput_image = \"C:/Users/.../chart.png\"\noperation = win32com.client.Dispatch(\"Excel.Application\")\noperation.Visible = 0\noperation.DisplayAlerts = 0\nworkbook_bis = operation.Workbooks.Open(input_file)\nsheet_bis = operation.Sheets(1)\n```\n\nAnd use Pillow to grab image\nhttps://pillow.readthedocs.io/en/stable/index.html\n```\npip install pillow\n```\niterate over all of the chart objects in the spreadsheet (if there are more than one) and save them in the specified location as such:\n\n```python\nfor x, chart in enumerate(sheet_bis.Shapes):\n    chart.Copy()\n    image = ImageGrab.grabclipboard()\n    image.save(output_image, 'png')\n    pass\nworkbook_bis.Close(True)\noperation.Quit()\n```\n\n![chart](immo_chart.png)\n\n## Create Word report\nhttps://python-docx.readthedocs.io/en/latest/\n\n```python\nfrom docx import Document\n\ndocument = Document()\ndocument.add_heading('Report on Excel and Word automation', 0)\n\n...\n\ndocument.save('dvf14_report.docx')\n```\n\n![report](report.png)\n\n## Scraping web pages with Beautiful Soup\n\n[Beautiful Soup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)\n\nExample: [web_automate.ipynb](web_automate.ipynb)\n```python\nimport requests as rq\nfrom bs4 import BeautifulSoup\n\nURL = 'https://realpython.github.io/fake-jobs/'\npage = rq.get(URL)\nsoup = BeautifulSoup(page.content, \"html.parser\")\nres = soup.find_all(class_ = \"location\")\nopen(\"location1.txt\",\"w\").write(res[0].text)\n```\n\n## Automate the browser interaction with Selenium\n\n### Installation\n\n| Browser | Webdriver |\n|---------|-----------------------------------------------|\n| Chrome: |\thttps://sites.google.com/chromium.org/driver/ |\n| Edge: |\thttps://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/ |\n| Firefox: |\thttps://github.com/mozilla/geckodriver/releases |\n\n### Simple usage\nhttps://selenium-python.readthedocs.io/getting-started.html#simple-usage\n\n### Locating elements\nhttps://selenium-python.readthedocs.io/locating-elements.html\n\nExample usage:\n```python\nfrom selenium.webdriver.common.by import By\n\ndriver.find_element(By.XPATH, '//button[text()=\"Some text\"]')\ndriver.find_elements(By.XPATH, '//button')\n```\n\nThese are the attributes available for By class:\n```python\nID = \"id\"\nXPATH = \"xpath\"\nLINK_TEXT = \"link text\"\nPARTIAL_LINK_TEXT = \"partial link text\"\nNAME = \"name\"\nTAG_NAME = \"tag name\"\nCLASS_NAME = \"class name\"\nCSS_SELECTOR = \"css selector\"\n```\n\n```python\nfrom selenium import webdriver\nfrom selenium.webdriver.common.by import By\ndriver = webdriver.Chrome()\nurl = \"https://realpython.github.io/fake-jobs/\"\ntitle = driver.find_element(by=By.CLASS_NAME, value=\"title\")\nprint(title.text)\nres = driver.find_elements(by=By.TAG_NAME, value=\"img\")\nsrc = res[0].get_property('src')\nitem = driver.find_elements(by=By.CLASS_NAME, value=\"card-footer-item\")\n# Get apply link\napply = [r for r in item[1::2]] # every other element of the list (starting at the second element)\napply[0].click()\n# Or simply get location of the link\nhref = apply[0].get_attribute('href')\ndriver.get(href)\n```\n\n## Resources\n\n- [Working with Excel Spreadsheet - Automate the boring Stuff](https://automatetheboringstuff.com/2e/chapter13/)\n- [Web Scraping - Automate the boring Stuff](https://automatetheboringstuff.com/2e/chapter12/)\n- [Video Selenium - Technology for Noobs](https://www.youtube.com/watch?v=id-HGghty6c) - [Sources](https://github.com/sharmasw/Data-Science-with-python/tree/master/selenium)\n- https://realpython.com/beautiful-soup-web-scraper-python/\n- https://xkcd.com/1205/\n\n![is_it_worth_the_time](is_it_worth_the_time.png)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fslevin48%2Fautomate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fslevin48%2Fautomate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fslevin48%2Fautomate/lists"}