{"id":17844085,"url":"https://github.com/ginglis13/magic-formula-scraper","last_synced_at":"2025-09-05T08:32:14.353Z","repository":{"id":43357453,"uuid":"165159461","full_name":"ginglis13/magic-formula-scraper","owner":"ginglis13","description":"scrapes names and tickers from magicformulainvesting.com every quarter, adds info to a google sheet which includes stock prices and a link to the first result of a google search of the company name","archived":false,"fork":false,"pushed_at":"2023-09-15T13:00:17.000Z","size":12050,"stargazers_count":30,"open_issues_count":3,"forks_count":12,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-12-29T16:45:57.612Z","etag":null,"topics":["automation","automation-selenium","chrome-browser","google-sheets","google-worksheet","gspread","investing","python3","scraper","selenium"],"latest_commit_sha":null,"homepage":"https://ginglis.me/magic-formula/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ginglis13.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-01-11T01:41:01.000Z","updated_at":"2024-05-12T20:37:22.000Z","dependencies_parsed_at":"2024-10-27T22:21:38.621Z","dependency_job_id":"36614c89-5d98-40a8-b6f7-e30a6c32a5ae","html_url":"https://github.com/ginglis13/magic-formula-scraper","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ginglis13%2Fmagic-formula-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ginglis13%2Fmagic-formula-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ginglis13%2Fmagic-formula-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ginglis13%2Fmagic-formula-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ginglis13","download_url":"https://codeload.github.com/ginglis13/magic-formula-scraper/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":232032088,"owners_count":18462969,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automation","automation-selenium","chrome-browser","google-sheets","google-worksheet","gspread","investing","python3","scraper","selenium"],"created_at":"2024-10-27T21:27:59.362Z","updated_at":"2024-12-31T21:49:24.851Z","avatar_url":"https://github.com/ginglis13.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# magic-formula-scraper\n\nPython script for scraping [magicformulainvesting.com](https://www.magicformulainvesting.com/) and appending data to a Google Sheet using [selenium](https://www.seleniumhq.org/), [Google Sheets API](https://developers.google.com/sheets/api/), and [gspread](https://gspread.readthedocs.io/en/latest/).\n\nMy brother and I make investments by following Joel Greenblatt's Magic Formula.\nThe site above uses this formula and outputs the top X companies that fit within\nthe criteria of the formula. However, the site does not allow a user to copy the information of\nthese companies from the webpage directly. Manually typing out the names of 30+ companies and their information\nis a time-suck, so I created this script to scrape this information instead.\n\nExample GIF\n------\nHere is the script running using a headless version of the Google Chrome browser, one w/o a GUI. It is also running using my credentials, so there is no interaction between the program and user. \n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"scrape.gif\" /\u003e\n\u003c/p\u003e\n\nFeatures\n------\n+ opens a chrome browser to the magic formula login page, then uses selenium's Keys and the getpass library to enter login information\n+ once logged in, selects the number of stocks to view and clicks the corresponding button to display them\n+ scrapes information about listed companies, writes to csv file titled 'companies.csv'\n+ appends data to spreadsheet using the Google Sheets API and gspread \n+ Optional: can be turned into a cronjob, instructions below\n\n### Main Loop\nThis is where the data is both written to a csv file and added to a Google worksheet\n```python\n# find all td elements, write needed elements to file\ntrs=driver.find_elements_by_xpath('//table[@class=\"divheight screeningdata\"]/tbody/tr')\n\nfor tr in trs:\n    td = tr.find_elements_by_xpath(\".//td\")\n    # encode company info as string to write to file\n    company_name=td[0].get_attribute(\"innerHTML\").encode(\"UTF-8\")\n    company_tikr=td[1].get_attribute(\"innerHTML\").encode(\"UTF-8\")\n    # write to csv file\n    writer.writerow([company_name,company_tikr])\n    # append row to worksheet\n    # use value input option = user entered so that price can be called from google finance\n    worksheet.append_row([company_name,company_tikr,'=GOOGLEFINANCE(\"' + company_tikr + '\",\"price\")'], value_input_option=\"USER_ENTERED\")  \n\ndriver.quit()\n```\n\nUsage\n------\n1. [Create a Google Developer Account](https://console.developers.google.com/). This allows access to Google's Drive and Sheets APIs, as well as a ton of other resources. Signing up gives the user $300 in credit!\n\n2. [Read the gspread docs on how to generate credentials](https://gspread.readthedocs.io/en/latest/oauth2.html). This will help with linking your worksheet to the script. Make sure you put the path to the JSON file on line 74!\n\n3. Some parts of the script will have to be personalized by the user. These sections of scraper.py are listed below.\n\n#### Add Oauth Credentials\n```python\ncredentials = ServiceAccountCredentials.from_json_keyfile_name('/path/to/your/credentials', scope)\n```\n\n#### Add URL to Your Spreadsheet\n```python\n# access sheet by url\nworksheet = gc.open_by_url('URL_TO_YOUR_SPREADSHEET').get_worksheet(1) # worksheet number\n```\n\n### Cron Job\n\nI have set up my script to run using a cron job every 3 months on the first of each month at 1 pm. \n\nEdit lines 31-35 if you wish to hardcode your login credentials\n\n```python\n# enter email and password. uses getpass to hide password (i.e. not using plaintext)\nyour_email=raw_input(\"Please enter your email for magicformulainvesting.com: \")\nyour_password=getpass.getpass(\"Please enter your password for magicformulainvesting.com: \")\nusername.send_keys(your_email)\npassword.send_keys(your_password)\n```\n\nTo run selenium with a cron job, the browser used must be headless. I am using Chrome and giving it the option to run headless in my personal script. Chrome webdrivers must also be installed:\n\n```sh\nbrew cask install chromedriver\n```\n\nAdd these lines to scraper.py in place of the current 'driver = ...' line:\n\n```python\noptions = webdriver.ChromeOptions()\noptions.add_argument('headless')\n\n# declare driver as chrome headless instance\ndriver = webdriver.Chrome(executable_path=\"path/to/chromedriver\", chrome_options=options)\n```\n\nBelow is my cron job, accessed on Mac or Linux by running 'crontab -e' at the terminal. I first had to give iTerm and the Terminal apps permission to read/write from my ssd.\n\n```bash\nSHELL=/bin/bash\nPATH=/usr/local/bin/:/usr/bin:/usr/sbin\n0 1 1 */3 * export DISPLAY=:0 \u0026\u0026 cd /path/to/scraper \u0026\u0026 /usr/bin/python scraper.py\n```\n\nFrom reading online, it sounds as though a cron job cannot read standard input and will generate an end of file error. So for the cronjob, I have hardcoded my username and password, which is really bad practice. However, since this site doesn't really contain sensitive information, I'm okay with that. The provided script in this repository still uses the secure method provided by getpass to deal with the user's password.\n\nFeatures to Implement\n------\n+ have a file of companies already researched/invested in, check this list before writing to csv or updating google worksheet\n+ need to add a blank row before adding all company info to google worksheet\n+ maybe scrape for company descriptions and add these to the spreadsheet\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fginglis13%2Fmagic-formula-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fginglis13%2Fmagic-formula-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fginglis13%2Fmagic-formula-scraper/lists"}