https://github.com/gamehunterkaan/companyenum
OSINT sweep on a company name — Flask dashboard that scrapes Craft.co, Trustpilot, CareerBliss, WHOIS, and web-tech scanners in parallel.
https://github.com/gamehunterkaan/companyenum
cybersecurity cybersecurity-tools osint osint-python osint-tool python python3
Last synced: 3 months ago
JSON representation
OSINT sweep on a company name — Flask dashboard that scrapes Craft.co, Trustpilot, CareerBliss, WHOIS, and web-tech scanners in parallel.
- Host: GitHub
- URL: https://github.com/gamehunterkaan/companyenum
- Owner: GamehunterKaan
- Created: 2023-06-27T19:47:00.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2026-04-11T17:09:06.000Z (3 months ago)
- Last Synced: 2026-04-11T19:11:17.296Z (3 months ago)
- Topics: cybersecurity, cybersecurity-tools, osint, osint-python, osint-tool, python, python3
- Language: Python
- Homepage: https://kaangultekin.net/projects/company-enum/
- Size: 30.9 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# CompanyEnum
A Flask web app that runs a one-shot OSINT sweep on a company name and presents the findings in a live dashboard. You type a company, watch each scraper report progress in real time, and get a tabbed report covering the company profile, financials, people, web-tech posture, and customer/employee reviews.

## What it collects
| Tab | Sources | Data |
| ----------- | ------------------------------------ | ------------------------------------------------------------------------------------- |
| Summary | Craft.co, Google (website fallback) | Company name, website, HQ, founded date, description, sectors, competitors |
| Financials | Craft.co | Stock price, market cap, revenue |
| People | Craft.co | Executives with roles |
| Technology | securityheaders.com, SSL Shopper, Sucuri SiteCheck, whois.com | HTTP security-header grade, raw headers, TLS cert + SANs, Sucuri ratings and recommendations, WHOIS records |
| Ratings | Trustpilot, CareerBliss | Aggregate scores and recent reviews with star widgets |
## How it works
Submissions start a background thread that runs each scraper sequentially and writes per-step state into an in-memory job store. The browser gets redirected to a loading page that polls `/status/` every 500 ms and renders each step as pending → running → done / skipped / error. When the job finishes, the page auto-navigates to `/result/`.
Scrapers in [submodules/](submodules/) are a mix of three strategies depending on how aggressive the target is about bot detection:
- **requests / BeautifulSoup** for sites that don't block (whois.com, SSL Shopper, Sucuri API).
- **cloudscraper** for Cloudflare-protected sites that still accept a good browser fingerprint (Craft.co, securityheaders.com).
- **Playwright + playwright-stealth** (headless Chromium) for sites whose anti-bot won't budge without a real browser (Trustpilot, CareerBliss).

## Project layout
```
main.py Flask app, routes, background job runner
requirements.txt Python dependencies
submodules/
craftco.py Craft.co profile + executives scraper
trustpilot.py Trustpilot reviews (Playwright)
careerbliss.py CareerBliss reviews (Playwright)
findwebsite.py Website resolver (Craft.co field + Google fallback)
securityheaders.py securityheaders.com scanner
sslhopper.py SSL Shopper cert checker
sucuri.py Sucuri SiteCheck API client
whoisquery.py whois.com scraper
compiledata.py HTML rendering for each output tab
static/
style.css Input page (waves, gradient, search bar)
loading.css Loading page (progress bar + step list)
output-style.css Report page (cards, tags, stars, grade badge)
script.js Output page tab switching + scroll progress
templates/
input.html Search form with example chips
loading.html Live progress UI
output.html Tabbed report
```
## Installation
Requires Python 3.9+.
```bash
pip install -r requirements.txt
playwright install chromium
```
The second command downloads the headless Chromium binary Playwright needs for Trustpilot and CareerBliss.
## Running
```bash
python main.py
```
Then open `http://127.0.0.1:5000/` and enter a company name. Use the example chips on the input page for a quick test.
## Notes
- The in-memory job store is capped at 32 entries and evicts oldest-first.
- A single scan takes roughly 20-40 seconds depending on network and which scrapers stall.
- If Craft.co can't find the company, the website resolver falls back to a Google search, which may itself fail silently if Google serves a CAPTCHA; in that case the Technology tab is filled with "Company website not found" placeholders and everything else still works.
- The scrapers target real HTML and API shapes that change without warning. Expect occasional breakage when a source restructures its page.