https://github.com/skykery/python-scripts
HtmlToText
https://github.com/skykery/python-scripts
Last synced: 5 months ago
JSON representation
HtmlToText
- Host: GitHub
- URL: https://github.com/skykery/python-scripts
- Owner: skykery
- Created: 2016-01-22T12:47:42.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2016-01-22T13:16:46.000Z (over 10 years ago)
- Last Synced: 2024-12-30T00:34:57.175Z (over 1 year ago)
- Language: Python
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Python-Scripts
#Strip Html code from string.
def stripHTML (source):
#remove style
source = re.sub(']*?>[^<]*?<\/style>', '', source,flags=re.M|re.S|re.I)
#remove scripts
source = re.sub('<script[^>]*?>[^<]*?<\/script>', '', source,flags=re.M|re.S|re.I)
#remove comments
source = re.sub('<!--.*?-->','',source,flags=re.M|re.S|re.I)
#remove all html tags
source = re.sub('<[^<]+?>', '', source,flags=re.M|re.S|re.I)
#remove multiple spaces
source = re.sub('\s+', ' ', source,flags=re.M|re.S|re.I)
return source
html = getSource("https://thewebminer.com")
print(stripHTML(html))