https://github.com/neulhan/piro-webtoon
https://github.com/neulhan/piro-webtoon
Last synced: 11 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/neulhan/piro-webtoon
- Owner: Neulhan
- Created: 2020-01-27T01:52:23.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2020-01-28T08:05:15.000Z (over 6 years ago)
- Last Synced: 2025-04-05T02:33:36.301Z (about 1 year ago)
- Language: Python
- Size: 5.8 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ๐บpiro_crawling
```python
print('ํผ๋ก๊ทธ๋๋ฐ 12๊ธฐ ํฌ๋กค๋ง ๊ฐ์ ํ์ด์ง์
๋๋ค.')
```
## ์ฌ์ฉํ๊ฒฝ
- jupyter notebook (.ipynb)
- google colaboratory (.ipynb)
## request
ํ์ด์ฌ ์ฝ๋๋ฅผ ํตํด์ ์น ํ์ด์ง์ HTTP ์์ฒญ์ ๋ณด๋
### urllib
```python
import urllib
urllib_case = urllib.request.urlopen(url)
html_text = urllib_case.read().decode("utf-8")
```
[ํ์ด์ฌ binary ํ์ผ์ ๋ํด](https://wikidocs.net/15101)
### requests
```python
import requests
html_text = requests.get(url).text
# html_text ์๋ str ํ์์ html ๋ฌธ์๊ฐ ๋ด๊ธด๋ค
```
[urllib vs requests ์ ๋ฆฌ๋ ๋ธ๋ก๊ทธ](https://brownbears.tistory.com/299)
## bs4.Beautifulsoup
[beautifulsoup๋ ๋ฌด์์ธ์ง์ ๋ํด ์ ์ ๋ฆฌ๋ ๋ธ๋ก๊ทธ](https://velog.io/@neulhan/%EC%B4%88%EB%B3%B4%EB%8F%84-%ED%95%A0-%EC%88%98-%EC%9E%88%EB%8A%94-python%EC%9C%BC%EB%A1%9C-%EB%84%A4%EC%9D%B4%EB%B2%84%EC%97%90%EC%84%9C-%EC%8B%A4%EC%8B%9C%EA%B0%84-%EA%B2%80%EC%83%89%EC%96%B4-%EC%A0%95%EB%B3%B4-%EA%B0%80%EC%A0%B8%EC%98%A4%EA%B8%B0-2-BeautifulSoup-1uk4asqet0)
```python
from bs4 import BeautifulSoup as bs
# beautiful soup ๊ฐ์ฒด ์์ฑ
soup = bs(html_text, 'html.parser')
# html ์์์ ์ ํ์๋ฅผ ํตํด ํน์ ํ๊ทธ๋ค ๊ฐ์ ธ์ค๊ธฐ
selected_elements = soup.select('selector')
# ๊ฐ์ ธ์จ ํ๊ทธ๋ค ํ์ฉํ๊ธฐ
# 1. .text๋ก ๋ด์ฉ ์ถ์ถ
# 2. .attrs
# 3. .get
```
## pandas ํ์ฉํ๊ธฐ