An open API service indexing awesome lists of open source software.

https://github.com/neulhan/piro-webtoon


https://github.com/neulhan/piro-webtoon

Last synced: 11 months ago
JSON representation

Awesome Lists containing this project

README

          

# ๐Ÿ—บpiro_crawling

```python
print('ํ”ผ๋กœ๊ทธ๋ž˜๋ฐ 12๊ธฐ ํฌ๋กค๋ง ๊ฐ•์˜ ํŽ˜์ด์ง€์ž…๋‹ˆ๋‹ค.')
```

## ์‚ฌ์šฉํ™˜๊ฒฝ
- jupyter notebook (.ipynb)
- google colaboratory (.ipynb)

## request
ํŒŒ์ด์ฌ ์ฝ”๋“œ๋ฅผ ํ†ตํ•ด์„œ ์›น ํŽ˜์ด์ง€์— HTTP ์š”์ฒญ์„ ๋ณด๋ƒ„

### urllib
```python
import urllib

urllib_case = urllib.request.urlopen(url)
html_text = urllib_case.read().decode("utf-8")
```
[ํŒŒ์ด์ฌ binary ํŒŒ์ผ์— ๋Œ€ํ•ด](https://wikidocs.net/15101)

### requests
```python
import requests

html_text = requests.get(url).text

# html_text ์—๋Š” str ํ˜•์‹์˜ html ๋ฌธ์„œ๊ฐ€ ๋‹ด๊ธด๋‹ค
```

[urllib vs requests ์ •๋ฆฌ๋œ ๋ธ”๋กœ๊ทธ](https://brownbears.tistory.com/299)

## bs4.Beautifulsoup

[beautifulsoup๋ž€ ๋ฌด์—‡์ธ์ง€์— ๋Œ€ํ•ด ์ž˜ ์ •๋ฆฌ๋œ ๋ธ”๋กœ๊ทธ](https://velog.io/@neulhan/%EC%B4%88%EB%B3%B4%EB%8F%84-%ED%95%A0-%EC%88%98-%EC%9E%88%EB%8A%94-python%EC%9C%BC%EB%A1%9C-%EB%84%A4%EC%9D%B4%EB%B2%84%EC%97%90%EC%84%9C-%EC%8B%A4%EC%8B%9C%EA%B0%84-%EA%B2%80%EC%83%89%EC%96%B4-%EC%A0%95%EB%B3%B4-%EA%B0%80%EC%A0%B8%EC%98%A4%EA%B8%B0-2-BeautifulSoup-1uk4asqet0)
```python
from bs4 import BeautifulSoup as bs

# beautiful soup ๊ฐ์ฒด ์ƒ์„ฑ
soup = bs(html_text, 'html.parser')

# html ์•ˆ์—์„œ ์„ ํƒ์ž๋ฅผ ํ†ตํ•ด ํŠน์ • ํƒœ๊ทธ๋“ค ๊ฐ€์ ธ์˜ค๊ธฐ
selected_elements = soup.select('selector')

# ๊ฐ€์ ธ์˜จ ํƒœ๊ทธ๋“ค ํ™œ์šฉํ•˜๊ธฐ
# 1. .text๋กœ ๋‚ด์šฉ ์ถ”์ถœ
# 2. .attrs
# 3. .get
```
## pandas ํ™œ์šฉํ•˜๊ธฐ