{"id":15980416,"url":"https://github.com/thinh-vu/vnnews","last_synced_at":"2025-11-11T04:30:15.423Z","repository":{"id":255441298,"uuid":"567924172","full_name":"thinh-vu/vnnews","owner":"thinh-vu","description":"A Python package that helps capture news updates from top Vietnamese news sites","archived":false,"fork":false,"pushed_at":"2023-04-22T15:08:53.000Z","size":93,"stargazers_count":1,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-29T05:18:16.717Z","etag":null,"topics":["investment","news-crawler","sentiment-analysis","vietnamese-language"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thinh-vu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-18T23:02:16.000Z","updated_at":"2024-11-16T15:19:30.000Z","dependencies_parsed_at":null,"dependency_job_id":"20328ab1-ce6a-49f4-8e39-0917fea411cc","html_url":"https://github.com/thinh-vu/vnnews","commit_stats":{"total_commits":15,"total_committers":1,"mean_commits":15.0,"dds":0.0,"last_synced_commit":"8597f893820e91b96cb9766b63559e6da727b0fd"},"previous_names":["thinh-vu/vnnews"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thinh-vu%2Fvnnews","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thinh-vu%2Fvnnews/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thinh-vu%2Fvnnews/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thinh-vu%2Fvnnews/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thinh-vu","download_url":"https://codeload.github.com/thinh-vu/vnnews/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239587243,"owners_count":19663892,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["investment","news-crawler","sentiment-analysis","vietnamese-language"],"created_at":"2024-10-08T00:05:01.920Z","updated_at":"2025-11-11T04:30:13.303Z","avatar_url":"https://github.com/thinh-vu.png","language":"Jupyter Notebook","funding_links":["https://patreon.com/thinhvu?utm_medium=clipboard_copy\u0026utm_source=copyLink\u0026utm_campaign=creatorshare_creator"],"categories":[],"sub_categories":[],"readme":"# vnnews crawler\nA Python package that helps crawl updates from top Vietnamese news providers.\n\n\u003cdiv id=\"badges\" align=\"center\"\u003e\n\u003cimg src=\"https://img.shields.io/pypi/pyversions/vnnews?logoColor=brown\u0026style=plastic\" alt= \"Version\"/\u003e\n\u003cimg src=\"https://img.shields.io/pypi/dm/vnnews\" alt=\"Download Badge\"/\u003e\n\u003cimg src=\"https://img.shields.io/github/last-commit/thinh-vu/vnnews\" alt=\"Commit Badge\"/\u003e\n\u003cimg src=\"https://img.shields.io/github/license/thinh-vu/vnnews?color=red\" alt=\"License Badge\"/\u003e\n\u003c/div\u003e\n\n# II. REFERENCES\n## 2.1. How to use this package?\n- You can install the latest `vnnews` crawler version from source with the following command:\n`pip install git+https://github.com/thinh-vu/vnnews.git@main`\n- Install the stable version: `pip install vnnews`\n_(*) You might need to insert a `!` before your command when running terminal commands on Google Colab._\n\n- To start using functions, you need to import them: `from vnnews import *`\n\n## 2.2. List of Popular Online news for investors\n1. [VN Express](https://vnexpress.net/)\n2. [Tuổi trẻ Online](https://tuoitre.vn/)\n3. [CafeF](https://cafef.vn/)\n4. [Cafebiz](https://cafebiz.vn/)\n5. [Kinh tế Sài Gòn Online](https://thesaigontimes.vn/)\n6. [VN Economy](https://vneconomy.vn/)\n7. [Pháp Luật Tp.HCM](https://plo.vn/)\n8. [Đầu tư Online](https://baodautu.vn/)\n9. [Nhịp cầu đầu tư](https://nhipcaudautu.vn/)\n10. [Diễn đàn doanh nghiệp](https://diendandoanhnghiep.vn/)\n\n\n\u003cdetails\u003e\u003csummary\u003eSee more\u003c/summary\u003e\n\u003cp\u003e\n\n11. [Diễn đàn kinh tế Việt Nam - Vietnamnet](https://vef.vn/)\n12. [Forbes Việt Nam](https://forbes.vn/)\n13. [Vietstock](https://vietstock.vn/)\n14. [Tin nhanh chứng khoán](https://www.tinnhanhchungkhoan.vn/)\n15. [Cafe Land](https://cafeland.vn/)\n16. [Kenh14](https://kenh14.vn/)\n17. [Dân trí](https://dantri.com.vn/)\n18. [Thanh niên](https://thanhnien.vn/)\n19. [Vietnamnet](http://vietnamnet.vn/)\n20. [Nhân dân điện tử](http://www.nhandan.com.vn/)\n21. [Lao động](http://laodong.com.vn/)\n22. [Đời sống \u0026 pháp luật](http://www.doisongphapluat.com/)\n\n\u003c/p\u003e\n\u003c/details\u003e\n\n\n\n## 2.3. Function references\n\n- `url_extract (url, key, tag_class='', type='link', bs_on=True, user_agent='Mozilla/5.0 (Windows NT 10.0; WOW64; rv:11.0) Gecko/20100101')`\n    - Purpose: Extract article info from a news source using BeautifulSoup to pull data from HTML/XML web page.\n    - Arguments:\n      - url (:obj:`str`, required): url of the target news source. Eg. 'https://cafef.vn/'\n      - key (:obj:`str`, required): HTML tag which contains the information that you want to extract. Eg. 'h3', 'article', 'div'\n      - tag_class (:obj:`str`, required): The HTML class attribute specifies one or more class names for an element. Eg. 'pdate' in the tag \u003cspan class=\"pdate\"\u003e19-11-2022 - 15:32 PM \u003c/span\u003e on CafeF.\n      - type (:obj:`str`, optional): 'link' as default to extract only the article link from a news homepage. Use blank value `''` when extracting article detail on the article page.\n      - bs_on (:obj:`str`, optional): `True` as default. Input blank `''` when the issue is raised.\n      - user_agent (:obj:`str`, optional): The default value for Desktop has been provided. You can find more user agent value here: https://developers.whatismybrowser.com/useragents/explore/operating_system_name/\n\n- `fix_url(host, url)`\n  - Purpose: Extract article info from a news source using BeautifulSoup to pull data from HTML/XML web page.\n  - Arguments:\n    - host (:obj:`str`, required): the host name of the news source. Eg. 'https://vneconomy.vn\n    - url (:obj:`str`, required): the url string of the target news source. This might not contain the host at the beginning. Eg. '/de-viet-nam-thanh-digital-hub-cua-khu-vuc-vao-nam-2030-e290.htm'\n\n## 2.4. Let's get our hands dirty\n1. [VN Express](https://vnexpress.net/)\n   - Get the list of article urls: `url_extract('https://vnexpress.net/kinh-doanh', key='h3')`\n   - Extract article details: `url_extract('https://vnexpress.net/thuong-mai-va-dau-tu-ben-vung-se-giup-apec-ung-pho-nguy-co-suy-thoai-4538015.html', key='span', tag_class='date', type='')`\n2. [Tuổi trẻ Online](https://tuoitre.vn/)\n   - Get the list of article urls: `url_extract('https://tuoitre.vn/phap-luat.htm', key='h3')`\n   - Extract article details: `url_extract('https://tuoitre.vn/gap-thu-tuong-xuc-dong-chuyen-co-giao-mam-non-miet-mai-lam-thien-nguyen-cho-vung-xa-20221119175021292.htm', key='div', tag_class='date-time', type='')`\n3. [CafeF](https://cafef.vn/)\n   - Get the list of article urls:  `url_extract('https://cafef.vn/bat-dong-san.chn', key='h3', type='link')`\n   - Extract article details: `url_extract('https://cafef.vn/dau-se-la-phan-khuc-bds-giu-duoc-nhiet-trong-thoi-gian-toi-2022111913083069.chn', key='span', tag_class='pdate', type='')`\n4. [Cafebiz](https://cafebiz.vn/)\n   - Get the list of article urls:  `url_extract('https://cafebiz.vn/vi-mo.chn', key='h3', type='link', bs_on='')`\n   - Extract article details: `url_extract('https://cafebiz.vn/tai-sao-nha-o-my-la-tai-san-con-o-nhat-ban-thi-lai-chang-khac-gi-hang-tieu-dung-176221119095831295.chn', key='span', tag_class='time', type='')`\n\n\u003cdetails\u003e\u003csummary\u003eSee more\u003c/summary\u003e\n\u003cp\u003e\n\n5. [Kinh tế Sài Gòn Online](https://thesaigontimes.vn/)\n   - Get the list of article urls:  `url_extract('https://thesaigontimes.vn/', key='h3', type='link', bs_on='')`\n   - Extract article details: `url_extract('https://thesaigontimes.vn/kinh-te-tuan-hoan-mo-ra-nhung-mo-hinh-kinh-doanh-moi/', key='time', tag_class='', type='')`\n6. [VN Economy](https://vneconomy.vn/)\n   - Get the list of article urls:  `url_extract('https://vneconomy.vn/', key='h3', type='link', bs_on=False)`\n   - Extract article details: `url_extract('https://vneconomy.vn/xuat-khau-det-may-van-tu-tin-voi-muc-tieu-42-ty-usd.htm', key='div', tag_class='detail__meta', type='')`\n7. [Pháp Luật Tp.HCM](https://plo.vn/)\n   - Get the list of article urls:  `url_extract('https://m.plo.vn/phap-luat/', key='h3', type='link')[0][1]`\n   - Extract article details: `test = url_extract('https://plo.vn/dieu-tra-trung-tam-dang-kiem-cap-so-song-sinh-cho-xe-tai-post705918.html', key='time', tag_class='', type='')`\n8. [Đầu tư Online](https://baodautu.vn/)\n   - Get the list of article urls:  `url_extract('https://baodautu.vn/', key='article', type='link', bs_on='')`\n   - Extract article details: `url_extract('https://baodautu.vn/nguoi-dan-rong-ra-cau-cuu-khi-nao-co-so-do-tu-du-an-cua-cong-ty-bach-dat-an-d177946.html', key='span', tag_class='post-time', type='')`\n9.  [Nhịp cầu đầu tư](https://nhipcaudautu.vn/)\n   - Get the list of article urls:  `url_extract('https://m.nhipcaudautu.vn/kinh-doanh/', key='article', type='link', bs_on='', user_agent='Mozilla/5.0 (iPhone; CPU iPhone OS 15_5 like Mac OS X)')`\n   - Extract article details: `url_extract('https://m.nhipcaudautu.vn/ti-le-don-bay-tai-chinh-toan-thi-truong-giam-dan-tu-quy-i-3348999/', key='span', tag_class='date-post', type='')`\n10.  [Diễn đàn doanh nghiệp](https://diendandoanhnghiep.vn/)\n     - Get the list of article urls:  `url_extract('https://diendandoanhnghiep.vn/', key='h3', type='link', bs_on='')`\n     - Extract article details: `url_extract('https://diendandoanhnghiep.vn/https-diendandoanhnghiep-vn-dien-mat-troi-mai-nha-can-hoan-thien-co-che-ho-tro-doanh-nghiep-phat-trien-225626-html-e313.html', key='span', tag_class='created_time', type='')`\n11.  [Diễn đàn kinh tế Việt Nam - Vietnamnet](https://vef.vn/)\n     - Get the list of article urls:  `url_extract('https://vef.vn/diem-nong/', key='article', type='link', bs_on='')`\n     - Extract article details: ``\n12. [Forbes Việt Nam](https://forbes.vn/)\n    - Get the list of article urls:  `url_extract('https://forbes.vn', key='h3', type='link', bs_on='')`\n    - Extract article details: `url_extract('https://forbes.vn/m-village-cua-nguyen-hai-ninh-xay-lang-trong-pho/', key='div', tag_class='forbes-single__heading-time', type='')`\n13.  [Vietstock](https://vietstock.vn/)\n     - Get the list of article urls:  `url_extract('https://vietstock.vn/', key='h4', type='link', bs_on='')`\n     - Extract article details: `url_extract('https://vietstock.vn/2022/11/thieu-hut-iphone-14-nguoi-dung-viet-lua-chon-iphone-doi-cu-4264-1017483.htm', key='span', tag_class='date', type='')`\n14. [Tin nhanh chứng khoán](https://www.tinnhanhchungkhoan.vn/)\n    - Get the list of article urls: Doesn't work `url_extract('https://m.tinnhanhchungkhoan.vn/', key='h2', type='link', bs_on='')`\n    - Extract article details: `url_extract('https://www.tinnhanhchungkhoan.vn/big-trends-sau-con-mua-troi-lai-sang-post310328.html', key='time', tag_class='', type='')`\n15. [Cafe Land](https://cafeland.vn/)\n    - Get the list of article urls:  `url_extract('https://cafeland.vn/', key='h3', type='link', bs_on='')`\n    - Extract article details: `url_extract('https://cafeland.vn/phan-tich/bien-doi-khi-hau-dang-leo-thang-nhung-doanh-nghiep-chu-yeu-doi-pho-114941.html', key='div', tag_class='info-date right', type='')`\n16. [Kenh14](https://kenh14.vn/)\n    - Get the list of article urls:  `url_extract('https://m.kenh14.vn/doi-song.chn', key='h3', type='link')`\n    - Extract article details: `url_extract('https://m.kenh14.vn/phia-sau-nhung-gen-z-okela-co-luc-that-bai-co-luc-khong-on-lam-nhung-chua-bao-gio-ngung-no-luc-20221119153833146.chn', key='span', tag_class='kbwcm-time', type='')`\n17. [Dân trí](https://dantri.com.vn/)\n    - Get the list of article urls:  `url_extract('https://dantri.com.vn/', key='h3', type='link', bs_on='')`\n    - Extract article details: `url_extract('https://dantri.com.vn/the-gioi/moscow-cao-buoc-ukraine-kich-dong-xung-dot-quan-su-nga-nato-20221119145209276.htm', key='time', tag_class='author-time', type='')`\n18. [Thanh niên](https://thanhnien.vn/)\n    - Get the list of article urls:  ``\n    - Extract article details: ``\n19. [Vietnamnet](http://vietnamnet.vn/)\n    - Get the list of article urls:  ``\n    - Extract article details: ``\n20. [Nhân dân điện tử](http://www.nhandan.com.vn/)\n    - Get the list of article urls:  ``\n    - Extract article details: ``\n21. [Lao động](http://laodong.com.vn/)\n    - Get the list of article urls:  ``\n    - Extract article details: ``\n22. [Đời sống \u0026 pháp luật](http://www.doisongphapluat.com/)\n    - Get the list of article urls:  ``\n    - Extract article details: ``\n\n\u003c/p\u003e\n\u003c/details\u003e\n\n# III. APENDICES\n- [Demo video](https://youtu.be/S_Jx_TgSTTw): How to select the key \n- Explore User Agents by Operating System: [here](https://developers.whatismybrowser.com/useragents/explore/operating_system_name/)\n\n# IV. 🙋‍♂️ CONTACT INFORMATION\nYou can contact me at one of my social network profiles:\n\n\u003cdiv id=\"badges\" align=\"center\"\u003e\n  \u003ca href=\"https://www.linkedin.com/in/thinh-vu\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/LinkedIn-blue?style=for-the-badge\u0026logo=linkedin\u0026logoColor=white\" alt=\"LinkedIn Badge\"/\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://www.messenger.com/t/mr.thinh.ueh\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Messenger-00B2FF?style=for-the-badge\u0026logo=messenger\u0026logoColor=white\" alt=\"Messenger Badge\"/\u003e\n  \u003ca href=\"https://www.youtube.com/channel/UCYgG-bmk92OhYsP20TS0MbQ\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/YouTube-red?style=for-the-badge\u0026logo=youtube\u0026logoColor=white\" alt=\"Youtube Badge\"/\u003e\n  \u003c/a\u003e\n  \u003c/a\u003e\n    \u003ca href=\"https://github.com/thinh-vu\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/GitHub-100000?style=for-the-badge\u0026logo=github\u0026logoColor=white\" alt=\"Github Badge\"/\u003e\n  \u003c/a\u003e\n\u003c/div\u003e\n\n---\n\nIf you want to support my open-source projects, you can \"buy me a coffee\" via [Patreon](https://patreon.com/thinhvu?utm_medium=clipboard_copy\u0026utm_source=copyLink\u0026utm_campaign=creatorshare_creator) or Momo e-wallet (VN). Your support will help to maintain my blog hosting fee \u0026 to develop high-quality content.\n\n![momo-qr](https://github.com/thinh-vu/vnstock/blob/main/src/momo-qr-thinhvu.jpeg?raw=true)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthinh-vu%2Fvnnews","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthinh-vu%2Fvnnews","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthinh-vu%2Fvnnews/lists"}