{"id":18513833,"url":"https://github.com/timjjting/test-for-gilacloud","last_synced_at":"2025-05-14T12:35:19.114Z","repository":{"id":257951841,"uuid":"166158377","full_name":"TimJJTing/test-for-gilacloud","owner":"TimJJTing","description":"GliaCloud 考題 for Backend Engineer  ","archived":false,"fork":false,"pushed_at":"2019-01-17T04:31:25.000Z","size":9,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-17T03:34:26.914Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TimJJTing.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-01-17T04:09:08.000Z","updated_at":"2019-01-17T04:31:26.000Z","dependencies_parsed_at":"2024-10-17T04:40:05.050Z","dependency_job_id":null,"html_url":"https://github.com/TimJJTing/test-for-gilacloud","commit_stats":null,"previous_names":["timjjting/test-for-gilacloud"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TimJJTing%2Ftest-for-gilacloud","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TimJJTing%2Ftest-for-gilacloud/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TimJJTing%2Ftest-for-gilacloud/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TimJJTing%2Ftest-for-gilacloud/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TimJJTing","download_url":"https://codeload.github.com/TimJJTing/test-for-gilacloud/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254142739,"owners_count":22021590,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T15:40:50.914Z","updated_at":"2025-05-14T12:35:19.057Z","avatar_url":"https://github.com/TimJJTing.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# GliaCloud 考題 for Backend Engineer  \n## Part 1\n### 1) counting \\(counting.py\\)\nDone.  \nI use regex and the build-in method sorted to solve this problem and have tried to handle all possibilities.\n### 2) integration \\(integration.py\\)\nDone.  \nanonymous(intercept) * step = y * x = area of each little interval  \nSo the sum of little areas is the integration of the anonymous function\n### 3) multiples of 3 or 5 \\(multiples.py\\)\nDone.  \nI solved this with my intuition given a limited time. I think there are some better approaches.\n## Part 2\na) 請用 Python 寫出一個可以爬 ptt /reddit 任意看板 (https://www.ptt.cc) 的爬蟲\n程式，可以使用任意 Python 套件\nDone.  \nPTT crawler implemented with scrapy  \nPTT article can have complex patterns, I've tried my best to handle these possible patterns.\n### Installation\n\n    pip install scrapy\n    cd pttcrawler\n\n### Example Usage\n\nCommand pattern:  \n\n    scrapy crawl ptt \u003c-a argument=value\u003e \u003c-o outputfile.json\u003e  \n\nExample 1: Crawl 5 articles from PTT Goossiping and dump the data into output.json\n\n    scrapy crawl ptt -a max_articles=5 -a board='Gossiping' -o output.json\n\nExample 2: Crawl 5 articles that title contain 丹丹 from PTT Goossiping and dump the data into output.json  \n\n    scrapy crawl ptt -a max_articles=5 -a board='Gossiping' -a keyword=丹丹 -o output.json\n\nExample 3: Crawl an article from url (https://www.ptt.cc/bbs/WomenTalk/M.1494689998.A.2AA.html) and dump the data into output.json  \n\n    scrapy crawl ptt -a test_url=https://www.ptt.cc/bbs/WomenTalk/M.1494689998.A.2AA.html -o output.json\n\n### Available Arguments\n**`max_articles`**: Maximium articles to crawl. *default=5*  \n**`max_retry`**: Maximium retries during the process. *default=5*  \n**`board`**: PTT board to crawl. *default='HatePolitics'*  \n**`keyword`**: If specified, the spider will only retrieve articles that has the given keyword in its title. *optional argument*    \n**`test_url`**: If set, only the article in the given url will be crawled and all arguments above will be ignored. This argument is especially helpful when debugging. *optional argument*  \n**`get_content`**: If set *False*, content of articles will not be retrieved. This helps to reduce the size of dataset if you are not interested in them. *default=True*  \n**`get_comments`**: If set *False*, comments of articles will not be retrieved and article scores will not be calculated. This helps to reduce the size of dataset if you are not interested in them. *default=True*  ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftimjjting%2Ftest-for-gilacloud","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftimjjting%2Ftest-for-gilacloud","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftimjjting%2Ftest-for-gilacloud/lists"}