{"id":13305910,"url":"https://github.com/Soumya117/finnparser","last_synced_at":"2025-03-10T14:32:21.896Z","repository":{"id":42366365,"uuid":"197197492","full_name":"Soumya117/finnparser","owner":"Soumya117","description":null,"archived":false,"fork":false,"pushed_at":"2023-05-01T20:36:50.000Z","size":4455,"stargazers_count":0,"open_issues_count":3,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-07-29T18:36:50.018Z","etag":null,"topics":["azure","azure-blob","dockefile","docker-compose","elasticsearch","filebeats","finn-no","gcloud","google-maps-api","grafana","grafana-prometheus","kibana","nginx","prometheus","prometheus-metrics","python-web-scraper","python3","realstate-ads","scan-websites","uwsgi-nginx"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Soumya117.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-07-16T13:16:47.000Z","updated_at":"2022-04-08T17:33:06.000Z","dependencies_parsed_at":"2024-10-23T13:01:53.991Z","dependency_job_id":null,"html_url":"https://github.com/Soumya117/finnparser","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Soumya117%2Ffinnparser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Soumya117%2Ffinnparser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Soumya117%2Ffinnparser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Soumya117%2Ffinnparser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Soumya117","download_url":"https://codeload.github.com/Soumya117/finnparser/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242868518,"owners_count":20198495,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["azure","azure-blob","dockefile","docker-compose","elasticsearch","filebeats","finn-no","gcloud","google-maps-api","grafana","grafana-prometheus","kibana","nginx","prometheus","prometheus-metrics","python-web-scraper","python3","realstate-ads","scan-websites","uwsgi-nginx"],"created_at":"2024-07-29T17:55:20.119Z","updated_at":"2025-03-10T14:32:21.510Z","avatar_url":"https://github.com/Soumya117.png","language":"Python","readme":"# finnazureflaskapp\n\nThe python project will parse finn.no and present the results.\nThe parsing includes:\n1. Scanning for new ads of realstates.\n2. Scanning each realstate ad and parsing the price.\n3. Scanning for sold houses.\n4. Listing all the visnings.\n\nThe ads are stored in links.json along with the time stamp. So its possible to find out when the link was first seen on finn. The prices are stored in pris.json with the links and timestamp. So if 1 ad contains two different prices at 2 different timestamps, it is easier to find out the change in price.\n\nIt also tells you the SOLD status. Finn.no doesnt display the sold houses in the search. If the link is saved somewhere it is easy to browse it and see the sold houses. The webapp scans links and records the status changes.\nAdditionaly it also collects all the visnings.\n\nThe app is triggered by an azure timer function so it collects data in the background and store the json in the Azure Storage Blob.\n\nDEPLOYMENT:\u003cbr /\u003e\nLOCAL\n\n1. Clone the project.\n2. Enter the project directory.\n3. Enter following commands: \u003cbr /\u003e\n   python3 -m venv venv \u003cbr /\u003e\n   source venv/bin/activate \u003cbr /\u003e\n   pip install -r requirements.txt \u003cbr /\u003e\n4. Run the project by entering the following command:   \n   FLASK_APP=application.py flask run \u003cbr /\u003e\n   This will start the flask and you can browse your app on the localhost.\n\nMICROSOFT AZURE\n\n1. Get a free azure account (or paid if you can).\n2. Enter the cloud shell.\n3. Clone the project.\n4. Enter the project directory and run the following command: \n   az webapp up -n \"app-name\"\n5. Using cloud shell, you can makes changes to the code and then deploy it back with the same command.\n6. Since now the /price request takes long time to load, the app timed out. So i changed the startup configuration using        command: az webapp config set --resource-group \"resource-group\" --name \"app-name\" --startup-file \"gunicorn --bind=0.0.0.0    --timeout 2000 application:app\"\n\nGOOGLE CLOUD \n\n1. For google cloud, i need to create app.yaml and add an entrypoint to main.py or application.py.\n2. App is deployed on the cloud locally using :\n   gcloud app deploy\n3. The flask app is now deployed on the google app engine and the azure timer function sends the request to scan every 3 hours.\n\n****************************************************************************************************************************\n\nUPDATE: I am not running this app as an app service on GCP. I am running it as a daemon inside a GCP vm instance. The reason for this change was it was not really handling the huge requests well. It was getting timedout. \nFor running it inside a vm instance as a daemon:\n1. Create a VM instance.\n2. Git clone this project there.\n3. Change the required keys (gmaps and azure blob)\n4. Create a conf file inside /etc/supervisor/conf.d/ as finnazureflaskapp.conf. \n5. Write the conf file with following:\u003cbr /\u003e\n   [program:finn-flask-app]\u003cbr /\u003e\n   directory=/home/soumya/python/finnazureflaskapp\u003cbr /\u003e\n   command=python main.py\u003cbr /\u003e\n   autostart=true\u003cbr /\u003e\n   autorestart=true\u003cbr /\u003e\n   stopsignal=INT\u003cbr /\u003e\n   stopasgroup=true\u003cbr /\u003e\n   killasgroup=true\u003cbr /\u003e\n6. Run sudo /usr/bin/supervisord. Make sure no other supervisor processes are running. (sudo ps -ax | grep 'supervisor').\n7. Enter sudo supervisorctl and check your app.\n\nAfter this architectural change, there is a timer daemon running inside the vm that will schedule the scans. \n\n****************************************************************************************************************************\n\nUPDATE Again: So i changed the arhitecture a little bit. I wanted to run prometheus, grafana and elk service along with my python web app. So i found out that good solution will be to dockarize it. So every thing is now running as dockers.\nChanges: \n1. docker-compose.yml contains configuration of the webapp.\n2. docker-compose-infra.yml contains configuration of prometheus, grafana, elastic search and kibana.\n3. To start the dockers, just navigate to the project folder and run sudo bash start.sh.This script will perform a docker        build and fire up the required containers.\n4. After all the dockers are running, run install_filebeat.sh to install and start filebeat service. Change the configuration in /etc/filebeat/filebeat.yml and restart filebeat.\n\n****************************************************************************************************************************\n\nServices and ports\u003cbr /\u003e\u003cbr /\u003e\nPrometheus: 9090\u003cbr /\u003e\nGrafana: 3000\u003cbr /\u003e\nElasticSearch: 9200\u003cbr /\u003e\nKibana: 5601\u003cbr /\u003e\n\nSince the dockers are running inside the VM, i have to open some ports to access prometheus and grafana. That can be done by enabling some firewalls. For ex: \u003cbr /\u003e\ngcloud compute firewall-rules create allow-http-5000 \\\n    --allow tcp:5000 \\\n    --source-ranges 0.0.0.0/0 \\\n    --target-tags http-server \\\n    --description \"Allow port 5000 access to http-server\" \u003cbr /\u003e\nTo tail docker-compose logs: sudo docker-compose logs --tail=\"all\" -f\nARCHITECTURE\n\n![alt text](https://github.com/Soumya117/finnazureflaskapp/blob/master/app/Selection_152.png) \u003cbr /\u003e\u003cbr /\u003e\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSoumya117%2Ffinnparser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSoumya117%2Ffinnparser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSoumya117%2Ffinnparser/lists"}