{"id":15715612,"url":"https://github.com/francois-lenne/monitor_device_data","last_synced_at":"2026-05-06T13:12:50.974Z","repository":{"id":253143454,"uuid":"842547881","full_name":"Francois-lenne/monitor_device_data","owner":"Francois-lenne","description":"The project goal is to retrieve the activity the time activiy of my mac per appllication and load it in R2 the data lake of Cloudflaire using python","archived":false,"fork":false,"pushed_at":"2024-09-08T16:04:49.000Z","size":89,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-30T20:18:08.440Z","etag":null,"topics":["cloudflare","data-engineering","pandas","python","r2","sqlite"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Francois-lenne.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-14T15:16:15.000Z","updated_at":"2024-09-08T16:04:53.000Z","dependencies_parsed_at":"2024-08-14T19:04:43.521Z","dependency_job_id":"3d81020d-11ad-4015-9b6d-9db18c897fa4","html_url":"https://github.com/Francois-lenne/monitor_device_data","commit_stats":null,"previous_names":["francois-lenne/monitor_device_data"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Francois-lenne%2Fmonitor_device_data","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Francois-lenne%2Fmonitor_device_data/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Francois-lenne%2Fmonitor_device_data/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Francois-lenne%2Fmonitor_device_data/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Francois-lenne","download_url":"https://codeload.github.com/Francois-lenne/monitor_device_data/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246372743,"owners_count":20766635,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cloudflare","data-engineering","pandas","python","r2","sqlite"],"created_at":"2024-10-03T21:42:09.825Z","updated_at":"2026-05-06T13:12:50.917Z","avatar_url":"https://github.com/Francois-lenne.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# Project Goal \n\n\nThe project goal is to retrieve the activity the time activiy of my mac per appllication and load it in R2 the data lake of Cloudflaire using python \n\n\n# Stack \n\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://go-skill-icons.vercel.app/\"\u003e\n    \u003cimg src=\"https://go-skill-icons.vercel.app/api/icons?i=py,pandas,sqlite,cloudflare,bash,git,github\" /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\n\n\n\n# Architecture Schema \n\n\n![Copie de Schema play-gcp-bq](https://github.com/user-attachments/assets/90f9256f-7ecc-4efc-9764-dc0e74c8247e)\n\n\n\n# Technical explanation \n\n\nThe program is contain on main.py with 3 functions that i will comment just below \n\n\n\n## test_before_retrieve\n\n\n```py\ndef test_before_retrieve():\n    system_os = platform.system()\n\n    if system_os == 'Darwin':\n        print(\" It's the good operating system\")\n    else:\n        raise ValueError(\"The operating system need to be a mac os\")\n\n\n    user_dir = os.path.expanduser('~')\n\n\n    pattern = f\"{user_dir}/Library/Application Support/Knowledge/knowledgeC.db\"\n    matches = glob.glob(pattern)\n\n    if len(matches) != 1:\n        raise ValueError(\"The database file was not found\")\n    else:\n        print(\"The database file was found\")\n\n    return matches[0]\n\n```\n\n\nThis function have to main goal to verify if the computer is a mac (mandatory for this project) and then check if the db is accessible or not if one of the condition is wrong the program stop and raise an error else it's printing in the log file that i have in my local repo. Also, the repo get the path to the db (we do a the expand user to get the beginning example 'Users/username') the function return the path to the sqlite db.\n\n\n\n\n### retrieve_data\n\n\n```py\n\ndef retrieve_data():\n\n\n    db_path = test_before_retrieve()\n\n    print(f\"Trying to connect to database at: {db_path}\")\n\n\n\n    try:\n        conn = sqlite3.connect(db_path)\n        print(\"Connected to database\")\n\n        query = \"SELECT * FROM ZOBJECT\"\n\n        df = pd.read_sql_query(query, conn)\n\n        # Get the current date\n        current_date = datetime.datetime.now().strftime(\"%d_%m_%Y\")\n\n        # Create the file name pattern\n        file_name_pattern = f\"data_mac_{current_date}\"\n\n        # Create the file path\n        csv_file_path = os.path.join('files', f\"{file_name_pattern}.csv\")\n\n\n        df.to_csv(csv_file_path, index=False)\n\n        conn.close()\n    \n    except sqlite3.OperationalError as e:\n        print(f\"Error connecting to database: {e}\")\n        print(f\"Database path: {os.path.abspath(db_path)}\")\n        print(f\"Database path: {db_path}\")\n        print(f\"Current user: {os.getuid()}\")\n        raise ValueError(f\"Please check the database connection: {e}. Check the privilege for the IDE and for the terminal.\")\n\n    return df\n\n\n\n```\n\n\nThe function **Retrieve_data**, first retrieve all the line and columns in the knowledgeDB then load it in a pandas dataframe and finaly save in the local respository *files* as a csv with this pattern *data_mac_DD_MM_YYYY*\n\n\n\n### load_data_to_r2\n\n\n```py\nr2_endpoint = os.getenv('R2_ENDPOINT')\naccess_key = os.getenv('R2_ACCESS_KEY')\nsecret_key = os.getenv('R2_SECRET_KEY')\n\nprint(f\"R2 endpoint: {r2_endpoint}\")\n\ndef load_data_to_r2(r2_endpoint, access_key, secret_key):\n    # Création du client S3 compatible avec R2\n    s3 = boto3.client('s3',\n                      endpoint_url=r2_endpoint,\n                      aws_access_key_id=access_key,\n                      aws_secret_access_key=secret_key,\n                      config=Config(signature_version='s3v4'))\n\n    # Télécharger le fichier CSV dans un bucket R2\n\n    current_date = datetime.datetime.now().strftime(\"%d_%m_%Y\")\n\n\n    file_name_pattern = f\"data_mac_{current_date}\"\n    \n    csv_file_path = os.path.join('files', f\"{file_name_pattern}.csv\")\n    \n    bucket_name = 'usertime'\n    object_key = f\"{file_name_pattern}.csv\"\n\n    s3.upload_file(csv_file_path, bucket_name, object_key)\n    print(f\"DataFrame uploaded to R2 bucket {bucket_name} with key {object_key}\")\n\nload_data_to_r2(r2_endpoint, access_key, secret_key)\n\n```\n\n\nThe last function just retrieve in the *files* local repository the files of the day and load it into the R2 bucket\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffrancois-lenne%2Fmonitor_device_data","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffrancois-lenne%2Fmonitor_device_data","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffrancois-lenne%2Fmonitor_device_data/lists"}