{"id":28447186,"url":"https://github.com/stonecharioteer/canonical-interview-test","last_synced_at":"2026-04-28T17:32:02.484Z","repository":{"id":187507028,"uuid":"282482143","full_name":"stonecharioteer/canonical-interview-test","owner":"stonecharioteer","description":"This is my solution to the standard problem that canonical gives interview candidates.","archived":false,"fork":false,"pushed_at":"2020-07-26T05:59:14.000Z","size":39,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-06-06T11:09:38.337Z","etag":null,"topics":["canonical","debian","interview","python","ubuntu"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stonecharioteer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2020-07-25T16:28:11.000Z","updated_at":"2025-02-17T12:38:24.000Z","dependencies_parsed_at":null,"dependency_job_id":"03cb4a61-9263-4dde-8fbe-c753ba6b89b3","html_url":"https://github.com/stonecharioteer/canonical-interview-test","commit_stats":null,"previous_names":["stonecharioteer/canonical-interview-test"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/stonecharioteer/canonical-interview-test","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stonecharioteer%2Fcanonical-interview-test","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stonecharioteer%2Fcanonical-interview-test/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stonecharioteer%2Fcanonical-interview-test/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stonecharioteer%2Fcanonical-interview-test/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stonecharioteer","download_url":"https://codeload.github.com/stonecharioteer/canonical-interview-test/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stonecharioteer%2Fcanonical-interview-test/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32392291,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-28T14:34:11.604Z","status":"ssl_error","status_checked_at":"2026-04-28T14:32:37.009Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["canonical","debian","interview","python","ubuntu"],"created_at":"2025-06-06T11:09:40.628Z","updated_at":"2026-04-28T17:32:02.478Z","avatar_url":"https://github.com/stonecharioteer.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Canonical Interview Question: Debian Package Statistics\n\n## Instructions\n\nDebian uses *deb packages to deploy and upgrade software. The packages\nare stored in repositories and each repository contains the so called \"Contents\nindex\". The format of that file is well described here\nhttps://wiki.debian.org/RepositoryFormat#A.22Contents.22_indices\n\nYour task is to develop a python command line tool that takes the\narchitecture (amd64, arm64, mips etc.) as an argument and downloads the\ncompressed Contents file associated with it from a Debian mirror. The\nprogram should parse the file and output the statistics of the top 10\npackages that have the most files associated with them.\nAn example output could be:\n\n./package_statistics.py amd64\n\n1. \u003cpackage name 1\u003e         \u003cnumber of files\u003e\n2. \u003cpackage name 2\u003e         \u003cnumber of files\u003e\n......\n10. \u003cpackage name 10\u003e         \u003cnumber of files\u003e\n\nYou can use the following Debian mirror\nhttp://ftp.uk.debian.org/debian/dists/stable/main/. Please do try to\nfollow Python's best practices in your solution. Hint: there are tools\nthat can help you verify your code is compliant. In-line comments are\nappreciated.\n\nIt will be good if the code is accompanied by a 1-page report of the\nwork that you have done including the time you actually spent working on it.\n\nOnce started, please return your work in approximately 24 hours.\n\nNote: the focus is not to write the perfect Python code, but to see how\nyou'll approach the problem and how you organize your work.\n\n\n## Assumptions\n\nUbuntu currently ships Python3, so I will use it. Python 2 suppport is possible, but out of the scope of this solution.\n\nI am not using any 3rd party applications for the core application. The performance could be improved using other packages, or other versions of Python (perhaps PyPy).\n\n\n### Regarding `udeb` content indices\n\nI am treating the `udeb` files as additional sources, which can be included in the\nreport for an architecture by means of an `--include-udeb` flag.\nCheck the help output for more information.\n\n## Application Installation\n\nIf you would like to install this application into your python3 environment, run the following:\n\n```bash\npython3 setup.py install\n```\n\n*Note that the installation is not necessary to run this application*.\n\n## Usage\n\n### With Installation\n\nOnce `packstats` is installed, you can run it in one of two ways.\n\n```packstats --help```\n\nOr:\n\n```python -m packstats --help```\n\nThe second version is preferred in places where you would want to *ensure* the right version of python is being used, perhaps with a virtal environment.\n\n### Without Installation\n\nIf you want to run `packstats` without installation, use the helper file instead.\n\nEither modify permissions to make it executable and use a version of python3 to run it:\n\n```bash\nchmod +x package_statistics.py\n./package_statistics.py --help\n```\n\nOr, you can run it with the python command directly.\n\n```bash\npython3 package_statistics.py\n```\n\n## `packstats` CLI\n\nThe command line interface has a help command that teaches you what you can do with the tool.\n\n```\n$ python package_statistics.py --help\nusage: package_statistics.py [-h] [-m MIRROR_URL] [-u] [-c COUNT] [-i]\n                             [-o OUTPUT_DIR] [-r]\n                             arch\n\nA tool to get the package statistics by parsing a Contents Index (defined here\n- https://wiki.debian.org/RepositoryFormat#A.22Contents.22_indices)from a\ndebian mirror, given a system architecture.\n\npositional arguments:\n  arch                  the architecture of the Contents index you would like\n                        to parse.\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -m MIRROR_URL, --mirror_url MIRROR_URL\n                        Mirror URL from which to fetch the contents file.\n                        DEFAULT\n                        http://ftp.uk.debian.org/debian/dists/stable/main/\n  -u, --include-udeb    include udeb file for the given architecture. DEFAULT\n                        False\n  -c COUNT, --count COUNT\n                        number of packages to list. Use -1 to list all.\n                        DEFAULT 10\n  -i, --sort-increasing\n                        Sort package stats list by increasing number of files.\n                        DEFAULT False\n  -o OUTPUT_DIR, --output-dir OUTPUT_DIR\n                        a directory in which to store the downloaded contents\n                        indices. DEFAULT \u003ccurrent-directory\u003e\n  -r, --reuse-if-exists\n                        Reuses a content file if it has been downloaded\n                        previously and exists in the output directory.\n```\n\n\n## Examples\n\n### Getting `armel` Statistics\n\n```\n$ packstats armel\n\nNo.\tPackage Name                                      \tFile Count\n1.\tfonts/fonts-cns11643-pixmaps                      \t110999\n2.\tx11/papirus-icon-theme                            \t69475\n3.\tfonts/texlive-fonts-extra                         \t65577\n4.\tgames/flightgear-data-base                        \t62463\n5.\tdevel/piglit                                      \t49913\n6.\tdoc/trilinos-doc                                  \t49591\n7.\tx11/obsidian-icon-theme                           \t48829\n8.\tgames/widelands-data                              \t34984\n9.\tdoc/libreoffice-dev-doc                           \t33667\n10.\tmisc/moka-icon-theme                              \t33326\n```\n\n### Getting the top 25 packages\n\n```bash\n$ packstats -c 25 armel\nNo.\tPackage Name                                      \tFile Count\n1.\tfonts/fonts-cns11643-pixmaps                      \t110999\n2.\tx11/papirus-icon-theme                            \t69475\n3.\tfonts/texlive-fonts-extra                         \t65577\n4.\tgames/flightgear-data-base                        \t62463\n5.\tdevel/piglit                                      \t49913\n6.\tdoc/trilinos-doc                                  \t49591\n7.\tx11/obsidian-icon-theme                           \t48829\n8.\tgames/widelands-data                              \t34984\n9.\tdoc/libreoffice-dev-doc                           \t33667\n10.\tmisc/moka-icon-theme                              \t33326\n11.\tx11/numix-icon-theme                              \t31098\n12.\tgnome/faenza-icon-theme                           \t29400\n13.\tdoc/vtk6-doc                                      \t29370\n14.\tdoc/vtk7-doc                                      \t28640\n15.\tscience/esys-particle                             \t27143\n16.\tx11/mate-icon-theme-faenza                        \t26494\n17.\tscience/gromacs-data                              \t24491\n18.\tgnome/ukui-themes                                 \t22123\n19.\tpython/python3-azure                              \t20785\n20.\tnet/oca-core                                      \t20780\n21.\tdoc/lazarus-doc-2.0                               \t20484\n22.\tfonts/jsmath-fonts                                \t20129\n23.\tdevel/rust-src                                    \t19464\n24.\tdoc/pike8.0-doc                                   \t18487\n25.\tlisp/racket-common                                \t18197\n```\n\n### Getting `amd64` packages, with `udeb` files included\n```\n$ packstats --include-udeb amd64\n\nNo.\tPackage Name                                      \tFile Count\n1.\tfonts/fonts-cns11643-pixmaps                      \t110999\n2.\tx11/papirus-icon-theme                            \t69475\n3.\tfonts/texlive-fonts-extra                         \t65577\n4.\tgames/flightgear-data-base                        \t62463\n5.\tdevel/piglit                                      \t49913\n6.\tdoc/trilinos-doc                                  \t49591\n7.\tx11/obsidian-icon-theme                           \t48829\n8.\tgames/widelands-data                              \t34984\n9.\tdoc/libreoffice-dev-doc                           \t33667\n10.\tmisc/moka-icon-theme                              \t33326\n```\n\n### Redirecting Downloaded Files to `/tmp`\n\n```\n$ packstats -o /tmp armel\nNo.\tPackage Name                                      \tFile Count\n1.\tfonts/fonts-cns11643-pixmaps                      \t110999\n2.\tx11/papirus-icon-theme                            \t69475\n3.\tfonts/texlive-fonts-extra                         \t65577\n4.\tgames/flightgear-data-base                        \t62463\n5.\tdevel/piglit                                      \t49913\n6.\tdoc/trilinos-doc                                  \t49591\n7.\tx11/obsidian-icon-theme                           \t48829\n8.\tgames/widelands-data                              \t34984\n9.\tdoc/libreoffice-dev-doc                           \t33667\n10.\tmisc/moka-icon-theme                              \t33326\n```\nThis downloads all files in the given directory. In this case: `/tmp`.\n\n### Reusing Downloaded Files\n\nIf you want to preserve your bandwidth while testing this tool, like I did while developing it,\ntry the `-r` flag, which reuses the Contents Index files for the architecture, if it has already been downloaded into the directory.\n\n```\n$ packstats -r amd64\n```\n\n### Attempting to Get Stats for an Incorrect Arch\nIf a user attempts to get the statistics for an architecture that does not exist,\nthey will see the following error.\n\n```\n$ packstats intel\nTraceback (most recent call last):\n  File \"/home/stonecharioteer/code/tools/anaconda3/envs/py36/lib/python3.6/runpy.py\", line 193, in _run_module_as_main\n    \"__main__\", mod_spec)\n  File \"/home/stonecharioteer/code/tools/anaconda3/envs/py36/lib/python3.6/runpy.py\", line 85, in _run_code\n    exec(code, run_globals)\n  File \"/home/stonecharioteer/code/interview/canonical/packstats/__main__.py\", line 5, in \u003cmodule\u003e\n    cli_main()\n  File \"/home/stonecharioteer/code/interview/canonical/packstats/packstats.py\", line 218, in cli_main\n    reuse_if_exists=args.reuse_if_exists,\n  File \"/home/stonecharioteer/code/interview/canonical/packstats/packstats.py\", line 146, in main\n    f\"{arch} was not found in the given mirror. Available architectures are: {found_architectures}\")\npackstats.exceptions.ContentIndexForArchitectureNotFound: intel was not found in the given mirror. Available architectures are: amd64, arm64, armel, armhf, i386, mips, mips64el, mipsel, ppc64el, s390x, source\n```\n\n\n## Development and Testing\n\nTo understand how `packstats` is implemented, I recommend beginning with `packstats.packstats.cli_main`,\nwhich builds an `argparse` command line interface.\n\nFrom there, the arguments are sent to `packstats.packstats.main`. This builds a workflow with the internal\nfunctions which are self-explanatory if you look at the docstrings.\n\nFor the formatting, I always auto-run autopep8 on my code, and then check the score with pylint which\nlints as I type in VS Code.\n\n### Testing\n\nThe core functions of `packstats` have tests.\n\n```\n$ python setup.py test\nrunning test\nrunning egg_info\nwriting canonical_vinay_packstats.egg-info/PKG-INFO\nwriting dependency_links to canonical_vinay_packstats.egg-info/dependency_links.txt\nwriting top-level names to canonical_vinay_packstats.egg-info/top_level.txt\nreading manifest file 'canonical_vinay_packstats.egg-info/SOURCES.txt'\nwriting manifest file 'canonical_vinay_packstats.egg-info/SOURCES.txt'\nrunning build_ext\ntest_downloads_content_file (tests.test_packstats.TestPackStats)\nthe application can download the right contents file given an architecture ... ok\ntest_get_content_index_file_url (tests.test_packstats.TestPackStats)\nthe application can get the content index file url given the architecture and whether or not udeb files are needed ... ok\ntest_get_content_index_file_url_with_udeb (tests.test_packstats.TestPackStats)\nthe application can get the content index file url for both the base arch ... ok\ntest_gets_package_data_from_contents (tests.test_packstats.TestPackStats)\nthe application can get the package data from a contents file ... ok\ntest_lists_architectures (tests.test_packstats.TestPackStats)\nthe application can list the architectures in the provided debian mirror ... ok\n\n----------------------------------------------------------------------\nRan 5 tests in 35.787s\n\nOK\n```\n\nThe tests can also be run with `pytest`, should you choose to install it.\n\n\n## Profiling with `py-spy`\n\n`py-spy` offers a great report for the profiling. Note that `py-spy` needs to be installed separately using `pip`.\n\n```\n$ py-spy top -- python package_statistics.py amd64\n\nCollecting samples from 'python package_statistics.py amd64' (python v3.6.10)\nTotal Samples 2600\nGIL: 100.00%, Active: 100.00%, Threads: 1\n\n  %Own   %Total  OwnTime  TotalTime  Function (filename:line)\n 53.00%  53.00%    5.30s     5.30s   parse_contents_index (packstats/packstats.py:111)\n  0.00%   0.00%    2.10s     2.10s   decode (codecs.py:321)\n  0.00%   0.00%    1.89s     3.99s   parse_contents_index (packstats/packstats.py:101)\n 14.00%  14.00%    1.80s     1.80s   parse_contents_index (packstats/packstats.py:114)\n 13.00%  13.00%    1.49s     1.49s   parse_contents_index (packstats/packstats.py:112)\n  0.00%   0.00%    1.45s     1.45s   read (gzip.py:471)\n  0.00%   0.00%    1.17s     1.17s   parse_contents_index (packstats/packstats.py:103)\n  7.00%   7.00%   0.680s    0.680s   parse_contents_index (packstats/packstats.py:118)\n  5.00%   5.00%   0.630s    0.630s   parse_contents_index (packstats/packstats.py:106)\n  0.00%   0.00%   0.590s    0.590s   _add_read_data (gzip.py:490)\n  0.00%   0.00%   0.500s    0.500s   readinto (socket.py:586)\n  0.00%   0.00%   0.400s     2.71s   read (gzip.py:276)\n  4.00%   4.00%   0.320s    0.320s   parse_contents_index (packstats/packstats.py:113)\n  0.00%   0.00%   0.320s    0.320s   parse_contents_index (packstats/packstats.py:116)\n  3.00%   3.00%   0.260s    0.260s   parse_contents_index (packstats/packstats.py:110)\n  1.00%   1.00%   0.230s    0.230s   parse_contents_index (packstats/packstats.py:105)\n  0.00%   0.00%   0.100s    0.100s   read (gzip.py:91)\n  0.00%   0.00%   0.060s    0.060s   download_contents_file (packstats/packstats.py:93)\n  0.00%   0.00%   0.050s    0.060s   parse_contents_index (packstats/packstats.py:100)\n  0.00%   0.00%   0.040s     3.48s   main (packstats/packstats.py:153)\n  0.00%   0.00%   0.040s    0.040s   read (gzip.py:472)\n  0.00%   0.00%   0.030s    0.030s   readinto (socket.py:580)\n  0.00%   0.00%   0.030s    0.030s   create_connection (socket.py:713)\n  0.00%   0.00%   0.030s    0.030s   read (gzip.py:83)\n  0.00%   0.00%   0.020s    0.020s   __setitem__ (enum.py:90)\n  0.00%   0.00%   0.020s    0.020s   download_contents_file (packstats/packstats.py:92)\n  0.00%   0.00%   0.020s    0.020s   read (gzip.py:90)\n  0.00%   0.00%   0.020s    0.030s   __instancecheck__ (abc.py:184)\n  0.00%   0.00%   0.020s    0.610s   read (gzip.py:485)\n  0.00%   0.00%   0.020s    0.020s   __instancecheck__ (abc.py:189)\n  0.00%   0.00%   0.010s    0.010s   prepend (gzip.py:94)\n  0.00%   0.00%   0.010s     2.72s   download_contents_file (packstats/packstats.py:91)\n  0.00%   0.00%   0.010s    0.530s   _safe_read (http/client.py:622)\n  0.00%   0.00%   0.010s    0.010s   download_contents_file (packstats/packstats.py:87)\n  0.00%   0.00%   0.010s    0.010s   read (gzip.py:486)\n  0.00%   0.00%   0.010s    0.060s   open (gzip.py:52)\n  0.00%   0.00%   0.010s    0.010s   read (gzip.py:88)\n  0.00%   0.00%   0.010s    0.010s   _compile (re.py:289)\n  0.00%   0.00%   0.010s    0.010s   prepend (gzip.py:95)\n  0.00%   0.00%   0.010s    0.010s   _is_descriptor (enum.py:25)\n  0.00%   0.00%   0.010s    0.010s   open (gzip.py:51)\n  0.00%   0.00%   0.010s    0.060s   _call_with_frames_removed (\u003cfrozen importlib._bootstrap\u003e:219)\n\nPress Control-C to quit, or ? for help.\n\nprocess 85674 ended\n```\n\nThe maximum time was spent within the `parse_contents_index` function, which can be optimized by using third-party packages such as `numba` or a different version of Python such as `PyPy`.\n\nThese performance metrics are from a desktop workstation with the following specs:\n\n* CPU: Intel i5-2310 (4) @ 3.200GHz\n* GPU: NVIDIA GeForce GTX 1060 3GB\n* RAM: 15965MiB\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstonecharioteer%2Fcanonical-interview-test","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstonecharioteer%2Fcanonical-interview-test","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstonecharioteer%2Fcanonical-interview-test/lists"}