{"id":15913076,"url":"https://github.com/x-tabdeveloping/visual-analytics-assignment4","last_synced_at":"2025-10-28T16:33:41.677Z","repository":{"id":236061147,"uuid":"791827586","full_name":"x-tabdeveloping/visual-analytics-assignment4","owner":"x-tabdeveloping","description":"Fourth assignment for visual analytics.","archived":false,"fork":false,"pushed_at":"2024-04-25T16:16:36.000Z","size":570,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-08T17:14:38.283Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/x-tabdeveloping.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-25T12:58:53.000Z","updated_at":"2024-04-25T16:16:40.000Z","dependencies_parsed_at":null,"dependency_job_id":"2a421bb4-298f-40d2-93ab-ed12fc842a8f","html_url":"https://github.com/x-tabdeveloping/visual-analytics-assignment4","commit_stats":null,"previous_names":["x-tabdeveloping/visual-analytics-assignment4"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/x-tabdeveloping%2Fvisual-analytics-assignment4","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/x-tabdeveloping%2Fvisual-analytics-assignment4/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/x-tabdeveloping%2Fvisual-analytics-assignment4/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/x-tabdeveloping%2Fvisual-analytics-assignment4/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/x-tabdeveloping","download_url":"https://codeload.github.com/x-tabdeveloping/visual-analytics-assignment4/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246927844,"owners_count":20856198,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-06T16:22:58.430Z","updated_at":"2025-10-28T16:33:41.565Z","avatar_url":"https://github.com/x-tabdeveloping.png","language":"Python","readme":"# visual-analytics-assignment4\nFourth assignment for visual analytics in Cultural Data Science.\nThe assignment is oriented at figuring out how the number of faces in the newspapers \"Genève\", \"Gazette de Lausanne\" and \"Impartial\" changes over time.\n\n\n## Usage\n\nThe dataset is publicly available at [Zenodo](https://zenodo.org/records/3706863).\n\nThe `images.zip` file should be downloaded from the source and unzipped in a `dat/` directory.\n```bash\nmkdir dat\nmv images.zip dat\ncd dat\nunzip images.zip\n```\n\nThe folder structure should look like this:\n```\n- dat/\n    - images/\n        - GDL\n            - GDL-1997-07-05-a-p0020.jpg\n            ...\n        - IMP\n        - JDG\n```\n\n#### Install requirements:\n\n```bash\npip install -r requirements.txt\n```\n\n#### Processing \n\nTo find all faces in each page in the dataset run:\n\n```bash\npython3 src/count_faces.py\n```\n\nThis will put the results in the `results/faces.jsonl` file.\nEach record in the file contains metadata about the image that was processed, along with bounding boxes for all faces, and the number of faces in the image. \u003cbr\u003e\nExample:\n```json\n{\"type\": \"success\", \"boxes\": [], \"n_faces\": 0, \"journal\": \"JDG\", \"page\": \"p0001\", \"year\": 1950, \"month\": 5, \"day\": 5, \"decade\": 1950, \"file\": \"dat/images/JDG/JDG-1950-05-05-a-p0001.jpg\"}\n```\n\nIf the processing failed, an error record is appended to the file with reason.\u003cbr\u003e\nExample:\n```json\n{\"type\": \"error\", \"reason\": \"image file is truncated (174 bytes not processed)\", \"journal\": \"JDG\", \"page\": \"p0001\", \"year\": 1980, \"month\": 7, \"day\": 30, \"decade\": 1980, \"file\": \"dat/images/JDG/JDG-1980-07-30-a-p0001.jpg\"}\n```\n\n#### Aggregation \n\nTo produce aggregate results for interpretation, run the following:\n\n```bash\npython3 src/aggregate_results.py\n```\n\nThis will output all visualizations into the `figures/` directory like so:\n```\n- figures/\n    - GDL.png\n    - IMP.png\n    - JDG.png\n```\n\nAnd csv files containing aggregate information about each decade (number of faces, percentage of pages with faces on them) in the `results/` directory:\n```\n- results/\n    - GDL.png\n    - IMP.png\n    - JDG.png\n```\n\nExample file:\n```csv\njournal,decade,total_faces,percent_contains_face\nIMP,1880,1,2.941176470588235\nIMP,1890,17,23.52941176470588\nIMP,1900,25,25.0\nIMP,1910,27,32.69230769230769\nIMP,1920,46,45.3125\n...\n```\n\n## Results\n\n### Percentage of pages containing faces\n\nOn all plots we see and upwards tendency, espceially in the nineteenhundreds, in the percentage of pages that contain faces on them.\n\nSurprisingly, in _Gazette de Lausanne_ we can observe a downwards tendency up until about 1830.\nA quick investigation revealed that this is likely due to:\n1. A lower number of articles in each decade. Up until and including the 1830's the mean number of articles per decade was 46, while decadesa after the 1830s on average contain 243. A large portion of these is from GDL.\n2. False positives happen, especially on noiser images, like the ones that were made of older pages.\n\n\n| GDL | IMP | JDG |\n|----------|-----------|-----------|\n|![Bar Plot](figures/GDL.png)|![Line Plot](figures/IMP.png)|![Line Plot](figures/JDG.png)|\n\n\n### Total number of faces\n\nLooking at the total number of faces in recognized on pages reveals similar patterns, starting to rise around 1900 and steadily increasing since.\nWe can also observe that prior to the 1890s no decade had more than ten faces recognized by the model, further indicating the fact that the results we saw around 1800 in GDL are most probably a fluke.\n\n|   decade | GDL   | IMP   | JDG   |\n|----------|-------|-------|-------|\n|     1790 | 3     |       |       |\n|     1800 | 10    |       |       |\n|     1810 | 2     |       |       |\n|     1820 | 1     |       | 0     |\n|     1830 | 2     |       | 1     |\n|     1840 | 0     |       | 1     |\n|     1850 | 2     |       | 2     |\n|     1860 | 1     |       | 4     |\n|     1870 | 1     |       | 5     |\n|     1880 | 1     | 1     | 6     |\n|     1890 | 1     | 17    | 1     |\n|     1900 | 4     | 25    | 11    |\n|     1910 | 4     | 27    | 6     |\n|     1920 | 8     | 46    | 21    |\n|     1930 | 8     | 31    | 32    |\n|     1940 | 10    | 24    | 13    |\n|     1950 | 8     | 105   | 49    |\n|     1960 | 25    | 175   | 44    |\n|     1970 | 14    | 202   | 48    |\n|     1980 | 60    | 343   | 143   |\n|     1990 | 115   | 210   | 190   |\n|     2000 |       | 657   |       |\n|     2010 |       | 699   |       |\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fx-tabdeveloping%2Fvisual-analytics-assignment4","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fx-tabdeveloping%2Fvisual-analytics-assignment4","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fx-tabdeveloping%2Fvisual-analytics-assignment4/lists"}