{"id":27915453,"url":"https://github.com/aabbtree77/imdb-sqlite-queries","last_synced_at":"2026-01-23T18:34:02.334Z","repository":{"id":289562467,"uuid":"971668113","full_name":"aabbtree77/imdb-sqlite-queries","owner":"aabbtree77","description":"Actor-director collab analysis based on imdb-sqlite.","archived":false,"fork":false,"pushed_at":"2025-04-24T10:07:28.000Z","size":265,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-06T16:05:29.631Z","etag":null,"topics":["analytics","chatgpt","counting","film-directors","h-index","imdb","imdb-sqlite","queries","relational-database","sql","sqlite3"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aabbtree77.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-23T21:57:51.000Z","updated_at":"2025-04-24T10:12:36.000Z","dependencies_parsed_at":"2025-04-23T23:26:00.534Z","dependency_job_id":"16d7589d-b51d-465c-b49f-b93b7442241b","html_url":"https://github.com/aabbtree77/imdb-sqlite-queries","commit_stats":null,"previous_names":["aabbtree77/imdb-sqlite-queries"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/aabbtree77/imdb-sqlite-queries","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aabbtree77%2Fimdb-sqlite-queries","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aabbtree77%2Fimdb-sqlite-queries/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aabbtree77%2Fimdb-sqlite-queries/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aabbtree77%2Fimdb-sqlite-queries/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aabbtree77","download_url":"https://codeload.github.com/aabbtree77/imdb-sqlite-queries/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aabbtree77%2Fimdb-sqlite-queries/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28697428,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-23T17:25:48.045Z","status":"ssl_error","status_checked_at":"2026-01-23T17:25:47.153Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analytics","chatgpt","counting","film-directors","h-index","imdb","imdb-sqlite","queries","relational-database","sql","sqlite3"],"created_at":"2025-05-06T15:54:32.981Z","updated_at":"2026-01-23T18:34:02.313Z","avatar_url":"https://github.com/aabbtree77.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003e \"Is that you, John Wayne? Is this me?\"\n\u003e\n\u003e — *[Full Metal Jacket (1987)](https://youtu.be/sUIzoiMCp-Q?t=89)*\n\n## Introduction\n\n[imdb-sqlite](https://github.com/jojje/imdb-sqlite) allows to download the public imdb data set and store it as a single SQLite file. \n\nName it \"imdb.db\" (19GB), install sqlite3, and extract the schema:\n\n```bash\nsudo apt install sqlite3\nsqlite3 --version\n3.37.2 2022-01-06\nsqlite3 imdb.sqlite .schema \u003e imdb_schema.sql\n```\n\nOne can then give the schema to an AI to build advanced queries. \n\nThey won't be available on [imdb.com](https://www.imdb.com/). The query may take more than 5 minutes to execute, may produce 80MB file with a million rows. Such queries are too costly for web APIs, but they work well locally.\n\nBelow I present two such queries (SQL scripts).\n\n## query_h.sql\n\n\"Show SQL query which ranks directors via \"h-index\" where instead of h publications and h citations of them we get the same h actors in h films of a given director. Show row numbers too. Restrict to solid films such as non-adult, no anime, no cartoons. Add the total film counter column used for each director.\"\n\n\"This does not feel right, I get James Bobin with h index 6 and 4 films. Remember, the h index is the maximal number of h films produced by the same director with the same actor set of size h.\"\n\nAfter a few more iterations ChatGPT produces the desired result.\n\n```bash\nsqlite3 imdb.db \u003c query_h.sql \u003e top_h.txt\n```\n\nNeglecting the outliers at the top, these \"h-sets\" turn out to be unexpectedly tiny:\n\n```text\n1098  Woody Allen                   3                       51  \n1111  Jean-Luc Godard               3                       44 \n1142  Terence Young                 3                       36         \n1143  Steven Spielberg              3                       35\n1304  Éric Rohmer                   3                       24\n...\n```\n\nOut of 44 films directed by Jean-Luc Godard and selected with the filter (type=movies, duration\u003e60min), only 3 films feature the same trio of actors. The film crews are not theatrical troupes at all. They have no permanence.\n\n## query_duo.sql\n\n\"Now do the same but instead of h-index show the maximal number of films with the same actor per director, also show that actor and total film count by director as in the script above. Add the second best entry if the director and actor are the same.\"\n\nIgnoring the outliers, the appearances of the favorites are also not as big as one would have expected:\n\n```text\n23     Woody Allen                           Woody Allen                      27              51  \n843    Jean-Luc Godard                       Jean-Luc Godard                  7               44\n1775   Jean-Luc Godard                       László Szabó                     5               44\n1779   Werner Herzog                         Klaus Kinski                     5               41  \n1792   Steven Spielberg                      Tom Hanks                        5               35\n2883   Éric Rohmer                           Pascal Greggory                  4               24  \n4759   Woody Allen                           John Doumanian                   3               51 \n...\n```\n\nMany famous tandems turn out to be of very mild symbolic value if exist at all. The superstar hyper-productive directors largely work as lonesome wolves. Further analysis reveals that no same actress has ever played more than twice in Éric Rohmer's films!\n\n## Notes\n\n* Comment out `LIMIT` lines in the both files to get a complete output.\n\n* AI can do rating trend analysis, but I do not find this interesting.\n\n* There is no data for per country analysis.\n\n* A small total number of films with relatively high h-index, e.g. Hal Hartley's 12 and 4 resp., might indicate unusual quality.\n\n* How to reveal little-known outstanding films such as Vincent Gallo's Buffalo '66 (1998)?!\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faabbtree77%2Fimdb-sqlite-queries","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faabbtree77%2Fimdb-sqlite-queries","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faabbtree77%2Fimdb-sqlite-queries/lists"}