{"id":43548517,"url":"https://github.com/amalan-constat/presidentialelection","last_synced_at":"2026-02-03T19:09:34.263Z","repository":{"id":143718490,"uuid":"200527520","full_name":"Amalan-ConStat/PresidentialElection","owner":"Amalan-ConStat","description":"Sri Lanka Presidential Election data","archived":false,"fork":false,"pushed_at":"2019-08-05T10:17:28.000Z","size":12328,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-09-09T04:59:53.122Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Amalan-ConStat.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2019-08-04T18:24:55.000Z","updated_at":"2024-09-08T10:38:05.000Z","dependencies_parsed_at":null,"dependency_job_id":"e7e8bd66-64cd-4692-a863-54d0d2401fde","html_url":"https://github.com/Amalan-ConStat/PresidentialElection","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Amalan-ConStat/PresidentialElection","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Amalan-ConStat%2FPresidentialElection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Amalan-ConStat%2FPresidentialElection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Amalan-ConStat%2FPresidentialElection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Amalan-ConStat%2FPresidentialElection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Amalan-ConStat","download_url":"https://codeload.github.com/Amalan-ConStat/PresidentialElection/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Amalan-ConStat%2FPresidentialElection/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29054127,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-03T15:43:47.601Z","status":"ssl_error","status_checked_at":"2026-02-03T15:43:46.709Z","response_time":96,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-03T19:09:33.479Z","updated_at":"2026-02-03T19:09:34.254Z","avatar_url":"https://github.com/Amalan-ConStat.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\ntitle: \"Presidential Election Data\"\noutput: github_document\n---\n\n```{r setup, include=FALSE}\nknitr::opts_chunk$set(echo = TRUE)\n```\n\nMy motivation to spend time on this project is that no one \nhas done it yet and also it would be useful for the presidential \nelection which will happen before the end of year 2019. \n\n# Process of Harvesting Presidential Election data.\n\nThis is a six step process. Each step is crucial and necessary, also \nexplained thoroughly using directories and markdown files. \n\nThe steps are\n\n* Step 1 - Find Data.\n* Step 2 - Extract Data.\n* Step 3 - Validate Data.\n* Step 4 - Clean the Data.\n* Step 5 - Develop CSV files.\n* Step 6 - Develop final Data-frame and CSV file.\n\n## Step 1 : Find Data\n\nSri Lanka does have an open gov data [website](http://www.data.gov.lk)\nbut there is no data for the topic in concern. The topic in concern is \nhow people have voted in the presidential elections so far(from 1982 until mid 2019),\nSeven Presidential elections. Thankfully the data that I need was in the Elections Commission\n[website](https://elections.gov.lk/web/en/elections/elections-results/presidential-elections-results/).\n\nUnfortunately the data is not in csv, excel, text or any other usable format. There are \nseven pdf files which has the data I desire. This data is much more accurate and official \nis what I expected. Yet there were some disappointments.\n\n## Step 2 : Extract Data\n\nExtracting data from these seven files is my only way of forming a meaningful data-frame.\nThis is called pdf scraping. I started this process from year 2015 but down the years it\nbecame very difficult. The main reason for this could be that we did not have advanced \ntechnological tools such as computers or spread sheets. With some hustle and creative \nthinking I was able to do the data extraction successfully I might add. \n\nThe folders Year1982, Year1988, Year1994, Year1999, Year2005, Year2005,\nYear2010 and Year2015 include clear information regarding the data extraction process. \nEach folder has the pdf file that data needs to be extracted, R script used for data \nextraction, a markdown file explaining the data extraction and validation, and finally,\nfew figures showing anomalies occur in the pdf file which needs to be rectified.\n\nThe process is clearly explained to each year separately in each folder, with precise \ninstructions and explanations. It should be noted that this is the only time extract the \ndata. But in perspective of resolving misprints or miscalculations in the pdf file there \nwill be few coordinated attempts but successful.\n\n## Step 3 : Validate the Data\n\nThe first attempt will occur with data extraction. Mainly Sri Lanka has 160 electorates\nwhich are divided into 22 districts. The pdf files follow the format of summarizing the\nFinal district results after mentioning their respective electorates. This gives us the \nopportunity to compare the district(22 districts) tallying and electorate tallying(160)\nfor Registered Electors, Polled Votes, Valid Votes, Rejected Votes and Candidates votes.\n\nSome occasions this tallying does not produce same values for collection of all districts \nand electorates, which means there might me a printing error or a miscalculation. This is \nrectified at the first level. \n\n## Step 4 : Clean the Data\n\nAfter the rectification it is necessary to clean the data. This process is explained \nin the ExtractedData directory with a markdown file. Work here includes setting variable\ntypes for columns and rectifying further.\n\nWhich includes the issue of maintaining unity in districts and electorates \nnames, spellings and case sensitiveness throughout the seven presidential elections.\nAlso other information such as Registered Electors, Valid Votes, Polled Votes and Rejected\nVotes should contain same variable names throughout the seven presidential elections.\n\nFinally check if percentages are in their domain, which is in between 0 and 100. Further,\ncheck if there are missing values and if so take necessary action. After resolving this\nwe have a clear individual data-frame for each election. It should be noted that\nthere are three markdown files for three types of table formats from the pdf files. \nThese table formats focus on the votes and percentage values given. \n\n## Step 5 : Develop CSV files\n\nAfter much tiresome work of data extraction and cleaning now we have clean individual\ndata-frames. They can be produced as CSV files for each presidential election.\n\n## Step 6 : Develop final Data-frame and CSV file\n\nWhile we do have 7 files it would be wiser to combine them and produce one final data-frame.\nThis also can be done by R functions and is done in the directory Final Data with the instructions\nincluded in a markdown file. The Final data-frame is named as Final.csv in the directory.\n\n# Suggestions To Make Election Data collection Easier\n\n1. Summary results should be included in the pdf reports as pre 2010 election files.\n2. Should develop a software which will make election data collection easier. \n3. This software should have tests to ensure that values or calculations are possible values\nin their respective domains.\n\nFor Example to Votes : \n\n* Total Polled = Total Valid + Total Rejected\n\nand \n\n* Total Valid = Votes casted to Candidate A + Votes casted to Candidate ... \n\nAlso for Percentages are generated by below equations for the below vote values :\n\n* Total Polled % = (Total Polled/ Total Registered) * 100 \n* Total Valid % = (Total Valid/ Total Polled) * 100\n* Total Rejected % = (Total Rejected/ Total Polled) * 100\n* Candidate A % = (Candidate A/ Total Polled ) *100\n\nBelow is a image of a sample extract table for an electorate and we can clearly\nsee the percentages are in order as mentioned above.\n\n![](Fig1.JPG)\n\n4. Not only pdf files we should be able to provide usable data\nfiles such as csv or excel or text. \n\n# Future Plans for the Data\n\nMost coolest and useful thing to do with this data-frame is to develop\nan Rshiny app which can \n\n1. Provide functions to download data with specific interests. \n  * Electorate data throughout all the elections.\n  * District data throughout all the elections. \n  * Complete data frame for all the seven years.\n  * Data-frame for a specific year. \n\n2. Provide data visualizations under meaningful themes as 2 dimensional or \n3 dimensional plots. \n\n# Future Similar Projects based on pdf scraping\n\n1. Sri Lanka has parliament elections since 1947, they also\ncan be extracted. \n\n2. Provincial Council Election results from 1999 until now can be extracted. \n\n3. Local Authorities Election results from 2002 until now can be extracted.\n\nIt took me 3 weeks to complete this project to its entirety and I feel very\nsuccessful and confident that it would be useful. \n\n**Note** \nI wrote an example [markdown file](https://github.com/Amalan-ConStat/PresidentialElection/blob/master/Final%20Data/Example.md) \nwith the full data frame and it has some interesting plots. Also below are a few \nplots from this markdown file. \n\n![](Fig2.png)\n\n![](Fig3.gif)\n\nAt the beginning of this project I wrote a [blog post](https://amalan-con-stat.netlify.com/post/slelection/presidential-election/2015/2015/)\nregarding extracting data from the pdf file of year 2015.\n\n*THANK YOU*","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famalan-constat%2Fpresidentialelection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famalan-constat%2Fpresidentialelection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famalan-constat%2Fpresidentialelection/lists"}