https://github.com/somdipdey/convert_uk_tier2_tier5_sponsorpdf_to_xml
Convert UK Tier 2 & Tier 5 Work Sponsor list in PDF to XML structured file
https://github.com/somdipdey/convert_uk_tier2_tier5_sponsorpdf_to_xml
pdf python xml xml-generation
Last synced: 26 days ago
JSON representation
Convert UK Tier 2 & Tier 5 Work Sponsor list in PDF to XML structured file
- Host: GitHub
- URL: https://github.com/somdipdey/convert_uk_tier2_tier5_sponsorpdf_to_xml
- Owner: somdipdey
- Created: 2018-01-07T18:43:05.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2018-01-07T19:05:58.000Z (almost 8 years ago)
- Last Synced: 2025-04-09T17:58:05.008Z (7 months ago)
- Topics: pdf, python, xml, xml-generation
- Language: Python
- Size: 8.79 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Convert_UK_Tier2_Tier5_SponsorPDF_To_XML
Convert UK Tier 2 & Tier 5 Work Sponsor list in PDF to XML structured file
Download and execute the file: Create UK-Tier2-Tier5-SponsorList-PDF-To-XML.py in order to download, format & convert the sponsor list in PDF to XML.
There are few important dependecies that need to installed in the system or else the program won't execute properly.
# Dependecies Required
### PDFTOHTML
You need to install pdftohtml on the system.
It can be installed with the following command:
>> brew install pdftohtml
This adds pdftohtml to your path.
Website of PDFTOHTML: http://pdftohtml.sourceforge.net
### PDFtk
You also need to install PDFtk Server to format the PDF file properly so that it can be converted to structured XML format.
It can be sintalled from the following web link:
https://www.pdflabs.com/tools/pdftk-server/
Simply choose the version based on your operating system and install. The path for 'pdftk' will be automatically added.
Website of PDFtk: https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/