An open API service indexing awesome lists of open source software.

https://github.com/somdipdey/convert_uk_tier2_tier5_sponsorpdf_to_xml

Convert UK Tier 2 & Tier 5 Work Sponsor list in PDF to XML structured file
https://github.com/somdipdey/convert_uk_tier2_tier5_sponsorpdf_to_xml

pdf python xml xml-generation

Last synced: 26 days ago
JSON representation

Convert UK Tier 2 & Tier 5 Work Sponsor list in PDF to XML structured file

Awesome Lists containing this project

README

          

# Convert_UK_Tier2_Tier5_SponsorPDF_To_XML
Convert UK Tier 2 & Tier 5 Work Sponsor list in PDF to XML structured file

Download and execute the file: Create UK-Tier2-Tier5-SponsorList-PDF-To-XML.py in order to download, format & convert the sponsor list in PDF to XML.

There are few important dependecies that need to installed in the system or else the program won't execute properly.

# Dependecies Required

### PDFTOHTML
You need to install pdftohtml on the system.

It can be installed with the following command:

>> brew install pdftohtml

This adds pdftohtml to your path.

Website of PDFTOHTML: http://pdftohtml.sourceforge.net

### PDFtk
You also need to install PDFtk Server to format the PDF file properly so that it can be converted to structured XML format.

It can be sintalled from the following web link:

https://www.pdflabs.com/tools/pdftk-server/

Simply choose the version based on your operating system and install. The path for 'pdftk' will be automatically added.

Website of PDFtk: https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/