Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mconlon17/vivo-person-ingest
Ingest people to VIVO
https://github.com/mconlon17/vivo-person-ingest
Last synced: 11 days ago
JSON representation
Ingest people to VIVO
- Host: GitHub
- URL: https://github.com/mconlon17/vivo-person-ingest
- Owner: mconlon17
- License: bsd-3-clause
- Created: 2014-11-11T22:28:04.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2014-11-11T22:49:38.000Z (about 10 years ago)
- Last Synced: 2023-08-05T04:38:04.013Z (over 1 year ago)
- Language: Python
- Size: 6.69 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.txt
- License: LICENSE
Awesome Lists containing this project
README
h1. Running Person Ingest
1) Using a SSH client connect to vivostagingweb.vivo.ufl.edu
2) In your home directory (or directory of your choosing) run the following
git clone [email protected]:vivo-people-ingest.git
3) CD into the project4) Start a screen using
screen -S ingest
5) Screen will start and the session will be named ingest, this will be important since you may drop connection to the server while the ingest is running6) All the necessary files will be in the repo and the next step is to simply run the ingest by
./run-me-ingest.sh7) If there are any missing libraries of python, install them using below instructions, else proceed to step 8
sudo easy_install
example :sudo easy_install tempita
sudo easy_install python-dateutil
The libraries should be installed in the site-packages under _"/Library/Python/2.7/site-packages"_8) It will ask you to delete the old output files from the previous run, hit 'y' for the 4/5 files
9) The ingest is now running
10) Detach from the screen using control-A then D
11) To reattach and periodically check the progress type
screen -r ingest12) After the ingest is run do a git status and there should be 4/5 files to upload
* contact_data.pcl
* people_add.rdf
* people_sub.rdf
* person-ingest.log
* privacy_data.pci13) Upload the output files to the repo
14) Pull the files onto your local machine so that you can access them
15) Go to vivo.ufl.edu and login
16) Go to Site Admin and Add/Remove RDF data
17) Select Add instance data, and that RDF/XML is selected from the drop down menu
18) Click choose file and select the people_add.rdf and click submit
19) Wait for the browser to time out, this may or not signal the end of the add. Open the people_add.rdf and look for a change near the end of the file and check this against the site to see if the change is reflected. If everything looks good you can continue with the sub.
20) Select the Remove Mixed RDF radio button and choose the people_sub.rdf file and click submit. This should take substantially less time. Again check for a change near the end of the sub file and compare it to the site. If everything checks out the ingest for the week is complete.
h2.ADDENDUM
contact_data.csv
It is pipe '|' delimited and contains headers. Contains about 2 Million rows
The following information is stored on the CTSI-SRVTASK11 server in the VIVO_DATA.dbo.VIVO_PERSON_INGEST
UFID - nvarchar
FIRST_NAME - nvarchar
LAST_NAME - nvarchar
MIDDLE_NAME - nvarchar
NAME_PREFIX - nvarchar
NAME_SUFFIX - nvarchar
DISPLAY_NAME - nvarchar
GATORLINK - nvarchar
JOBTITLE - nvarhcar
LONGTITLE - nvarchar
UF_BUSINESS_EMAIL - nvarchar
UF_BUSINESS_PHONE - nvarchar
UF_BUSINESS_FAX - nvarchar
WORKINGTITLE - nvarcharThe contact_data.csv file contains the following:
UFID
FIRST_NAME
LAST_NAME
MIDDLE_NAME
NAME_PREFIX
NAME_SUFFIX
DISPLAY_NAME
GATORLINK
WORKINGTITLE
UF_BUSINESS_EMAIL
UF_BUSINESS_PHONE
UF_BUSINESS_FAX------------------------------------------
privacy_data.csv
It is pipe '|' delimited and contains headers. Contains about 2 Million rows
The following information is stored on the CTSI-SRVTASK11 server in the VIVO_DATA.dbo.tbl_privacy_dump
UFID - varchar
UF_SECURITY_FLG - varchar
UF_PROTECT_FLG - varchar
UF_PUBLISH_FLG - varcharThe privacy_data.csv contains the following:
UFID
UF_SECURITY_FLG
UF_PROTECT_FLG
UF_PUBLISH_FLG-------------------------------------------
position_data.csv
It is pipe '|' delimited and contains headers. Contains about 41k rowsThe following information is stored on the CTSI-SRVTASK11 server in the VIVO_DATA.dbo.tbl_hr_data
DEPTID - varchar
UFID - varchar
JOBCODE - varchar
START_DATE - date
END_DATE - date
JOBCODE_DESCRIPTION - varchar
SAL_ADMIN_PLAN - varcharThe position_data.csv contains the following:
DEPTID
UFID
JOBCODE - No leading 0's
START_DATE
END_DATE
JOBCODE_DESCRIPTION
SAL_ADMIN_PLANYou need to delete the following files before each run of the person-ingest.py
Delete all the pcl files
*.pcl
Delete all the people_ filespeople_add.rdf
people_exc.lst
people_sub.rdfok...