Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/marco-gallegos/sqoopit
A python package that lets you sqoop into HDFS/Hive/HBase data from RDBMS using sqoop
https://github.com/marco-gallegos/sqoopit
hadoop hbase hdfs hive py python python3 sqoop sqoop-import
Last synced: 10 days ago
JSON representation
A python package that lets you sqoop into HDFS/Hive/HBase data from RDBMS using sqoop
- Host: GitHub
- URL: https://github.com/marco-gallegos/sqoopit
- Owner: marco-gallegos
- License: mit
- Fork: true (lucafon/pysqoop)
- Created: 2020-02-21T19:02:43.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2020-04-18T08:16:41.000Z (almost 5 years ago)
- Last Synced: 2024-10-01T19:31:50.746Z (4 months ago)
- Topics: hadoop, hbase, hdfs, hive, py, python, python3, sqoop, sqoop-import
- Language: Python
- Homepage:
- Size: 1.7 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# sqoop-it
A python package that lets you sqoop into HDFS/Hive/HBase data from RDBMS using sqoop.[![PyPI](https://img.shields.io/badge/pip-v.20.0.1-blue.svg)](https://github.com/marco-gallegos/sqoopit)
![Python](https://img.shields.io/badge/python-3.5+,2.7-green.svg)
[![MIT license](http://img.shields.io/badge/license-MIT-orange.svg)](http://opensource.org/licenses/MIT)To install the package via pip, run
`
pip install sqoopit
`You can then use the package using
```python
from sqoopit.SqoopImport import Sqoop
sqoop = Sqoop(help=True)
code = sqoop.perform_import()
```This will print the output of the command
`
sqoop --help
`to your stoud; e.g.
```
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.3.0-235/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.3.0-235/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/08/13 20:25:13 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.6.3.0-235
usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS]Common arguments:
--connect Specify JDBC
connect
string
--connection-manager Specify
connection
manager
class name
...
```#### Useful Resources
* HBase Client for Python : [happybase](https://github.com/python-happybase/happybase/blob/master/doc/index.rst)
#### A more concrete example
The following code
```python
sqoop = Sqoop(fs='hdfs://remote-cluster:8020', hive_drop_import_delims=True, fields_terminated_by='\;',
enclosed_by='\'"\'', escaped_by='\\\\', null_string='\'\'', null_non_string='\'\'',
table='sample_table', target_dir='hdfs://remote-cluster/user/hive/warehouse/db/sample_table',
delete_target_dir=True, connect='jdbc:oracle:thin:@//your_ip:your_port/your_schema',
username='user', password='pwd', num_mappers=2,
bindir='/path/to/bindir/folder')sqoop.perform_import()
```will execute the following command
`
sqoop import -fs hdfs://remote-cluster:8020 --hive-drop-import-delims --fields-terminated-by \; --enclosed-by \'\"\' --escaped-by \\\\ --null-string \'\' --null-non-string \'\' --table sample_table --target-dir hdfs://remote-cluster/user/hive/warehouse/db/sample_table --delete-target-dir --connect jdbc:oracle:thin:@//your_ip:your_port/your_schema --username user --password pwd --num-mappers 2 --bindir /path/to/bindir/folder
`#### Conditional Building
Use the `set_param`, `unset_param` function to build conditioned sqoop imports.
```python
sqoop = Sqoop(table="MyTable")
sqoop.set_param(param="--connect", value="jdbc:a_valid_string")
if taget_is_hbase :
added_table = sqoop.set_param(param="--hbase-table", value="MyTable")
added_key = sqoop.set_param(param="--hbase-row-key", value="Id_MyTable")
if added_table and added_key:
print("all params added :D")sqoop.perform_import()
```### Doing
* handle sqoop jobs
* more tests coverage### TODOs
* add missing parameters
Original Idea By [Luca Fontanili](https://github.com/lucafon/pysqoop)