https://github.com/keanteng/wqd7007-lab-test
Lab Test With Hive
https://github.com/keanteng/wqd7007-lab-test
bash big-data docker hive
Last synced: about 2 months ago
JSON representation
Lab Test With Hive
- Host: GitHub
- URL: https://github.com/keanteng/wqd7007-lab-test
- Owner: keanteng
- Created: 2025-06-13T03:38:28.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-14T04:42:14.000Z (about 1 year ago)
- Last Synced: 2025-06-14T05:29:43.596Z (about 1 year ago)
- Topics: bash, big-data, docker, hive
- Language: Shell
- Homepage:
- Size: 1000 Bytes
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# WQD7007 Lab Test



The lab test is designed to assess your understanding of data cleaning and transformation using Hive. You will be working with a dataset where you are performing various data cleaning operations, and generating insights from the data.
## Using This Repository
- Load the repository into your local environment.
```bash
git clone https://github.com/keanteng/wqd7007-lab-test
```
- To produce the report make sure you have pandoc installed. You can download the latest distribution from [Pandoc's GitHub](https://github.com/jgm/pandoc/releases/tag/3.7.0.2).
- Also, make sure you have Docker installed on your machine. You can download it from [Docker's official website](https://www.docker.com/products/docker-desktop).
## Setup Hive on Docker
```bash
# pull the latest Hive image
docker pull apache/hive:4.0.1
# run the Hive container
docker run -d -p 10000:10000 -p 10002:10002 `
--env SERVICE_NAME=hiveserver2 `
--name hive-server `
-v "${PWD}:/keanteng" `
apache/hive:4.0.1
```
After the container is running, you can access the Hive CLI or Beeline to interact with Hive.
```bash
# find your data, you will be put at opt/hive, to go to root
cd ..
cd ..
# view your directory
ls keanteng
# start hive CLI
hive
# set the connection
!connect jdbc:hive2://localhost:10000
```
## Docker Extra
Some useful Docker commands to manage your Hive container:
```bash
# start the container
docker start hive-server
# stop the container
docker stop hive-server
# enter the cli
docker exec -it hive-server bash
# list all running containers
docker ps
```
## Further Usage
What about cleaning up your Docker environment? Here are some commands to help you manage your Docker containers and images:
```bash
# remove a container
docker rm hive-server
# prune all stopped containers
docker container prune
# remove all unused images
docker image prune -a
# remove all unused volumes
docker volume prune
# remove build cache
docker builder prune
```