Categories: Big DataDevOpsDockers

Dockers – How to Get Started with Cloudera

This article represents information and code/scripts which could be used to get started with Cloudera using Dockers. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.

Following are the key points described later in this article:

  • Docker machine configuration
  • Cloudera & Dockers
  • Test the Cloudera installation
  • Scripts to install & run Cloudera
Docker Machine Configuration

To run the cloudera in docker container, one would require to do following configuration to the Docker machine. Open Oracle VM Virtualbox Manager. Stop the default machine. Then, change the settings as shown below.

  • Change the processor (core) setting to 2

    Increase core size to 2

  • Change the memory setting to 8192 (8 GB Ram)

    Increase Memory Size to 8GB

If not done, running “cloudera-manager –express” throws following error:

Memory related error while starting Cloudera manager service

 

Cloudera & Dockers
  • As of date, Cloudera docker image is of size 4.4 GB. Mind you this is going to take some time for image to build.
  • For installation of Cloudera image, you could adopt following three methods:
    • Use command such as following:
      docker pull cloudera/quickstart:latest
      
    • Use Docker file with following content:
      FROM cloudera/quickstart:latest
      

      Save the file as cloudera.df and then, use following command to build the image:

      docker build -t cloudera -f cloudera.df .
      

      The image is tagged as cloudera.

    • Download the cloudera quickstart tar file. Untar it and import it in Docker. Following command could be used:
      tar xzf cloudera-quickstart-vm-*-docker.tar.gz
      docker import - cloudera/quickstart:latest < cloudera-quickstart-vm-*-docker/*.tar
      
  • Once Cloudera image is built, following command could be used to run the container:
    docker run --privileged=true -ti -d -p 8888:8888 -p 80:80 -p 7180:7180 --name $1 --hostname=quickstart.cloudera -v /c/Users:/mnt/Users $cd_image /usr/bin/docker-quickstart   
    

    Note that image is named/tagged as cloudera. You could as well check “docker images” command to find the tag name of Cloudera image and use it in place of “cloudera”. Also, note the port such as 7180, 8888 mapped from guest to host.

 

Test the Cloudera Installation

Execute following command to start the Cloudera service assuming the you started the container with name as “cdh”. Use the scripts below to start “cdh” cloudera container.

docker exec -ti cdh /home/cloudera/cloudera-manager --express

With above command, Cloudera starts as shown in following diagram.

Cloudera starts in a docker container

Open a browser and access following command: http://192.168.99.100:7180/. It would open up the login page for Cloudera Manager. Enter the login/password as cloudera/cloudera and you are all set!

 

Scripts to install & run Cloudera

Following is the script which could be used to install/build the image and run the cloudera container.

  • cloudera.df. This is dockerfile for building Cloudera image
    FROM cloudera/quickstart:latest
    
  • runCloudera.sh. This is a script to build Cloudera image (if not present) and start the container.
    #!/bin/sh
    
    if [ $# == 0 ]; then
      echo "This script expect container name argument. Example: ./runCloudera.sh cdh"
      exit 100
    fi
    
    docker stop $1;docker rm $1
    
    # Build Cloudera image if it does not exists
    #
    cd_image="cloudera"
    cd_df="cloudera.df"
    if [ `docker images $cd_image | wc -l` -lt 2 ]; then
      echo "Docker Image $cd_image do not exist..."
      echo "Builing docker image $cd_image"
      if [ -f $cd_df ]; then
        docker build -t $cd_image -f $cd_df .
      else
        echo "Can't find Dockerfile $cd_df in the current location"
        exit 200
      fi
    fi
    
    docker run --privileged=true -ti -d -p 8888:8888 -p 80:80 -p 7180:7180 --name $1 --hostname=quickstart.cloudera -v /c/Users:/mnt/Users $cd_image /usr/bin/docker-quickstart 
    

Open a Docker terminal, place both the files within a folder and execute the command such as “./runCLoudera.sh cdh”. This would build the image and start the container namely “cdh”.

 

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. For latest updates and blogs, follow us on Twitter. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking. Check out my other blog, Revive-n-Thrive.com

Recent Posts

Feature Engineering in Machine Learning: Python Examples

Last updated: 3rd May, 2024 Have you ever wondered why some machine learning models perform…

2 days ago

Feature Selection vs Feature Extraction: Machine Learning

Last updated: 2nd May, 2024 The success of machine learning models often depends on the…

3 days ago

Model Selection by Evaluating Bias & Variance: Example

When working on a machine learning project, one of the key challenges faced by data…

3 days ago

Bias-Variance Trade-off in Machine Learning: Examples

Last updated: 1st May, 2024 The bias-variance trade-off is a fundamental concept in machine learning…

4 days ago

Mean Squared Error vs Cross Entropy Loss Function

Last updated: 1st May, 2024 As a data scientist, understanding the nuances of various cost…

4 days ago

Cross Entropy Loss Explained with Python Examples

Last updated: 1st May, 2024 In this post, you will learn the concepts related to…

4 days ago