Learning the Ropes of OwlDQ AMI

Introduction

This tutorial is aimed for first time users of OwlDQ AMI.‌

The OwlDQ AMI is a straightforward, pre-configured environment that contains the latest distribution of OwlDQ. The AMI accelerates adoption of Data Quality program within your enterprise.‌

Let's begin OwlDQ journey...‌

Prerequisites‌

Currently OwlDQ is not in AWS public marketplace. It can be shared privately with the customers. Once you get the access to the OwlDQ image, you can search for the OwlDQ image under private images in N. Virginia( us-east-1) region as shown below.

The AMI is based on centOS 7 so before launching make sure to Subscribe to the CentOS 7(x86_64) image.

AWS Marketplace: CentOS 7 (x86_64) - with Updates HVMaws.amazon.com

Next, launch the AMI, make sure you select the appropriate VPC, Subnet ,Security groups and keys as per your requirements.‌

Make sure you have the inbound rules configured to access the all required ports on the box within your network. We recommend to control the access required users based on their IP address.

OwlDQ Product Components

Hardware Sizing Guide

Azure Setup

You can setup OwlDQ on Azure box with following options.

  1. OwlDQ image is available in Azure "Shared Image Galleries". If you have experience using Azure shared Image Galleries we can work with you in setting up the OwlDQ image.

2. Alternatively you can spin up the Azure instance (Check our Hardware Sizing Guidelines) and run the super script which will install OwlDQ.

Following are the steps for second option specified above.

Provision Azure Instance

The example shown above use the following.

  1. CentOS 7.5 Image

  2. E16s v3

  3. Make sure you have ssh key available for login to instance

  4. User id as "owl"

  5. Select your existing network resources or create new.

  6. Make sure you can have appropriate inbound network rules to access the application.

Install OwlDQ

  1. ssh to the newly created instance. Get the installation script

  2. wget https://owl-packages.s3.amazonaws.com/owlScript.tar.gz

  3. tar -xvf owlScript.tar.gz

  4. execute ./owl_install.sh

  5. Try and access http://<host>:9000 url.

Install with Script.

The following script , installs all the required components (Web, Postgres , spark) for OwlDQ. Alternatively you can bring your own Postgres instance.

#!/bin/bash
export JAVA_HOME=/usr
export SPARK_HOME=$HOME/owl/spark
export PATH=$PATH:$HOME/owl/spark/bin
ssh-keygen -t rsa -N "" -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
## Install Java 1.8
sudo yum install java-1.8.0-openjdk
##
mkdir build
## Get Owl Packages, spark.
cd build
wget https://owl-packages.s3.amazonaws.com/MP/owl-2.12.0-package-rhel7-base.tar.gz
tar -xvf owl-2.12.0-package-rhel7-base.tar.gz
wget https://<>/additional/spark-2.3.2-bin-hadoop2.6.tgz
wget https://<>/additional/zeppelin-0.8.0-bin-all.tgz
#sed -i 's/com.owl.org.postgresql.Driver/org.postgresql.Driver/g' $HOME/build/setup.sh
##
./setup.sh -port=9000 -owlbase=$HOME -owlpackage=$HOME/build -options=postgres,spark,owlagent,owlweb,zeppelin -pgpassword=owl123 -pgserver=localhost
sleep 2
echo 'key=<key>:65536' >> $HOME/owl/config/owl.properties
##
sleep 2
cd $HOME/owl/spark/jars
wget https://owl-packages.s3.amazonaws.com/MP/aws-java-sdk-1.11.880.jar
wget https://owl-packages.s3.amazonaws.com/MP/hadoop-aws-2.6.5.jar

Environment Details.

  1. Once the EC2 instance is up and running. You can validate the OwlDQ is running by login into web console at http://<public_ip>:9000 ({user}/{password})

  1. Make sure spark master is running at http://<public_ip>:8080

  2. ssh to the EC2 instance ssh -i <ssh key> centos@<ip_address>

  3. Login to the owl user by : [centos@ip-10-0-0-17 ~]$ su owl -

    Password: <owl123>

    [owl@ip-10-0-0-17 centos]$

  4. make note of following important softwares and directories.

Software Version

Directory

OwlDQ 2.9

/opt/owl

Java: openjdk version "1.8.0_252"

/usr

Apache Spark 2.3.4

/opt/owl/spark

Postgres 11.4

/opt/owl/owl-postgres

Scripts:

  1. Start and stop all the services on box Go to /home/owl directory.

  2. Stop services run ./owl_stop.sh

  3. Start all services run ./owl_start.sh.

Update Agent Properties.

Login to Owl web console and go to Admin setting->Remote Agent

Edit the Agent property and update the spark config to match your EC2 instance private DNS entry

Once you make sure all the services are running fine , you can go ahead and start adding connections for your data sources.

You can follow the OwlDQ user guide from here.

Last updated