Learning the Ropes of OwlDQ AMI
Introduction
This tutorial is aimed for first time users of OwlDQ AMI.
The OwlDQ AMI is a straightforward, pre-configured environment that contains the latest distribution of OwlDQ. The AMI accelerates adoption of Data Quality program within your enterprise.
Let's begin OwlDQ journey...
Prerequisites
Currently OwlDQ is not in AWS public marketplace. It can be shared privately with the customers. Once you get the access to the OwlDQ image, you can search for the OwlDQ image under private images in N. Virginia( us-east-1) region as shown below.
The AMI is based on centOS 7 so before launching make sure to Subscribe to the CentOS 7(x86_64) image.
AWS Marketplace: CentOS 7 (x86_64) - with Updates HVMaws.amazon.com
Next, launch the AMI, make sure you select the appropriate VPC, Subnet ,Security groups and keys as per your requirements.
Make sure you have the inbound rules configured to access the all required ports on the box within your network. We recommend to control the access required users based on their IP address.
OwlDQ Product Components
Hardware Sizing Guide
Azure Setup
You can setup OwlDQ on Azure box with following options.
OwlDQ image is available in Azure "Shared Image Galleries". If you have experience using Azure shared Image Galleries we can work with you in setting up the OwlDQ image.
2. Alternatively you can spin up the Azure instance (Check our Hardware Sizing Guidelines) and run the super script which will install OwlDQ.
Following are the steps for second option specified above.
Provision Azure Instance
The example shown above use the following.
CentOS 7.5 Image
E16s v3
Make sure you have ssh key available for login to instance
User id as "owl"
Select your existing network resources or create new.
Make sure you can have appropriate inbound network rules to access the application.
Install OwlDQ
ssh to the newly created instance. Get the installation script
wget https://owl-packages.s3.amazonaws.com/owlScript.tar.gz
tar -xvf owlScript.tar.gz
execute ./owl_install.sh
Try and access http://<host>:9000 url.
Install with Script.
The following script , installs all the required components (Web, Postgres , spark) for OwlDQ. Alternatively you can bring your own Postgres instance.
Environment Details.
Once the EC2 instance is up and running. You can validate the OwlDQ is running by login into web console at http://<public_ip>:9000 (admin/admin123)
Make sure spark master is running at http://<public_ip>:8080
ssh to the EC2 instance ssh -i <ssh key> centos@<ip_address>
Login to the owl user by : [centos@ip-10-0-0-17 ~]$ su owl -
Password: <owl123>
[owl@ip-10-0-0-17 centos]$
make note of following important softwares and directories.
Software Version | Directory | |
OwlDQ 2.9 | /opt/owl | |
Java: openjdk version "1.8.0_252" | /usr | |
Apache Spark 2.3.4 | /opt/owl/spark | |
Postgres 11.4 | /opt/owl/owl-postgres |
Scripts:
Start and stop all the services on box Go to /home/owl directory.
Stop services run ./owl_stop.sh
Start all services run ./owl_start.sh.
Update Agent Properties.
Login to Owl web console and go to Admin setting->Remote Agent
Edit the Agent property and update the spark config to match your EC2 instance private DNS entry
Once you make sure all the services are running fine , you can go ahead and start adding connections for your data sources.
You can follow the OwlDQ user guide from here.
Last updated