Owl Analytics Admin Guide

Introduction to Owl Analytics

Purpose

Owl-Analytics software provides Machine Learning first and Rules second based approach to Data integrity / Data Quality of datasets. Owl is unobtrusive to your current Data Science tools. Data Quality is an essential prerequisite for a defensible Governance Program.

Who Should Use This Guide

This guide is intended for Administrators of Owl-Analytics software

Administrators: The administrators will learn how to install and configure the application to fit the business needs, incorporate Owl within corporate security/infrastructure and schedule Owlcheck jobs to run on a recurring basis.

Terminology

  • Owl-web - a Tomcat web server that users can log into and see the results of data quality scans run. The web application essentially displays the results of all data quality scans that have been run.

  • Owl-core - the main jar file doing the processing of the data.

  • Owl-agent: remote execution of a job via agent.

  • Owlcheck - main execution point of jobs. Owlcheck is a shell script

  • Hoot - Results of an Owlcheck

  • Metastore - Postgres repository used to store all information surfaced in owl-web.

  • Dataset - The name given to a Dataset (DS) at Owlcheck execution time.

Scope

This guide covers planning, setup, administration, securing, and working with owl-analytics software. It covers two main functional components known as Owl-Core and Owl-Web.

In this depiction the owlcheck driver and job gets pushed to the cluster (deploymode = cluster flag was sent), so all communication connects from any node on the cluster back to the metastore.

Last updated