Owl Analytics Admin Guide
Introduction to Owl Analytics
Purpose
Owl-Analytics software provides Machine Learning first and Rules second based approach to Data integrity / Data Quality of datasets. Owl is unobtrusive to your current Data Science tools. Data Quality is an essential prerequisite for a defensible Governance Program.
Who Should Use This Guide
This guide is intended for Administrators of Owl-Analytics software
Administrators: The administrators will learn how to install and configure the application to fit the business needs, incorporate Owl within corporate security/infrastructure and schedule Owlcheck jobs to run on a recurring basis.
Terminology
Owl-web - a Tomcat web server that users can log into and see the results of data quality scans run. The web application essentially displays the results of all data quality scans that have been run.
Owl-core - the main jar file doing the processing of the data.
Owlcheck - main execution point of jobs. Owlcheck is a shell script
Hoot - Results of an Owlcheck
Metastore - Postgres repository used to store all information surfaced in owl-web.
Dataset - The name given to a Dataset (DS) at Owlcheck execution time.
Scope
This guide covers planning, setup, administration, securing, and working with owl-analytics software. It covers two main functional components known as Owl-Core and Owl-Web.
In this depiction the owlcheck driver and job gets pushed to the cluster (deploymode = cluster flag was sent), so all communication connects from any node on the cluster back to the metastore.
Last updated