Owl Analytics Admin Guide
Introduction to Owl Analytics
Purpose
Owl-Analytics software provides Machine Learning first and Rules second based approach to Data integrity / Data Quality of datasets. Owl is unobtrusive to your current Data Science tools. Data Quality is an essential prerequisite for a defensible Governance Program.
Who Should Use This Guide
This guide is intended for Administrators of Owl-Analytics software
Administrators: The administrators will learn how to install and configure the application to fit the business needs, incorporate Owl within corporate security/infrastructure and schedule Owlcheck jobs to run on a recurring basis.
Terminology
Owl-web - a Tomcat web server that users can log into and see the results of data quality scans run. The web application essentially displays the results of all data quality scans that have been run.
Owl-core - the main jar file doing the processing of the data.
Owl-agent: remote execution of a job via agent.
Owlcheck - main execution point of jobs. Owlcheck is a shell script
Hoot - Results of an Owlcheck
Metastore - Postgres repository used to store all information surfaced in owl-web.
Dataset - The name given to a Dataset (DS) at Owlcheck execution time.
Scope
This guide covers planning, setup, administration, securing, and working with owl-analytics software. It covers two main functional components known as Owl-Core and Owl-Web.
Owl Features | |
Data Quality | Business Functions |
Row count validation (ex: today 50% record volume) | Downstream data impacts to biz |
Column type validation (ex: is this type the same) | Detect which Models are affected most by DQ issues |
Mixed col, data shifting (ex: is the type mixed) | Cataloging of data assets (which datasets have been Owlchecked) |
Col outlier detection (ex: is the value actually correct) | Usage ranked catalog |
Records removed / added detection | Model performance |
Col format consistency (column shaping the same) | Annotation of DQ issues discovered |
Null / Empty check | Distributed rule generation |
Validate against Source (check col values against source values, authoritative source use-case) | suppression of false positive |
Auto Schema evolution | correlation matrix |
Schema change detection | Scorecard |
Incremental / Micro-batch | Built in Alerting |
Stream quality |
In this depiction the owlcheck driver and job gets pushed to the cluster (deploymode = cluster flag was sent), so all communication connects from any node on the cluster back to the metastore.
Last updated