Running owlcheck (Validate Job Execution)

Inside the <INSTALL_PATH>/bin/demos/ directory you will find a demo shell script. Execute the shell script as follows:

./demo.sh

Inside this script there are demonstrations on how to run an Owlcheck against a set of files. For a more rich list of Owlcheck command and options please see documentation entitled Owlcheck CLI Examples, and Owlcheck Command Line Options. For these demos we have 5 different files, identified below (NOTE: Appendix holds the file content):

  • distribution_change.csv

  • null_change.csv

  • dupe.csv

  • shape.csv

  • row_count.csv

The first file distribution_change will show that when we run 4 Owlchecks on consecutive days from 2018-10-04 through 2018-10-07 that there is a change in distribution. At the end of each run you will see an output that looks like a json. This output is what we call a HOOT.

{

"dataset": "dist_example",

"runId": "2018-10-07",

"score": 43,

"rows": 3,

"avgRows": 0,

"cols": 4,

"dqItems": {

"load_time": {

"key": "Load Time",

"name": "Load Time",

"stndDev": 0.0,

"zscore": 3.0,

"mean": 20000.0,

"value": 1.40931E8,

"score": 5,

"perChange": 0.8,

"verbose": "Late or off schedule data loading",

"type": "TIME"

},

"age_card": {

"name": "age",

"stndDev": 0.0,

"zscore": 1.0,

"mean": 3.0,

"value": 1.0,

"score": 25,

"perChange": -0.6666666666666666,

"verbose": "Change in column values, look for col with repeating or static value",

"type": "CARDINALITY"

},

"fname_card": {

"name": "fname",

"stndDev": 0.0,

"zscore": 1.0,

"mean": 3.0,

"value": 1.0,

"score": 25,

"perChange": -0.6666666666666666,

"verbose": "Change in column values, look for col with repeating or static value",

"type": "CARDINALITY"

}

},

"rules": [],

"alerts": [],

"prettyPrint": true

}

The second file null_change will show when we run 4 Owlchecks across consecutive days from 2018-10-04 through 2018-10-07 that we are detecting more nulls than typical in the dataset.

The third file dupe shows an example of records that are, or maybe duplicates.

The fourth file shape show when we run 4 Owlchecks on consecutive days from 2018-10-04 through 2018-10-07 that we are detecting shape of the data in a columns are different.

The fifth and last example is a row_count example that shows how Owl is able to detect rows dropped / rows added from the data set as well as a provide insight that the record counts do not match day over day.

Please take the time to understand the data in each of these sample files along with the Owlcheck commands that are being executed within the demo.sh script. It is alway good to practice running Owlcheck on flat files to get the handle of the powerful algorithms Owl provides out of the box. However, the true power comes into play when owl is run against a live production dataset well into the millions/billions of records. The owl-web will also show more information about each Owlcheck we ran.

Last updated