Difference between revisions of "Data Validation module"

From MIPAV
Jump to: navigation, search
m (Fixing validation errors)
m
Line 95: Line 95:
  
 
=== Fixing validation errors===
 
=== Fixing validation errors===
 +
[[File:ValidationTool4.png|150px|thumb|left|Saving validation errors and warnings in a TXT file]]
 
Validation errors and warnings can be exported into a text file - that makes working with then and fixing errors much easier.  
 
Validation errors and warnings can be exported into a text file - that makes working with then and fixing errors much easier.  
  
Line 103: Line 104:
  
 
Recommendations:
 
Recommendations:
* By default, error log file is created and stored in the same directory as your working files. We recommend that you create a designated error log directory and save validation logs there.  
+
* By default, an error log file is created and stored in the same directory as your working files. We recommend that you create a designated error log directory and save validation logs there.  
* By default, the error log is saved under the "resultDetail.txt" name. we commend that you choose your own file name for an error log and that name is somehow related to the name of your data file. E.g. if you have a data file let say "MyData.csv" you give the corresponding error log file the following name "MyDataErroLog.txt".
+
* By default, an error log is saved under the "resultDetail.txt" name. We commend that you choose your own file name for an error log and that name is somehow related to the name of your data file. E.g. if you have a data file let say "MyData.csv" you give the corresponding error log file the following name "MyDataErroLog.txt".

Revision as of 19:01, 23 July 2013

In order to ensure the quality of uploaded data and also to make data easy to query, data should be submitted in a specific format and range values should comply with the values defined in the data dictionary. All submitted research data must be validated against the values defined in the data dictionary prior to submission. To facilitate this process, we provide the Data Validation module that assists researchers with the submission of their data.

Introduction

The Data Validation module accepts the data as CSV files from a researcher and validates the file content against the values defined in the data dictionary. It then creates a submission package. If everything is OK, the Data Validation module creates a submission ticket and submission package. After that data a good for uploading.

If any validation errors or warnings are found, the module provides a detailed report of any data discrepancies, errors, and warnings received.

Validation warnings are just warnings and they did not prevent creating of the submission package. However, if any validation errors are found, a submission package cannot be created. In that case, the researched should edit data to fix all errors, first, and then re-validate the data.

System requirements

The most recent version of Java Runtime Environment (JRE) (6 or 7) is required in order to run the Data Validation module.

CSV files

The structure of a CSV file should match a corresponding form structure queryable by the query tool.

For more information about CSV files for data uploading, contact the data dictionary operations team - TBD.

Form structures

A form structure represents a grouping/collection of data elements used in BRICS data dictionary. A form structure is analogous to a case report form (CRF) (electronic or paper) where data elements are linked together for collection and display.

A data element is a logical unit of data used in BRICS. It contains a single piece of information of one kind. A data element has a name, precise definition, and a set of permissible values (codes), if applicable. A data element is not necessarily the smallest unit of data; it can be a unique combination of one or more smaller units. A data element occupies the space provided by field(s) on a paper/electronic case report form (CRF) or field(s) in a database record. BRICS allows a use of two types of data elements:

  • Common data elements (CDEs), which are used across multiple studies and diseases/domains,
  • And unique data elements (UDE), which are used to gather information for a particular study.

Both types of data elements are used in forms structures and to collect data.

Submission package

The submission package includes:

  • A submission ticket (XML), see an example below,
  • A data file (XML).

An example of a submission ticket:

<?xml version="1.0" encoding="UTF-8" standalone="true"?> -<submissionTicket environment="production" version="2.0.2.108">-<submissionPackage types="CLINICAL" crcHash="55830a2aa77164ea834942e65e319a38" dataFileBytes="241686" bytes="19233" name="dataFile-1373248220203">-<datasets><dataset crcHash="82ff11c787fc3086ce5bbd9e7518e279" bytes="19233" name="WardMinus2DemoGUIDS.csv" type="CLINICAL" path="C:\Users\user1\Documents\TBI 2013\CSV\sampleCSV.csv"/></datasets><associatedFiles/></submissionPackage> </submissionTicket>

Running the Data Validation module

Selecting the Working Directory and loading files
Including and excluding files from/to validation
Validation warnings (if any) appear in the Result Details table. Files with warnings can be included into a submission package

The the Data Validation module runs locally on your machine, although in order to launch the module, you need to navigate to the Data Repository>Validate data page and click the Launch the Validation Tool link.

Note that the most recent version of Java Runtime Environment (JRE) (6 or 7) is required in order to run the module. Make sure your computer has it installed before launching the module.

  1. Click Launch Validation Tool. First, the Java Runtime Environment window appears, and then the module window appears.
  2. In the module dialog box, click Browse (under Working Directory) to navigate to the directory where the files for submission (CSVs) are located. We call it your Working Directory.
  3. Select the directory and click Load files to load CSVs into the dialog box. The Loading Files window appears showing the progress.
  4. At some point the list of files from your working directory appears in the dialog box under Files.
  5. If your Working Directory contains only the CSV files designated for validation, select the files (click to highlight) to be validated and clock Validate Files. See Including and excluding files.
  6. The Validation module begins to run and validation errors/warnings (if any) appear in the Result Details table. Note: Files with warnings can be included into a submission package. However, files with errors must be fixed and re-validated.
  7. If there are no errors found in your CSV file(s), for each file that passes validation the following information appears in the Files table: 1) the form structure name appears in the Structure column, 2) the word PASSED appears in the Result column, and 3) the summary column contains only warnings but no errors. Note that a file that passed validation still can have a lot of warnings. That is OK. For more information about validation errors and warnings refer to Error log.
  8. Click Build Submission Package.
  9. For each validated CSV file, the submission package and submission ticket will be deposited in the same Working Directory as the original files submitted to the Validation Tool.

Including and excluding files

In the ideal world, your Working Directory should contain only CSV files for validation, although very often it contains other files also (such as error logs and notes, etc.) In that case, you need to:

  1. Exclude from validation those files (and directories) that are not designated for validation. These files usually appear with Type= UNKNOWN under Files in the Working Directory;
  2. Include into validation the CSV data files that you would like to validate. These files have Type=CSV in the Files table.

Refer to Excluding files from validation and Including files for validation.

Excluding files from validation

To exclude files from validation, select individual file(s) (click to highlight) that are of TYPE UNKNOWN and those not needed for the submission. Hold Ctrl while clicking in order to highlight multiple files. Click Exclude Files.

Including files for validation

To include files for validation, select the CSV files you want to be validated and press Include Files. Hold Ctrl while clicking in order to highlight multiple files.

Error log

Validation errors and warnings appear in the Result Details table. Files with warnings can be validated. However, files with errors must be fixed prior and re-validated prior submission.

Validation errors appear when a CSV file has entries that are

  • Of different type than defines in the data dictionary, e.g. numbers instead of alphanumeric values;
  • In different format (other than defiled in the data dictionary for this data element);
  • Not listed among permissible values for this particular data element;
  • Have more than 1 permissible value separated by a semicolon ";";
  • Some other errors.

Validation warnings mostly appear when a data entry, which was defined as Required in the corresponding form structure, is missing in the CSV file.

Please don't be surprised when your validation error log appears having a ton of warnings, these can be easily ignored.

Fixing validation errors

Saving validation errors and warnings in a TXT file

Validation errors and warnings can be exported into a text file - that makes working with then and fixing errors much easier.

To export validation errors or warnings, or both,

  1. Click the Export Result Details,
  2. In the Save dialog box that appears, a) select a directory where you would like to save validation logs, b) specify what types of error log entries you would like to export - these could be both errors and warnings (recommended only for smaller log files), errors only (recommended), or warnings only.
  3. Type in your own file name and press Save. The log file will be save in the designated directory under the chosen name.

Recommendations:

  • By default, an error log file is created and stored in the same directory as your working files. We recommend that you create a designated error log directory and save validation logs there.
  • By default, an error log is saved under the "resultDetail.txt" name. We commend that you choose your own file name for an error log and that name is somehow related to the name of your data file. E.g. if you have a data file let say "MyData.csv" you give the corresponding error log file the following name "MyDataErroLog.txt".