Data Validation module

From MIPAV
Revision as of 17:59, 23 July 2013 by Olga Vovk (Talk | contribs)

Jump to: navigation, search

In order to ensure the quality of uploaded data and also to make data easy to query, data should be submitted in a specific format and range values should comply with the values defined in the data dictionary. All submitted research data must be validated against the values defined in the data dictionary prior to submission. To facilitate this process, we provide the Data Validation module that assists researchers with the submission of their data.

Introduction

The Data Validation module accepts the data as CSV files from a researcher and validates the file content against the values defined in the data dictionary. It then creates a submission package. If everything is OK, the Data Validation module creates a submission ticket and submission package. After that data a good for uploading.

If any validation errors or warnings are found, the module provides a detailed report of any data discrepancies, errors, and warnings received.

Validation warnings are just warnings and they did not prevent creating of the submission package. However, if any validation errors are found, a submission package cannot be created. In that case, the researched should edit data to fix all errors, first, and then re-validate the data.

System requirements

The most recent version of Java Runtime Environment (JRE) (6 or 7) is required in order to run the Data Validation module.

CSV files

The structure of a CSV file should match a corresponding form structure queryable by the query tool.

For more information about CSV files for data uploading, contact the data dictionary operations team - TBD.

Form structures

A form structure represents a grouping/collection of data elements used in BRICS data dictionary. A form structure is analogous to a case report form (CRF) (electronic or paper) where data elements are linked together for collection and display.

A data element is a logical unit of data used in BRICS. It contains a single piece of information of one kind. A data element has a name, precise definition, and a set of permissible values (codes), if applicable. A data element is not necessarily the smallest unit of data; it can be a unique combination of one or more smaller units. A data element occupies the space provided by field(s) on a paper/electronic case report form (CRF) or field(s) in a database record. BRICS allows a use of two types of data elements:

  • Common data elements (CDEs), which are used across multiple studies and diseases/domains,
  • And unique data elements (UDE), which are used to gather information for a particular study.

Both types of data elements are used in forms structures and to collect data.

Submission package

The submission package includes:

  • A submission ticket (XML), see an example below,
  • A data file (XML).

An example of a submission ticket:

<?xml version="1.0" encoding="UTF-8" standalone="true"?> -<submissionTicket environment="production" version="2.0.2.108">-<submissionPackage types="CLINICAL" crcHash="55830a2aa77164ea834942e65e319a38" dataFileBytes="241686" bytes="19233" name="dataFile-1373248220203">-<datasets><dataset crcHash="82ff11c787fc3086ce5bbd9e7518e279" bytes="19233" name="WardMinus2DemoGUIDS.csv" type="CLINICAL" path="C:\Users\user1\Documents\TBI 2013\CSV\sampleCSV.csv"/></datasets><associatedFiles/></submissionPackage> </submissionTicket>

Running the Data Validation module

The the Data Validation module runs locally on your machine, although in order to launch the module, you need to navigate to the Data Repository>Validate data page and click the Launch the Validation Tool link.

Note that the most recent version of Java Runtime Environment (JRE) (6 or 7) is required in order to run the module. Make sure your computer has it installed before launching the module.

  1. Click Launch Validation Tool. First, the Java Runtime Environment window appears, and then the module window appears.
  2. In the module dialog box, click Browse (under Working Directory) to navigate to the directory where the files for submission (CSVs) are located. We call it your Working Directory.
  3. Select the directory and click Load files to load CSVs into the dialog box. The Loading Files window appears showing the progress.
  4. At some point the list of files from your working directory appears in the dialog box.
  5. If your Working Directory contains only the CSV files designated for validation, select the files (click to highlight) to be validated and clock Validate Files. See Including and excluding files.
  6. The Validation module begins to run and validation errors/warnings (if any) appear in the Result Details table. Note: Files with warnings can be included into a submission package. However, files with errors must be fixed and re-validated.
  7. If there are no errors, click Build Submission Package.
  8. The submission package and submission ticket will be deposited in the same Working Directory as the original files submitted to the Validation Tool.

Including and excluding files

In the ideal world, your Working Directory should contain only CSV files for validation, although very often it contains other files also (such as error logs and notes, etc.) In that case, you need to:

  1. Exclude from validation those files (and directories) that are not designated for validation;
  2. Include into validation the CSV data files that you would like to validate.

Refer to Excluding files from validation and Including files for validation.

Excluding files from validation

To exclude files from validation, select individual file(s) (click to highlight) that are of TYPE UNKNOWN and those not needed for the submission. Hold Ctrl while clicking in order to highlight multiple files. Click Exclude Files.

Including files for validation

To include files for validation, select the CSV files you want to be validated and press Include Files. Hold Ctrl while clicking in order to highlight multiple files.

Error log

Files with warnings can be validated. However, files with errors must be fixed prior to being able to validate the file. By clicking the Export Result Details, a text file is created and stored in the same directory as your working files.

TBD.