Quick Start [Draft]

Last updated: Mar 19th, 2017

Overview

This page presents a brief practical introduction on how to tackle Data Quality (DQ) on the light of the conceptual framework ont the Biodiversity Informatics context, specially, but not exclusive, in the TDWG and GBIF community context. Due to the comprehensiveness of the conceptual framework, it allows different interpretations and manners of using it according to different stakeholders. To contextualize how those different stakeholders can take advantage of the conceptual framework, we selected four stakeholders to describe their role in DQ context:

DQ Profilers
  • Who: are experts on DQ and/or a specific domain/context that uses/handle biodiversity-related data.
  • Interest: formalizing and sharing the way DQ is handled in a specific domain/context.
Developers
  • Who: are experts on the development of technical solutions for DQ.
  • Interest: formalizing and sharing techniques and tools used for DQ and generating standardized and comparable outputs.
Data Users
  • Who: are experts on a specific domain, which uses biodiversity-related data.
  • Interest: efficiently assessing the quality of data and ensure their fitness for use.
Data Holders
  • Who: are institutions or people that hold, manage and curate biodiversity-related data.
  • Interest: efficiently improving the quality of data and their fitness for use.

The Framework

The original version of the framework can be found in [Veiga 2017], which is highly comprehensive and formal, composed by 29 interrelated concepts.

In this page we present a lite view of the framework according to a practical perspective. In this context, the framework will be approached according to three main components: DQ Profile, DQ Solutions and DQ Report.

Framework overview

DQ Profile defines a structure to describe the meaning of data fitness for use in a given context. A DQ Profile describes DQ needs requirements for a given context/scope. In order to implement and apply such requirements on data, it is necessary to use a set of DQ Solutions that involve methods and mechanisms applied to meet DQ Profiles requirements.

DQ Profile

Definition of DQ and data fitness for use in a given context.

DQ Solutions define a structure to describe methods (technical specifications) and mechanisms (tools that act on data) in order meet the DQ Profile requirements. DQ Solutions operate on Data Resources (both single records and multi records) and generate DQ Assertions assigned to each Data Resource. A set of selected DQ Assertions represents a DQ Report.

DQ Solutions

Set of methods (technical specifications) and mechanisms (tools that act on data) in order meet the DQ Profile requirements.

A DQ Report defines a set of selected DQ Assertions assigned to a Data Resources according to a DQ Profile requirements. With a DQ Report assigned to a Data Resource, data users, holders, aggregators and custodians are enabled to assess and improve the quality of the Data Resource according to the related DQ Profile definition.

DQ Report

Set of selected DQ Assertions assigned to a Data Resources according to a DQ Profile requirements.

DQ Profiler

Overview

Due to the idiosyncratic nature of the concept of “quality”, it is essential to understand what "data fitness for use" means according to the data user/handler’s perspective in order to enable the DQ assessment and management.

In this context, defining “data fitness for use” involves to define three elements: use, data and fitness. Accordingly, DQ Profile encompasses these elements by five main components:

Use:
  1. Use Case: represents the a context/scope delimitation for a DQ Profile.
Data:
  1. Information Element (IE): elements in data that represents an event (e.g. occurrence, taxon identification), an object (e.g. preserved specimen, living specimen, leaf, seed), an abstract data concept (e.g. GUID (Global Unique Identifier), scientifi name), or any entity of the real world (e.g. person, institution, location, etc), and has some importance in a data use context.
Fitness:
  1. DQ Measurement Policy: soon...
  2. DQ Validation Policy: soon...
  3. DQ Improvement Policy: soon...

In this context we propose a method to define a DQ Profile composed by five steps: (1) Define a Use Case; (2) Define the valuable IE in the context of the Use Case; (3) Define a DQ Measurement Policy in the Use Case context; (4) Define a DQ Validation Policy in the Use Case context and; (5) Define a DQ Improvement Policy in the Use Case context. Next, we present a brief description of each step.

Defining a basic DQ Profile

This is an interactive tool for building a DQ Profile following 4 main steps.

Examples

Soon.

Developer

Soon

Content under development. Subscribe GitHub to be informed on updates.

Overview

Soon.

Describing DQ Solutions

Soon.

Examples

Soon.

Data User

Soon

Content under development. Subscribe GitHub to be informed on updates.

Overview

Soon.

Assessing DQ

Soon.

Examples

Soon.

Data Holder

Soon

Content under development. Subscribe GitHub to be informed on updates.

Overview

Soon.

Improving DQ

Soon.

Examples

Soon.

TDWG-GBIF Biodiversity Data Quality Interest Group - Task Group 1