Glossary and Controlled Vocabularies

Last updated: May 15th, 2017

Draft Glossary

DQ assessment

Definition: Action of judging if DQ status makes data fit for use.

DQ management

Definition: Action of improving DQ status for making data fitter for use.

DQ control

Definition: One of the approaches of DQ management, used for improving DQ when this is feasible.

DQ assurance

Definition: One of the approaches of DQ management, used for improving DQ in order to guarantee a satisfactory level of DQ, usually filtering data sets.

DQ Needs (class)

Definition: Class of concepts related to the definition of DQ meaning.

DQ Solutions (class)

Definition: Class of concepts related to DQ methods and tools used to meet DQ needs.

DQ Report (class)

Definition: Class of concepts related to the description of DQ status of a data resource.

Use Case

Definition: The context of use of data - describes an activity or process that uses data for a specific purpose.

Examples: Species Distribution Modeling, Agrobiodiversity, Niche Modelingn of Apis melifera in Brazil, GBIF Publishing Requirements.

Information Element (IE)

Definition: Is an abstraction that represents a relevant content in the Use Case context (e.g. coordinates).

Examples: Coordinates, Date, Time, Species, Specimen, Observation.

DQ Dimension

Definition: Measurable quality aspects.

Examples: Completeness, accuracy, consistency, conformance, granularity.

DQ Criterion

Definition: Statement that describes acceptable levels of DQ (e.g data are consistent, value is in controlled vocabulary.)

DQ Enhancement

Definition: Statement that describes activities required to improve DQ.

Specification

Definition: A formal or informal description of a method for performing DQ measurements, validations or enhancements.

Mechanism

Definition: A piece of software, a hardware, a technique, a tool or any other artifact that implements one or more Specifications.

Data Resource

Definition: An instance of data (a single record or multi records.)

Data Resource Type

Definition: Type of Data Resource, which can be "single record" or "multi records".

DQ Assertion

Definition: A result of a measurement (of a DQ Dimension), a validation (according to a DQ Criterion) or an amendment (according to a DQ Enhancement), obtained by a Mechanism, operating on a specific Data Resource (single record or dataset), and using a specific method defined by a Specification.

DQ profile

Definition: Defines the components necessary for enabling DQ assessment and management in the context of a Use Case.

DQ Measurement Policy

Definition: The component of DQ profile that describes a set of relevant DQ Dimensions for performing DQ measurements.

DQ Validation Policy

Definition: The component of DQ profile that describes a set of relevant DQ Criteria for performing DQ validations.

DQ Enhancement Policy

Definition: The component of DQ profile that describes a set of relevant DQ Enhancements for performing DQ enhancements.

Terminology history: DQ Improvement Policy.

DQ Status

Definition: Defines the status of quality of a Data Resource according to a DQ profile.

DQ Measures

Definition: Part of DQ status that presents a set of measures of the DQ Dimensions defined in the DQ Measurement Policy of the related DQ profile.

DQ Validations

Definition: Part of DQ status that presents a set of validations based on the DQ Criteria defined in the DQ Validation Policy of the related DQ profile.

DQ Amendments

Definition: Part of DQ status that presents a set of amendments based on the DQ Enhancements defined in the DQ Enhancement Policy of the related DQ profile.

Terminology history: DQ Improvements.

Fitness For Use Backbone (FFUB)

Definition: Proposal of a computational platform for registering and reusing components based on the conceptual framework, such as DQ profile (Use Cases, IEs, DQ Dimensions, DQ Criteria, DQ Enhancements), DQ status reports (DQ Assertions), DQ Solutions (Mechanism, Specification).

Draft Controlled Vocabularies

DQ Dimensions

Completeness

Definition: Measure the extent to which every meaningful and necessary data are present and sufficient for use in a specific Use Case.

Guideline: Is all the requisite information available? Are relevant data values missing, or in an unusable state, such as "NULL" or "UNKNOWN" or any predefined placeholders (e.g. N/A, the value 0 or -1 for numbers)? For a dataset the number or proportion of records in the database that have some specific information.

Examples:

Completeness of Coordinates (Coordinates - Single Record) - The presence of Decimal Latitude, Decimal Longitude, Geodetic Datum and Coordinates Uncertainty in Meters.

Completeness of Event Date (Event Date - Dataset) - The proportion of records in dataset that have a value defined for the dwc:eventDate field that is different to the placeholder "0000-00-00".

Conformance

Definition: Conforms to a format, syntax, type, range, standard or to the own nature of the information element.

Guideline: Are there expectations that data values conform to a specified standard, format or controlled vocabulary? Should the data conform to rules related to the nature of the data, such as limits, ranges or any other prescribed rule inherent to the real-world which the data is supposed to represent?

Examples:

Event Date Conformity with ISO 8601 (Event date - Single Record) - Measure the compliance of the Event Date with ISO 8601.

Coordinates Conformity (Coordinates - Single Record) - Decimal Latitude value is between [-90, 90] inclusive and Decimal Longitude value is between [-180, 180] inclusive.

Consistency

Definition: Agreement among related information elements in the data.

Guideline: Do distinct data instances provide conflicting information about the same underlying data object? Are values consistent across data sets? Are interdependent attributes in agreement?

Examples:

Event Date Consistency (Event Date - Single Record) - The consistency of the dwc:eventDate value with the related atomic field (dwc:year, dwc:month and dwc:day).

Georeference Consistency (Georeference - Single Record) - The coordinates are consistent with the locality description, including State, Country, etc.

Accuracy

Definition: Measured by how the data values agree with an identified source of truth. The degree to which data correctly describes the truth (object, event or any abstract or real "thing").

Guideline: The degree to which the data mirrors the characteristics of the real world object or objects it represents. Ideally the "real world" truth is established through primary research. However, as this is often not practical, it is common to use 3rd party reference data from sources which are deemed trustworthy and of the same chronology.

Source: https://github.com/eBay/griffin/blob/master/griffin-doc/userguide.md

Examples:

Scientific Name Accuracy (Scientific Name - Single Record) - Measured by the number of matches of the data value against a set of nomenclatural authorities sources (source of "truth").

Resolution

Definition: Refer to the data have sufficient detail information. Measure the granularity of the data. Smallest measurable increment.

Guideline: Could refer to the spatial resolution of the grids used in collecting observational records (e.g, 100m grids); the temporal resolution - whether the data is to the day, or just the month or year; or the taxonomic resolution - whether the data is identified to species level only, or to infraspecific level.

Examples:

Event Date Resolution (Event Date - Single Record) - The eventDate value represents a specific day, a month or a year.

Georeference Resolution (Georeference - Single Record) - What is the grid size of observation data records.

DQ Enhancements Types

Recommendation

Definition: A suggestion that (if accepted) may improve the quality of data in one or more DQ Dimension.

Examples:

Recommend swap latitude by longitude (Coordinates - Single Record) - Recommend swap latitude and longitude if the resultant coordinates become more consistent with the related data (Country, Species, etc).

Correction

Definition: Any action that may improve the quality of data in one or more DQ Dimension.

Examples:

Disregard duplicated records in dataset (Occurrence - Dataset) - For a particular Use: Filter duplicated records. Create a subset of the original dataset that have only one of each duplicated record, in order to improve the Uniqueness of the dataset.

Prevention

Definition: Any action that may prevent error, avoiding worsen the quality of data in one or more DQ Dimension.

Examples:

Visual inspection (Coordinates - Single Record) - Plot the coordinates on the map and check if the location make sense.

Data Resource Types

Single Record

Definition: A single record.

Examples:

An observation of species in location/time - An instance a DwC-based record.

Multi Records

Definition: An entire dataset or a subset of one entire dataset.

Examples:

Occurrences species of Apis mellifera of MCZbase - Subset occurrences records of Apis mellifera of the MCZbase dataset.