Skip to main content

Data integrity checks

Maintaining a high quality data set of timestamped features quickly becomes very time consuming. Unlike in a regular geospatial data set each edit has the potential to be at odds with a previous edit. To ensure the data quality Time Editor provides you with four types of so called "Integrity Checks": Date Integrity, Geometric Integrity, Time Integrity and Spatial Integrity. The most important being the latter two.

Once you have completed the basic setup you can perform integrity checks with Vector -> Time Editor -> Inspect Layer (or via Ctrl + I). You then can choose the check you want to perform via the dropdown at the top.

For every check you can export the results wich will be reported at the bottom of the dialog in tabular format. You can export the results for each check to a CSV file (use the file chooser just above the buttons) for later inspection or reference.

tip

In many cases you would not want to check the entire dataset, but only a subset of the features. All integrity checks work with a selection of features. If you have an active selection, only those features will be checked.

Types of Checks

Date Integrity

The Date Integrity check ensures that all dates are either NULL or adhere to the date format expected by Time Editor. As the time of writing this is YYYY-MM-DD. It also confirms that the date actually exists, thus making sure that there is no 30th of february or similiarily non existing dates. You can start the process by running Start.

If you use the sample data set there should be no errors present (output will be All dates are valid). You can verify that the plugin works by changing any date to a non-valid one. After a change to the feature with fid = 80410 (Border of Bad Homberg up to 1946) to 1946-13-31 the plugin will report the incorrect date back to you, including the feature-ID you defined in the settings (here fid), the reason (in this case Invalid End Date) and the reason.

Geometric Integrity

The Geometric Integrity check performs just a basic check on the geometries. Under the hood it checks with GEOS library (.isGeosValid()) and reports the output of the QGIS geometry validator.

Time Integrity

The Time Integrity check iterates over all distinct common identifiers as defined in the settings. For each unique value it will collect all the features, order them choronological and check for missing time spans between them. Once you run the script, the plugin will report a lot of errors:

Using Expressions

This is due to a conceptual design in our dataset. We include additional geometries that are used for specific purposes within the data set. Hence we need a utility to limit the check to a subset of the features. This can be achieved via a regular QGIS Expression. In our case we add "hilfs_gmk" = 0 to the expression field. After running the check again, we will only receive one error. The plugin will tell us the features involved by their feature-id, the common identifier (shared_id) and the reason.

Adding Exceptions

In this case we can see a gap between 1928 and 1935 during which period the unit indeed was non existent. Obviously, this happens quite frequently with historical data: states loose and regain their sovereignity, adminstrative units are dissolved and reestablished. What we need is an exception that tells Time Editor that it is no problem if the features 80394 and 1717 do not align choronologically.

You can provide for each layer a CSV-file that specifies the IDs that are allowed to have temporal gaps between them. Start by opening the Time Editor Settings and add a path to an existing or not yet existing CSV-file:

Next you will have to add the two reported features IDs to the CSV-file. The comment column is optional and serves only as a reference for you. The order of the features is irrelevant. After the edit Time Editor will not report the remaining error but will report Date history is valid for all features.

Spatial Integrity

The Spatial Integrity check ensures that no features - according to their temporal attributes - existed along side each other do overlap. It will iterate over all unique timestamps in the layers features. For each of those timestamps it will filter the features and check for intersections between the subset. By default it will ignore small intersections under 0.1 square map units. Obviously this will result in a substantially large area in the CRS EPSG:4326. The threshold can be changed via the input field. Use zero if you whant to report all interesections without regard to their size.

warning

As the time of writing Time Editor only checks for overlaps. We do plan to provide a function to check for "holes" meaning space between features that is not covered for a specific date.

tip

Often times you will see the same fids with the same areas, as a spatial error will affect multiple timestamps. It is also worth noting that some spatial errors are actually a result of differing time attributes.