Parquet Export

Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk.

Creating parquet exports is currently an experimental feature, which means the feature is still subject to change.

To create a parquet export, make sure ‘Experimental features’ are enabled, then navigate to Configuration > Parquet export. The screen shows an overview of the created export schedules, and allows for the creation of a one-of export or the creation of an export schedule. Currently the parquet export function only exports data that has a Good quality status.

Parquet export screen, no schedules

One-of export

To create a one-of export task that only runs once, click the Parquet export button in the upper right corner. A modal will open where the export task can be configured.

Parquet export, one-off export configuration screen

  1. Select a time range for which to export data
  2. Choose a directory where the exported parquet file(s) will be saved
  3. Select the label(s) of the measurement(s) that need to be exported

Export on a schedule

To create a recurring export, press the Create Task scheduler button in the top right corner.

Create Task schedule screen

  1. Fill in the task details
    1. Choose a name and description for the scheduler
    2. Write an RRule to define when and how often the task should be scheduled.

      An RRULE is way to define a recurrence set. The rule defines a pattern to generate a series of timestamps for events. Their syntax is defined by section 3.8.5.3 of RFC 5545 .
      Use the RRULE tool for help in creating RRules

  2. Fill in the export details
    1. Fill in the start offset.
      The start offset determines the start of the interval for the exported data, relative to the RRule’s trigger event.
    2. Fill in the period for the data to export or fill in the stop offset
    3. Specify a directory where the parquet files should be saved.
    4. Select the labels of the measurements that should be exported.
  3. Click save

Example

The following is an example configuration of an exporter.

The used RRULE is RRULE:FREQ=HOURLY;INTERVAL=1;BYMINUTE=0;BYSECOND=0, which triggers every hour, on the hour.
By setting the start offset to -1h10m and the period to 1h, the scheduler
will export an hour’s worth of data, starting from 10 minutes before the previous hour until 10 minutes before the current hour.
Setting the stop offset to -10m would be equivalent to setting the period to 1h in this example.

trigger at export start time export stop time
12:00 10:50 11:50
13:00 11:50 12:50
14:00 12:50 13:50

Example of export scheduler

Export progress

Creating a parquet export can take some time, depending on the amount of data that needs to be exported. Task progress can be viewed in the Tasks screen.

Parquet export task entry