Parquet Export
Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk.
To create a parquet export, make sure ‘Experimental features’ are enabled, then navigate to Configuration
> Parquet export
. The screen shows an overview of the created export schedules, and allows for the creation of a one-of export or the creation of an export schedule. Currently the parquet export function only exports data that has a Good
quality status.
One-of export
To create a one-of export task that only runs once, click the Parquet export
button in the upper right corner. A modal will open where the export task can be configured.
- Select a time range for which to export data
- Choose a directory where the exported parquet file(s) will be saved
- Select the label(s) of the measurement(s) that need to be exported
Export on a schedule
To create a recurring export, press the Create Task scheduler
button in the top right corner.
- Fill in the task details
- Choose a name and description for the scheduler
- Write an RRule to define when and how often the task should be scheduled.
An RRULE is way to define a recurrence set. The rule defines a pattern to generate a series of timestamps for events. Their syntax is defined by section 3.8.5.3 of RFC 5545 .
Use the RRULE tool for help in creating RRules
- Fill in the export details
- Fill in the start offset.
The start offset determines the start of the interval for the exported data, relative to the RRule’s trigger event. - Fill in the period for the data to export or fill in the stop offset
- Specify a directory where the parquet files should be saved.
- Select the labels of the measurements that should be exported.
- Fill in the start offset.
- Click save
Example
The following is an example configuration of an exporter.
The used RRULE is RRULE:FREQ=HOURLY;INTERVAL=1;BYMINUTE=0;BYSECOND=0
, which triggers every hour, on the hour.
By setting the start offset to -1h10m
and the period to 1h
, the scheduler
will export an hour’s worth of data, starting from 10 minutes before the previous hour until 10 minutes before the current hour.
Setting the stop offset to -10m
would be equivalent to setting the period to 1h
in this example.
trigger at | export start time | export stop time |
---|---|---|
12:00 | 10:50 | 11:50 |
13:00 | 11:50 | 12:50 |
14:00 | 12:50 | 13:50 |
Export progress
Creating a parquet export can take some time, depending on the amount of data that needs to be exported. Task progress can be viewed in the Tasks screen.