datasqrl.profile.default

0.5.6·dev·Published 10/6/2024

Default Profile

The default profile for DataSQRL is automatically selected if no other profile is provided. This profile supports all versions past 0.5.3 and provides a Docker Compose setup for PostgreSQL, Flink, Kafka, and Vert.x HTTP server.

Configuration Options

This profile offers configuration options for several engines it uses.

Kafka

If additional Kafka topics are required, they can be added to your package.json with the create-topics value. These topics will be created automatically.

{
  "values": {
    "create-topics": [
      "topic1",
      "topic2"
    ]
  }
}

In the clickstream-ai-recommendation example, you can find an example where we create additional topics manually.

Flink

Additional Flink configurations can be provided with the flink-config value. This allows you to pass standard Flink configurations. For instance, to configure checkpointing, you might need to specify execution.checkpointing.interval, state.backend, and state.checkpoints.dir. Here's how you can provide them using this profile:

{
  "values": {
    "flink-config": {
      "execution.checkpointing.interval": "10 min",
      "state.backend": "filesystem",
      "state.checkpoints.dir": "file://<checkpoints-dir>"
    }
  }
}

It is common to use additional file resources in the pipeline. For this, the profile provides a mountDir configuration option to mount additional resources. These resources will be mounted in the Flink job- and task-manager containers using Docker Compose volumes syntax.

{
  "values": {
    "mountDir": "/mylocaldir:/myremotedir"
  }
}

In the clickstream-ai-recommendation example, you can find an example of how to specify a mount point.

Enabled Engines

This profile enables the vertx, postgres, kafka, and flink engines. However, it is possible to disable some of these engines based on your use case. To disable Vert.x (e.g., if you don't want to execute GraphQL queries), simply omit it from the enabled-engines list:

{
  "enabled-engines": ["postgres", "kafka", "flink"]
}

Example Usage

Basic Scenario

This profile is automatically selected if no other profile is provided. In the logistics-shipping-geodata example, it is used automatically.

Advanced Scenario

It is a good practice to use the default profile as a base profile that can be extended or overridden. For instance, if you want to use Redpanda instead of Kafka while retaining the other configurations from the default profile, you can do so.

To use this profile, add it to the profiles list in your project's package.json:

{
  "profiles": [
    "datasqrl.profile.default"
  ]
}

It is also recommended to specify the exact version used in the dependencies section:

{
  "dependencies": [
    {
      "datasqrl.profile.default": {
        "name": "datasqrl.profile.default",
        "version": "0.5.6",
        "variant": "dev"
      }
    }
  ]
}

When swapping out an existing service, you need to respect the structure of the default profile. Other services will reference the log engine as Kafka, so you need to keep that name. To replace Kafka with Redpanda, modify the kafka folder and the kafka.compose.yml. The content in these will override the default profile.

For this use case, see the clickstream-ai-recommendation example.

Good to Know:

  • You need to specify the enabled profiles in order. The latter in the list will override the respective parts of the previous profiles.
  • SQRL doesn't support merging at the engine level. Therefore, the vertx, postgres, kafka, and flink folders will be fully overridden if specified in subsequent profiles.

Install

Add the following to the dependencies of your package.json:

Repository

License

Description

A docker compose datasqrl profile