Writing Queries

Writing Queries

The following sections will help with writing queries.

Records

All queries need to specify a record. A record is simply a set of named expressions inside squiggly brackets.

An example record:

%%fenl
{
    time: Purchase.purchase_time,
    entity: Purchase.customer_id,
    max_amount: Purchase.amount | max(),
    min_amount: Purchase.amount | min(),
}

This record has 4 fields, time, entity, max_amount and min_amount

Querying with Python

Using python directly is one way to write queries.

import pandas
from kaskada import compute

query = "{ time: Purchase.purchase_time, entity: Purchase.customer_id, max_amount: last(Purchase.amount) | max(), min_amount: Purchase.amount | min()}"

response_parquet = compute.query(query=query).parquet.path

pandas.read_parquet(response_parquet)

This returns a dataframe with the results of the query.

Optional Parameters (with Python)

When querying directly from python, the following optional parameters are available:

  • with_tables: A list of tables to use in the query, in addition to the tables stored in the system.
  • with_views: A list of views to use in the query, in addition to the views stored in the system.
  • result_behavior: Determines which results are returned. Either "all-results" (default), or "final-results" which returns only the final values for each entity.
  • response_as: Determines how the response is returned. Either "parquet" (default) or "redis-bulk".
    • If "redis-bulk", result_behavior is assumed to be "final-results".
  • data_token_id: Enables repeatable queries. Queries performed against the same data token always run on the same input data.
  • limits: Configures limits on the output set.

Querying with fenlmagic

Using the fenlmagic IPython extension makes iterating on queries easier.

️ Note:

The fenlmagic IPython extension is optional and isn't required to use Kaskada. Feel free to use whichever client interface fits your workflow.

You can make Fenl queries by prefixing a query block with %%fenl. The query results will be computed and returned as a Pandas dataframe. The query content starts on the next line and includes the rest of the code block's contents:

%%fenl
{
    time: Purchase.purchase_time,
    entity: Purchase.customer_id,
    max_amount: Purchase.amount | max(),
    min_amount: Purchase.amount | min(),
}

This returns a dataframe with the results of the query.

Optional Parameters (with fenlmagic)

When querying using fenlmagic, the following optional parameters are available:

  • --result-behavior: Determines which results are returned. Either all-results (default), or final-results which returns only the final values for each entity.
  • --output: Output format for the query results. One of df dataframe (default), json, parquet or redis-bulk.
    • If redis-bulk, --result-behavior is assumed to be final-results.
  • --data-token: Enables repeatable queries. Queries performed against the same data token always run on the same input data.
  • --preview-rows: Produces a preview of the data with at least this many rows.
  • --var: Assigns the body to a local variable with the given name.

Example use of some of these options can be found in the next section: Example Queries

Tables and Views

Most basic queries operate against tables. However, queries can also operate on views or a combination of tables and views.

Here's an example of using a view to filter the values produced by an expression using a table.

%%fenl
{
  time: Purchase.purchase_time,
  entity: Purchase.customer_id,
  total_purchases: Purchase.amount | sum(),
} | when(PurchaseStats.max_amount > 100)

Views may reference other views, so we could give this expression a name and create a view for it as well if we wanted to.

Views are useful any time you need to share or re-use expressions:

  • Cleaning operations
  • Common business logic
  • Final feature vectors

For more help with tables and views, see Working with Tables and Working with Views.


© Copyright 2021 Kaskada, Inc. All rights reserved. Privacy Policy

Kaskada products are protected by patents in the United States, and Kaskada is currently seeking protection internationally with pending applications.