Writing Queries

Writing Queries

The following sections will help with writing queries.

Records

All queries need to specify a record. A record is simply a set of named expressions inside squiggly brackets.

An example record:

%%fenl
{
    time: Purchase.purchase_time,
    entity: Purchase.customer_id,
    max_amount: Purchase.amount | max(),
    min_amount: Purchase.amount | min(),
}

This record has 4 fields, time, entity, max_amount and min_amount

Querying with Python

Using python directly is one way to write queries.

import pandas
from kaskada import compute

query = "{ time: Purchase.purchase_time, entity: Purchase.customer_id, max_amount: last(Purchase.amount) | max(), min_amount: Purchase.amount | min()}"

response_parquet = compute.query(query=query).parquet.path

pandas.read_parquet(response_parquet)

This returns a dataframe with the results of the query.

Optional Parameters (with Python)

When querying directly from python, the following optional parameters are available:

  • with_tables: A list of tables to use in the query, in addition to the tables stored in the system.
  • with_views: A list of views to use in the query, in addition to the views stored in the system.
  • result_behavior: Determines which results are returned. Either "all-results" (default), or "final-results" which returns only the final values for each entity.
  • response_as: Determines how the response is returned. Either "parquet" (default) or "redis-bulk".
    • If "redis-bulk", result_behavior is assumed to be "final-results".
  • data_token_id: Enables repeatable queries. Queries performed against the same data token always run on the same input data.
  • limits: Configures limits on the output set.

Querying with fenlmagic

Using the fenlmagic IPython extension makes iterating on queries easier.

ℹ️

Note:

The fenlmagic IPython extension is optional and isn't required to use Kaskada. Feel free to use whichever client interface fits your workflow.

You can make Fenl queries by prefixing a query block with %%fenl. The query results will be computed and returned as a Pandas dataframe. The query content starts on the next line and includes the rest of the code block's contents:

%%fenl
{
    time: Purchase.purchase_time,
    entity: Purchase.customer_id,
    max_amount: Purchase.amount | max(),
    min_amount: Purchase.amount | min(),
}

This returns a dataframe with the results of the query.

Optional Parameters (with fenlmagic)

When querying using fenlmagic, the following optional parameters are available:

  • --result-behavior: Determines which results are returned. Either all-results (default), or final-results which returns only the final values for each entity.
  • --output: Output format for the query results. One of df dataframe (default), json, parquet or redis-bulk.
    • If redis-bulk, --result-behavior is assumed to be final-results.
  • --data-token: Enables repeatable queries. Queries performed against the same data token always run on the same input data.
  • --preview-rows: Produces a preview of the data with at least this many rows.
  • --var: Assigns the body to a local variable with the given name.

Example use of some of these options can be found in the next section: Example Queries

Tables and Views

Most basic queries operate against tables. However, queries can also operate on views or a combination of tables and views.

Here's an example of using a view to filter the values produced by an expression using a table.

%%fenl
{
  time: Purchase.purchase_time,
  entity: Purchase.customer_id,
  total_purchases: Purchase.amount | sum(),
} | when(PurchaseStats.max_amount > 100)

Views may reference other views, so we could give this expression a name and create a view for it as well if we wanted to.

Views are useful any time you need to share or re-use expressions:

  • Cleaning operations
  • Common business logic
  • Final feature vectors

For more help with tables and views, see Working with Tables and Working with Views.


© Copyright 2021 Kaskada, Inc. All rights reserved. Privacy Policy

Kaskada products are protected by patents in the United States, and Kaskada is currently seeking protection internationally with pending applications.

Kaskada is a registered trademark of Kaskada Inc.