Querying Data

How to compute results from your data.


Setup Required

The following examples assume you've already completed Loading Data.

Writing Queries

You can write queries in a number of ways with Kaskada. Here we start with fenlmagic because these queries are not persistent. As you are iterating in Jupyter it can be helpful to build up your feature and time selection as you go, once you'd like to persist a query, check out our article on Sharing Queries.

You can make Fenl queries by prefixing a query block with %%fenl. The query results will be computed and returned as a Pandas dataframe. The query content starts on the next line and includes the rest of the code block's contents.

Let's start by looking at the Purchase table without any filters, this query will return all of the columns and rows contained in a table:


Note: this table is intentionally small so that you can get to know queries with Kaskada. When you upload your own data, you may want to filter down the results so that you can see just a relevant sample:

Purchase | when(Purchase.customer_id == "patrick")

As you begin to better understand your data you can start using aggregations over your data such as the max() function:

   max_purchase: Purchase.amount | max(),

These results may be surprising if you were expecting a single value, this is a feature, not a bug!

Computations in Fenl are temporal: they produce a time-series of values describing the full history of a computation's results. Temporal computation allows Fenl to capture what an expression's value would have been at arbitrary times in the past.

Fenl values can time-travel forward through time. Time travel allows combining the result of different computations at different points in time. Because values can only travel forward in time, Fenl prevents information about the future from "leaking" into the past.

Read more in the Fenl Language Guide

Now we can start building up our features. To reduce the set of columns output in your query, you can define a record with the curly braces { } and name the columns with a label shown on the left of the : in the below query. In order to debug your features, we recommend including the time and the entity with each query so that you can walk through the results in time:

    time: Purchase.purchase_time,
    entity: Purchase.customer_id,
    max_amount: Purchase.amount | max(),
    min_amount: Purchase.amount | min(),

Tip: the result of a previous cell in Jupyter is available to be saved by setting a variable to the result temporarily stored as _. You can then interact with these results at a typical dataframe such as displaying the columns or plotting a histogram:

df_explore = _

For more help writing queries, see Reference - Writing Queries

© Copyright 2021 Kaskada, Inc. All rights reserved. Privacy Policy

Kaskada products are protected by patents in the United States, and Kaskada is currently seeking protection internationally with pending applications.

Kaskada is a registered trademark of Kaskada Inc.