Writing Queries
Writing Queries
The following sections will help with writing queries.
Records
All queries need to specify a record. A record is simply a set of named expressions inside squiggly brackets.
An example record:
%%fenl
{
time: Purchase.purchase_time,
entity: Purchase.customer_id,
max_amount: Purchase.amount | max(),
min_amount: Purchase.amount | min(),
}
This record has 4 fields, time
, entity
, max_amount
and min_amount
Querying with Python
Using python directly is one way to write queries.
import pandas
from kaskada import compute
query = "{ time: Purchase.purchase_time, entity: Purchase.customer_id, max_amount: last(Purchase.amount) | max(), min_amount: Purchase.amount | min()}"
response_parquet = compute.query(query=query).parquet.path
pandas.read_parquet(response_parquet)
This returns a dataframe with the results of the query.
Optional Parameters (with Python)
When querying directly from python, the following optional parameters are available:
- with_tables: A list of tables to use in the query, in addition to the tables stored in the system.
- with_views: A list of views to use in the query, in addition to the views stored in the system.
- result_behavior: Determines which results are returned. Either
"all-results"
(default), or"final-results"
which returns only the final values for each entity. - response_as: Determines how the response is returned. Either
"parquet"
(default) or"redis-bulk"
.- If
"redis-bulk"
, result_behavior is assumed to be"final-results"
.
- If
- data_token_id: Enables repeatable queries. Queries performed against the same data token always run on the same input data.
- limits: Configures limits on the output set.
Querying with fenlmagic
Using the fenlmagic IPython extension makes iterating on queries easier.
Note:
The fenlmagic IPython extension is optional and isn't required to use Kaskada. Feel free to use whichever client interface fits your workflow.
You can make Fenl queries by prefixing a query block with %%fenl
. The query results will be computed and returned as a Pandas dataframe. The query content starts on the next line and includes the rest of the code block's contents:
%%fenl
{
time: Purchase.purchase_time,
entity: Purchase.customer_id,
max_amount: Purchase.amount | max(),
min_amount: Purchase.amount | min(),
}
This returns a dataframe with the results of the query.
Optional Parameters (with fenlmagic)
When querying using fenlmagic, the following optional parameters are available:
- --result-behavior: Determines which results are returned. Either
all-results
(default), orfinal-results
which returns only the final values for each entity. - --output: Output format for the query results. One of
df
dataframe (default),json
,parquet
orredis-bulk
.- If
redis-bulk
, --result-behavior is assumed to befinal-results
.
- If
- --data-token: Enables repeatable queries. Queries performed against the same data token always run on the same input data.
- --preview-rows: Produces a preview of the data with at least this many rows.
- --var: Assigns the body to a local variable with the given name.
Example use of some of these options can be found in the next section: Example Queries
Tables and Views
Most basic queries operate against tables. However, queries can also operate on views or a combination of tables and views.
Here's an example of using a view to filter the values produced by an expression using a table.
%%fenl
{
time: Purchase.purchase_time,
entity: Purchase.customer_id,
total_purchases: Purchase.amount | sum(),
} | when(PurchaseStats.max_amount > 100)
Views may reference other views, so we could give this expression a name and create a view for it as well if we wanted to.
Views are useful any time you need to share or re-use expressions:
- Cleaning operations
- Common business logic
- Final feature vectors
For more help with tables and views, see Working with Tables and Working with Views.
Updated about 1 year ago