Example Queries

Some example queries using fenlmagic

The following examples will cover some of the common optional parameters used when writing queries.

Prerequisite: Table Setup

First, we create two tables and load some data. The purchases table contains a small amount of data loaded from 2 files. The transactions table contains 100000 rows of data loaded from a single file.

Table Setup Input
from kaskada import table as kt

def indent(text, amount, ch=' '):
        padding = amount * ch
        return ''.join(padding+line for line in str(text).splitlines(True))

t_purchases = kt.create_table(
    table_name =             "purchases",
    time_column_name =       "purchase_time",
    entity_key_column_name = "customer_id",
    subsort_column_name =    "subsort_id",
)
file_purchases1 = kt.upload_file(table="purchases", file_path="purchases_part1.parquet")
file_purchases2 = kt.upload_file(table=t_purchases, file_path="purchases_part2.parquet")
print(f"SUCCESS: Created Purchases table and uploaded data.")
print("\t Table Response:")
print(indent(t_purchases, 2, "\t"))
print("\t Upload Responses:")
print(indent(file_purchases1, 2, "\t"))
print(indent(file_purchases2, 2, "\t"))

t_transactions = kt.create_table(
    table_name =             "transactions",
    time_column_name =       "transaction_date",
    entity_key_column_name = "msno",
    subsort_column_name =    "idx",
)
file_transactions1 = kt.upload_file(table=t_transactions, file_path="transactions_part1.parquet")
print(f"SUCCESS: Created Transactions table and uploaded data.")
print("\t Table Response:")
print(indent(t_transactions, 2, "\t"))
print("\t Upload Response:")
print(indent(file_transactions1, 2, "\t"))
Table Setup Output
SUCCESS: Created Purchases table and uploaded data.
     Table Response:
        table {
          table_id: "56437b49-4749-4d47-ad5e-ef680d34c750"
          table_name: "purchases"
          time_column_name: "purchase_time"
          entity_key_column_name: "customer_id"
          subsort_column_name: "subsort_id"
          create_time {
            seconds: 1638907853
            nanos: 587268229
          }
          update_time {
            seconds: 1638907853
            nanos: 587268229
          }
        }
        request_details {
          request_id: "985a39f0851a50b1c23551660f0925f1"
        }

     Upload Responses:
        data_token_id: "bdc9e595-a8a0-448c-9a95-c2e3d886b633"
        request_details {
          request_id: "dfcfa0026e4f0a99344329cfd80737ac"
        }

        data_token_id: "24c83cac-8cf4-4a45-98f0-dac8d5b303a2"
        request_details {
          request_id: "f36489c8eb48c8e01fd152bb95da4991"
        }

SUCCESS: Created Transactions table and uploaded data.
     Table Response:
        table {
          table_id: "24fd3343-1c3d-4ad7-bc67-8b367019afd3"
          table_name: "transactions"
          time_column_name: "transaction_date"
          entity_key_column_name: "msno"
          subsort_column_name: "idx"
          create_time {
            seconds: 1638907856
            nanos: 680641826
          }
          update_time {
            seconds: 1638907856
            nanos: 680641826
          }
        }
        request_details {
          request_id: "62a07d27133bedf8779dea9380e09a07"
        }

     Upload Response:
        data_token_id: "7bd4e740-9e63-418e-ba9b-5582db010959"
        request_details {
          request_id: "872984a9097bd6c8dd53d88845615f07"
        }

Using --data-token

--data-token: Enables repeatable queries. Queries performed against the same data token always run on the same input data.

  • use the data token id returned after loading the first file, and results only include rows from the first file
%%fenl --data-token bdc9e595-a8a0-448c-9a95-c2e3d886b633
purchases
Query results using the first dataToken
data_token_id: "bdc9e595-a8a0-448c-9a95-c2e3d886b633"
request_details {
  request_id: "3f737ff336666515a54dd29a9c5ace3a"
}
idpurchase_timecustomer_idvendor_idamountsubsort_id
0cb_0012020-01-01 00:00:00karenchum_bucket90
1kk_0012020-01-01 00:00:00patrickkrusty_krab31
2cb_0022020-01-02 00:00:00karenchum_bucket22
3kk_0022020-01-02 00:00:00patrickkrusty_krab53
4cb_0032020-01-03 00:00:00karenchum_bucket44
5kk_0032020-01-03 00:00:00patrickkrusty_krab125
6cb_0042020-01-04 00:00:00patrickchum_bucket50006
7cb_0052020-01-04 00:00:00karenchum_bucket37
8cb_0062020-01-05 00:00:00karenchum_bucket58
9kk_0042020-01-05 00:00:00patrickkrusty_krab99
  • use the data token id returned after loading the second file, and results rows from both files
%%fenl --data-token 24c83cac-8cf4-4a45-98f0-dac8d5b303a2
purchases
Query results using the second dataToken
data_token_id: "24c83cac-8cf4-4a45-98f0-dac8d5b303a2"
request_details {
  request_id: "3f737ff336666515a54dd29a9c5ace3a"
}
idpurchase_timecustomer_idvendor_idamountsubsort_id
0cb_0012020-01-01 00:00:00karenchum_bucket90
1kk_0012020-01-01 00:00:00patrickkrusty_krab31
2cb_0022020-01-02 00:00:00karenchum_bucket22
3kk_0022020-01-02 00:00:00patrickkrusty_krab53
4cb_0032020-01-03 00:00:00karenchum_bucket44
5kk_0032020-01-03 00:00:00patrickkrusty_krab125
6cb_0042020-01-04 00:00:00patrickchum_bucket50006
7cb_0052020-01-04 00:00:00karenchum_bucket37
8cb_0062020-01-05 00:00:00karenchum_bucket58
9kk_0042020-01-05 00:00:00patrickkrusty_krab99
10kk_0052020-01-06 00:00:00patrickkrusty_krab20
11wh_0012020-01-06 00:00:00spongebobweenie_hut71
12cb_0072020-01-07 00:00:00spongebobchum_bucket342
13wh_0022020-01-08 00:00:00karenweenie_hut83
14kk_0062020-01-08 00:00:00patrickkrusty_krab94

Using --result-behavior

--result-behavior: Determines which results are returned.

  • use all-results (default) to return all the results for each entity:
%%fenl --result-behavior all-results
purchases
`all-results` Output
data_token_id: "7bd4e740-9e63-418e-ba9b-5582db010959"
request_details {
  request_id: "1badb8b0e220e26cc15b93b234ac3c14"
}
idpurchase_timecustomer_idvendor_idamountsubsort_id
0cb_0012020-01-01 00:00:00karenchum_bucket90
1kk_0012020-01-01 00:00:00patrickkrusty_krab31
2cb_0022020-01-02 00:00:00karenchum_bucket22
3kk_0022020-01-02 00:00:00patrickkrusty_krab53
4cb_0032020-01-03 00:00:00karenchum_bucket44
5kk_0032020-01-03 00:00:00patrickkrusty_krab125
6cb_0042020-01-04 00:00:00patrickchum_bucket50006
7cb_0052020-01-04 00:00:00karenchum_bucket37
8cb_0062020-01-05 00:00:00karenchum_bucket58
9kk_0042020-01-05 00:00:00patrickkrusty_krab99
10kk_0052020-01-06 00:00:00patrickkrusty_krab20
11wh_0012020-01-06 00:00:00spongebobweenie_hut71
12cb_0072020-01-07 00:00:00spongebobchum_bucket342
13wh_0022020-01-08 00:00:00karenweenie_hut83
14kk_0062020-01-08 00:00:00patrickkrusty_krab94
  • use final-results (default) to return only the most recent event for each entity
%%fenl --result-behavior final-results
purchases
`final-results` Output
data_token_id: "7bd4e740-9e63-418e-ba9b-5582db010959"
request_details {
  request_id: "145bc51d9bac47f17fd202e5785e58b7"
}
idpurchase_timecustomer_idvendor_idamountsubsort_id
0kk_0062020-01-08 00:00:00patrickkrusty_krab94
1wh_0022020-01-08 00:00:00karenweenie_hut83
2cb_0072020-01-07 00:00:00spongebobchum_bucket342

Using --preview-rows

--preview-rows: Produces a preview of the data with at least this many rows.

  • Setting a limit allows you to quickly iterate on features and verify your results before running them over your full dataset
  • set to 50 on the transactions table to return a preview of at least 50 rows
%%fenl --preview-rows 50
transactions

Returns a dataframe of 71599 rows, instead of the full dataset of 100000 rows.

Note: It may seem odd that many thousands of rows were returned when only 50 were requested. This happens because query operates on batches and will return the results of all batches processed in order to reach the minimum set of rows requested. In this case, compute processed only a single batch, but the batch had a size of 71599 rows.
Note: Using --preview-rows with --result-behavior final-results will cause the full dataset to be processed, as all inputs must be processed to produce final results.


© Copyright 2021 Kaskada, Inc. All rights reserved. Privacy Policy

Kaskada products are protected by patents in the United States, and Kaskada is currently seeking protection internationally with pending applications.

Kaskada is a registered trademark of Kaskada Inc.