Quick Start

A quick walk-through of the main concepts behind Fenl.

Let's walk through an example of using Fenl to build features for a simple fraud detection model.

Data Model

Features will be built from two event tables; a Purchase table and a FraudReport table. The goal will be to build a model predicting if a given purchase will result in a fraud report within the next 30 days.

A Purchase event occurs when a transaction is recorded. It describes the items that were purchased, the vendor selling the items, the customer buying the items and the total value of the transaction.

Purchase :: {time: timestamp_ns, id: string, vendor_id: string, customer_id: string, total: i64}
entity(id)timevendor_idcustomer_idtotal
cb_001100chum_bucketkaren9
cb_002101chum_bucketkaren2
cb_003102chum_bucketkaren4
cb_004103chum_bucketpatrick5000
cb_005103chum_bucketkaren3
cb_006104chum_bucketkaren5
kk_001100krusty_krabpatrick3
kk_002101krusty_krabpatrick5
kk_003102krusty_krabpatrick12
kk_004104krusty_krabpatrick9

A FraudReport event occurs when a transaction is reported as fraudulent. It identifies the purchase that was reported as fraudulent.

FraudReport :: {time: timestamp_ns, purchase_id: string}
entity (purchase_id)time
cb_004120

The values produced by a Fenl expression are associated with an entity key. Entity keys describe something each value is associated with, for example a purchase could be related to a specific user, and a fraud report could be related to a specific vendor. The Purchase table's entity key is the id field, while the FraudReport table's entity key is purchase_id. Any entity key will do - these specific keys are chosen because they're convenient for this exercise.

Simple Aggregation: Target Value

Before we can start building the inputs to our model, we need to describe the target value the model will predict. We would like to predict if a given purchase will result in a fraud report - if the number of fraud reports is greater than zero.

let Target = count(FraudReport) > 0
entitytimeentity
cb_004120true

Aggregations in Fenl are scoped to entity key; the Target expression produces a bool value associated with each purchase (as identified by FraudReport.purchase_id).

First Feature: Purchase Total

We can describe some simple features based on attributes of a purchase event. For example, we can describe the purchase total by referencing the appropriate event field:

let PurchaseTotal = Purchase.total
entitytimePurchase.total
cb_0011009
cb_0021012
cb_0031024
cb_0041035000
cb_0051033
cb_0061045
kk_0011003
kk_0021015
kk_00310212
kk_0041049

Fenl expressions are either continuous or discrete. Discrete expressions are defined at a finite set of times and their value is null at all other times. For example, PurchaseTotal is a discrete expression: it is defined at the times associated with each purchase event.

Continuous expressions are defined at all times, and are generally the result of an aggregation. For example, Target is a continuous expression because it uses the count() aggregation: at any point in time its value is true if there have been 1 or more FraudReport events before that time or false otherwise.

Changing Entity Key: Purchase Average by Customer part I

It could be useful to compare how each individual purchase compares to the customer's other purchases. We can describe a given customer's purchases by transforming the purchase table to use customer_id as the entity key rather than id. The resulting expression contains the same values, but aggregations will now be scoped to customer ID rather than a purchase ID.

let PurchaseByCustomer = Purchase | with_entity_key($input.customer_id)
entitytimevendor_idcustomer_idtotal
karen100chum_bucketkaren9
karen101chum_bucketkaren2
karen102chum_bucketkaren4
karen103chum_bucketkaren3
karen104chum_bucketkaren5
patrick100krusty_krabpatrick3
patrick101krusty_krabpatrick5
patrick102krusty_krabpatrick12
patrick103chum_bucketpatrick5000
patrick104krusty_krabpatrick9

πŸ“˜

TIP

This expression uses "pipe syntax" which allows sequential operations to be chained.

Pipe syntax works by assigning the left-hand-side of the pipe to the name $input in the right-hand-side of the pipe. Within the right-hand-side of a pipe expression, required function arguments that are omitted from the function call default to $input.

An equivalent way to write PurchaseByCustomer is with_entity_key(Purchase.customer_id, Purchase)

This allows us to describe the average of each customer's purchases:

let AveragePurchaseByCustomer = PurchaseByCustomer.total | mean()
timeentity... | mean()
karen1009
karen1015.5
karen1025
karen1034.5
karen1044.6
patrick1003
patrick1014
patrick1026.666
patrick1031255
patrick1041005.8

Expressions in Fenl are temporal; they describe the result of a given computation at every point in time. In this case, AveragePurchaseByCustomer is an expression whose value changes over time as purchase events occur. The temporal nature of expressions allows Fenl to describe the values as they would have been computed at arbitrary times in the past.

Joining Between Entities: Purchase Average By Customer part II

Our goal is to predict if a given purchase will be reported as fraudulent, but the entity key of AveragePurchaseByCustomer describes a customer. We can operate between entities by "looking up" the average purchase of a particular purchase's customer:

let CustomerAveragePurchase = AveragePurchaseByCustomer | lookup(Purchase.customer_id)
entitytimecustomer_id... | lookup(...)
cb_001100karen9
cb_002101karen5.5
cb_003102karen5
cb_004103patrick1255
cb_005103karen4.5
cb_006104karen4.6
kk_001100patrick3
kk_002101patrick4
kk_003102patrick6.666
kk_004104patrick1005.8

In this case, for each Purchase event, the value of AveragePurchaseByCustomer computed for the purchases customer_id at the time of the purchase is produced. The value being looked up (in this case AveragePurchaseByCustomer) is referred to as the foreign value, while the value describing the foreign entity (in this case Purchase.customer_id) is referred to as the key value.

Lookups are similar to SQL left-joins: a foreign value is produced for each key value.
In contrast to SQL joins the lookup produces the foreign expression value at the point in time associated with each key expression value.

Time Travel: Shifting Features Forward in Time

We would like to predict if a purchase will result in a fraud report within 30 days of the purchase. We began by describing our Target value, and then we described two features that could be useful for making such a prediction: PurchaseTotal and `CustomerAveragePurchase.

For our model to make predictions about the future, it must be trained on features and target values computed at different points in time - we would like the target value to be computed 30 days after the feature values.

Fenl allows values to "time-travel" forward in time. This can be accomplished by shifting the feature expressions forward in time by 30 days:

let ShiftedPurchaseTotal           = PurchaseTotal | shift_by(days=30)
let ShiftedCustomerAveragePurchase = CustomerAveragePurchase | shift_by(days=30)
entitytimeShiftedPurchaseTotalShiftedCustomerAveragePurchase
cb_00113099
cb_00213125.5
cb_00313245
cb_00413350001255
cb_00513334.5
cb_00613454.6
kk_00113033
kk_00213154
kk_003132126.666
kk_00413491005.8

The result of these shift operations contain the same values as PurchaseTotal and CustomerAveragePurchase, but the times associated with each value will be 30 days later. We can now describe our training set by combining the shifted predictor values with the non-shifted target value:

let TrainingExample = record{
  p_total: ShiftedPurchaseTotal,
  avg_purchase: ShiftedCustomerAveragePurchase,
  target: Target,
}
entitytimep_totalavg_purchasetarget
cb_00113099false
cb_00213125.5false
cb_00313245false
cb_00413350001255true
cb_00513334.5false
cb_00613454.6false
kk_00113033false
kk_00213154false
kk_003132126.666false
kk_00413491005.8false

πŸ“˜

TIP

Values cannot travel backwards in time. This ensures features don't accidentally depend on violating the laws of physics.

Going to Production: Feature Vectors

Once a model has been trained, we'll need to compute feature vectors for making predictions. Feature vectors consist of the non-shifted predictor expressions but not the target value.

let FeatureVector = record{
  p_total: PurchaseTotal,
  avg_purchase: CustomerAveragePurchase,
}
entitytimep_totalavg_purchase
cb_00110099
cb_00210125.5
cb_00310245
cb_00410350001255
cb_00510334.5
cb_00610454.6
kk_00110033
kk_00210154
kk_003102126.666
kk_00410491005.8

πŸ›‘

NOTE

PurchaseTotal is a discrete expression whose value depends on the purchase event. A feature store implementation would seem to require some way of providing the "current" event. Alternately, we may want to omit discrete values and tell users they have to provide this type of information to the model.


Β© Copyright 2021 Kaskada, Inc. All rights reserved. Privacy Policy

Kaskada products are protected by patents in the United States, and Kaskada is currently seeking protection internationally with pending applications.