Working with Tables

Kaskada stores data in tables. Tables consist of multiple rows, and each row is a value of the same type.

All methods on this page use the table module. Be sure to import it before running any method:

from kaskada import table

Table Methods

Creating a Table

When creating a table, you must provide information about how each row should be interpreted. You must describe:

  • A field containing the time associated with each row. The time should refer to when the event occurred.
  • An initial entity key associated with each row. The entity should identify a thing in the world related to each event.

Optionally:

  • A subsort column associated with each row. This value is used to order rows associated with the same time value. If no subsort column is provided, Kaskada will generate one.

For more information about these fields, see: Expected File Format

Here is an example of creating a table:

table.create_table(
  table_name = "Purchase",
  time_column_name = "purchase_time",
  entity_key_column_name = "customer_id",
  subsort_column_name = "subsort_id",
)

This creates a table named Purchase. Any data loaded into this table must have a timestamp field named purchase_time, customer_id, and a field called subsort_id.

️ Idiomatic Kaskada

We like to use CamelCase to name tables because it helps distinguish data sources from transformed values and function names.

List Tables

The list table method returns all tables defined for your user. An optional search string can filter
the response set.

Here is an example of listing tables:

table.list_tables(search = "chase")

Get Table

You can get a table using its name:

table.get_table("Purchase")

Updating a Table

Tables are currently immutable. Updating a table requires deleting it and then re-creating it with a new expression.

Deleting a table

You can delete a table using its name:

table.delete_table("Purchase")

️ Warning

Note that deleting a table also deletes any files uploaded to it.

A failed precondition error is returned if another view and/or materialization references the table. To continue with the deletion of the table, delete the dependent resources or supply the force flag to delete the table forcefully. Forcefully deleting a table without deleting the dependent resources may result in the dependent resources functioning incorrectly.

table.delete_table("Purchase", force=True)

Uploading Data

️ Going Deeper

Under the hood, file uploads are a multi-part process. The first step is to create a staged file that requests an upload URL from the API. The Python client library will upload files to the upload URL through an HTTP PUT request. Then an additional API call is performed to load the staged file to a table where additional validation is performed.

From a Remote file or Dataframe

Data can be loaded from a dataframe into a Kaskada table. Remote files can be read into a dataframe and then uploaded to Kaskada.

import pandas

# A sample Parquet file provided by Kaskada for testing
purchases_url = "https://drive.google.com/uc?export=download&id=1SLdIw9uc0RGHY-eKzS30UBhN0NJtslkk"

# Read the file into a Pandas Dataframe
purchases = pandas.read_parquet(purchases_url)

# Upload the dataframe's contents to the Purchase table
table.upload_dataframe("Purchase", purchases)

The contents of the dataframe are transferred to Kaskada and added to the Purchase table.

From Amazon S3

Data can be loaded directly from Amazon S3 into a Kaskada table. Loading from S3 requires the following:

️ Security and Credentials

Kaskada does not store the provided credentials in any manner. The API only has access to the credentials throughout the load data call. If no access credentials are provided, the object must have public read permissions.

from kaskada import table

TABLE_NAME = 'Purchase'

EXTERNAL_AWS_ACCESS_KEY = '<AWS_ACCESS_KEY'
EXTERNAL_AWS_SECRET_KEY = '<AWS_SECRET_KEY>'
S3_PATH = 'events/2022/purchases.parquet'
BUCKET = 'production.company'
REGION = 'us-west-2'

table.upload_from_s3(
    TABLE_NAME,
    access_key=EXTERNAL_AWS_ACCESS_KEY, 
    secret=EXTERNAL_AWS_SECRET_KEY, 
    bucket=BUCKET,
    key=S3_PATH, 
    region=REGION
)

The contents of the parquet object in S3 are transferred to Kaskada and added to the Purchase table.

From a Local File

Local files can be uploaded directly to Kaskada without converting them to a dataframe. However, the files must be in a specific format. See Expected File Format for details.

fullPathToFile = "/content/drive/place/thing/purchases.parquet"
table.upload_file("Purchases", fullPathToFile)

This uploads the contents of the file to the Purchases table.


© Copyright 2021 Kaskada, Inc. All rights reserved. Privacy Policy

Kaskada products are protected by patents in the United States, and Kaskada is currently seeking protection internationally with pending applications.

Kaskada is a registered trademark of Kaskada Inc.