Working with Tables

Kaskada stores data in tables. Tables consist of multiple rows, and each row is a value of the same type.

All methods on this page use from the table object. Be sure to import it before running any method:

from kaskada import table

Table Methods

Creating a Table

When creating a table, you must provide some information about how each row should be interpreted. You must describe:

  • A field containing the time associated with each row. The time should refer to when the event occurred.
  • An initial entity key associated with each row. The entity should identify a thing in the world that each event is associated with.
  • A subsort column associated with each row. This value is used to order rows associated with the same time value.

For more information about these fields, see: Expected File Format

Here is an example of creating a table:

table.create_table(
  table_name = "Purchase",
  time_column_name = "purchase_time",
  entity_key_column_name = "customer_id",
  subsort_column_name = "subsort_id",
)

This creates a table named Purchase. Any data loaded into this table must have a timestamp field named purchase_time, a field named customer_id, and a field named subsort_id.

️ Idiomatic Kaskada

We like to use CamelCase to name tables because it helps distinguish data sources from transformed values and function names.

List Tables

The list table method returns all tables defined for your user. An optional search string can filter
the response set.

Here is an example of listing tables:

table.list_tables(search = "chase")

Get Table

You can get a table using its name:

table.get_table(table_name = "Purchase")

Updating a Table

Tables are currently immutable. Updating a table requires deleting that table and then re-creating it with a new expression.

Deleting a table

You can delete a table using its name:

table.delete_table(table_name = "Purchase")

️ Warning

Note that deleting a table also deletes any files uploaded to it.

Uploading Data

Note that at the moment, a table can have at most a single file.

️ Going Deeper

Under the hood, file uploads are a multi-part process. The first step is to request an upload URL using the v1alpha/uploadurl API endpoint. The second step is to make an HTTP PUT request to the returned URL including the file to load as the request body.

From a Remote file or Dataframe

Data can be loaded from a dataframe into a Kaskada table. Remote files can be read into a dataframe and then uploaded to kaskada.

import pandas

# A sample Parquet file provided by Kaskada for testing
purchases_url = "https://drive.google.com/uc?export=download&id=1SVLTEjmzjA5f2S_o6w1J15uLYeb0cDMx"

# Read the file into a Pandas Datafram
purchases = pandas.read_parquet(purchases_url)

# Upload the dataframe's contents to the Purchase table
table.upload_dataframe("Purchase", purchases)

The contents of the dataframe is transferred to Kaskada and added to Purchase table.

From a Local File

Local files can be uploaded directly to kaskada without first converting them to a dataframe. However, the files must be in a specific format. See Expected File Format for details.

fullPathToFile = "/content/drive/place/thing/purchases.parquet"
table.upload_file("Purchases", fullPathToFile)

This uploads the contents of the file to the Purchases table.


© Copyright 2021 Kaskada, Inc. All rights reserved. Privacy Policy

Kaskada products are protected by patents in the United States, and Kaskada is currently seeking protection internationally with pending applications.