Loading Data

How to create a table and load some data into it.

️ Setup Required

The following examples assume you've already completed Client Setup.

Kaskada stores data in tables. Tables consist of multiple rows, and each row is a value of the same type.

Creating a Table

When creating a table, you must provide some information about how each row should be interpreted. You must describe:

  • A field containing the time associated with each row (time_column_name). The time should refer to when the event occurred.
  • An initial entity key associated with each row (entity_key_column_name). The entity should identify a thing in the world that each event is associated with. Don't worry too much about picking the "right" value here - it's easy to change the entity key in Fenl.
  • A subsort column associated with each row (subsort_column_name). This value is used to order rows associated with the same time value.

For more information about these fields, see: Expected File Format

from kaskada import table

table.create_table(
  table_name = "Purchase",
  time_column_name = "purchase_time",
  entity_key_column_name = "customer_id",
  subsort_column_name = "subsort_id",
)

This creates a table named Purchase. Any data loaded into this table must have a timestamp field named purchase_time, a field named customer_id, and a field named subsort_id.

️ Idiomatic Kaskada

We like to use CamelCase to name tables because it helps distinguish data sources from transformed values and function names.

The response from the create_table is a table object with contents similar to:

table {
  table_id: "76b***2e5"
  table_name: "Purchase"
  time_column_name: "purchase_time"
  entity_key_column_name: "customer_id"
  subsort_column_name: "subsort_id"
  create_time {
    seconds: 1634250064
    nanos: 422017488
  }
  update_time {
    seconds: 1634250064
    nanos: 422017488
  }
}
request_details {
  request_id: "fe6bed41fa29cea6ca85fe20bea6ef4a"
}

Note that the response also includes a request_id. A request_id is returned from all requests, whether they succeed or error. When contacting support for an issue, if you include the request_id, the tech can look up additional details about your request, and help get to the root cause faster.

Loading Data

Now that we've created a table, we're ready to load some data into it.

Data can be loaded into a table in multiple ways. In this example we'll load the contents of a Pandas dataframe into the table. To learn about the different ways data can be loaded into a table, see the "Uploading Data" section of the "Working with Data" page.

import pandas

# A sample Parquet file provided by Kaskada for testing
purchases_url = "https://drive.google.com/uc?export=download&id=1SLdIw9uc0RGHY-eKzS30UBhN0NJtslkk"

# Read the file into a Pandas Dataframe
purchases = pandas.read_parquet(purchases_url)

# Upload the dataframe's contents to the Purchase table
table.upload_dataframe("Purchase", purchases)

The result of running upload_dataframe returns a data_token_id. The data token ID is a unique reference to the data currently stored in the system.

data_token_id: "aa2***a6b9"
request_details {
  request_id: "fe6bed41fa29cea6ca85fe20bea6ef4b"
}

The file is transferred to Kaskada and added to the table.

Inspecting the Table's Contents

To verify the file was loaded as expected you can use the table list endpoint to see all the tables defined for your user and the files loaded into each:

table.list_tables()

list_tables shows all the tables accessible by the user and returns a list of table. The table created above is shown here:

tables {
  table_id: "76b***2e5"
  table_name: "Purchase"
  time_column_name: "purchase_time"
  entity_key_column_name: "customer_id"
  subsort_column_name: "subsort_id"
  create_time {
    seconds: 1634067588
    nanos: 312567086
  }
  update_time {
    seconds: 1634067603
    nanos: 70745776
  }
  version: 1
}
request_details {
  request_id: "fe6bed41fa29cea6ca85fe20bea6ef4c"
}

After executing this block, all tables that have been defined are returned.

For more help with tables and loading data, see Reference - Working with Tables


© Copyright 2021 Kaskada, Inc. All rights reserved. Privacy Policy

Kaskada products are protected by patents in the United States, and Kaskada is currently seeking protection internationally with pending applications.

Kaskada is a registered trademark of Kaskada Inc.