Loading Data

How to create a table and load some data into it.

️ Setup Required

The following examples assume you've already completed Client Setup.

Kaskada stores data in tables. Tables consist of multiple rows, and each row is a value of the same type.

Creating a Table

When creating a table, you must provide some information about how each row should be interpreted. You must describe:

  • A field containing the time associated with each row. The time should refer to when the event occurred.
  • An initial entity key associated with each row. The entity should identify a thing in the world that each event is associated with. Don't worry too much about picking the "right" value here - it's easy to change the entity key in Fenl.
  • A subsort column associated with each row. This value is used to order rows associated with the same time value.

For more information about these fields, see: Expected File Format

from kaskada import table

table.create_table(
  table_name = "Purchase",
  time_column_name = "purchase_time",
  entity_key_column_name = "customer_id",
  subsort_column_name = "subsort_id",
)

This creates a table named Purchase. Any data loaded into this table must have a timestamp field named purchase_time, a field named customer_id, and a field named subsort_id.

️ Idiomatic Kaskada

We like to use CamelCase to name tables because it helps distinguish data sources from transformed values and function names.

Now that we've created a table, we're ready to load some data into it.

Loading Data

Data can be loaded from a dataframe into a Kaskada table.

import pandas

# A sample Parquet file provided by Kaskada for testing
purchases_url = "https://drive.google.com/uc?export=download&id=1SVLTEjmzjA5f2S_o6w1J15uLYeb0cDMx"

# Read the file into a Pandas Datafram
purchases = pandas.read_parquet(purchases_url)

# Upload the dataframe's contents to the Purchase table
table.upload_dataframe("Purchase", purchases)

The file is transferred to Kaskada and added to the table.

Inspecting the Table's Contents

To verify the file was loaded as expected you can use the table list endpoint to see all the tables defined for your user and the files loaded into each:

table.list_tables()

After executing this block, all tables that have been defined are returned.

{
  "tables": [
    {
      "tableId": "31112aca11d0e9e6eb7db96f317dda49",
      "tableName": "Purchase",
      "timeColumnName": "purchase_time",
      "groupColumnName": "customer_id",
      "create_time": {
         "seconds": 1630686464,
         "nanos": 608298056
      },
      "update_time": {
         "seconds": 1630686476,
         "nanos": 821819136
       }
    }
  ]
}

For more help with tables and loading data, see Reference - Working with Tables


© Copyright 2021 Kaskada, Inc. All rights reserved. Privacy Policy

Kaskada products are protected by patents in the United States, and Kaskada is currently seeking protection internationally with pending applications.