Kaskada stores data in tables. Tables consist of multiple rows, and each row is a value of the same type.
All methods on this page use from the
table object. Be sure to import it before running any method:
from kaskada import table
When creating a table, you must provide some information about how each row should be interpreted. You must describe:
- A field containing the time associated with each row. The time should refer to when the event occurred.
- An initial entity key associated with each row. The entity should identify a thing in the world that each event is associated with.
- A subsort column associated with each row. This value is used to order rows associated with the same time value.
For more information about these fields, see: Expected File Format
Here is an example of creating a table:
table.create_table( table_name = "Purchase", time_column_name = "purchase_time", entity_key_column_name = "customer_id", subsort_column_name = "subsort_id", )
This creates a table named
Purchase. Any data loaded into this table must have a timestamp field named
purchase_time, a field named
customer_id, and a field named
️ Idiomatic Kaskada
We like to use CamelCase to name tables because it helps distinguish data sources from transformed values and function names.
The list table method returns all tables defined for your user. An optional search string can filter
the response set.
Here is an example of listing tables:
table.list_tables(search = "chase")
You can get a table using its name:
table.get_table(table_name = "Purchase")
Tables are currently immutable. Updating a table requires deleting that table and then re-creating it with a new expression.
You can delete a table using its name:
table.delete_table(table_name = "Purchase")
Note that deleting a table also deletes any files uploaded to it.
Note that at the moment, a table can have at most a single file.
️ Going Deeper
Under the hood, file uploads are a multi-part process. The first step is to request an upload URL using the
v1alpha/uploadurlAPI endpoint. The second step is to make an HTTP PUT request to the returned URL including the file to load as the request body.
Data can be loaded from a dataframe into a Kaskada table. Remote files can be read into a dataframe and then uploaded to kaskada.
import pandas # A sample Parquet file provided by Kaskada for testing purchases_url = "https://drive.google.com/uc?export=download&id=1SVLTEjmzjA5f2S_o6w1J15uLYeb0cDMx" # Read the file into a Pandas Datafram purchases = pandas.read_parquet(purchases_url) # Upload the dataframe's contents to the Purchase table table.upload_dataframe("Purchase", purchases)
The contents of the dataframe is transferred to Kaskada and added to Purchase table.
Local files can be uploaded directly to kaskada without first converting them to a dataframe. However, the files must be in a specific format. See Expected File Format for details.
fullPathToFile = "/content/drive/place/thing/purchases.parquet" table.upload_file("Purchases", fullPathToFile)
This uploads the contents of the file to the Purchases table.
Updated 20 days ago