Once you are satisfied with the performance of your predictive query on historical data, you can generate batch predictions. This is done by navigating to New > Prediction and selecting a model to run.
To run batch predictions:
To view the details for an existing batch prediction jobs, click on the batch prediction job on “Predictions” page. You can view the details of each job by clicking on the job ID in the “Batch Prediction Job Id” column.
After training, you may want predictions for a specific subset of entities. Kumo allows you to:
Filter target entities by refining the dataset used for batch predictions.
Applying filters at batch prediction time helps:
Example 1: Changing Entity Filters
Example 2: Adjusting Target Conditions
2024-02-27
). If left blank, Kumo defaults to the latest timestamp in the fact table.You’ll also need to specify some additional settings per your prediction type. For example, a binary classification task will require setting a threshold (e.g., 0.5) for determining the point at which an object is considered part of the target class.
Your editable options (e.g., Threshold for Binary Classification) will depend on the type of prediction task at hand.
Output Destination
Specify where predictions should be stored. Available destinations:
AWS S3 (CSV, Parquet, or partitioned Parquet format)
Snowflake (overwrites existing table rows)
Note: The user account that you used to create the Snowflake connector must have permissions to create tables in Snowflake.
BigQuery (appends predictions to an existing table)
Note: The user account that you used to create the BigQuery connector must have permissions to create tables in your BigQuery data warehouse.
Local Download (sample output up to 1GB)
Specify the number of parallel workers (up to 4) to speed up batch predictions for large datasets.
Choose the type of output:
Predictions - The predicted target values for the selected entities.
Embeddings - Numerical vectors of entities capturing their behavioral patterns.
You’ll need to either specify an output directory (for S3) or a table name (for Snowflake/BigQuery), depending on your output destination.
If you select “Local Download Only,” you will be able to download a sample of the outputs (up to 1GB). Also, you’ll need to specify a file type for your prediction outputs— either Parquet or CSV file format.
The resulting table will contain a column for the entity id, columns for predicted values or embeddings, and a timestamp column if relevant.
You can later download a sample batch prediction output—even if you choose to write predictions to another data source.
You can choose to output both predictions and embeddings.
Once configured, click Start Predicting to launch the batch prediction job.
You will be redirected to the batch prediction job details page, where you can monitor progress and download output samples.
Check Batch Prediction Outputs for details on the batch prediction outputs.
Once you are satisfied with the performance of your predictive query on historical data, you can generate batch predictions. This is done by navigating to New > Prediction and selecting a model to run.
To run batch predictions:
To view the details for an existing batch prediction jobs, click on the batch prediction job on “Predictions” page. You can view the details of each job by clicking on the job ID in the “Batch Prediction Job Id” column.
After training, you may want predictions for a specific subset of entities. Kumo allows you to:
Filter target entities by refining the dataset used for batch predictions.
Applying filters at batch prediction time helps:
Example 1: Changing Entity Filters
Example 2: Adjusting Target Conditions
2024-02-27
). If left blank, Kumo defaults to the latest timestamp in the fact table.You’ll also need to specify some additional settings per your prediction type. For example, a binary classification task will require setting a threshold (e.g., 0.5) for determining the point at which an object is considered part of the target class.
Your editable options (e.g., Threshold for Binary Classification) will depend on the type of prediction task at hand.
Output Destination
Specify where predictions should be stored. Available destinations:
AWS S3 (CSV, Parquet, or partitioned Parquet format)
Snowflake (overwrites existing table rows)
Note: The user account that you used to create the Snowflake connector must have permissions to create tables in Snowflake.
BigQuery (appends predictions to an existing table)
Note: The user account that you used to create the BigQuery connector must have permissions to create tables in your BigQuery data warehouse.
Local Download (sample output up to 1GB)
Specify the number of parallel workers (up to 4) to speed up batch predictions for large datasets.
Choose the type of output:
Predictions - The predicted target values for the selected entities.
Embeddings - Numerical vectors of entities capturing their behavioral patterns.
You’ll need to either specify an output directory (for S3) or a table name (for Snowflake/BigQuery), depending on your output destination.
If you select “Local Download Only,” you will be able to download a sample of the outputs (up to 1GB). Also, you’ll need to specify a file type for your prediction outputs— either Parquet or CSV file format.
The resulting table will contain a column for the entity id, columns for predicted values or embeddings, and a timestamp column if relevant.
You can later download a sample batch prediction output—even if you choose to write predictions to another data source.
You can choose to output both predictions and embeddings.
Once configured, click Start Predicting to launch the batch prediction job.
You will be redirected to the batch prediction job details page, where you can monitor progress and download output samples.
Check Batch Prediction Outputs for details on the batch prediction outputs.