metrics
metrics: List[str]
(Optional)
Description
This field specifies the metrics to use for evaluating your model. Available metrics vary per task, with incompatible metrics resulting in a validation error.
Supported Task Types
- All
Available Extra Metrics
Task Type | Options |
---|---|
Binary Classification | acc ,auroc , auprc , ap , f1 ,ndcg , ndcg@k ,precision ,precision@k ,recall ,recall@k ; for k = 1, 10, and 100 |
Multiclass Classification | acc , f1 , precision , recall |
Multilabel Classification | acc ,f1 , precision , recall ; auroc ,auprc , ap supported only with suffixes _macro , _micro , and _per_label |
Multilabel Ranking | f1@k , map@k , mrr@k , ndcg@k , precision@k , recall@k ,hit_ratio@k ; for k = 1, 10, and 100 |
Link Prediction | f1@k , map@k , mrr@k , ndcg@k , precision@k , recall@k , hit_ratio@k , coverage@k , avg_popularity@k , personalization@k , diversity[col_name]@k ; for k = 1, 10, and 100 |
Regression | mae , mape , mse , rmse , smape |
Forecasting | mae , mse , rmse , smape , mape , neg_binamial , normal , lognormal |
Example
In the case of link prediction, the default metrics are map@1
, map@10
, and map@100
, but you can use:
to report map@12
.
Metric Description
-
precision@k
, i.e. the proportion of recommendations within the top-k that are actually relevant. A higher precision indicates the model’s ability to surface relevant items early in the ranking. -
recall@k
, i.e. the proportion of relevant items that appear within the top-k. A higher recall indicates the model’s ability to retrieve a larger proportion of relevant items. -
map@k
(Mean Average Precision), considering the order of relevant items within the top-k. map@k can provide a more comprehensive view of ranking quality than precision alone. -
ndcg@k
(Normalized Discounted Cumulative Gain) accounts for the position of relevant items by considering relevance scores, giving higher weight to more relevant items appearing at the top. -
mrr@k
(Mean Reciprocal Rank), i.e. the mean reciprocal rank of the first correct prediction (or zero otherwise). -
hit_ratio@k
, i.e. the percentage of users for whom at least one relevant item is present within the top-k recommendations. -
coverage@k
, i.e. the percentage of unique items recommended across all users within the top-k. Higher coverage indicates a wider exploration of the item catalog. -
avg_popularity@k
provides insights into the model’s tendency to recommend popular items by averaging their popularity scores of items from the training set within the top-k recommendations. -
personalization@k
, i.e. the dissimilarity of recommendations across different users. Higher personalization suggests that the model tailors recommendations to individual user preferences rather than providing generic results. Dissimilarity is defined by the average inverse cosine similarity between users’ lists of recommendations. -
diversity[item_col_name]@k
computes the diversity of predictions according to an item category, i.e. the pair-wise inequality of recommendations according to item categories. An item category can be defined by a categorical column name in the item table, e.g., diversity[product_type]@10.