TorchMetrics v0.11 — Multimodal and nominal

Skaftenicki
PyTorch Lightning Developer Blog

--

We are happy to announce that Torchmetrics v0.11 is now publicly available. In Torchmetrics v0.11 we have primarily focused on the cleanup of the large classification refactor from v0.10 and adding new metrics. With v0.11 are crossing 90+ metrics in Torchmetrics nearing the milestone of having 100+ metrics.

Classification refactor

First, we would like to highlight that from v0.11 of Torchmetrics the classification refactor that was done in v0.11 is now fully implemented and has taken effect. You can read more about the refactor in this blog post. This means that if you have not updated your code (assuming it used any classification metrics) after v0.10 it will most likely not work anymore. If this is the case we highly recommend you check out the documentation for all classification metrics that contains updated examples on how to update your code.

New domains

In Torchmetrics we are not only looking to expand with new metrics in already established metric domains such as classification or regression, but also new domains. We are therefore happy to report that v0.11 includes two new domains: Multimodal and nominal.

Multimodal

If there is one topic within machine learning that is hot right now then it is generative models and in particular image-to-text generative models. Just recently stable diffusion v2 was released, able to create even more photorealistic images from a single text prompt than ever

Tip: Have you tried the Lightning App for self-hosting stable diffusion v2?

Example of images generated by stable diffusion v2. Image credit: Stability AI

In Torchmetrics v0.11 we are adding a new domain called multimodal to support the evaluation of such models. For now, we are starting out with a single metric, the CLIPScore from this paper that can be used to evaluate such image-to-text models. CLIPScore currently achieves the highest correlation with human judgment, and thus a high CLIPScore for an image-text pair means that it is highly plausible that an image caption and an image are related to each other.

Example use of the new CLIPScore metric. We use the transformers framework as a backend for getting the underlying clip models. Currently supports all OpenAI CLIP models.

Nominal

If you have ever taken any course in statistics or introduction to machine learning you should hopefully have heard about data can be of different types of attributes: nominal, ordinal, interval, and ratio. This essentially refers to how data can be compared. For example, nominal data cannot be ordered and cannot be measured. An example, would it be data that describes the color of your car: blue, red, or green? It does not make sense to compare the different values. Ordinal data can be compared but does have not a relative meaning. An example, would it be the safety rating of a car: 1,2,3? We can say that 3 is better than 1 but the actual numerical value does not mean anything.

In v0.11 of TorchMetrics, we are adding support for classic metrics on nominal data. In fact, 4 new metrics have already been added to this domain:

All metrics are measures of association between two nominal variables, giving a value between 0 and 1, with 1 meaning that there is a perfect association between the variables.

Example use of the new CramersV metric.

Small improvements

In addition to metrics within the two new domains v0.11 of Torchmetrics contains other smaller changes and fixes:

  • Totalvariation metric has been added to the image package, which measures the complexity of an image with respect to its spatial variation.
  • MulticlassExactMatch metric has been added to the classification package, which for example can be used to measure sentence level accuracy where all tokens need to match for a sentence to be counted as correct
  • KendallRankCorrCoef have been added to the regression package for measuring the overall correlation between two variables
  • LogCoshError have been added to the regression package for measuring the residual error between two variables. It is similar to the mean squared error close to 0 but similar to the mean absolute error away from 0.

Finally, Torchmetrics now only supports v1.8 and higher of Pytorch. It was necessary to increase from v1.3 to secure because we were running into compatibility issues with an older version of Pytorch. We strive to support as many versions of Pytorch, but for the best experience, we always recommend keeping Pytorch and Torchmetrics up to date.

Stay tuned

As a little teaser for our next release, we are working on adding native support for plotting directly in Torchmetrics with a simple interface (metric.plot()), as some metrics are better visualized than described numerically. Feel free to give input on this issue or this PR.

Thank you!

Big thanks to all our community members for their contributions and feedback. If you have any recommendations for the next metric that we should try to tackle, please open an issue in the repo.

We are happy to see a continued adaption of TorchMetrics in over 5500+ projects, and this release also marks the point where we crossed 1200+ GitHub stars.

Finally, If you would like to give open source a try, we have the #new_contributors and #metrics channel on the general PyTorch-lightning slack where you can ask a question and get guidance.

--

--