Summary

Completed

Note

See the Text and images tab for more details!

Computer vision is built on the analysis and manipulation of numeric pixel values in images. Machine learning models are trained using a large volume of images to enable common computer vision scenarios, such as image classification, object detection, semantic segmentation, caption generation, and others.

The models used for computer vision tasks have evolved from statistics-based image classifiers through convolutional neural networks to today's transformer-based multimodal models. Cutting-edge models can not only interpret visual input, but also generate visual output.

Tip

For more information, see What is Computer Vision?.