Taskonomy

Transfer Learning API

Visit the live API to find a supervision-efficient transfer strategy for your specified arguments.

API Page

Live Demo of 20 Tasks

Upload an image and see live results for 20 distinct vision tasks in the Task Bank, all output from our networks.

Live Demo

Paper

The paper and supplementary material describing the methodology and evaluation.

PDF

Transfer Visualization

Examine any combination of source to target transfers via sample videos.

Transfer Visualization Page

Models

Download pretrained models of the method.

Pretrained Models

Data

Download the data. Almost 4M multiple-annotated images of indoor spaces.

Dataset

Abstract

Do visual tasks have a relationship, or are they unrelated? For instance, could having surface normals simplify estimating the depth of an image? Intuition answers these questions positively, implying existence of a "structure" among visual tasks. Knowing this structure has notable values; it is the concept underlying transfer learning and provides a principled way for identifying redundancies across tasks, e.g., to seamlessly reuse supervision among related tasks or solve many tasks in one system without piling up the complexity.
We propose a fully computational approach for modeling the structure of the space of visual tasks. This is done via finding (first and higher-order) transfer learning dependencies across a dictionary of twenty six 2D, 2.5D, 3D, and semantic tasks in a latent space. The product is a computational taxonomic map for task transfer learning. We study the consequences of this structure, e.g. nontrivial emerged relationships, and exploit them to reduce the demand for labeled data. For example, we show that the total number of labeled datapoints needed for solving a set of 10 tasks can be reduced by roughly 2/3 (compared to training independently) while keeping the performance nearly the same. We provide a set of tools for computing and probing this taxonomical structure including a solver that users can employ to devise efficient supervision policies for their use cases.

Process overview. The steps involved in creating the taxonomy.

Transfer learning relationships across perceptual task. Found fully computationally using Taskonomy.

Force Atlas plot of tasks, showing tasks going from isolated random locations to positioned according to their computed relationships.

Transfer Learning API

The provided API uses our results to recommend a superior set of transfers. By using these transfers, we can get similar results close to a fully supervised network using substantially less data.

Example taxonomies. Generated from the API.

Transfer Visualization

In order to evaluate the quality of the learned transfer functions, we ran the transfer networks on a random youtube video. Visit the Transfer Visualization page to analyze how well different sources transfer to a target, or how well a source transfers to different targets. You can compare the results to a fully superivsed network as well as to baselines trained on ImageNet or not employing trasnfer learning at all.

Transfer Function. We use one or multiple source tasks to predict a target task's output.

Task Bank: A Unified Bank of 25 Pretrained Visual Estimators

Click on each task to see sample results.
Try the live demo on your query image.
Download pretrained models in the bank.

Denoising Autoencoder

Uncorrupted version of corrupted image.

Surface Normals

Pixel-wise surface normals.

Z-buffer Depth

Depth estimation.

Colorization

Colorizing input grayscale images.

Reshading

Reshading with new lighting placed at camera location.

Room Layout

Orientation and aspect ratio of cubic room layout.

Camera Pose (fixated)

Relative camera pose with matching optical centers.

Camera Pose (nonfix.)

Relative camera pose with distinct optical centers.

Vanishing Points

Three Manhattan-world vanishing points.

Curvatures

Magnitude of 3D principal curvatures.

Unsupervised 2D Segm.

Segmentation (graph cut approximation) on RGB.

Unsupervised 2.5D Segm.

Segmentation (graph cut approximation) on RGB-D-Normals-Curvature image.

3D Keypoints

3D Keypoint estimation from underlying scene 3D.

2D Keypoints

Keypoint estimation from RGB-only (texture features).

Occlusion Edges

Edges which occlude parts of the scene.

Texture Edges

Edges computed from RGB only (texture edges).

Inpainting

Filling in masked center of image.

Semantic Segmentation

Pixel-wise semantic labeling (via knowledge distillation from MS COCO).

Object Classification

1000-way object classification (via knowledge distillation from ImageNet).

Scene Classification

Scene Classification (via knowledge distillation from MIT Places).

Jigsaw Puzzle

Putting scrambled image pieces back together.

Egomotion

Odometry (camera poses) given three input images.

Autoencoder

Image compression and decompression.

Point Matching

Classifying if centers of two images match or not.

Dataset

4.5 Mil.

Scenes

600

Buildings

25

Tags per Image

1024

Resolution

We provide a large and high-quality dataset of varied indoor scenes.

Complete pixel-level geometric information via aligned meshes.

Semantic information via knowledge distillation from ImageNet, MS COCO, and MIT Places.

Globally consistent camera poses. Complete camera intrinsics.

High-definition images.

3x times big as ImageNet.

Paper

Taskonomy: Disentangling Task Transfer Learning.
Zamir, Sax*, Shen*, Guibas, Malik, Savarese.

CVPR 2018 [Best Paper Award] IJCAI 2019 Inivted [Sister Conference Best Papers Track]

Authors

Amir Zamir

Stanford, UC Berkeley

Alexander (Sasha) Sax

Stanford

William B. Shen

Stanford

Jitendra Malik

UC Berkeley

Leonidas Guibas

Stanford

Silvio Savarese

Stanford

Transfer Learning API

Live Demo of 20 Tasks

Paper

Transfer Visualization

Models

Data

Abstract

Transfer Learning API

Transfer Visualization

Task Bank: A Unified Bank of 25 Pretrained Visual Estimators

Click on each task to see sample results. Try the live demo on your query image. Download pretrained models in the bank.

Uncorrupted version of corrupted image.

Pixel-wise surface normals.

Depth estimation.

Colorizing input grayscale images.

Reshading with new lighting placed at camera location.

Orientation and aspect ratio of cubic room layout.

Relative camera pose with matching optical centers.

Relative camera pose with distinct optical centers.

Three Manhattan-world vanishing points.

Magnitude of 3D principal curvatures.

Segmentation (graph cut approximation) on RGB.

Segmentation (graph cut approximation) on RGB-D-Normals-Curvature image.

3D Keypoint estimation from underlying scene 3D.

Keypoint estimation from RGB-only (texture features).

Edges which occlude parts of the scene.

Edges computed from RGB only (texture edges).

Filling in masked center of image.

Pixel-wise semantic labeling (via knowledge distillation from MS COCO).

1000-way object classification (via knowledge distillation from ImageNet).

Scene Classification (via knowledge distillation from MIT Places).

Putting scrambled image pieces back together.

Odometry (camera poses) given three input images.

Image compression and decompression.

Classifying if centers of two images match or not.

Dataset

4.5 Mil.

600

25

1024

Paper

Authors

Amir Zamir

Alexander (Sasha) Sax

William B. Shen

Jitendra Malik

Leonidas Guibas

Silvio Savarese

Click on each task to see sample results.
Try the live demo on your query image.
Download pretrained models in the bank.