Quickstart
This quickstart is intended for developers who are ready to dive into the code and see an example of how to integrate timm
into their model training workflow.
First, you’ll need to install timm
. For more information on installation, see Installation.
pip install timm
Load a Pretrained Model
Pretrained models can be loaded using create_model().
Here, we load the pretrained mobilenetv3_large_100
model.
>>> import timm
>>> m = timm.create_model('mobilenetv3_large_100', pretrained=True)
>>> m.eval()
List Models with Pretrained Weights
To list models packaged with timm
, you can use list_models(). If you specify pretrained=True
, this function will only return model names that have associated pretrained weights available.
>>> import timm
>>> from pprint import pprint
>>> model_names = timm.list_models(pretrained=True)
>>> pprint(model_names)
[
'adv_inception_v3',
'cspdarknet53',
'cspresnext50',
'densenet121',
'densenet161',
'densenet169',
'densenet201',
'densenetblur121d',
'dla34',
'dla46_c',
]
You can also list models with a specific pattern in their name.
>>> import timm
>>> from pprint import pprint
>>> model_names = timm.list_models('*resne*t*')
>>> pprint(model_names)
[
'cspresnet50',
'cspresnet50d',
'cspresnet50w',
'cspresnext50',
...
]
Fine-Tune a Pretrained Model
You can finetune any of the pre-trained models just by changing the classifier (the last layer).
>>> model = timm.create_model('mobilenetv3_large_100', pretrained=True, num_classes=NUM_FINETUNE_CLASSES)
To fine-tune on your own dataset, you have to write a PyTorch training loop or adapt timm
’s training script to use your dataset.
Use a Pretrained Model for Feature Extraction
Without modifying the network, one can call model.forward_features(input) on any model instead of the usual model(input). This will bypass the head classifier and global pooling for networks.
For a more in depth guide to using timm
for feature extraction, see Feature Extraction.
>>> import timm
>>> import torch
>>> x = torch.randn(1, 3, 224, 224)
>>> model = timm.create_model('mobilenetv3_large_100', pretrained=True)
>>> features = model.forward_features(x)
>>> print(features.shape)
torch.Size([1, 960, 7, 7])
Image Augmentation
To transform images into valid inputs for a model, you can use timm.data.create_transform(), providing the desired input_size
that the model expects.
This will return a generic transform that uses reasonable defaults.
>>> timm.data.create_transform((3, 224, 224))
Compose(
Resize(size=256, interpolation=bilinear, max_size=None, antialias=None)
CenterCrop(size=(224, 224))
ToTensor()
Normalize(mean=tensor([0.4850, 0.4560, 0.4060]), std=tensor([0.2290, 0.2240, 0.2250]))
)
Pretrained models have specific transforms that were applied to images fed into them while training. If you use the wrong transform on your image, the model won’t understand what it’s seeing!
To figure out which transformations were used for a given pretrained model, we can start by taking a look at its pretrained_cfg
>>> model.pretrained_cfg
{'url': 'https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/mobilenetv3_large_100_ra-f55367f5.pth',
'num_classes': 1000,
'input_size': (3, 224, 224),
'pool_size': (7, 7),
'crop_pct': 0.875,
'interpolation': 'bicubic',
'mean': (0.485, 0.456, 0.406),
'std': (0.229, 0.224, 0.225),
'first_conv': 'conv_stem',
'classifier': 'classifier',
'architecture': 'mobilenetv3_large_100'}
We can then resolve only the data related configuration by using timm.data.resolve_data_config().
>>> timm.data.resolve_data_config(model.pretrained_cfg)
{'input_size': (3, 224, 224),
'interpolation': 'bicubic',
'mean': (0.485, 0.456, 0.406),
'std': (0.229, 0.224, 0.225),
'crop_pct': 0.875}
We can pass this data config to timm.data.create_transform() to initialize the model’s associated transform.
>>> data_cfg = timm.data.resolve_data_config(model.pretrained_cfg)
>>> transform = timm.data.create_transform(**data_cfg)
>>> transform
Compose(
Resize(size=256, interpolation=bicubic, max_size=None, antialias=None)
CenterCrop(size=(224, 224))
ToTensor()
Normalize(mean=tensor([0.4850, 0.4560, 0.4060]), std=tensor([0.2290, 0.2240, 0.2250]))
)
Using Pretrained Models for Inference
Here, we will put together the above sections and use a pretrained model for inference.
First we’ll need an image to do inference on. Here we load a picture of a leaf from the web:
>>> import requests
>>> from PIL import Image
>>> from io import BytesIO
>>> url = 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/timm/cat.jpg'
>>> image = Image.open(requests.get(url, stream=True).raw)
>>> image
Here’s the image we loaded:
Now, we’ll create our model and transforms again. This time, we make sure to set our model in evaluation mode.
>>> model = timm.create_model('mobilenetv3_large_100', pretrained=True).eval()
>>> transform = timm.data.create_transform(
**timm.data.resolve_data_config(model.pretrained_cfg)
)
We can prepare this image for the model by passing it to the transform.
>>> image_tensor = transform(image)
>>> image_tensor.shape
torch.Size([3, 224, 224])
Now we can pass that image to the model to get the predictions. We use unsqueeze(0)
in this case, as the model is expecting a batch dimension.
>>> output = model(image_tensor.unsqueeze(0))
>>> output.shape
torch.Size([1, 1000])
To get the predicted probabilities, we apply softmax to the output. This leaves us with a tensor of shape (num_classes,)
.
>>> probabilities = torch.nn.functional.softmax(output[0], dim=0)
>>> probabilities.shape
torch.Size([1000])
Now we’ll find the top 5 predicted class indexes and values using torch.topk
.
>>> values, indices = torch.topk(probabilities, 5)
>>> indices
tensor([281, 282, 285, 673, 670])
If we check the imagenet labels for the top index, we can see what the model predicted…
>>> IMAGENET_1k_URL = 'https://storage.googleapis.com/bit_models/ilsvrc2012_wordnet_lemmas.txt'
>>> IMAGENET_1k_LABELS = requests.get(IMAGENET_1k_URL).text.strip().split('\n')
>>> [{'label': IMAGENET_1k_LABELS[idx], 'value': val.item()} for val, idx in zip(values, indices)]
[{'label': 'tabby, tabby_cat', 'value': 0.5101025700569153},
{'label': 'tiger_cat', 'value': 0.22490699589252472},
{'label': 'Egyptian_cat', 'value': 0.1835290789604187},
{'label': 'mouse, computer_mouse', 'value': 0.006752475164830685},
{'label': 'motor_scooter, scooter', 'value': 0.004942195490002632}]