Model Definition
A Tungsten model is defined as a class with define_model
decorator in tungsten_model.py
:
from typing import List
import torch
from tungstenkit import BaseIO, Image, define_model
class Input(BaseIO):
prompt: str
class Output(BaseIO):
image: Image
@define_model(
input=Input,
output=Output,
gpu=True,
python_packages=["torch", "torchvision"],
batch_size=4,
gpu_mem_gb=16,
)
class TextToImageModel:
def setup(self):
weights = torch.load("./weights.pth")
self.model = load_torch_model(weights)
def predict(self, inputs: List[Input]) -> List[Output]:
input_tensor = preprocess(inputs)
output_tensor = self.model(input_tensor)
outputs = postprocess(output_tensor)
return outputs
Here you can find the elements needed to define a Tungsten model:
- Input/Output classes
setup
methodpredict
method- Runtime configuration (e.g. batch size & device type)
- Dependencies (e.g. Python packages)
Basic Usage
Declare input/output types
Input/output types are declared by passing them as arguments of define_model
decorator:
from typing import List
import torch
from tungstenkit import BaseIO, Image, define_model
class Input(BaseIO):
prompt: str
class Output(BaseIO):
image: Image
@define_model(
input=Input,
output=Output,
gpu=True,
python_packages=["torch", "torchvision"],
batch_size=4,
gpu_mem_gb=16,
)
class TextToImageModel:
def setup(self):
weights = torch.load("./weights.pth")
self.model = load_torch_model(weights)
def predict(self, inputs: List[Input]) -> List[Output]:
input_tensor = preprocess(inputs)
output_tensor = self.model(input_tensor)
outputs = postprocess(output_tensor)
return outputs
See Building Your Model - Input/Output to learn how to define input/output.
Define how to load a model
You can define the setup
method for loading a model:
from typing import List
import torch
from tungstenkit import BaseIO, Image, define_model
class Input(BaseIO):
prompt: str
class Output(BaseIO):
image: Image
@define_model(
input=Input,
output=Output,
gpu=True,
python_packages=["torch", "torchvision"],
batch_size=4,
gpu_mem_gb=16,
)
class TextToImageModel:
def setup(self):
weights = torch.load("./weights.pth")
self.model = load_torch_model(weights)
def predict(self, inputs: List[Input]) -> List[Output]:
input_tensor = preprocess(inputs)
output_tensor = self.model(input_tensor)
outputs = postprocess(output_tensor)
return outputs
As you can see, the weights.pth
file is required to setup.
Before containerizing, you should make sure that the file exists in the build directory.
Warning
Avoid downloading model weights or building binaries within the setup
method. This slows down the model server launch time and prevents the model from running in an isolated network environment. In fact, all models pushed to tungsten.run are unable to access the internet while running for security issues and mitigating speed degradation caused by cold starts. Instead, define the post_build
method executed while containerizing.
Define how a prediction works
The predict
method defines the computation performed at every prediction request.
It takes a non-empty list of Input
objects as an argument, and should return a list of the same number of Output
objects.
from typing import List
import torch
from tungstenkit import BaseIO, Image, define_model
class Input(BaseIO):
prompt: str
class Output(BaseIO):
image: Image
@define_model(
input=Input,
output=Output,
gpu=True,
python_packages=["torch", "torchvision"],
batch_size=4,
gpu_mem_gb=16,
)
class TextToImageModel:
def setup(self):
weights = torch.load("./weights.pth")
self.model = load_torch_model(weights)
def predict(self, inputs: List[Input]) -> List[Output]:
input_tensor = preprocess(inputs)
output_tensor = self.model(input_tensor)
outputs = postprocess(output_tensor)
return outputs
During runtime, Tungstenkit automatically keeps all options same in a batch.
So, you can define the predict
method of a text-to-image generaton model like this:
from typing import List
from tungstenkit import BaseIO, Image, define_model, Option
class Input(BaseIO):
prompt: str
width: int = Option(
choices=[128, 256],
default=256,
)
height: int = Option(
choices=[128, 256],
default=256,
)
class Output(BaseIO):
image: Image
@define_model(input=Input, output=Output)
class TextToImageModel:
def predict(self, inputs: List[Input]) -> List[Output]:
options = inputs[0]
prompts = [inp.prompt for inp in inputs]
pil_images = self.model(prompts, width=options.width, height=options.height)
images = [Image.from_pil_image(pil_image) for pil_image in pil_images]
outputs = [Output(image=image) for image in images]
return outputs
Declare dependencies and runtime configuration
You can declare dependencies and runtime configuration via define_model
decorator:
from typing import List
import torch
from tungstenkit import BaseIO, Image, define_model
class Input(BaseIO):
prompt: str
class Output(BaseIO):
image: Image
@define_model(
input=Input,
output=Output,
gpu=True,
python_packages=["torch", "torchvision"],
batch_size=4,
gpu_mem_gb=16,
)
class TextToImageModel:
def setup(self):
weights = torch.load("./weights.pth")
self.model = load_torch_model(weights)
def predict(self, inputs: List[Input]) -> List[Output]:
input_tensor = preprocess(inputs)
output_tensor = self.model(input_tensor)
outputs = postprocess(output_tensor)
return outputs
The define_model
decorator takes the following keyword-only arguments:
input (BaseIO)
: The input class.output (BaseIO)
: The output class.demo_output (BaseIO | None)
: The demo output class. It is required only when thepredict_demo
method is defined (default:None
).batch_size (int)
: Max batch size for adaptive batching (default:1
).gpu (bool)
: Indicates if the model requires GPUs (default:False
).cuda_version (str | None)
: CUDA version in<major>[.<minor>[.<patch>]]
format. IfNone
(default), the cuda version will be automatically determined as compatible withpython_packages
. Otherwise, fix the CUDA version ascuda_version
.gpu_mem_gb (int)
: Minimum GPU memory size required to run the model (default:16
). This argument will be ignored ifgpu==False
.python_packages (list[str] | None)
: A list of pip requirements in<name>[==<version>]
format. IfNone
(default), no extra Python packages are added.python_version (str | None)
: Python version to use in<major>[.<minor>]
format. IfNone
(default), the python version will be automatically determined as compatible withpython_packages
while prefering the current Python version. Otherwise, fix the Python version aspython_version
.system_packages (list[str] | None)
: A list of system packages that will installed by the system package manager (e.g.apt
). The default value isNone
. This argument will be ignored while using a custom base image, because Tungstenkit cannot decide which package manager to use.mem_gb (int)
: Minimum memory size required to run the model (default:8
).include_files (list[str] | None)
: A list of patterns as in.gitignore
. IfNone
(default), all files in the working directory and its subdirectories are added, which is equivalent to[*]
.exclude_files (list[str] | None)
: A list of patterns as in.gitignore
for matching which files to exclude. IfNone
(default), all hidden files and Python bytecodes are ignored, which is equivalent to[".*/", "__pycache__/", "*.pyc", "*.pyo", "*.pyd"]
.dockerfile_commands (list[str] | None)
: A list of dockerfile commands (default:None
). The commands will be executed before setting up python packages.base_image (str | None)
: Base docker image in<repository>[:<tag>]
format. IfNone
(default), the base image is automatically selected with respect to pip packages, the device type, and the CUDA version. Otherwise, use it as the base image andsystem_packages
will be ignored.
Advanced Usage
Define post-build commands
If you want to download weights or build binaries while containerizing, you can define post_build
method:
from typing import List
from diffusers import DiffusionPipeline
import torch
from tungstenkit import define_model
@define_model(...)
class TextToImageModel:
@staticmethod
def post_build():
"""Download Stable Diffusion weights"""
DiffusionPipeline.download(
"stabilityai/stable-diffusion-2-1", cache_dir="model_cache"
)
def setup(self):
"""Load Stable Diffusion weights"""
self.pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-1",
cache_dir="model_cache",
local_files_only=True,
)
def predict(self, inputs: List[Input]) -> List[Output]:
...
Define how a demo prediction works
You can define an object detection model like this:
from typing import List
from tungstenkit import BaseIO, Image, define_model
class BoundingBox(BaseIO):
xmin: int
xmax: int
ymin: int
ymax: int
class Detection(BaseIO):
label: str
bbox: BoundingBox
class Input(BaseIO):
image: Image
class Output(BaseIO):
detections: List[Detection]
@define_model(input=Input, output=Output)
class ObjectDetectionModel:
def setup(self):
...
def predict(self, inputs: List[Input]) -> List[Output]:
...
But it doesn't contain any visualization, so you'll only get raw JSONs on the demo page.
Then, you could add the visualization result to the output:
from typing import List
from tungstenkit import BaseIO, Image, define_model
class BoundingBox(BaseIO):
xmin: int
xmax: int
ymin: int
ymax: int
class Detection(BaseIO):
label: str
bbox: BoundingBox
class Input(BaseIO):
image: Image
class Output(BaseIO):
detections: List[Detection]
visualized: Image
@define_model(input=Input, output=Output)
class ObjectDetectionModel:
def setup(self):
...
def predict(self, inputs: List[Input]) -> List[Output]:
...
This can improve the demo page, but introduces the visualization overhead of the API.
In such a case, you can separate the method for demo predictions:
from typing import List, Tuple, Dict
from tungstenkit import BaseIO, Image, define_model
class BoundingBox(BaseIO):
xmin: int
xmax: int
ymin: int
ymax: int
class Detection(BaseIO):
label: str
bbox: BoundingBox
class Input(BaseIO):
image: Image
class Output(BaseIO):
detections: List[Detection]
class Visualization(BaseIO):
result: Image
@define_model(
input=Input,
output=Output,
demo_output=Visualization,
)
class ObjectDetectionModel:
def setup(self):
...
def predict(self, inputs: List[Input]) -> List[Output]:
...
def predict_demo(
self,
inputs: List[Input]
) -> Tuple[List[Output], List[Visualization]]:
outputs = self.predict(inputs)
pil_images = visualize(inputs, outputs)
images = [Image.from_pil_image(pil_image) for pil_image in pil_images]
visualizations = [Visualization(result=image) for image in images]
return outputs, visualizations
predict
method is executed when a prediction is requested through the API, but the predict_demo
method is called for a demo request.