Export & Edge Deployment¶
Export your trained PyTorch models to multiple formats and deploy them to edge devices.
Model Export¶
ONNX Export¶
ONNX (Open Neural Network Exchange) is an open format for representing machine learning models. It's widely supported across frameworks and deployment platforms.
from torchloop.exporter import Exporter
# Create exporter with your model
exp = Exporter(model, input_shape=(1, 3, 224, 224))
# Export to ONNX
exp.to_onnx("model.onnx")
When to use ONNX: - Deploying to ONNX Runtime - Cross-framework compatibility needed - Server-side inference - When you need maximum portability
TFLite Export¶
TensorFlow Lite is optimized for mobile and embedded devices. TFLite models are smaller and faster than their full counterparts.
from torchloop.exporter import Exporter
exp = Exporter(model, input_shape=(1, 3, 224, 224))
# Standard export
exp.to_tflite("model.tflite")
# With quantization (recommended for edge devices)
exp.to_tflite("model_quantized.tflite", quantize=True)
Quantization benefits: - 4x smaller model size (float32 → int8) - Faster inference on edge devices - Lower power consumption - Minimal accuracy loss (typically < 1%)
Edge Deployment¶
Resource Estimation¶
Before deploying to resource-constrained devices, estimate your model's requirements:
from torchloop.edge import estimate_model
stats = estimate_model(
model,
input_shape=(1, 3, 224, 224),
target_device="esp32"
)
print(f"Estimated RAM: {stats['estimated_ram_mb']:.2f} MB")
print(f"Estimated Latency: {stats['estimated_latency_ms']:.2f} ms")
print(f"Total Parameters: {stats['total_params']:,}")
print(f"Total FLOPs: {stats['total_flops']:,}")
print(f"Model Size: {stats['model_size_mb']:.2f} MB")
Supported target devices:
| Device | RAM | Typical Use Case |
|---|---|---|
esp32 |
~520 KB | Microcontrollers, IoT sensors |
raspberry_pi |
1-8 GB | Edge computing, home automation |
mobile |
Varies | Smartphone apps (iOS/Android) |
jetson |
4-32 GB | Edge AI, robotics, autonomous systems |
Deploy to Target Device¶
Deploy optimized models directly to your target device:
from torchloop.edge import deploy_to_edge
deploy_to_edge(
model,
target="esp32",
input_shape=(1, 3, 224, 224),
output_path="model.tflite",
quantize=True,
quantize_type="int8",
)
Quantization types:
Complete Workflow Example¶
Here's a full workflow from training to edge deployment:
import torch
import torch.nn as nn
from torchloop import Trainer
from torchloop.exporter import Exporter
from torchloop.edge import estimate_model, deploy_to_edge
# 1. Train your model
model = nn.Sequential(
nn.Conv2d(3, 16, 3, padding=1),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Flatten(),
nn.Linear(16 * 112 * 112, 4)
)
trainer = Trainer(model, optimizer, criterion, device="cuda")
trainer.fit(train_loader, val_loader, epochs=30)
trainer.save("best.pt")
# 2. Estimate resources for target device
stats = estimate_model(model, (1, 3, 224, 224), target_device="esp32")
if stats['estimated_ram_mb'] > 0.5:
print("⚠️ Model may be too large for ESP32")
else:
print("✓ Model fits on ESP32")
# 3. Deploy to edge device
deploy_to_edge(
model,
target="esp32",
input_shape=(1, 3, 224, 224),
output_path="esp32_model.tflite",
quantize=True,
quantize_type="int8",
)
print("✓ Model deployed successfully!")
Export Best Practices¶
Optimization Tips
- Always test quantized models - Verify accuracy on validation set after quantization
- Profile on target device - Actual performance may vary from estimates
- Simplify architecture - Simpler models deploy better to edge devices
- Use standard layers - Some exotic layers may not convert well to TFLite
- Batch size = 1 - Edge devices typically process one sample at a time
Common Issues
- Unsupported operations: Some PyTorch ops don't have TFLite equivalents
- Dynamic shapes: TFLite prefers fixed input shapes
- Custom layers: May need manual conversion or replacement
- Memory constraints: Always check device specs before deploying
API Reference¶
For detailed API documentation, see: