Object detection β identifying and localising objects in images in real time β is a foundational capability for robot perception, and the choice of detection model determines whether a robot can distinguish, grasp, and manipulate the objects in its environment reliably. In 2026, YOLO (particularly YOLOv9/v10), DETR (Detection Transformer), and SAM 2 (Segment Anything Model 2) represent three different approaches with different performance/accuracy/versatility trade-offs. This guide covers the architectures, enterprise selection criteria, and ROS 2 integration patterns for production robot perception systems.
Model Architecture Comparison
| Model | Architecture | Speed | Accuracy | Best For |
|---|---|---|---|---|
| YOLOv10 | CNN β single-stage detection, NMS-free | Very fast β 2β5ms on A100 | Good β 52β54% mAP COCO | Real-time control; edge deployment; production AMR |
| YOLOv9 | CNN β GELAN architecture with PGI | Fast β 5β10ms on A100 | Better β 55.6% mAP COCO | High-accuracy real-time detection; bin picking |
| DETR (RT-DETR) | Transformer encoder + decoder, queries | Medium β 10β20ms on A100 | Best β 57β60% mAP COCO | Complex scenes; high accuracy priority; server-side |
| SAM 2 | Mask-generating foundation model | Slower β 25β50ms per frame | Best segmentation quality | Precise manipulation; novel objects; instance segmentation |
YOLO vs DETR: The Production Decision
- Inference must run at 30Hz+ for real-time robot control
- Deployment on edge hardware (Jetson Orin, Xavier)
- Well-defined object classes with available training data
- Production AMR navigation and obstacle avoidance
- Highest accuracy is priority over pure speed
- Complex overlapping objects or crowded scenes
- Server-side inference with GPU available (not edge)
- Training data is limited β transformer generalises better
- Precise instance segmentation needed (not just bounding box)
- Novel object categories without labelled training data
- Zero-shot detection β new objects that weren't in training
- Grasp point computation requiring precise object boundary
- YOLO for real-time detection + SAM 2 for precise segmentation of detected objects
- YOLO detects at 30Hz; SAM runs on highest-priority detected object at 5Hz
- Best of both worlds: real-time awareness + precise grasp planning
ROS 2 Integration
Use the yolov10_ros package (community) or wrap the Ultralytics Python API in a ROS 2 node. Subscribe to sensor_msgs/Image; run inference; publish vision_msgs/Detection2DArray. TensorRT export: model.export(format='engine') β 3β5Γ throughput improvement over PyTorch on NVIDIA hardware. Deploy the ROS 2 node on Jetson AGX Orin for edge inference. Connect to Nav2 costmap via a detection-to-obstacle converter node for obstacle avoidance integration with our ROS 2 development support.
Our ML development and software development teams design and deploy production robot perception systems using YOLO, DETR, and SAM 2 with ROS 2 integration. Book a free advisory session.