Confidential · Hive Autonomy · Product Vision
Hive Autonomy · Product Vision

Building a world with unlimited access to skilled industrial labor.

01 / 22
What it is

A physical AI worker.

A real, physical agent powered by AI — deployed in the real world.

Sees through cameras, lidar, and sensors.

Understands goals and instructions in natural language.

Acts through direct machine control — driving, lifting, navigating, adapting.

02 / 22
The premise
The brain is the model.
The body is the machine.
Not a robot we ship. Not a retrofit on a machine. A single intelligence layer that operates the machine — whatever shape that machine takes.
03 / 22
The category shift
Everyone else automates a task.
Hive runs a model.
Industrial autonomy has had two generations in fifty years. The third is a categorical shift — one model, every machine, every task.
Generation 01
1960s →
Task automation
Fixed programs
Conveyors, AGVs, scripted hydraulic cycles. One route, one job. Change the layout, re-engineer the system.
Generation 02
2010s →
Scripted autonomy
SLAM & rules
Mine haulers, AMRs, SLAM-based navigation. One task-specific script per operation. New task = new engineering cycle.
Generation 03 · Frontier
2024 →
Physical AI
One model, every machine
VLA foundation models. Vision in, language in, action out. Same intelligence across machines and tasks.
04 / 22
Where we are · today

Supervised physical AI, live in industrial operations.

Every live deployment is also a data source — every decision the operator reviews becomes training data for the next generation of the model.
Bulk handling
Yara · Herøya
Fertilizer loading, 18+ months in live industrial environment.
Hazard ops
Presis · Vikafjellet
Remote snow clearing in avalanche zones. Operator never on mountain.
Quarry load-out
Hamar Pukk · Nordic
Autonomous wheel loader feeding aggregate trucks continuously.
5G teleoperation
Veidekke · Telia
Europe's first industrial 5G pilot, alongside Boston Dynamics.
05 / 22
The operating model

Humans in the loop.
Machines at the edge.

One operations center. Trained remote operators. Every machine on Hive's platform has a human supervising — and every supervision is a labeled training signal.

When the model is confident, it runs. When it hesitates, the operator takes over. When the operator acts, the model learns.

There is no version of physical AI where the human disappears on day one. Our bet is that the human disappears slowly, by design — one task, one confidence threshold, one deployment at a time.

06 / 22
Why this is hard

The sim-to-real gap closed in 2024.

Physical AI was a research agenda for twenty years. Three things changed in rapid succession.
01 · Architecture
Transformers learned to act. Vision-Language-Action (VLA) models — RT-2, OpenVLA, π0 — proved that the same architecture that writes text can also control a robot body. Language becomes the interface to action.
02 · Data
Pretraining transfers to robots. Vision and language models pretrained on web-scale data give robotic policies a world-model they could never learn from robot data alone. You no longer train from zero.
03 · Policy
Diffusion models took over action generation. Diffusion policies (Chi et al., Toyota Research, 2023) produce smooth, multi-modal action trajectories — a step change over behavior cloning.
04 · Hardware
Edge compute caught up. Sensors and inference hardware dropped ~10× in cost since 2020. A full perception + control stack runs on a machine-mounted compute cabinet.
07 / 22
The model class

VLA: Vision · Language · Action.

A single neural network that takes what the machine sees, what the operator said, and what the machine is doing — and produces the next action.
Input
Vision
Camera streams
Lidar point clouds
Depth maps
+
Input
Language
"Pick the pallet and move it to the loading bay."
Output
Action
Continuous control tokens
Drive · lift · steer
End-effector poses

The first VLAs — RT-2 (Google DeepMind, 2023), OpenVLA (Stanford, 2024), π0 (Physical Intelligence, 2024) — showed the paradigm works on robot arms. Hive is adapting the paradigm to industrial machinery. Same class of model. Bigger bodies.

08 / 22
What the model sees

Four modalities, one shared embedding space.

Vision. Multi-camera RGB. Front, side, rear, and task-focused. Stereo for depth.

Lidar. 3D point clouds for spatial awareness in dust, fog, darkness.

Proprioception. Joint positions, hydraulic pressures, wheel velocity, load sensors — the machine's sense of its own body.

Language. Natural-language task goals from the operator. No command DSL. No scripting language.

09 / 22
Action generation

Continuous control is a diffusion problem.

Robot arms, wheel loaders, and forklifts don't take discrete actions. They operate on continuous, multi-modal trajectories.

Why diffusion: behavior cloning collapses to the mean when multiple actions are valid. Diffusion policies sample from the full action distribution — preserving that sometimes you swing left, sometimes right is the correct behavior.

What it looks like in production: Muninn iteratively denoises an action trajectory conditioned on the current state and Huginn's understanding. The same family of models that generates images — now generating machine motion.

Trajectory
Horizons of ~16–64 timesteps ahead (1-3 seconds) — the machine plans its next few moves, executes, re-plans.
Multi-modal
Captures the fact that two skilled operators will execute the same task with different-but-valid trajectories. The model keeps that nuance.
Smoothness
Diffusion-generated control is continuous and physically plausible — no jittery "policy oscillation" that behavior cloning suffers from.
10 / 22
Embodiment transfer

A robot arm shares geometry with an excavator.

Physical AI generalizes across bodies when the underlying control primitives transfer. Degrees of freedom are the unit of transfer.
Robot arm
6 DOF · Shoulder, elbow, wrist, yaw, pitch, roll. The canonical manipulation platform. Cheap, fast, reproducible — ideal for data collection.
Excavator
6 DOF · Swing, boom, stick, bucket, plus two base tracks. Geometric twin of the robot arm. The control problem is the same control problem.
Wheel loader
5 DOF · Drive, steering, boom, bucket tilt, articulation. Slightly simpler than an excavator. Easier policy transfer.
Forklift
5 DOF · Drive, steering, mast lift, tilt, fork spacing. Intralogistics body. Shares control primitives with wheel loaders.
Terminal tractor
3 DOF · Drive, steering, hitch. The "easy" body — navigation-heavy, minimal manipulation.
11 / 22
The data engine

Deployment is data.
Data is the next model.

Deploy machines Supervise + label Train ODIN Better policy
01
Machines deployed under supervised physical AI, running live in real industrial environments.
02
Every operator intervention is a labeled trajectory. When the operator takes over, the system captures state, goal, and the correct action — free-of-charge training data.
03
ODIN retrains on the growing corpus. Each training run expands the distribution of tasks, environments, and machines the model handles.
04
The policy improves. Confidence thresholds rise. Operator interventions drop. Operator-to-machine ratio scales. More deployments follow. Loop.
12 / 22
Part Two
ODIN.
The foundation model for industrial physical AI.
13 / 22
ODIN · overview

One model. Every industrial machine.

ODIN is trained on real-world industrial operations. Generalizes across machine types. Deployed machine-by-machine as capability matures.
Input
Industry datasets
Forklifts · Wheel loaders
Reach stackers · Excavators
Haulers · Terminal tractors
Foundation model
ODIN
Hive foundation model
for industrial physical AI
Output
Autonomous machines
One model, every machine
Light tasks · heavy tasks
Cross-task generalization
14 / 22
Huginn · perception + understanding
Huginn · thought
Huginn
Sees the world. Understands the goal.
Vision Language Fusion

From raw pixels to a grounded goal.

Huginn is the sensory and reasoning half of ODIN. It takes multi-camera images, lidar, proprioception, and the operator's natural-language instruction — and produces a single grounded representation of what needs to happen next.

Built on a vision-language backbone pretrained on web-scale image-text data. Fine-tuned on industrial imagery: construction sites, quarries, warehouses, terminals.

Pretrained on web-scale VLM data → fine-tuned on industrial operations → distilled for edge inference. The same kind of model that describes an image can now describe a task in progress.
15 / 22
Muninn · action + memory

From grounded goal to machine control.

Muninn is the motor half of ODIN. It takes Huginn's representation of the task, plus the current machine state, and generates the next few seconds of continuous control — via diffusion.

Memory lives here too: short-term operational context (what the operator said twenty seconds ago, what just happened on the site) conditions every trajectory the policy samples.

Diffusion policy architecture. Samples continuous action trajectories conditioned on Huginn's embedding. Multi-modal by construction — preserves that many trajectories are valid for the same goal.
Muninn · memory
Muninn
Acts. Remembers. Samples smooth trajectories.
Policy Diffusion Execution
16 / 22
ODIN · composition

Two models. One worker.

Text input
"Please pick the pallet and move it to the loading bay."
Machine state
Camera feeds · lidar · joint positions · velocity
Huginn · thought
Perception & Understanding
Sees and understands
Vision + Language
+ Fusion
Muninn · memory
Action & Memory
Acts and remembers
Policy + Diffusion
+ Execution
Machine control
Continuous trajectories — drive, lift, navigate, adapt.
Named after Odin's two ravens in Norse mythology — Huginn (thought) and Muninn (memory) — who fly across the world each day and return with what must be remembered. Hive is a Norwegian company.
17 / 22
The lab · Kristiansand · 2024

We started with a robot arm.

Six degrees of freedom, hundreds of pick-and-place trajectories a day, operator-mounted teleop. Every trajectory — what the operator did, what the cameras saw, what the joints measured — was labeled training data for the first ODIN checkpoint.

18 / 22
In production · 2026

Same model class. Industrial body.

Toyota forklift running under ODIN in VLA mode. Camera in. Natural-language goal in. Machine control out. No per-task script. The transfer from robot arm to industrial machine is the thesis that made Hive possible.

19 / 22
Generalization

The model generalizes. The verticals open in sequence.

Once a foundation model handles one class of industrial body, the marginal cost of adding the next class collapses. Every vertical below is a body — not a new model.
Vertical 01 · Live
Heavy Machinery
Wheel loaders, excavators, haulers. Quarries, construction, fertilizer facilities, road maintenance.
Vertical 02 · Live pilots
Intralogistics
Forklifts, reach stackers, terminal tractors. Warehouses, ports, distribution centers.
Vertical 03 · Horizon
Clean industry
Food processing, pharma, nuclear decommissioning — environments where human presence is risky or prohibited.
Vertical 04 · Horizon
Defense & dual-use
Autonomous logistics in contested environments. Machines that operate where operators shouldn't be.
Vertical 05 · Horizon
Humanoid robotics
When the body catches up to the brain. The same model class, a different mechanical substrate.
Vertical 06 · +
Wherever operator cost dominates TCO
Physical AI becomes viable wherever the hourly cost of a human operator exceeds the amortized cost of supervised autonomy.
20 / 22
The horizon

Fully autonomous industrial operations.
Supervised from one operations center.

Today, one operator supervises two to three machines.

As ODIN improves — each deployment feeding the next model — that ratio scales: one operator to four, to six, to dozens of machines across sites.

Unlimited access to skilled industrial labor. Decoupled from the human hour.

21 / 22
The decade of physical AI
This is only the start.
Company Hive Autonomy AS · Kristiansand, Norway Web hiveautonomy.no
22 / 22