Hive Autonomy · Product Vision

Building a world with unlimited access to skilled industrial labor.

01 / 22

F fullscreen · ← → navigate

What it is

A physical AI worker.

A real, physical agent powered by AI — deployed in the real world.

Sees through cameras, lidar, and sensors.

Understands goals and instructions in natural language.

Acts through direct machine control — driving, lifting, navigating, adapting.

02 / 22

The premise

The brain is the model.
The body is the machine.

Not a robot we ship. Not a retrofit on a machine. A single intelligence layer that operates the machine — whatever shape that machine takes.

03 / 22

The category shift

Everyone else automates a task.
Hive runs a model.

Industrial autonomy has had two generations in fifty years. The third is a categorical shift — one model, every machine, every task.

Generation 01

1960s →

Task automation

Fixed programs

Conveyors, AGVs, scripted hydraulic cycles. One route, one job. Change the layout, re-engineer the system.

Generation 02

2010s →

Scripted autonomy

SLAM & rules

Mine haulers, AMRs, SLAM-based navigation. One task-specific script per operation. New task = new engineering cycle.

Generation 03 · Frontier

2024 →

Physical AI

One model, every machine

VLA foundation models. Vision in, language in, action out. Same intelligence across machines and tasks.

04 / 22

Where we are · today

Supervised physical AI, live in industrial operations.

Every live deployment is also a data source — every decision the operator reviews becomes training data for the next generation of the model.

Bulk handling

Yara · Herøya

Fertilizer loading, 18+ months in live industrial environment.

Hazard ops

Presis · Vikafjellet

Remote snow clearing in avalanche zones. Operator never on mountain.

Quarry load-out

Hamar Pukk · Nordic

Autonomous wheel loader feeding aggregate trucks continuously.

5G teleoperation

Veidekke · Telia

Europe's first industrial 5G pilot, alongside Boston Dynamics.

05 / 22

The operating model

Humans in the loop.
Machines at the edge.

One operations center. Trained remote operators. Every machine on Hive's platform has a human supervising — and every supervision is a labeled training signal.

When the model is confident, it runs. When it hesitates, the operator takes over. When the operator acts, the model learns.

There is no version of physical AI where the human disappears on day one. Our bet is that the human disappears slowly, by design — one task, one confidence threshold, one deployment at a time.

06 / 22

Why this is hard

The sim-to-real gap closed in 2024.

Physical AI was a research agenda for twenty years. Three things changed in rapid succession.

01 · Architecture

Transformers learned to act. Vision-Language-Action (VLA) models — RT-2, OpenVLA, π0 — proved that the same architecture that writes text can also control a robot body. Language becomes the interface to action.

02 · Data

Pretraining transfers to robots. Vision and language models pretrained on web-scale data give robotic policies a world-model they could never learn from robot data alone. You no longer train from zero.

03 · Policy

Diffusion models took over action generation. Diffusion policies (Chi et al., Toyota Research, 2023) produce smooth, multi-modal action trajectories — a step change over behavior cloning.

04 · Hardware

Edge compute caught up. Sensors and inference hardware dropped ~10× in cost since 2020. A full perception + control stack runs on a machine-mounted compute cabinet.

07 / 22

The model class

VLA: Vision · Language · Action.

A single neural network that takes what the machine sees, what the operator said, and what the machine is doing — and produces the next action.

Input

Vision

Camera streams
Lidar point clouds
Depth maps

+

Input

Language

"Pick the pallet and move it to the loading bay."

→

Output

Action

Continuous control tokens
Drive · lift · steer
End-effector poses

The first VLAs — RT-2 (Google DeepMind, 2023), OpenVLA (Stanford, 2024), π0 (Physical Intelligence, 2024) — showed the paradigm works on robot arms. Hive is adapting the paradigm to industrial machinery. Same class of model. Bigger bodies.

08 / 22

What the model sees

Four modalities, one shared embedding space.

Vision. Multi-camera RGB. Front, side, rear, and task-focused. Stereo for depth.

Lidar. 3D point clouds for spatial awareness in dust, fog, darkness.

Proprioception. Joint positions, hydraulic pressures, wheel velocity, load sensors — the machine's sense of its own body.

Language. Natural-language task goals from the operator. No command DSL. No scripting language.

09 / 22

Action generation

Continuous control is a diffusion problem.

Robot arms, wheel loaders, and forklifts don't take discrete actions. They operate on continuous, multi-modal trajectories.

Why diffusion: behavior cloning collapses to the mean when multiple actions are valid. Diffusion policies sample from the full action distribution — preserving that sometimes you swing left, sometimes right is the correct behavior.

What it looks like in production: Muninn iteratively denoises an action trajectory conditioned on the current state and Huginn's understanding. The same family of models that generates images — now generating machine motion.

Trajectory

Horizons of ~16–64 timesteps ahead (1-3 seconds) — the machine plans its next few moves, executes, re-plans.

Multi-modal

Captures the fact that two skilled operators will execute the same task with different-but-valid trajectories. The model keeps that nuance.

Smoothness

Diffusion-generated control is continuous and physically plausible — no jittery "policy oscillation" that behavior cloning suffers from.

10 / 22

Embodiment transfer

A robot arm shares geometry with an excavator.

Physical AI generalizes across bodies when the underlying control primitives transfer. Degrees of freedom are the unit of transfer.

Robot arm

6 DOF · Shoulder, elbow, wrist, yaw, pitch, roll. The canonical manipulation platform. Cheap, fast, reproducible — ideal for data collection.

Excavator

6 DOF · Swing, boom, stick, bucket, plus two base tracks. Geometric twin of the robot arm. The control problem is the same control problem.

Wheel loader

5 DOF · Drive, steering, boom, bucket tilt, articulation. Slightly simpler than an excavator. Easier policy transfer.

Forklift

5 DOF · Drive, steering, mast lift, tilt, fork spacing. Intralogistics body. Shares control primitives with wheel loaders.

Terminal tractor

3 DOF · Drive, steering, hitch. The "easy" body — navigation-heavy, minimal manipulation.

11 / 22

The data engine

Deployment is data.
Data is the next model.

01

Machines deployed under supervised physical AI, running live in real industrial environments.

02

Every operator intervention is a labeled trajectory. When the operator takes over, the system captures state, goal, and the correct action — free-of-charge training data.

03

ODIN retrains on the growing corpus. Each training run expands the distribution of tasks, environments, and machines the model handles.

04

The policy improves. Confidence thresholds rise. Operator interventions drop. Operator-to-machine ratio scales. More deployments follow. Loop.

12 / 22

Part Two

ODIN.

The foundation model for industrial physical AI.

13 / 22

ODIN · overview

One model. Every industrial machine.

ODIN is trained on real-world industrial operations. Generalizes across machine types. Deployed machine-by-machine as capability matures.

Input

Industry datasets

Forklifts · Wheel loaders
Reach stackers · Excavators
Haulers · Terminal tractors

→

Foundation model

ODIN

Hive foundation model
for industrial physical AI

→

Output

Autonomous machines

One model, every machine
Light tasks · heavy tasks
Cross-task generalization

14 / 22

Huginn · perception + understanding

Huginn · thought

Huginn

Sees the world. Understands the goal.

Vision Language Fusion

From raw pixels to a grounded goal.

Huginn is the sensory and reasoning half of ODIN. It takes multi-camera images, lidar, proprioception, and the operator's natural-language instruction — and produces a single grounded representation of what needs to happen next.

Built on a vision-language backbone pretrained on web-scale image-text data. Fine-tuned on industrial imagery: construction sites, quarries, warehouses, terminals.

Pretrained on web-scale VLM data → fine-tuned on industrial operations → distilled for edge inference. The same kind of model that describes an image can now describe a task in progress.

15 / 22

Muninn · action + memory

From grounded goal to machine control.

Muninn is the motor half of ODIN. It takes Huginn's representation of the task, plus the current machine state, and generates the next few seconds of continuous control — via diffusion.

Memory lives here too: short-term operational context (what the operator said twenty seconds ago, what just happened on the site) conditions every trajectory the policy samples.

Diffusion policy architecture. Samples continuous action trajectories conditioned on Huginn's embedding. Multi-modal by construction — preserves that many trajectories are valid for the same goal.

Muninn · memory

Muninn

Acts. Remembers. Samples smooth trajectories.

Policy Diffusion Execution

16 / 22

ODIN · composition

Two models. One worker.

Text input

"Please pick the pallet and move it to the loading bay."

Machine state

Camera feeds · lidar · joint positions · velocity

Huginn · thought

Perception & Understanding

Sees and understands

Vision + Language
+ Fusion

Muninn · memory

Action & Memory

Acts and remembers

Policy + Diffusion
+ Execution

Machine control

Continuous trajectories — drive, lift, navigate, adapt.

Named after Odin's two ravens in Norse mythology — Huginn (thought) and Muninn (memory) — who fly across the world each day and return with what must be remembered. Hive is a Norwegian company.

17 / 22

The lab · Kristiansand · 2024

We started with a robot arm.

Six degrees of freedom, hundreds of pick-and-place trajectories a day, operator-mounted teleop. Every trajectory — what the operator did, what the cameras saw, what the joints measured — was labeled training data for the first ODIN checkpoint.

18 / 22

In production · 2026

Same model class. Industrial body.

Toyota forklift running under ODIN in VLA mode. Camera in. Natural-language goal in. Machine control out. No per-task script. The transfer from robot arm to industrial machine is the thesis that made Hive possible.

19 / 22

Generalization

The model generalizes. The verticals open in sequence.

Once a foundation model handles one class of industrial body, the marginal cost of adding the next class collapses. Every vertical below is a body — not a new model.

Vertical 01 · Live

Heavy Machinery

Wheel loaders, excavators, haulers. Quarries, construction, fertilizer facilities, road maintenance.

Vertical 02 · Live pilots

Intralogistics

Forklifts, reach stackers, terminal tractors. Warehouses, ports, distribution centers.

Vertical 03 · Horizon

Clean industry

Food processing, pharma, nuclear decommissioning — environments where human presence is risky or prohibited.

Vertical 04 · Horizon

Defense & dual-use

Autonomous logistics in contested environments. Machines that operate where operators shouldn't be.

Vertical 05 · Horizon

Humanoid robotics

When the body catches up to the brain. The same model class, a different mechanical substrate.

Vertical 06 · +

Wherever operator cost dominates TCO

Physical AI becomes viable wherever the hourly cost of a human operator exceeds the amortized cost of supervised autonomy.

20 / 22

The horizon

Fully autonomous industrial operations.
Supervised from one operations center.

Today, one operator supervises two to three machines.

As ODIN improves — each deployment feeding the next model — that ratio scales: one operator to four, to six, to dozens of machines across sites.

Unlimited access to skilled industrial labor. Decoupled from the human hour.

21 / 22

The decade of physical AI

This is only the start.

Company Hive Autonomy AS · Kristiansand, Norway Web hiveautonomy.no

22 / 22

Building a world with unlimited access to skilled industrial labor.

A physical AI worker.

Supervised physical AI, live in industrial operations.

Humans in the loop.Machines at the edge.

The sim-to-real gap closed in 2024.

VLA: Vision · Language · Action.

Four modalities, one shared embedding space.

Continuous control is a diffusion problem.

A robot arm shares geometry with an excavator.

Deployment is data.Data is the next model.

One model. Every industrial machine.

From raw pixels to a grounded goal.

From grounded goal to machine control.

Two models. One worker.

We started with a robot arm.

Same model class. Industrial body.

The model generalizes. The verticals open in sequence.

Fully autonomous industrial operations.Supervised from one operations center.

Humans in the loop.
Machines at the edge.

Deployment is data.
Data is the next model.

Fully autonomous industrial operations.
Supervised from one operations center.