用于通用视觉边缘处理的光学超表面
近日,香港中文大学Huang, Chaoran团队研究了用于通用视觉边缘处理的光学超表面。2026年6月17日出版的《自然》杂志发表了这项成果。
大规模人工智能(AI)模型在计算机视觉中取得了显著性能,但需要大量计算资源,限制了其在边缘设备上的部署。光学神经网络(ONN)利用光固有的并行性,有望降低延迟和能耗。然而,当前的光学神经网络在扩展性上存在困难,且局限于简单任务,这是因为使用物理(模拟)系统复制数字模型的精确代数运算面临挑战。
研究组引入了一种新范式,将基于相似性的识别、注意力引导的感知以及细节–上下文融合等核心计算机视觉原理直接嵌入到大规模光学超表面中。通过将光学物理与这些计算机视觉基础原理相统一,研究组开发了一种光子–电子引擎,克服了扩展性和通用性障碍,实现了边缘端高精度、通用的计算机视觉。
由此产生的系统将拥有4100万参数的光学超表面前端与协同设计的、仅有8.7万参数的超高效数字后端相结合,在目标检测、分割、三维重建和视频理解等任务上,性能优于许多拥有数千万参数的数字模型。研究组构建了一个可部署的原型,并展示了在自然场景中的实时边缘视觉处理。这项工作为复杂自然环境中通用视觉任务的实用光学计算开辟了道路,为实现低能耗、低延迟、实时端侧视觉智能提供了新范式。
附:英文原文
Title: Optical metasurfaces for general vision processing on the edge
Author: Peng, Jiayong, Luo, Mingcheng, Han, Yuxi, Wu, Siying, Li, Hongsheng, Shastri, Bhavin J., Shu, Chester, Dou, Qi, Chai, Yang, Huang, Chaoran
Issue&Volume: 2026-06-17
Abstract: Large-scale artificial intelligence (AI) models achieve notable performance in computer vision but require substantial computational resources, limiting their deployment on edge devices1,2. Optical neural networks (ONNs) promise reduced latency and energy consumption by making use of the inherent parallelism of light3. However, present ONNs struggle to scale and are confined to simple tasks, owing to the challenges of replicating exact algebraic operations of digital models using physical (analogue) systems. This work introduces a new paradigm that directly embeds core computer vision principles, including similarity-based recognition, attention-guided perception and detail–context fusion, into a large-scale optical metasurface. By unifying optical physics with these computer vision fundamentals, we develop a photonic–electronic engine that overcomes scalability and generality barriers, enabling high-accuracy, general-purpose computer vision at the edge. The resulting system combines a 41-million-parameter optical metasurface front end with a co-designed, ultraefficient 87,000-parameter digital back end, outperforming many digital models with tens of millions of parameters across object detection, segmentation, 3D reconstruction and video understanding. We build a deployable prototype and demonstrate real-time edge visual processing in natural scenes. This work represents a path towards practical optical computing for general vision tasks in complex natural environments, enabling a new paradigm for low-energy, low-latency, real-time on-device vision intelligence.
DOI: 10.1038/s41586-026-10635-z
Source: https://www.nature.com/articles/s41586-026-10635-z
期刊信息
Nature:《自然》,创刊于1869年。隶属于施普林格·自然出版集团,最新IF:69.504
官方网址:http://www.nature.com/
投稿链接:http://www.nature.com/authors/submit_manuscript.html


