# A New Vision Chip with SPAD Imaging and Spiking Neural Network Processing

Liyuan Liu liuly@semi.ac.cn



Institute of Semiconductors, Chinese Academy of Sciences, Beijing, China

ISSW2024 @ Italy, Trento



#### Background

- Chip Architecture
- Key Techniques
- Results and Comparison

#### Discussion

# Outline

## Background

- Chip Architecture
- Key Techniques
- Results and Comparison

#### Discussion

# Motivation

#### **Edge machine vision applications**

- Agile drone
- Intelligent robots
- Autonomous vehicles
- Requirements
- Versatile (2D/3D/HDR)
- Intelligent
- Energy efficient
- Small size



High dynamic range



**Complex terrain** 



**Obstacles** 



Limited power supply

# **Necessity for 2D/3D Vision**

#### **Edge machine vision applications**

- Agile drone
- Intelligent robots
- Autonomous vehicles
- Requirements
- Versatile (2D/3D/HDR)
- Intelligent
- Energy efficient
- Small size



#### A single chip with 2D/3D vision

# **Necessity for Sensing and In-situ Processing**

#### **Edge machine vision applications**

- Agile drone
- Intelligent robots
- Autonomous vehicles
- Requirements
- Versatile (2D/3D/HDR)
- Intelligent
- Energy efficient
- Small size



A single chip with sensing and intelligent in-situ processing ability

# **Vision Chip Concept**

Vision chip integrates image sensor, memory and vision processor. It can acquire visual information and perform in-situ processing.



Vision chips are a key technology for enabling edge vision and IoT applications

# Challenge

Current vision chips perform visual acquisition, transmission, and processing in the form of multi-bit real-valued data.



- High-speed 2D/3D imaging
- Real-time intelligent processing
- Limitations of power supply

• Large data volume

- Complex computation
- High cost for 2D/3D vision
- High latency and power

A new paradigm for future vision chips is urgently needed as performance gains from the process are not enough to meet demand.

**Challenges** 

# **The Human Visual System**



A vision chip adopts full spiking visual flow to mimic the human visual flow.

# **Proposed Bio-inspired Spiking Vision Chip**



The bio-inspired spiking vision chip integrates a spike-based image sensor and a processor to mimic the human visual system and realize a full spiking visual flow.

# **Our Approach**



- Versatile spike-based Imaging and Processing
- Decrease Data Volume and Computation Load
- Spiking Vision Processor
  Spiking ISP and SNN-based intelligent recognition
- Adaptive Imaging Adjustment

Versatile vision ability with low latency!

# **Advantage of Spiking Vision Chip**

#### **Traditional vision chip**



🙁 large data volume

- **Complex ALU (MAC)**
- **②** large latency and high cost

VS.

SPAD Spiking vision Image Sensor processor

**Spiking vision chip** 



SNN

PE

- &

#### Iow data volume

**Spikes** 

Simplified ALU (ACC)

☺ low latency and compact size



#### Background

## Chip Architecture

#### Key Techniques

#### Results and Comparison

#### Discussion

# **Chip Architecture**



- SPAD image sensor  $\rightarrow$  Naturally generate spike-based 2D/3D imaging data
- Spiking vision processor → Reconfigurable for preprocessor and SNN processor
- Processor-MPU-Configurable SPAD image sensor → System-level feedback adjustment

# **Spiking Visual Flow**



**Bio-inspired full spiking visual flow**  $\rightarrow$  low end-to-end latency  $\rightarrow$  light-adaptation

- SPAD imaging data and spike-based computing → versatile intelligent 2D/3D vision
- Spiking map stream → regular data flow for dynamic reconfigurable design

# **On-chip Feedback Adjustment**



Light change estimation
 Subsampled

8×8 pixel data

 $\rightarrow$  detect CNT

changes

→ Programmable threshold setting

# **Advantage of Spiking Vision Chip**

#### **SPAD-based spiking vision chip**

Feature:

- 1) Low data volume
- 2) Spike-based computing

- 3) Low latency
- 4) Structured spiking map

#### **Advantages:**

- Decrease hardware cost
- **Simplified ALU (ACC)**
- Instant feedback adjustment
- **C** Time-divided multiplexing

reconfigurable design

# **Key Techniques to Spiking Vision Chip**



# Outline

## Background

#### Chip Architecture

## Key Techniques

- SPAD image sensor
- Reconfigurable spiking vision chip
- Spike-based processing algorithm

## Results and Comparison

#### Discussion

# Outline

## Background

#### Chip Architecture

## Key Techniques

- SPAD image sensor
- Reconfigurable spiking vision chip
- Spike-based processing algorithm
- Results and Comparison

## Discussion

# **Configurable Adaptive SPAD Imaging**



Gated pixels

- $\rightarrow$  Configurable exposure time
- Rolling-shutter operation
- $\rightarrow$  Stabilize SPAD array bias

Adaptive

→ Adjust imaging parameters based on visual processing results



- Gated pixel structure
  - External reset
  - Passive quench



- Configurable gating
  - via SEL & RST

# **Configurable Imaging Mode**



- Configurable gating enables
- 1. adaptive 2D imaging
- 2. iToF based 3D imaging
- 3. dim imaging ability
- 4. color imaging w. RGB color filter

# Outline

## Background

#### Chip Architecture

## Key Techniques

- SPAD image sensor
- Reconfigurable spiking vision chip
- Spike-based processing algorithm
- Results and Comparison

## Discussion

# **Reconfigurable Spike-based Processing**



# Spiking Neuron: Integrate-and-Fire (IF) Neuron

#### **Spiking neuron model**



Input/output: spikes, 1 bit

W: synapses weight, 8bit

Vm: membrane potential, 13~16bit

 $\tau$ : firing threshold, 1bit

if  $V_m(t) \ge \tau$ ,  $S_o(t) = 1 \& V_m(t) = 0$ 

#### **Fire-reset operation**



# **Processing Element (PE)**



PE offers IF neuron computing, flexible local data access, and nearby data sharing. 27/47

# **IF Neuron for Preprocessing**

#### IF neuron model with temporal filtering



*f*-function:

- 1) 2D visual signal enhancement e.g.  $f = log_{1-PDE}(\frac{1-R}{1-\Delta t \times DCR})$
- 2) 3D visual reconstruction

Feature of PE for preprocessing:

- 1. Pixel-wise
- 2. Temporal accumulation
  - 3. Flexibly programmable ALU
  - 4. I-F process

# **PE for Preprocessing**



# **IF Neuron for SNN**

IF neuron model with dense synaptic connection



Dense connect  $k \times k \times C_{in} \times C_{out}$ (e.g.  $3 \times 3 \times 16 \times 64$ )

Weight kernel Shared within layer Feature of PE for preprocessing:

- 1. Accelerate synapse integration
- 2. Increase local data reuse
  - 3. Efficient data access
  - 4. I-F process

# **PE for SNN**



# **PE Chain Parallel Computing**



**Nearby data sharing** 



Up to  $7 \times 7$  weight kernel size



**PE-chain column-parallel computing** 



PE-chain length 8, 16, 32, 64, 128, 256

# Outline

## Background

#### Chip Architecture

## Key Techniques

- SPAD image sensor
- Reconfigurable spiking vision chip
- Spike-based processing algorithm
- Results and Comparison

## Discussion

# **2D Visual Signal Enhancement**



#### **2D depth imaging reconstruction**



Temporal accumulation ∑<sub>t</sub>
 Denoise function *f*

$$f(R) = log_{1-PDE}(\frac{1-R}{1-\Delta t \times DCR})$$

Correct the effect of DCR & PDE variation

For weak lighting condition, preprocessing solves signal from noise

# **3D Visual Reconstruction**



#### **3D depth imaging reconstruction**



Depth-solving function:

d

 $a = CNT_0 - CNT_180$  $b = CNT_90 - CNT_270$ 

$$= \frac{c}{8f} \begin{cases} \frac{b}{a+b} & a > 0 \& b > 0, \\ \frac{-a}{b-a} + 1 & a < 0 \& b > 0, \\ \frac{b}{a+b} + 2 & a < 0 \& b < 0, \\ \frac{a}{a-b} + 3 & a > 0 \& b < 0 \end{cases}$$

- Modulate frequency f
- Several exposures for each phase
  - → obtain light intensity (avalanche count CNT)

Solve and encode depth information into rate-coding spike output

# **Spiking Convolutional Neural Network**



- Hierarchical neural network
   Convolutional layer (Conv) pool layer (pool) full connect layer (FC) spike counting layer (SC)
- Training method →Converted from a well-trained real-value CNN with the same network structure

# ■ Network structure2D classificationOutputconv5-12, avg-pool/2, conv5-64,<br/>avg-pool-2, FC-10Confidence of<br/>digits 0-9 (MNIST)3D localizationOutputConv64×1-1Horizontal and<br/>depth position (X, Z)

On-chip deployment
Quantized weights and cut off outlier



# Outline

#### Background

- Chip Architecture
- Key Techniques
- Results and Comparison

#### Discussion

#### **Chip Microphotograph**



|                     | Specifications                     |               |  |  |  |
|---------------------|------------------------------------|---------------|--|--|--|
| Technology          | 180nm CMOS                         |               |  |  |  |
| Clock<br>frequency  | 80 MHz                             |               |  |  |  |
| Supply<br>voltage   | 1.8V (Logic), 11V (VHH), 0.3V (VG) |               |  |  |  |
| SRAM                | 256 kB (Data), 64 kB (Inst)        |               |  |  |  |
| Imaging rate        | 100,000 SMps                       |               |  |  |  |
| PE array            | 256                                |               |  |  |  |
|                     | Preprocessor                       | SNN processor |  |  |  |
| # neurons           | 256                                | 1024          |  |  |  |
| Peak<br>Performance | 20.48 GSOPS                        | 81.9 GSOPS    |  |  |  |

# **Measurement Setup**





Experimental setup: Object for imaging LCD for classification

Different lighting conditions and noise level

is simulated by setting:

- screen brightness of LCD
- screen contrast of LCD
- dataset picture contrast

#### **Test board**

# **Measured Imaging Results**





2D imaging



- Dynamic range 100 dB
- 15.75% PDE @ 503 nm
- 3D depth imaging error 2.7cm
- 2D color vision and dim vision ability
- dim-vision classification @0.02lux



# **MNIST Classification**





#### **Bright vision**

- A 5-layer SCNN
- 99.33% Acc. 300 infer/s @ MNIST

**Dim vision w. preprocessing** 

- Efficiently improve the SNR
- Merely 3.9% Acc. loss @ 20 mlx
- ► ~4× latency improvement

# **Spike-based Imaging Signal Enhancement**



On-chip approximation computation with fixed-point data representation can realize similar improvements on SNR.



#### @0.02lux





w./o Enhancement Float-point



w./o Enhancement Fixed-point approximation computation

# **Obstacle Localization**



# State-of-the-Art Comparison

|                        | ISSCC-2017(Sony)  | JSSC-2019         | ISSCC-2021(Sony)  | Ours               |
|------------------------|-------------------|-------------------|-------------------|--------------------|
| Process                | 90 nm 1P4M/       | 130nm 1P6M/       | 65nm/             | 180nm 1P8M         |
|                        | 40nm 1P7M         | 130nm 1P6M        | 22nm              |                    |
| Integration            | Stacked BSI       | Stacked BSI       | Stacked BSI       | Single chip, Fl    |
| Photoreceptor          | PD                | PD                | PD                | SPAD               |
| Resolution             | 1296×976          | 1024×769          | 4056×3040         | 128×128            |
| Dynamic range          | 80dB              | 54dB              | -                 | 100dB              |
| Frame rate             | 60fps             | 340fps            | 120fps            | 100000 fps         |
| Temporal resolution    | 16.7ms            | 2.9ms             | 8.3ms             | 10µs               |
| Vision mode            | 2D vision         | 2D vision         | 2D vision         | 2D vision          |
|                        |                   |                   |                   | Dim vision         |
|                        |                   |                   |                   | 3D depth vision    |
| Processor architecture | PE array          | PE array          | ISP + CNN DSP     | PE array + MPU     |
| Parallelism            | 1034              | 3072              | 2304              | 1024               |
| Bit-width              | 1/4               | 8                 | 8/16/32           | 1/8/16             |
| On-chip memory         | 168KB             | 171KB             | 9 MB              | 256KB              |
| CV                     | Spatial filtering | Spatial filtering | Signal processing | Temporal filtering |
|                        | Morphology        | Motion detection  |                   |                    |
| NNs                    | N/A               | N/A               | CNN               | SNN                |
| Light-adaptation       | No                | No                | No                | Yes (3.85µs)       |
| Clock frequency        | 108MHz            | 80MHz             | 262.5MHz          | 80MHz              |
| Peak performance       | 140GOPS@4bit      | 61GOPS@8bit       | 1210GOPS@8bit     | 81.9GOPS@1bit      |

# Summary

## **A Vision Chip is proposed:**

- Full spiking visual flow based on SPAD imaging and spikebased computing
- Configurable gated SPAD image sensor
- Reconfigurable spike-based vision processor

#### Measurement results show:

- Versatile visual capabilities (2D/3D/dim vision)
- 99.3% Acc. and 300 infer/s @ MNIST classification
- Merely 3.9% Acc. loss @ 20 mlx
- 1.68 cm obstacle detection error
- 3.85 μs self-adaptation for ambient light changes

# Outline

## Background

- Chip Architecture
- Key Techniques
- Results and Comparison

## Discussion

# **Discussion**

#### **Future vision chip**



# SPAD device array → High resolution → High fill factor

Low-level in-pixel processing circuits → Denoise preprocessing → Extract ROI

High-level intelligent processor →AI-based signal enhancement →More bio-inspired mechanisms

# Thanks! Q&A