The-State-of-the-Art of CMOS Image Sensors

Ziad Shukri, TechInsights Inc., 1891 Robertson Road, Suite 500, Ottawa, ON K2H 5B7, Canada
Phone: 1-877-826-4447, email: zshukri@techinsights.com

Abstract - CMOS Image Sensors (CIS) have continued to evolve in response to performance requirements of current applications for Smartphone Imaging, Security & Surveillance, Biometrics, Automotive and Depth Sensing and Ranging. In this paper, we present the latest analysis and observed trends on CMOS image sensors in terms of resolution, pixel pitch and silicon thickness. We also present the latest PDAF and CFA observations and discuss pixel-level DBI. Additionally, we present trends on recent Near-Infrared (NIR) sensors and show the current State-of-the-Art of back-surface NIR structures. We also present trends on Time-of-Flight (ToF) sensors for Front- and Back-Illuminated image sensors.

I. INTRODUCTION

Today, CMOS Image Sensors have become ubiquitous in our daily lives, from smartphones to automobiles, security cameras, robotics, and AR/VR entertainment devices. Much of this is driven by strong demand for smart, connected, and autonomous consumer products as we gradually transition into the IoT era. In response, leading image sensor designers, providers and world foundries have continued to advance technological innovations to facilitate greater miniaturization of pixel-pitch down to and beyond 0.7 µm [1], [2] and greater CIS/ISP integration through pixel-level interconnect. Enhanced longer wavelength detection and improved SPAD design have also facilitated 3D-ToF imaging to deliver a broader CIS capability to meet emerging applications.

II. CONFIGURATION & DIE SIZE

For mobile imaging, Stacked, Back-Illuminated image sensors continue to dominate the market. They represented nearly 90% of smartphone imagers analyzed in 2020 (Fig 1). Their use in smartphones is expected to continue to grow over monolithic Back-Illuminated imagers as on-chip image processing becomes vital for improved performance.

In addition, Stacked-Back-Illuminated imagers show increased utilization of the Die surface area for the active array. This trend is driven by the need for increased resolution as smartphone imagers surpassed 100 MP in 2020 and continue to trend upwards. This is shown in Fig. 2 for Stacked Die analyzed since 2013. Here we see the ratio of Active Area to total CIS Die Area has surpassed 80%.

Fig. 1 CIS configuration vs year of analysis for smartphone image sensors. Front-Illuminated (FI), Back-Illuminated (BI) and Back-Illuminated - Stacked.

Fig. 2 Ratio of Active CIS area to total CIS Die area.

III. PIXEL-PITCH & ACTIVE SILICON

Attaining higher resolutions is facilitated by the continued miniaturization of the pixel footprint. In addition, as pixel-pitch is reduced, silicon thickness needs to increase to maintain a good pixel photo-response. Fig. 3 shows the trend for silicon thickness and the ratio of thickness to pixel-pitch, both increasing as pixel size decreases.

The highest aspect ratio observed was on the Samsung GW3 (Fig. 4(a)), a 64 MP resolution, 0.7 µm pixel-pitch image sensor with a 4.1 µm active EPI thickness and a full Front-Deep Trench Isolation (F-DTI). OmniVision’s 0.7 µm OV64B (Fig. 4(b)), with a partial Back-DTI, is observed to have an EPI thickness of 3.0 µm.

Fig. 3 Trend of silicon thickness to pixel pitch ratio.

Fig. 4 SEM Cross sections (a) S5KGW3 (b) OV64B.
IV. PHASE DETECTION AUTO FOCUS

The shrinking pixel size is introducing a challenge to maintaining a high output signal for Phase Detection Auto Focus (PDAF) pixels. Fig. 5 shows PDAF method for smartphone imagers in terms of resolution and pixel-pitch. While Masked PDAF and Dual Photodiode continue to be in use for lower resolution and larger pixel-pitch imagers, On-Chip Lens (OCL) has become the PDAF method of choice as pixel-pitch decreases below about 1.0 µm limit.

a) Masked PDAF

As pixel-pitch decreases, so does the Fill-Factor for Masked pixels and therefore the PDAF signal. The smallest observed Masked PDAF was implemented by Samsung in 2019, on the 0.8 µm pixel, the GM1, and more recently in 2020, on the GD1 (Fig. 6) both Tetracell, and using a Clear channel to enhance PDAF output.

b) On-Chip Lens (OCL)

As OCL-based PDAF pixels do not sacrifice surface area, and resist-based processes are easily scalable, OCL-PDAF is now adopted for the smaller 0.7 µm pixel-pitch generation of Imagers and is expected to continue to be in use as the 0.6 µm pixel generation is realized. Additionally, as auto focus is further refined, the legacy 2x1 OCL structure has now evolved into a 2x2 OCL structure to facilitate PDAF in both X- and Y-directions. Fig 7 shows different 2x2 OCL approaches observed in use. OmniVision employs a large OCL equivalent to 2x2 pixels, recently observed in the 0.7 µm pixel 64 MP OV64B (Fig 7(a)), whereas Samsung utilizes a pair of adjacent 2x1 OCL to achieve a 2x2 effect, as observed last year in the 0.8 µm pixel 108 MP resolution HM1 and HM3, and more recently in the 0.7 µm pixel-pitch the HM2 (Fig 7 (b)). The OCL-PDAF typically replaces the red and blue channels with a green channel to maximize output signal, with a PDAF unit cell density of 32:1 or 36:1.

In contrast, Sony introduced a full array 2x2 OCL approach with the microlens having twice the pixel size over the entire active array. This was first observed in 2020 on the IMX689, a 1.12 µm pixel-pitch 48 MP resolution imager (Fig 7(c)), and more recently on the IMX766 and IMX789. A full array autofocus has no PDAF-dedicated pixels and therefore all pixels can be utilized for image acquisition.

c) Dual Photodiode (DP)

As highlighted in Fig. 8, Dual Photodiode full array PDAF remains a “larger-pixel” PDAF approach due to the in-pixel trench isolation. Sony released its Octa-PD technology in the IMX700, a 1.22 µm pixel-pitch, 50 MP Quad-Bayer CIS with 2 photodiodes per pixel in all color channels. Samsung introduced an improved Dual Photodiode PDAF approach in the GN2, a 1.40 µm pixel-pitch 50 MP imager with slanted green channel in-pixel DFI, facilitating X- and X-direction PDAF. Samsung currently holds the record for the smallest pixel-pitch Dual Photodiode PDAF approach at 1.2 µm pixel-pitch observed in the GN1 (Fig 8(b)).

For small pixel-pitch smartphone imagers, high signal output under low light conditions continues to drive Color Filter Array (CFA) mosaic and pixel-binning strategies. Fig. 9 shows smartphone imager CFA pattern in terms of resolution and pixel-pitch. In 2019, Sony utilized a 4x4 CFA in the IMX608. In 2020, Samsung introduced the 3x3 Nonacell CFA [3] in the 108 MP HM1, HM2, and HM3.

Fig. 7 Observed effective 2x2 OCL PDAF by (a) OmniVision, (b) Samsung, and (c) full array 2x2 OCL PDAF by Sony

Fig. 8 Dual Photodiode, full-array PDAF; (a) Sony IMX700 with Octa PD, and (b) Samsung S5KGN1 Dual photodiode.

V. COLOR FILTER ARRAY

For small pixel-pitch smartphone imagers, high signal output under low light conditions continues to drive Color Filter Array (CFA) mosaic and pixel-binning strategies. Fig. 9 shows smartphone imager CFA pattern in terms of resolution and pixel-pitch. In 2019, Sony utilized a 4x4 CFA in the IMX608. In 2020, Samsung introduced the 3x3 Nonacell CFA [3] in the 108 MP HM1, HM2, and HM3.
Fig. 10 presents smartphone imager CFA-pitch versus pixel-pitch for the mosaics: Bayer, 2x2, 3x3 and 4x4. The largest CFA pitch observed is in the IMX608 at 4.48 μm followed by the OmniVision 4-cell OV12D2Q and the Samsung Tetracell GN2 at 2.8 μm. From 2020 observations, CFA-pitch for smartphone imagers varied significantly, between 1.4 μm and 2.8 μm. As pixel-pitch is reduced further, we may see increased utilization of 3x3 and 4x4 Mosaics.

VI. CHIP STACKING AND PIXEL-LEVEL DBI

With Stacked CIS/ISP Die becoming mainstream, the need for even smaller TSV/DBI interconnect becomes essential. This is to reduce Die footprint but more importantly to facilitate pixel-level interconnect [4]. Fig. 11 shows the trend of DBI pitch for Cu-Cu Hybrid bonding for all stacked imagers analyzed between 2014 and 2021. For the most part, Cu-Cu DBI remains a peripheral (Row/Column) interconnect. The minimum TSV/DBI pitch observed for Row/Column interconnect is 3.1 μm.

![Image](image1.png)

**Fig. 11 Stacked Back-Illuminated Cu-Cu Direct Bond Interconnect**

To date, only three commercially available imagers analyzed utilize pixel-level interconnect. These are the Sony SPAD array (150 x 200) from the Apple 2020 iPad Pro, the Sony SensSWIR IMX990/991 VGA visible/infrared imager [5] and the OmniVision OG01A1B, a 1 MP machine vision imager. The OG01A1B holds the record for the smallest pixel-level DBI pitch at 2.2 μm (Fig. 12 (a)). The Sony SPAD array DBI pitch measures 5.0 μm (Fig. 12(b)).

![Image](image2.png)

**Fig. 12 Pixel Level Cu-Cu Direct Bond Interconnect (a) OmniVision OG01A1B, (b) Sony SPAD**

VII. NEAR-INFRA-RED ENHANCEMENT

Near-Infrared (NIR) enhancement has gained much attention in recent years for security & surveillance, for ranging applications and for machine vision. To enhance Quantum Efficiency in the NIR range, one measure is to increase the CIS active silicon thickness [6] beyond that used for mainstream mobile applications. Fig. 13 shows a trend of Active silicon thickness for Back-Illuminated imagers, highlighting per applications. NIR-enhanced imagers have an EPI thickness between 5.9 and 7.1 μm. This contrasts with values between ~ 3.0 to 4.1 μm typically in use for mainstream CIS.

![Image](image3.png)

**Fig. 13 Silicon Active Thickness Trend**

Another important approach to improve QE is to reduce incident IR back-scattering by facilitating diffraction into the EPI [7]. We have observed both Shallow Trench/Grid and Inverted Pyramid Arrays (IPA) in use. Sony holds the record for the thickest CIS EPI with IPA, at 6.2 μm, and the smallest pixel-pitch (1.12 μm), with an IPA (2x2). OmniVision transitioned from a shallow trench to IPA and successfully scaled IPA for higher resolution on recent imager, the 8 MP OS08A20. ON Semiconductor and SmartSens both demonstrated IPA use as in the ARX3A0 and the SC5035 respectively (Fig. 14).

![Image](image4.png)

**Fig. 14 Thickness trend of Back-Illuminated Image sensors with Back-surface treatment (IPA or shallow trench)**

Examples of back-surface structures are shown in Fig. 15. Here we see some variation in the number and depth of IPA, as they are optimized for specific imaging wavelengths. For Samsung, a shallow trench grid was observed in use on a 7.0 μm pixel-pitch, i-ToF image sensor.
VIII. TIME-OF-FLIGHT

In 2020, we saw continued development on Time-of-Flight (ToF) 3D image sensors [8], [9]. We witnessed the first use of a d-ToF/LiDAR Stacked Back-Illuminated imager, the Sony SPAD array, in the rear camera module of the iPad Pro and the iPhone 12 Pro/Max. Fig. 16 provides a trend of ToF devices analyzed in terms of pixel count, TOF method, and sensor configuration. Here we see a continuation of VGA-type imagers for i-ToF from Sony, Samsung and recently Gpixel. A shift towards Back-Illumination takes advantage of a higher QE at the 850 nm to 940 nm NIR-wavelengths in use. For proximity/gesture control, smaller pixel count sensors continue to dominate. STMicroelectronics recently introduced the VL53L5, a 64-channel Front-illuminated d-ToF SPAD array with multi-object tracking capability for front-facing mobile applications. With emphasis placed on face ID/biometrics, it is foreseeable that higher resolution, possibly VGA-grade, front-facing ToF may see increased adoption.

Looking at pixel-pitch for ToF, Fig. 17 shows that smaller pixels are increasingly in use as the trend moves towards Back-Illuminated imagers. To-date, the smallest pixel-pitch and highest resolution has been observed on the Microsoft Azure Kinect, a 3.5 μm pixel-pitch, 1 MP i-ToF imager.

IX. TRANSISTORS PER PIXEL

As a final note, we look at pixel complexity per application. Fig.18 shows the trend of number of effective transistors per pixel (Teff) over the past decade. CIS for Mobile applications typically have shared pixels so that Teff is reduced. In contrast, i-ToF and particularly event-based image sensors, often utilize non-shared pixels with higher transistor count. Examples being the 32 T Samsung i-ToF 33D, with a 4-Tap pixel design for depth resolution and the 36 T Samsung 231YX Dynamic Vision sensor. The highest observed transistor count was on the recently analyzed Sony/Prophesee event-based imager at 52 T. Event-based imagers involve more pixel complexity as they incorporate pixel-level functionality not in use in conventional CIS, such as measuring the Log of the signal intensity and the use of in-pixel time stamp. Accordingly, event-based imagers would benefit from chip stacking and pixel-level interconnect, as they continue to develop and find greater use in automotive and machine vision, among other applications.

REFERENCES