# A 316MP, 120FPS, High Dynamic Range CMOS Image Sensor for Next Generation Immersive Displays

Abhinav Agarwal<sup>1</sup>, Jatin Hansrani<sup>1</sup>, Sam Bagwell<sup>1</sup>, Oleksandr Rytov<sup>1</sup>, Varun Shah<sup>1</sup>, Kai Ling Ong<sup>1</sup>, Daniel Van Blerkom<sup>1</sup>, Jonathan Bergey<sup>1</sup>, Neil Kumar<sup>1</sup>, Tim Lu<sup>1</sup>, Deanan DaSilva<sup>2</sup>, Michael Graae<sup>2</sup>, and David Dibble<sup>2</sup>

> <sup>1</sup>Forza Silicon (Ametek Inc.), Pasadena, California, USA <sup>2</sup>Madison Square Garden Entertainment, New York City, New York, USA Corresponding Author: abhinav.agarwal@ametek.com

*Abstract*—We present a 2D-stitched, 316MP, 120FPS, high dynamic range CMOS image sensor with 92 CML output ports operating at a cumulative date rate of 515Gbit/s. The total die size is  $9.92 \text{cm} \times 8.31 \text{cm}$  and the chip is fabricated in a 4 metal BSI process with an overall power consumption of 23W. A 4.3  $\mu$ m dual-gain pixel has a high and low conversion gain full well of 6600e- and 41000e- respectively with a total high gain temporal noise of 1.8e- achieving a composite dynamic range of 87dB.

#### I. INTRODUCTION

Large format next generation immersive displays provide challenging requirements for video capture: the combination of size and resolution of the display that necessitates the detailed image resolution also clearly exposes any deficiencies in the image. This requires a sensor that will create very high resolution imagery while maintaining image quality–low noise, high dynamic range, and minimal shutter/image artifacts.

In this paper, we present a very large 2D-stitched 316MP CMOS image sensor capturing video at  $18K \times 18K$  resolution. This rolling shutter sensor operates in either a high frame rate single-gain readout mode or a reduced frame rate HDR mode. The HDR mode leverages the dual-gain capability of the pixel to allow extended dynamic range within a single exposure, mitigating motion artifacts that might appear in other HDR approaches.

#### II. READOUT ARCHITECTURE

The sensor stitch plan (Figure 1) has a total of 8 vertical and 13 horizontal stitch lines. The top and bottom halves of the pixel array are read out independently, with each half sampling 6 pixel rows at a time. To meet the high frame rate requirement, the sensor timing is pipelined as much as possible with overlapping pixel sampling, ADC conversion and digital readout phases. Figure 5 shows a detailed sensor block diagram with more details of the sensor partitioning.

The sensor readout is divided into repeating units. Stitch blocks 'B' and 'H' consist of six readout units each whereas stitch blocks 'A','C','G','I' contain two readout units each (Figure 5). Every readout unit is responsible for sampling [400 pixel columns  $\times$  6 pixel rows] per 'row time'. Each pixel column feeds into a set of 12 single slope ADCs (6 rows \* ping/pong), so there are a total of 441,600 ADCs in this design. To fit in the  $4.3\mu$ m pixel pitch, each pixel column has 4 vertically stacked rows of 3 ADCs arrayed at a  $1.4 \mu m$  pitch.

## III. OPERATING MODES

There are two modes of operation in this sensor, viz single gain and dual gain (HDR) mode. Die size limitations constrained the height of the readout circuitry, making it necessary



Fig. 1. Sensor on 12-inch wafer (4 die per wafer), die photo and stitch plan

to use the same ADC stack for both readout modes of the sensor. Detailed sensor timing for the single and dual gain modes is illustrated in Figure 2.

# *A. Single Gain Mode@120FPS*

Every pixel is sampled with a fixed conversion gain (either high or low conversion gain) with a standard 4T read timing. The ADC array is operating in a ping-pong type timing, where half the array is sampling while the other half is converting the previous sample.

#### *B. Dual Gain Mode@60FPS*

To implement a single exposure HDR functionality, we leveraged the existing programmable conversion gain of the pixel, where the high conversion is set by the floating diffusion (FD) capacitance and the low conversion gain achieved by connecting a secondary capacitor in parallel with the FD. The effective extended dynamic range is set by capacity in the low gain mode and the noise of the high gain mode. Each integration is read out in both high and low conversion gains, and the resulting two values are transferred off-chip and processed into a single HDR value capturing more scene contrast.

*1) Proposed Dual Gain Timing:* Whereas the single-gain mode uses the ADC array in a pipelined ping-pong type operation, the dual-gain mode re-configures the ADC array such that the 'ping' ADCs sample the high gain pixel readout and the 'pong' ADCs sample the low gain pixel readout, and all ADCs convert at once. This effectively doubles the readout time in this mode and thus halves the frame rate. The first operation in this readout is to reset the pixel FD and sample that reset value into both the high-gain and low-gain ADCs. Then there is a first charge transfer in the pixel onto the FD node, and this value is read into the high-gain ADCs.

Finally, the low-gain capacitor is switched into the pixel and another charge transfer is done (combining what was on the FD with any charge left in the photodiode), and this value is read into the low-gain ADCs. All ADCs are then converted simultaneously.

*2) Pros and Cons:* The approach implemented here provides substantially less dynamic range improvement than the well-known LOFIC approach[1]. However, this approach allows for the simple re-use of the existing single-gain ADC array, whereas typically the low-gain readout from a LOFIC pixel is in the opposite polarity from the high-gain readout, necessitating modifications to the ADC sampling network and ramp. Due to the already extreme density of the ADC array and the readout height limitations, it was not possible to support a LOFIC type readout in addition to the highframe rate single gain readout. This approach also obviated concerns about overflow path control and process tuning to better achieve first silicon success.

#### IV. CHALLENGES AND SOLUTIONS

We present some of the key challenges in designing this large footprint CIS chip along with our proposed solutions.

## *A. ADC Electrical Crosstalk*

Due to the vertical stacked arrangement of ADCs, the outputs of the top ADCs and pixel output lines travel the full height of the ADC column. The ADC layout is done at a small pitch of  $1.4\mu$ m with limited metal layers for shielding. This results in an unavoidable parasitic coupling between the pixel line (victim) and ADC outputs (aggressor) causing electrical crosstalk. In order to mitigate this, we implemented a novel shielding scheme by dynamically configuring the vertical ADC routing and shielding network according to the timing and mode of operation (as a means to reduce crosstalk). The ADC output multiplexing scheme is illustrated in Figure 3.

*1) Single Gain Mode:* Overlapping sampling and ADC conversion phase (ping-pong timing) results in electrical crosstalk. In a large sensor like this, there can be a significant intensity difference between two six row groups of twelve adjacent rows resulting in ADC crosstalk which is distinctly visible in an image. To mitigate this crosstalk, the outputs of the top two ADCs are multiplexed at the end of the second ADC. This frees up an additional routing line which is used as a shield to reduce crosstalk. The simulated ADC crosstalk improvements as a result of this multiplexing scheme for various scenarios is summarized in Table I.



Fig. 3. ADC Output Multiplexing Network



Fig. 2. Sensor Timing for single (high gain) and dual gain operation

*2) Dual Gain Mode:* Because of the non-overlapping sampling and conversion operations in the dual gain mode, this coupling pathway doesn't cause crosstalk thus allowing usage of individual output lines without any multiplexing.

TABLE I ADC CROSSTALK IMPROVEMENT (SIMULATED):SINGLE GAIN MODE

| <b>Aggressor-Victim</b> | <b>No Dynamic Shield</b> | <b>With Dynamic Shield</b> |
|-------------------------|--------------------------|----------------------------|
| $Ping0-Pong1(Dark)$     | 4.01DN                   | 0.25DN                     |
| $Ping1-Pong1(Dark)$     | $2.03$ DN                | $0.18$ DN                  |
| Ping0-Ping1(Bright)     | $-16.5DN$                | $0.04$ DN                  |
| Ping0-Pong1(Bright)     | $-3.7DN$                 | 0.3DN                      |
| Ping1-Pong1(Bright)     | $-1.7DN$                 | 0.1 <sub>DN</sub>          |
| $Pong0-Ping1(Dark)$     | $0.76$ DN                | $0.05$ DN                  |
| Pong0-Ping1(Bright)     | $-0.72DN$                | $0.01$ DN                  |

# *B. High Speed Clock Generation and Distribution*

The large amount of data generated by the sensor necessitates a large aggregate data rate – even with 92 output data ports the required data rate is 5.6Gbps (per port DDR), requiring a 2.8GHz clock to be distributed to each of the data ports along the  $\approx 8$ cm horizontal chip edges. There are a total of 18 PLLs (one each in stitch blocks 'A', 'C', 'G', 'I', 'B', 'H') generating a output clock at 2.8GHz from a 50MHz reference clock provided externally (in each of the stitch blocks). The 2.8GHz PLL output CMOS clock is converted into CML domain (for common mode noise rejection) and distributed to all readout cores within the stitch block through a high speed CML clock distribution network. Two CML buffers in the clock distribution are separated by a readout core pitch of  $\approx 1.72$ mm which is comparable to a quarter wavelength of 2.8GHz in silicon. This necessitates a very careful design and optimization of routing traces in between two CML buffers to avoid transmission line artifacts. As a starting point we routed the CML traces in top thick metal having minimum resistance and least coupling capacitance to the substrate. The trace width and separation were carefully fine tuned based on postlayout simulation with RLCK extracted model to minimize attenuation at the operating frequency. Increasing the number of CML buffers to reduce inter-buffer separation has a tradeoff with power and device noise. We selected a T-shaped clock distribution and based on simulation results decided with 4 cascaded CML buffer in each left and right directions from center (in stitch block 'B' and 'H').

The output of one CML buffer is AC-coupled into the input of the next stage buffer. This allows us to set a well defined common mode voltage at every CML buffer input pair. It helps to increase robustness by suppressing any systematic or process mismatch causing any common mode imbalance as well as any low frequency noise. One downside of this AC coupling approach is that there is an additional attenuation in the signal path due to the use of MOS capacitors for AC coupling. At the nominal operating frequency, the overall attenuation due to lossy trace and AC coupling stage should be compensated by the large signal gain of the CML buffer.

# *C. Horizontal Smearing*

Horizontal smearing is one of the primary array crosstalk artifacts in any large CIS chip using column parallel readout structures. This artifact occurs when the readout of one portion of the image has a global effect on components that are common to the entire readout. The classical manifestation of this is a very bright region in the image causing a disturbance extending horizontally from the bright region. Significant design effort went into minimizing the absolute value and curvature of this specific artifact.

The ramp generator output is buffered after every two readout units to reduce the ramp propagation delay and minimize the impact of any local ramp kickback causing smearing. Additionally, all the ADC references are re-biased locally in every readout block to create a low impedance net to the center of the array. This helps to locally restrict any reference disturbance caused by aggressor ADCs which further helps to improve horizontal smearing performance. Another important



Fig. 4. Generated Thermal Map from Static IR drop Simulation

'global' factor causing smearing is the non-uniformity of pixel and various ADC supplies. Due to sheer size of the sensor and limited routing metal availability in this process, it is challenging to bring power to the pixel and various supply domains of the readout array. At the readout unit level, careful floor planning and layout optimization efforts went into minimizing the IR drop across the pixel array and readout blocks to improve supply uniformity. Among the various supply domains, a larger routing area (lower resistance to pad) was allocated to supplies causing more significant impact on smearing performance based on simulation models. Finally, a comprehensive set of EMIR simulations were performed (Figure 4) for all the supply domains to ensure supply uniformity thus minimizing the impact of smearing.

# V. RESULTS AND SUMMARY

With the  $4.3\mu$ m dual conversion gain pixel, we measured a linear full well of 6600e- and 41000e- in high and low conversion gain modes respectively. The measured RMS temporal noise in the high gain mode is 1.8e- giving a composite dynamic range of 87dB. The measured SNR plot for the HDR mode along with the high and low gain transfer functions are shown in Figure 6. Detailed specifications of the CIS sensor are outlined in Table II and a full resolution color image captured at 120FPS (single gain mode) is shown in Figure 7.

#### **REFERENCES**

[1] Akahane, Nana, et al. "A sensitivity and linearity improvement of a 100-dB dynamic range CMOS image sensor using a lateral overflow integration capacitor." IEEE Journal of Solid-State Circuits 41.4 (2006): 851-858.

TABLE II SPECIFICATION TABLE

| Paramater                       | <b>Specification</b>                               |
|---------------------------------|----------------------------------------------------|
| Pixel Pitch                     | $4.3 \mu m$                                        |
| Total Pixels                    | $18400(H) \times 17712(V)$                         |
| <b>Active Pixels</b>            | $18000(H) \times 17568(V)$                         |
| Row Time                        | Single Gain: $5.5\mu s$ , Dual Gain: $11\mu s$     |
| Maximum Frame Rate              | Single Gain:120FPS, Dual Gain:60FPS                |
| ADC Resolution                  | 12-bits (2.8GHz count rate)                        |
| Linear OSat Full Well           | High Gain:6600e-, Low Gain:41000e-                 |
| Conversion Gain                 | High Gain:150 $\mu$ V/e-, Low Gain:19.1 $\mu$ V/e- |
| Total Temporal Noise            | High Gain:1.8e-, Low Gain:13e-                     |
| Dynamic Range                   | 87dB                                               |
| Image Lag                       | $0.45e-$                                           |
| PRNU (ROI: $4000 \times 3000$ ) | $0.8\%$                                            |
| Dark Current                    | 55e- (measured at 70C)                             |
| <b>Total Sensor Power</b>       | 23W                                                |
| Die Size                        | $9.92cm \times 8.31cm$                             |



Fig. 5. Detailed Block Diagram Showing Sensor Partitioning





Fig. 6. SNR and Transfer Function in HDR Mode Fig. 7. Full Resolution Color Image Captured in Single Gain Mode at 120FPS