Macromodel-based Signal and Power Integrity simulations of an LP-DDR2 interface in mSiP

Original

Availability:
This version is available at: 11583/2560336 since:

Publisher:
IEEE / Institute of Electrical and Electronics Engineers Incorporated:445 Hoes Lane:Piscataway, NJ 08854:

Published
DOI:10.1109/PRIME.2014.6872719

Terms of use:
openAccess
This article is made available under terms and conditions as specified in the corresponding bibliographic description in the repository

Publisher copyright

(Article begins on next page)
Macromodel-based Signal and Power Integrity simulations of an LP-DDR2 interface in mSiP

Gianni Signorini*, Stefano Grivet-Talocia†, Igor Simone Stievano‡ and Luca Fanucci†

*Intel Mobile Communications GmbH
Am Campeon 10-12, Neubiberg, Germany 85579
email: gianni.signorini@intel.com
†Department of Information Engineering, University of Pisa
Via G.Caruso 16, Pisa, Italy 56122
‡Department of Electronics and Telecommunications, Polytechnic University of Turin
Corso Duca degli Abruzzi 24, Torino, Italy 10129

Abstract—Signal and Power Integrity (SI/PI) analyses assume a paramount importance to ensure a secure integration of high-speed communication interfaces in low-cost highly-integrated System-in-Package(s) (SiP) for mobile applications. In an iterative fashion, design and time-domain SI/PI verifications are alternated to assess and optimize system functionality. The resulting complexity of the analysis limits simulation coverage and requires extremely long runtimes (hours, days). In order to ensure post-silicon correlation, electrical macromodels of Package/PCB parasitics and high-speed I/Os can be generated and included in the testbenches to expedite simulations. Using as example an LP-DDR2 memory interface to support the operations of a mobile digital base-band processor, we have developed and applied a macromodelling flow to demonstrate simulation run-time speed-up factors (x1200+), and enable interface-level analyses to study the effects of Package/PCB parasitics on signals and PDNs, as well as the corresponding degradation in the timing budget.

I. INTRODUCTION

The design of state-of-the-art mobile platforms is becoming even more and more challenging: to accommodate the demand for advance computing abilities and ubiquitous connectivity, an ever increasing number of data-processing architectures, multi-mode multi-band wireless communication interfaces and leading-edge technology peripherals are integrated in shrunk form factor electronic PCBs (Printed Circuit Boards). Furthermore, the trends and the competition in mobile-market dictate a low-cost nature of the devices, the minimization of the power consumption, an optimal battery efficiency and short design periods [1], [2], [3]. Relevant progresses on multi-layer PCB cross-sections and packaging technologies have made possible the integration of multiple silicon dies inside a single package, satisfying the request for small feature-size and high-performance devices.

In the context of modern digital base-band processors for mobile applications, the most common package structures adopted in commercial devices are reported in figure 1, namely single-die fcBGA, multiple-dies SiP (System-in-Package), PoP (Package-on-Package) and PiP (Package-in-Package). For these devices, packages do not only provide a mechanical support and a first-level of interconnection towards the outer world, but realize also internal chip-to-chip communication links [4]. The routing of signals and power distribution networks (PDNs) is highly dense, and the risk of performance degradation is increased due to potential mutual interferences between different portions of the system. To guarantee system reliability and assess performance prior to tapeout, a tremendous effort is required to perform reliable time-domain simulations, including input-output circuit (I/O) descriptions, as well as the effects of package/PCB parasitics on PDNs and interconnections [4]; unfortunately, such system-level simulations are extremely critical to be performed, and often even prevented due to the resulting complexity.

In this paper, we will present a fast and reliable full macromodel-based flow, aimed at supporting the design challenges for a secure integration of an LP-DDR2 (Low-Power Double Data-Rate) interface in a mSiP (SiP for mobile applications); in particular, we will consider a structure composed of a digital base-band processor and memory device, placed in a stacked configuration. Focusing on Signal and Power Integrity (SI/PI) aspects, section II discusses the major implications of low-cost, small-area and high-integration constraints on the design procedures of packages and PCBs. Section III presents a macromodel-based approach to expedite and increase the reliability of complex SI/PI time-domain verifications; the benefits of such an approach will be presented in section IV.

II. LP-DDR2 INTERFACE IN A mSiP

The LP-DDR2 memory interface is characterized by a 16/32-bit parallel bus supporting high-speed datarates (200...1066 Mb/s) with a considerably large voltage swing
(1.2V for unterminated applications) [5]. For the case of mSiP in a stacked-dies configuration, the processor-memory interconnections are completely routed inside the package, leading to a relevant area occupation, inevitably implying critical crossings of memory signals with other communication interfaces and/or sensitive portions of the system. Furthermore, the package distributes the supply to both the memory interface on processor side (core-logic, timing interfaces and I/Os) and to the memory device itself. Low-cost constraints reduce to the minimum the number of available metal layers in the package stackup; small-feature size also limit the number of solderballs and solderbumps reserved for power/ground connections. Furthermore, the number of available bypass capacitors on the PCB and their placement depend on product requirements, and often cannot be optimal. Because of all these constraints, common good-design practices (target-impedance for PDN, avoidance of return-path discontinuities (RPDs), etc.) cannot be completely implemented and require careful trade-offs with costs and area implications. As depicted in figure 2, following an iterative approach and starting from preliminary layouts, the impact of package and PCB parasitics on system performances are accurately assessed through time-domain simulations, aimed at studying the compliance with target operating specifications. Based on simulation results, designs are reworked and optimized, or delivered for the final physical implementation [4], [6].

SSN-induced jitter in LP-DDR2 interface

One of the most critical effects to be optimized in mSiP is the so-called Simultaneous Switching Noise (SSN), graphically illustrated in figure 3. The short rise and fall times required by the communication interface, together with the large voltage swing (1.2V), imply large pulses of current to flow through the supply rails of each switching I/O (3-a). This dynamic current, in combination with parasitic components of PDN and a weak current return-path (3-b), produces relevant voltage fluctuations around the nominal supply value (3-c), inevitably affecting all the other I/Os that share the same supply-domain. SSN effects have to be carefully analyzed: the injected critical voltage fluctuations on signals and supplies introduce a data-dependent jitter on memory lines. As an example, considering the ‘write’-operation of the LP-DDR2 interface (i.e., data are sent from the processor to the memory), in the testbench of figure 4 we assume a pure-inductive parasitic component ($L_{PDN}$) on the supplies and we investigate its dependence on the jitter at the output of a DQ and a DQS I/Os (respectively, $t_{j,DQ}$ and $t_{j,DQS}$); corresponding simulation results are depicted in figure 5. The current required to perform the logic-state transitions at the output of the DQ pad ($i_{DD}$) couples with the inductor, and triggers a supply variation $\Delta V_{DD}(t)$ that follows $\Delta V_{DD}(t) = L_{PDN} \cdot \frac{d}{dt} i_{DD}(t)$. The input transitions on the DQS pad occur after a time $\Delta T_{DQ2DQS}$, ideally set to be half bit period, and the residual bouncing on its supply implies a jitter $t_{j,DQS}$ on the switching event. $t_{j,DQS}$ rapidly increases with $L_{PDN}$, reducing the residual timing margins of a factor $\Delta t_{setup}$ and $\Delta t_{hold}$; a graphical representation of setup/hold time degradation due to SSN effects is reported in figure 6.
III. MACROMODELS FOR SIGNAL AND POWER INTEGRITY SIMULATIONS

In order to assess the quality of the memory interface after its integration in a mSiP, complex time-domain simulations shall be performed including the netlists for all the transmitting and receiving ends, and the complete set of parasitics that affect the signal routing and the power distribution networks. In this paper, we propose a full macromodel-based simulation flow, which is able to cast each system component as a compact SPICE-compatible behavioural netlist.

A. LTFM Macromodels of Package and PCB

Real PDN structures are much more complicated than the simple inductive model of Figure 4. In fact, only full-wave electromagnetic characterizations are able to represent all parasitics with sufficient accuracy. In this paper, a commercial hybrid 2.5D full-wave solver is adopted for the extraction of the scattering parameter matrix (S-parameters) of signal interconnections and PDN of the complete memory interface. Due to the width of the parallel bus (16 or 32 bit) and the number of involved power/ground terminals (relevant because of the number of I/Os), the ports included in the EM-extraction can easily exceed 100+. For this reason, we adopt a rational curve fitting methodology with passivity enforcement [8] to cast the linear structure as a lumped Linear Transfer Function Model (LTFM), which is converted to a state-space system and realized as a SPICE-compatible behavioral netlist. This procedure is standard and not further commented here [7].

B. The need for I/O Macromodels

To improve the correlation of SI/PI simulation results with post-silicon measurements, post-layout extracted views containing on-chip parasitics of devices and interconnections should be used for each I/O. Unfortunately, this results in a tremendous explosion of netlist complexity, and prevents the execution of interface-level transient simulations. However, black-box macromodelling methodologies can be applied to the same netlists, and equivalent representations can be extracted and used in SI/PI testbenches. In this context, in order to guarantee confidence in the results and enable complex analyses, macromodels shall be:

- compact, to expedite analysis and extend the simulation coverage;
- accurate, offering a superior accuracy of currents and voltages compared to corresponding netlists, both at output and at the supply terminals.

C. Mpilog Macromodels of Drivers and Receivers

In this paper, Mpilog macromodels [9] have been developed and used to represent the behaviour of LP-DDR2 I/Os. The generation of the macromodels is based on a DC sweep and a transient simulation, stimulating the device-under-modelling (DUM) with suitable voltage stimuli at input, output and supply terminals. Post-processing the resulting current and voltage waveforms at the same pins, Mpilog tunes a non-linear parametric mathematical model, in order to reproduce both the static and dynamic i-v characteristics of the DUM for low and high logic-state. Time-domain weighting functions multiply these non-linear functions to implement the dynamics of logic-state transitions, both at output and supply-terminals. The structure of Mpilog macromodels and the generation procedures are well described in [9], [10]. For a single-ended driver structure [10], model equations are:

\[ i_O = w_H k_H(v_{dd}) f_{sH}(v_O) \]
\[ i_{dd} = w_H k_H(v_{dd}) f_{sH}(v_O) \]
\[ + w_H f_H(v_O, v_{dd}, \partial/\partial t) + w_H f_{HH}(v_O, v_{dd}, v_{dd}, \partial/\partial t) \]
\[ + w_L k_L(v_{dd}) f_{sL}(v_O) + w_L f_L(v_O, \partial/\partial t) \]
\[ + w_L f_{HL}(v_O, v_{dd}, \partial/\partial t) \]
\[ + w_L f_{LH}(v_{dd}, v_O, \partial/\partial t) + \delta_i(t). \]

\( f_{sH} \) and \( f_{sL} \) represent the i-v static output characteristics when the driver is kept, respectively, in a fixed high and low logic-state, while \( f_H, f_L, f_{HH} \) and \( f_{LL} \) are discrete-time Local-Linear State-Space (LLSS) models that account for the non-linear dynamics of the buffer; \( w_H, w_L, w_{HH} \) and \( w_{LL} \) are time-varying weighting function to reproduce logic-state evolutions; \( k_H \) and \( k_L \) accounts for the effect of supply fluctuations on the static characteristics; \( \delta_i \) reproduces the current drawn from the pre-driver stages.

These equations are then synthesized as electrical representation, enabling the simulation in any SPICE circuit solver.
IV. RESULTS

Table 1 reports the time required to perform a transient simulation of a 40-bit PRBS pattern using several different electrical representations of a DQ transmitting pad (schematic, post-layout RC-extracted netlist and an Mpilog model). Neglecting layout-induced parasitics, the I/O schematic netlist can only deliver approximate results and potentially lead to inaccurate predictions about system performances. Post-layout RC-extracted netlist ensures post-silicon correlation, but its complexity implies a significant increase in simulation runtime (x6); however, an Mpilog macromodel can be generated from this netlist and be used to represent the I/O with the same accuracy while offering a tremendous simulation speed-up (x1293). This is proven in figure 7, depicting the output voltage and the supply currents of the model and the corresponding RC-extracted netlist for a 1-0-1-0 logic-state transition.

| TABLE I. TRANSIENT SIMULATION RUNTIME FOR A 40-BIT PRBS INPUT PATTERN USING DIFFERENT I/O ELECTRICAL REPRESENTATIONS |
|--------------------------------------------------|----------------|----------------|
| Schematic                                       | RC-full        | Mpilog         |
| Runtime [sec]                                   | 122.4          | 732.1          | 0.566          |

A macromodel-based SI/PI testbench has been developed to analyze the ‘write’-mode operation of the complete 32bit LPDDR2 interface. In figure 8, the eye-diagrams of the 32 DQ lines and the corresponding DQS strobes are superimposed. Such an analysis would not have been possible using the I/O behavioural models, offering outstanding runtime speedup (x1200+), superior accuracy and easy integration in SPICE simulation environments. Their use in combination with LTFTM equivalent netlists of Package/PCB parasitics ensures reliable performance predictions, expedite simulations and improves the effectiveness of optimization processes for the design of the overall system.

V. CONCLUSIONS

Low-cost, small feature-size and area constraints in the design of SiP for mobile applications, require accurate SI/PI time-domain simulations to support the layout of packages and PCBs, ensuring performance compliance with target operating specifications. A macromodel-based approach for SI/PI analysis enables interface-level simulations to study the impact of package/PCB parasitics on system functionality, hardly achievable using transistor-level descriptions due to an excessive complexity of the resulting netlist. Mpilog has been used to generate I/O behavioural models, offering reliable performance predictions, expedite simulations and improves the effectiveness of optimization processes for the design of the overall system.

ACKNOWLEDGMENT

The authors would like to thank Intel Mobile Communications GmbH for the support offered to this work: special thanks go to Vincenzo Costa, Venkatesh Kasturirangan, Dr. Alexander Olbrich, Pietro Brenner, Kay Schiller and Alexander Ruehle for the valuable discussions.

REFERENCES


Fig. 7. Comparison of output voltage and supply-currents for post-layout RC-extracted netlists and the corresponding Mpilog SPICE macromodel

Fig. 8. Eye-diagram of DQ<31:0> lines and the corresponding DQS<3:0> for a Write-operation