# POLITECNICO DI TORINO Repository ISTITUZIONALE

Nanomagnetic Logic: From Devices to Systems

# Original

Nanomagnetic Logic: From Devices to Systems / Riente, Fabrizio; Becherer, Markus; Csaba, Gyorgy - In: Emerging Computing: From Devices to SystemsELETTRONICO. - [s.l]: Springer, 2023. - ISBN 978-981-16-7486-0. - pp. 107-143 [10.1007/978-981-16-7487-7\_5]

Availability:

This version is available at: 11583/2971331 since: 2023-02-28T14:26:34Z

Publisher: Springer

Published

DOI:10.1007/978-981-16-7487-7\_5

Terms of use:

This article is made available under terms and conditions as specified in the corresponding bibliographic description in the repository

Publisher copyright

Springer postprint/Author's Accepted Manuscript (book chapters)

This is a post-peer-review, pre-copyedit version of a book chapter published in Emerging Computing: From Devices to Systems. The final authenticated version is available online at: http://dx.doi.org/10.1007/978-981-16-7487-7\_5

(Article begins on next page)

# Nanomagnetic Logic: from devices to systems

Fabrizio Riente, Markus Becherer and Gyorgy Csaba

Abstract A digital computing system with ferromagnets as switches, magnetic stray fieds for computation, and domain walls for information transport — is it a curiosity or ready for ultra-large-scale-integration? Over the last decade, starting from sub-micrometer sized Nanomagnets comprised of Co/Pt multilayers, a functionally complete set of logic gates and memory elements were experimentally demonstrated as a potential co-processing unit for CMOS microprocessors called perpendicular Nanomagnetic Logic (pNML). From the beginning of this endeavor, not only single devices but investigations of complex circuits like full-adders and multiplexers finally culminated in an EDA tool called ToPoliNano. It offers a complete design flow for system-level exploration of field coupled technologies, including pNML. In particular, its layout editor MagCAD provides the possibility to design, simulate and re-use pNML modules in larger architectures. The underlying compact models are continuously adapted to newest developments in pNML technology, e.g. improvements in materials, device design and exploitation of novel physical effects. With that, efficient and reliable benchmarking against CMOS implementations is possible, and important system level aspects are directly fed back to the technology and device engineers.

Fabrizio Riente

Politecnico di Torino, Department of Electronics and Telecommunications Engineering, 10129, Turin, Italy e-mail: fabrizio.riente@polito.it,

Markus Becherer

Technical University of Munich, Chair of Nanoelectronics, Arcisstraße 21, 80333 Munich, Germany e-mail: markus.becherer@tum.de

Gyorgy Csaba

Peter Pazmany Catholic University, Faculty for Information Technology and Bionics, Práter u. 50A, 1083 Budapest, Hungary e-mail: csaba.gyorgy@itk.ppke.hu

#### 1 Introduction

Nanomagnetic logic is a spin-based computing architecture that uses the magnetization state of nanoscale ferromagnets to store, propagate and process information. It is radically different from conventional electronics as it does not use charge, only the spin degree of freedom for computation, yet it is functionally equivalent to digital electronics. A Nanomagnet Logic (NML) circuit can do everything that, say, a standard CMOS digital circuit could do, and perform its tasks in a potentially significantly more power efficient way and with robust, nonvolatile storage integrated into every computing step.

It must be admitted that, as of the time of writing, NML is not the fanciest of all emerging nanoelectronic devices. For example, molecular electronics promises much smaller devices as molecules (being much smaller than nanomagnets) can be packed more densely. NML is also a digital computing technology, so powerful analog computing ideas and quantum computing cannot be done in the NML framework. Photonic computers can be orders of magnitude faster. But there is at least one aspect of NML that is unique and way beyond most emerging nanosystems: NML is inherently a system concept, where device and interconnections are essentially the same thing, and there is a clear, demonstrated path toward large-scale computing devices. Unlike most other novel nanoelectronics, NML is ready to be engineered to large-scale circuits, with no roadblocks hindering its scalability.

The purpose of this work is to present the principles of NML to a VLSI designer (or VLSI-minded researcher) and collect all the information that is required to design large systems. We will discuss in detail the relevant theory to actually design NML circuitry. A few case studies on more complex systems will be presented using the ToPolinano / MagCAD design suites.

The physics of NML will be discussed in less detail and the reader is referred to other publications about the physical background or the fabrication details of NML. For more physics or device oriented reviews the reader is referred to e.g. [14] [10] [36] [9] [5] [55] [20] [37].

This work would like to serve also as an invitation to design and benchmark NML circuitry and find killer applications for this technology.

#### 1.1 A brief history and nomenclature

The roots of NML go back to Quantum-dot Cellular Automata (or QCA), a ground-breaking exotic electronic concept from the late 1980s. The idea behind QCA was to use Coulombic interaction, rather than current flow, to perform computation. At its heart, QCA is an architecture idea, and many different systems may realize QCA computation. The originally devised (quantum-dot based) implementations were challenging to realize - single electron devices and nanomagnets turned out to be a much more fertile experimental playground. Due to this origin, early NML devices were named magnetic QCA and can be found as such in the corresponding papers.

The promise of nanomagnets for computation was first realized by Cowburn [1] and the first NML gates were built a few years later [33]. These implementations used in-plane nanomagnets, i.e. where the digital information is represented by an in-plane magnetization direction [53][18][52]. Sometimes such devices were referred to as iNML circuits. While nontrivial devices were realized (such as a 1-bit full adder [54]), it turned out that the in-plane magnetization seriously restricts how the circuit can be arranged in 2D so scaling up the circuits became impossible.

A newer version of NML (often referred to as pNML) uses out of plane (perpendicularly magnetized) nanomagnets, realized from Co/Pt nanomagnets. The introduction of out of plane magnetization required perpendicular magnetic anisotropy (that Co/Pt has) and it turned out the engineering the anisotropy brought came with many additional benefits. pNML devices have well-defined signal propagation direction (nonreciprocity) [13], and the shape of the magnets and their arrangement can be varied with much larger freedom. The subject of this work is exclusively pNML devices, because they provide a clear path toward larger-size computing systems.

# 2 pNML Working Principle

#### 2.1 Structure of the magnetic stack

In pNML technology, the binary information is encoded in two stable magnetization states of single-domain nanomagnets. The magnetization pointing up or down represents respectively the logic 1 or 0. The elementary bit of information is made by a multi-layer stack that shows strong perpendicular magnetic anisotropy (PMA). The stack is composed of ultrathin ferromagnetic (FM) layers separated by non-magnetic metal layers [35]. The thickness of the layers determines the magnetic properties of the device. The magnetic anisotropy  $K_{\rm eff}$ , with the saturation magnetization  $M_s$  are the most important parameters to design pNML devices. The effective anisotropy in PMA devices should be greater than zero and is given by:

$$K_{eff} = K_u - \frac{1}{2}\mu_0 M_s^2 + \frac{2K_s}{t_{eff_{EM}}}$$
 (1)

With  $K_u$  the uniaxial anisotropy,  $M_s$  the saturation magnetization,  $K_s$  the surface anisotropy and  $K_{\text{eff}_{FM}}$  the effective thickness of the ferromagnetic layer. The anisotropy energy, expressed in the general form as:

$$E_{anis} = K_{eff} \cdot \sin^2 \theta \tag{2}$$

is minimized for  $\theta \in \{0, \pi\}$  in PMA films. This means that the magnetization is perpendicular to the plane. On the contrary, if  $K_{eff}$  is lower than zero, the anisotropy energy is minimized for  $\theta \in \{-\frac{\pi}{2}, \frac{\pi}{2}\}$ . To guarantee the single domain state, the minimum feature size should be smaller than the domain size. In other words, a

feature size larger than the domain size results in a multi-domain configuration being energetically more stable. Thus, to keep the two-stable states behavior, the critical domain size derived from the domain wall theory (Eq. 3) should be considered in the design of pNML devices.

$$DW_{crit} \approx \frac{72\sqrt{AK_{eff}}}{\mu_0 M_s^2}. (3)$$

For a typical 5 bi-layer Co/Pt stack,  $K_{\rm eff} \approx 2.8 \times 10^5 \, {\rm J \, m^{-3}}$ ,  $A = 1.3 \times 10^{-11} \, {\rm J \, m^{-1}}$ ,  $M_s = 7.2 \times 10^5 \, {\rm A \, m^{-1}}$  and the critical domain width reads  $DW_{crit} \approx 210 \, {\rm nm}$ . The magnetic properties of the film can be tailored by adjusting the composition of the stack. The perpendicular magnetic anisotropy can be engineered by varying the material of the ferromagnetic and non-magnetic layer, the number of layers, their thickness, and the crystal orientation. The crystal orientation is induced by the seed layer, which is interposed between the substrate and the magnetic stack. The Pt induces the crystal orientation, i.e. the texture of the whole magnetic stack [35]. Its thickness ranges between 3 nm to 5 nm to enforce the PMA [17]. The general structure of the magnetic stack is schematized in Fig. 2.1. The magnetization reversal



Fig. 1 a, Schematic representation of the Co/Pt magnetic stack. b, Stoner-Wohlfarth particle with uniaxial anisotropy.

in PMA thin film and magnetic multi-layer starts from the weakest point (e.g. a defect in the crystal), nucleating the domain wall, followed by domain wall propagation. The domain wall nucleation is often modeled as a coherent reversal process and can be described by the so-called Stoner-Wohlfarth model. This analytical model describes a uniformly magnetized ellipsoid with uniaxial anisotropy, whose energy density is given by the contribution of the uniaxial anisotropy and the external magnetic field, Eq. 4.

$$E_{sw} = K_u sin^2 \theta - \mu M_s H cos(\theta - \alpha)$$
 (4)

The particle is subjected to a static external field H with angle  $\alpha$  with respect to the easy axis, and the magnetization M is rotated by the angle  $\theta$ . The uniaxial anisotropy and the external field compete to the final orientation of M. The total energy is

minimized for  $\alpha=0$ ,  $\theta=0$  if  $K_u>0$ . The minimum field required to saturate the particle can be obtained by minimizing  $E_{sw}$  with  $\theta=\pi/2$  and results in Eq. 5.

$$H_{anis} = \frac{2K_u}{\mu_0 M_s} \tag{5}$$

The obtained expression represents the field required to reverse the magnetization of the particle with a certain anisotropy value.

For building logic devices, it is important to control where the domain wall nucleation takes place. The magnetization reversal generally arises in a low anisotropy area (Eq. 5). In an as-grown stack, it occurs in correspondence of defects or inhomogeneities of the film. In order to control the domain wall nucleation and overcome this weak anisotropy spots distribution, the local reduction of  $K_{eff}$  by Focused Ion Beam (FIB) irradiation is widely used. To avoid the nucleation from as-grown randomly distributed defects, one side of the magnet is usually irradiated. The irradiation defines an artificial nucleation center (ANC) where the domain wall nucleates.

# 2.2 Properties of the hysteresis loop

Before going into the detail of the artificial nucleation center, the most important parameters of the hysteresis curve are recalled. The hysteresis plots the variation in



Fig. 2 a, Ideal hysteresis curve where the most important parameters are highlighted. b, Measured hysteresis loop on  $Ta_{1.7nm}Pt_{4nm}[Co_{0.75nm}Pt_{1.4nm}]_{x4}Pt_{2.75nm}$  as-grown film.

the magnetization as a function of the applied field. The main parameters on the hysteresis are:

- the saturation magnetization M<sub>s</sub>: it is reached when all the magnetic moments are aligned along the same direction of the applied magnetic field.
- the coercive field H<sub>c</sub>: it is the field required to reverse the overall magnetization
  of the sample. It is usually considered equal to the nucleation field.
- the remanent magnetization M<sub>r</sub>: it represents the magnetization left in a ferromagnetic material when the external field is set to zero after saturation.
- the nucleation field H<sub>nuc</sub>: it is the magnetic field required to reverse the magnetization in a small area of the magnet. It identifies the field required to nucleate a domain wall.
- the propagation field  $H_{prop}$ : it is the field required to propagate the domain wall after its nucleation. Usually  $H_{prop} < H_{nuc}$ .

The remanence field is extremely important in pNML devices because the magnetization has to keep its alignment even when no external field is applied. Ideally,  $M_r$ = $M_s$ . The remanence makes it possible to define two stable conditions which encode the logic 0 and 1.

#### 2.3 Defining artificial nucleation centers

The definition of the ANC by FIB irradiation during the fabrication process generates a low anisotropy region and a step in the anisotropy from the irradiated and non-irradiated portion of the film, see Fig. 2.3.b. Therefore, the nucleation field not only depends on the anisotropy field at the ANC, but also on the depinning field to overcome the step in the anisotropy. Fig. 2.3.a schematizes the region that is usually irradiated during the fabrication process.

The real challenge in pNML technology is to precisely control the nucleation mechanism from surrounding input magnets. The locally reduced PMA decreases the switching field from its intrinsic value  $H_{c,0}$  to its coercivity  $H_c$ . Fig. 2.3.c shows a typical hysteresis curve of a common magnetic multi-layer stack after FIB irradiation. The highly accelerated  $Ga^+$  ions intermix the magnetic stack, resulting in a narrower hysteresis. The FIB induced defects in the nanomagnet should dominate the domain wall nucleation over the randomly distributed defects in the film. The user-defined ANC makes it possible to determine where the nucleation reversal starts, enabling the implementation of logic functionalities. Indeed, the stray field from neighboring magnets can support or prevent the domain wall nucleation.

Fig. 2.3.e shows a chain of four nanomagnets. Each magnet is coupled ferromagnetically or anti-ferromagnetically to its closest neighbor according to its magnetization. Without having a FIB irradiated side on the magnets (left in this case), it is not possible to determine in which direction the signal is going to propagate. Therefore, the definition of the ANC and its position is not only extremely important to control the domain wall nucleation but also to define the signal propagation direction.



**Fig. 3** a, The local FIB irradiation partially reduce the PMA. b, Step in the magnetic anisotropy after FIB irradiation. c FIB irradiated magnets show a narrower hysteresis loop reducing the switching field to its coercivity. d Binary information encoded in the two stable magnetization states. e, The FIB irradiation on one side of the magnet defines the signal propagation direction.

## 3 Computing with no current

## 3.1 Coupling and clocking field

In section 2.3 the important aspects of the ANC have been described, highlighting the necessity of reducing the PMA on one side of the magnet. The logic computation is achieved by dipole coupling and its strength decays rapidly with the distance. The coupling strength can be approximated with Eq. 6.

$$C \approx \frac{M_s V}{4\pi r^3} \tag{6}$$

It is proportional to the volume (V) of the magnet and inversely proportional to the distance  $(r^3)$ . It is clear that to reach a high coupling field, the distance should be as small as possible. However, the coupling field from the surrounding magnets is not sufficient to nucleate the domain wall on its closest neighbor [36][21]. The magnetization reversal is achieved by the superposition of the input stray field and the



Fig. 4 a, Random initial magnetization for M1, M2, M3, and schematic hysteresis curve of FIB irradiated magnets. b, The hysteresis curve of magnet M2 is shifted to the right when a positive external field is applied and the coupling from the input magnet supports the domain wall nucleation. c The hysteresis curve of magnet M3 is shifted to the left when a negative external field is applied and the negative coupling supports the magnetization reversal. d, The minimum pulse width to determine the operating frequency should consider the time required for the domain wall nucleation and its propagation.

external field. Fig. 3.1.a shows schematically the hysteresis curve after FIB irradiation of magnets M1 and M2, where the switching field is reduced to H<sub>c</sub>. The ANC is located on the left side of the magnets. Therefore, according to Eq. 6, M2 and M3 are mostly influenced by their left neighbors, M1 and M2 respectively. On the contrary, the magnet on the right is too far to influence the ANC of its left neighbor. The three magnets are considered initially randomly magnetized to M1=0, M2=0, M3=1. When a positive external field is applied (H<sub>clock</sub>), M2 switches anti-parallel to M1 due to the anti-ferromagnetic coupling. In this case, the switching is supported by the coupling field (C) from M1 to M2. Therefore, the hysteresis curve is shifted to the right (Fig. 3.1.b). Similarly, when a negative external field is applied, the hysteresis of M3 is shifted to the left and its magnetization is reversed to -1 (Fig. 3.1.c). On the contrary, the domain wall nucleation on M2 is prevented by the coupling field being opposite to the external field. Therefore, the external field reduces the energy barrier for domain wall nucleation that is supported/prevented according to the sign of effective coupling from surrounding magnets. The effective field on the ANC can be estimated as:

$$H_{eff} = H_{clock} + \sum_{i} C_i M_i \tag{7}$$

Where  $C_i$  is the coupling contribution from input i and  $M_i$  its magnetization. The external field amplitude can range within a clocking window, see Fig. 3.1.d. This window contains a range of clock values around  $H_c$  for correct pNML operation. It is lower than 2C and it is reduced by the switching field distribution (SDF), which depends on the fabrication process.

In term of timing, the minimum pulse width  $t_{pulse}$  should take into account the domain wall nucleation time  $t_{nuc}$  and its propagation time  $t_{prop}$ . Both these quantities determine the maximum operating frequency of a pNML circuit. If  $t_{clock}$  is too short, the domain wall in some magnets may get stuck without a full magnetization reversal. The magnet will switch back to its previous value when then opposite clock pulse is applied leading to errors in the signal propagation.

#### 3.2 Basic gates

In pNML, directed signal flow is achieved by controlling the domain wall nucleation on a specific region of the magnet. Highly accelerated Ga<sup>+</sup> ions are used to lower the PMA and define the signal propagation direction. By playing with the effective field acting on the ANC, according to Eq. 6, it is possible to implement simple logic functions. The basic gates are available in pNML: the inverted (Fig. 3.2.1.a) and the majority voter (Fig. 3.2.1.b).

#### 3.2.1 Inverter

The inverter provides the NOT Boolean function and can be simply obtained by cascading two anti-ferromagnetically coupled nanomagnets. The ANC is defined on the side of the magnet that should be sensitive to the input stray field. (left in this case). The authors in [12] investigated the inverter gate using square-shaped



Fig. 5 a, Simplest implementation of the inverter using two anti-ferromagnetically coupled magnets. b, Schematic implementation of the majority gate implemented surrouding the ANC with three inputs. It can implement the NAND/NOR function by setting one of its input logic 0 or 1. c, SEM image of a fabricated inverter. d, SEM image of a fabricated majority gate.

magnets. They measured the coercivity and the SFD for every logic value. The SDF for positive and negative magnetization were shifted by  $2C \approx 10\,\mathrm{mT}$ . However, the two SDF were overlapped, meaning that no clocking window can be defined for reliable switching. In [36], an improved version of the inverted was presented. Here, the inverter with a fork-like structure was introduced. In the experiment, the SDFs were shifted by  $2C \approx 26\,\mathrm{mT}$ , with a clear separation between and positive and negative SDF. Therefore, the geometry of the magnets has a great impact on the achievable coupling. The inverter depicted in Fig. 3.2.1.a is designed with a fork-like structure to maximize the coupling strength on the ANC. This is the common shape used in pNML to design inverters. To further increase the stay field on the ANC the gap between the input and the output should be minimized.

#### 3.2.2 Majority gate

The majority voter is the most important gate in pNML, that combined with the inverter provides a functionally complete set of logic gates. This gate is composed of

three inputs and one output, as depicted in Fig. 3.2.1.b. The majority voter requires that the contributions from every input acting on the ANC are balanced, meaning that  $C_1 = C_2 = C_3$ . The implemented function can be written as:

$$Out = \overline{MAJ(M_1, M_2, M_3)} \tag{8}$$

The output is equal to the majority of the three inputs inverted. Observing the truth table in Fig. 3.2.1.b the gate can behave as NAND/NOR function by programming one of the input respectively to logic 0 or 1. This is a double advantage in digital circuit design. The NAND is a universal gate in digital electronics. Moreover, the designer could choose among majority-based [7][3][4][2] or NAND/NOR-based logic synthesis tools to optimize the logic network [34]. To operate as NAND/NOR one of the input can be a fixed magnet, saturated in the proper direction or it can be a programmable input. If for example  $M_1$  is the programmable input, the output function can be rewritten as reported in Eq. 9,10.

$$Out|_{M_1=0} = NAND(M_2, M_3)$$
 (9)

$$Out|_{M_1=1} = NOR(M_2, M_3)$$
 (10)

$$Out|_{M_1 \neq M_2} = NOT(M_3) \tag{11}$$

Also the inversion (NOT) can be obtained by setting two programmable inputs to opposite values. Their stray fields cancel each other and the output is the input inverted value (Eq. 11). Fig. 3.2.1.d shows an SEM image of a fabricated 3-input majority gate. The symmetric geometry of the inputs is required to reach equal contributions from every input on the ANC. The small tip contains the region with reduced PMA and defines where the domain wall nucleates. Contrarily to the inverted, the majority shows four SDF, two for each logic values. Indeed, according to the effective coupling on the ANC two SDF can be identified per magnetization (0/1):

- $C_{eff} = +3C/-3C$ , for inputs 000/111  $C_{eff} = C/-C$ , for inputs 001,010,100/011,101,110

The reason for two different SDF per logic value can be understood by observing the graph in Fig. 6. The effective coupling depends on the input patterns. When all the input magnetizations point in the same direction, they all support/prevent the switching of the output with  $C_{eff} = +3C/-3C$ . On the contrary, for all other input patterns, two inputs have opposite value and cancel each other. Therefore, the switching is supported/prevented by only a single input ( $C_{eff} = C/-C$ ). From experimental measurements, the calculated coupling is approximately 5 mT [11]. The lower coupling per input results in a reduced clocking window. Authors in [11] studied the effect of the ANC during the fabrication process. If the variation in x,y is higher than 10 nm, the output error rate significantly increases. However, 10 nm alignment is feasible in modern manufacturing processes provided by semiconductor industries.

In general, the majority gate requires an odd number of inputs to avoid undetermined



Fig. 6 The graph shows the different effective coupling on the ANC when a positive and a negative pulse is applied. The green circles represent contributions that prevent the switching of the magnet, while the others support the magnetization reversal.

output. Therefore, the number of inputs can in principle be increased to 5, 7 as long as the coupling field from every input is strong enough. A planar implementation of the five inputs majority has been experimentally demonstrated in [8]. In principle, the input weights can be also tuned by engineering the magnet geometry, distance from and ANC leading to a different coupling field from each input. This possibility can be interesting for threshold logic computation [40].

#### 3.2.3 Signal routing

The correct operation of every logic device requires the transfer of the digital information from one point to another of the circuit. Logic gates (inverter and majority) need to be interconnected to each other for complex logic operations. In pNML, the shape-independent anisotropy makes it possible to define the structure of any shape and in particular magnetic nanowires. Indeed, elongated nanomagnets are used to route the signals in pNML circuits. Magnetic wires are supposed to nucleate and propagate the domain wall in a single clock pulse. Its length mainly influences the propagation time (Fig. 3.1.d) and as a consequence the maximum operating fre-

quency. Therefore, their length should as small as possible, and the design should consider breaking the longer paths in multiple elongated, shorted wires.

## 3.3 Monolithic 3D integration

The pNML technology claims many advantages, among which being low power, it can combine logic and memory into the same device and it can operate at room temperature. Another important advantage that is not obvious among emerging technology.



**Fig. 7 a**, 2D horizontal signal propagation can be achieved by a chain of anti-parallel coupled magnets. The 3D implementation exploits ferromagnetic coupling to transfer the binary information from one layer to another. **b**, 2D implementation of the majority gate and corresponding 3D implementation. The input moved to the layer above/below can be a reconfigurable input. **c**, The 2D implementation of a cross-wire in pNML is not yet available. The 3D integration of pNML offers easy signal crossings by moving the signal to another functional layer.

nologies, is the compatibility with standard silicon back-end processes. This makes pNML very appealing to integrate and enrich the CMOS capabilities. A further advantage is provided by the monolithic 3D integrability offered by the technology. It results in higher circuit compaction and as a consequence high energy efficiency and shorted magnetic wires required for signal routing. The comparison between the 2D and the 3D implementation of the basic pNML build blocks is illustrated in Fig. 7. The horizontal signal propagation in 2D is obtained by a chain of nanomagnets or by using elongated wires. According to the FIB irradiated side of the magnet different

propagation direction can be defined, see Fig. 7.a. The same picture shows how the vertical propagation is achieved by exploiting the ferromagnetic/anti-ferromagnetic coupling. If the FIB irradiated edge overlaps the magnet laying below, the magnets align parallel and the information moves down-up. Similarly, when the magnet above overlaps the ANC of a magnet placed below, the signal propagates ferromagnetically from the up layer to the down layer. In case no overlap is present, the magnets align anti-parallel and the information is inverted when transferred from one functional layer to another. This approach makes it possible to cross two signals by detouring one of them on another functional layer [25].

The 3D implementation of the majority gate (Fig. 7.b) makes it possible to move the programmable input into another functional layer. Its correct logic operation has been experimentally demonstrated in [23].

From a system-level perspective, this solution enables the definition of programmable/reconfigurable layers in the architectures, separated by memory and logic layers.

Finally, magnetic signal crossings, not available in 2D, are enabled thanks to the possibility to 3D-stack magnetic functional layers. Here, the magnetic via is exploited to detour the signal [22].

From the technology point of view, the fabrication process can be summarized in five steps:

- 1. Deposition of the first magnetic stack on a thermally oxidized Si wafer and structuring of the magnets
- FIB irradiation of the first functional layer, defining the signal propagation direction
- 3. Planarization of the bottom layer with hydrogen silsesquioxane (HSQ), a spin-on dielectric
- 4. Deposition of the second magnetic, the same as in 1
- 5. Definition of the ANC by FIB irradiation of the patterned structure, the same as in 2

The distance between two subsequent layers has been lower than 100 nm to maintain reliable coupling and switching. Moreover, the FIB acceleration voltage, combined with the HSQ thickness are the parameters that have the greatest impact on the Ga<sup>+</sup> ions penetration depth during the definition of the ANC [24].

#### 3.4 Experimental study of pNML circuits

The basic pNML gates described in section 3.2 enable the design of complex architectures. Classical mapping on NAND/NOR gates can be used by programming one of the majority inputs. However, to fully exploit the majority logic function, the majority gate synthesis is preferable. It has many advantages, among which the increased circuit compaction, reducing its area, and lower dissipated power. In the

following, the most important experimentally demonstrated circuits are recalled. The first complex logic circuit experimentally demonstrated was the full adder [9]. This circuit is a milestone in beyond-CMOS technology benchmarking [38]. The second fundamental device in digital electronics was a 2-to-1 multiplexer [51] and the last a proposed memory element [44]. To the best of our knowledge, these devices are the most complex pNML circuits for which the correct logic operation has been experimentally demonstrated. In the following, the main results about the adder and multiplexer are reported.

#### 3.4.1 Full adder

The full adder is the core logic element in arithmetic logic units. Therefore, its experimental demonstration represents a milestone for the pNML technology. The first in chronological order exploits the advantages of majority function available in pNML. Its equation based on NOT/AND/OR for the sum (S) and the carry  $(C_{out})$  is reported in Eq.12-13.

$$S = \overline{AB}C_{in} + \overline{AB}\overline{C_{in}} + A\overline{B}C_{in} + ABC_{in}$$
 (12)

$$C_{out} = BC_{in} + AC_{in} + AB \tag{13}$$

Different majority-based implementations of the full adder have been proposed in the literature. The first version was proposed for QCA [49] and it was further developed in [56]. However, in pNML the output of the majority gate is inverted due to the antiferromagnetic coupling among coplanar inputs. Therefore, pNML majority-based equation of the full adder requires additional inverters and is reported for the sake of clarity in Eq.14-15.

$$S = \overline{MAJ}(\overline{MAJ}(A, B, C_{in}), C_{in}, \overline{MAJ}(A, B, \overline{C_{in}})$$
 (14)

$$C_{out} = \overline{\overline{MAJ}(A, B, C_{in})}$$
 (15)

Fig. 8.a shows the gate-level representation of the adder presented in [9]. It requires only three majority and four inverter gates [9]. The SEM image of the corresponding full adder is depicted in Fig. 8.c. Here, the gap between the inputs and the ANC was ranging from 25 nm to 50 nm. Its footprint is about  $17 \, \mu m^2$ . Fig. 8.e shows the timing diagram of the corresponding full adder. It is supposed that the circuit is firstly saturated to 0. As a consequence, all outputs and intermediate signals are initially equal to logic 0. The input pattern A=1, B=0,  $C_{in}$ =1 is set before applying the clock signal. It is possible to observe that the input signals cannot be varied earlier than two clock cycles. In general, the outputs are valid after 1.5 clock cycles, but for simplicity, new inputs are applied every two clock cycles and the outputs are considered valid with the same period. The input-output latency is strictly related to the circuit critical path, which is represented by the number of ANCs the signal needs to cross and the length of the interconnections. A more detailed discussion



**Fig. 8** a, Gate-level representation of the pNML majority-based full adder. b, Gate-level representation of the pNML threshold gate-based full adder. c, SEM image of the fabricated majority-based full adder [9]. d, SEM image of the fabricated threshold gate-based full adder [15].

on the critical path estimation can be found in section 4. For more details on the fabricated and measured device please refer to [9].

Better results in term of circuit compaction has been obtained by applying threshold logic gates (TLGs) as logic block in the full adder [15]. Threshold logic enables the design of more complex functions reducing the overall circuit complexity [32][19]. The adder gate-level representation is reported in Fig. 8.b. Here, the TLG1 has equal weights and provides the 3-input majority function, the TLG2 has one double-weighted input ( $C_{out}$ ) and provides the 5-input majority function. Fig. 8.d shows the SEM image of the fabricated device presented in [15]. In this layout, only two TLGs are required, resulting in a footprint reduced by a factor of 8.7 with respect to the circuit in Fig.8.c. The bounding box of this full adder is only 1.95  $\mu$ m² without considering the additional two inverters at the output required to complement the

sum and the carry [15]. The estimated coupling from the SEM image and numerical simulation reported in [15] was  $\approx 2.5\,\mathrm{mT}$  for the 5-input majority and  $\approx 3.5\,\mathrm{mT}$  for the 3-input majority. The simulations using compact models showed that a minimum coupling per input of 5 mT is required for reliable logic operation. This result can be achieved by reaching a gap between and inputs and the ANC lower then 48 nm for the 5-input majority and 80 nm for the 3-input majority gate [15]. Moreover, it is important to recall the importance of the ANC alignment between inputs to correctly balance their contributions. Indeed, Fig. 8.f reports the timing diagram of the TLGs-based full adder as a comparison with the 3-input majority version. The MFM measurements of the fabricated device with the details about the fabrication process can be found in [15].

# 3.4.2 1-bit multiplexer

In [51] the generic equation of the multiplexer (Eq. 16) was synthesized by using only NAND gates and results in Eq 17.



**Fig. 9** a, SEM image of the fabricated 2-to-1 multiplexer presented in [51]. b, Timing diagram reporting the magnetization of all internal signals. c, Wide-field MOKE image of the measured multiplexer [51]. d, Gate-level representation of the multiplexer.

$$Z = (A \cdot \overline{S}) + (B \cdot S) \tag{16}$$

Therefore, the circuit was designed by fixing one of the input of the majority gate to logic 0.

$$Z = \overline{\overline{(A \cdot \overline{S})} \cdot \overline{(B \cdot S)}} \tag{17}$$

The resulting SEM image of the fabricated circuit is reported in Fig. 9.a. The correct logic behavior of the multiplexer was verified by Wide-field MOKE microscopy [51]. The structure was first saturated to 0 applying a negative pulse. Afterward, subsequent positive and negative pulses were applied with amplitude 20 mT. Fig. 9.c reports the correct ordering when the inputs are (A=1, S=1, B=0). The timing diagram reported in Fig. 9.b shows the signal propagation across the circuit for the input patterns (A=1, S=1, B=0) and (A=1, S=0, B=0). More details on the magnetic stack and the fabrication process can be found in [51].

## 3.4.3 Memory element

Magnetic devices naturally retain their digital information even without power supply. This is the case also for pNML that maintains the binary values when no external field is applied. However, in realistic circuits where the external clocking field is repeatedly applied to the circuit, storage and read-out operation are a crucial issue. In [44], a pNML memory element where the read/write operation can be controlled by two independent signals has been presented. The idea is to confine the domain wall between two notches and control the read/write operation by means of current pulses (Fig. 10.a). The magnetic notch introduces an energy barrier for the incoming domain wall. Therefore, in the normal operation of the pNML circuit, the propagating signal is pinned at the notch. The value can be stored by applying a current pulse on the write signal, which lowers the energy barrier of the notch, releasing the propagation of the domain wall. The energy barrier can be adjusted via the notch size [31] [44]. The current wire, buried in the substrate generates an in-plane field that tilts the angle of the domain wall. Fig. 10.b reports the SEM image of the experiment presented in [44]. The dummy circuit is used to generate the input. Indeed, the left image in Fig. 10.c shows the input at logic 0 pinned at the first notch. The storage was achieved applying a 30 ns wide pulse (Fig. 10.c central). Similarly, the read-out is performed by applying a current pulse on the second wire, which is independent of the write signal. This approach could be used not only to build high-density memory arrays, but also to retain and safely store signals at the input of the logic network before starting the computation.

## 4 System-level design

In this section, the main aspects related to the design of pNML architectures are discussed. The system-level design should take into account signal propagation, layout floorplanning, synchronization, and buffering of the information to properly



**Fig. 10** a, Schematic representation of the memory element proposed in [44] where the read/write operation can be controlled by current pulses. b, SEM image of the fabricated device composed of a dummy circuit for generating the input and two notches. c, WMOKE measurements of the fabricated sample where it is possible to observe the pinned domain wall, the write and read operation when the current pulse is applied.

perform the required computation. In recent years, the research on pNML has been extended to complex architecture such as accumulators [45], Programmable Logic Arrays (PLAs), Finite State Machines (FSMs), memories and different adder implementations [30] [26] [16] [28], thanks to the support of specific EDA tools [41] [43]. Before going into the detail of the methodology proposed to study and investigate such systems, some considerations on the signal propagation and the processing of the magnetic information are given to the reader.

#### 4.1 Signal synchronization

In a general CMOS processing system, logic units and memories are properly organized over the chip area and interconnections enable intra-logic or logic-memory communications. A similar organization can be implemented in pNML, where instead of metallic wires, magnetic nanowires are used to interconnect logic and memory blocks. The intrinsic pipelining of the digital signals require proper buffering of the information to control the motion of the domain walls.

In the previous section, some of the pNML milestone circuits have been recalled. However, in order to analyze and optimize the circuit performance, it is fundamental to study its timing.

#### 4.1.1 Signal delays

Fig. 11 shows the propagation of the logic 0 and logic 1 in a chain of magnets. The initial configuration represents the worst-case scenario, where the input coming from the result of computation needs to propagate through an inverter chain, or in general a chain on ANCs. Therefore, the signal should travel across five magnets FIB irradiated on their left side. It is possible to observe that the logic 0 reaches the output earlier than logic 1. The  $0\rightarrow1$  transition is favored, considering the clock signal starting with a positive pulse (Fig. 11.a). On the contrary, the  $1\rightarrow0$  transition requires an entire clock cycle to start the propagation of the logic 0. In general, according to the length of the critical path, intended as the maximum number of ANCs to be crossed, it is possible to define the input-output delay in term of clock cycles for 2D layout as:

$$N_{cycles} = \frac{N_{ANC_{path}} + 1}{2} \tag{18}$$

Where,  $N_{ANC_{path}}$  is longest ordering path. In the example in Fig. 11,  $N_{ANC_{path}} = 5$  and a consequence the output is valid and can be sampled after three clock cycles. In the case of 3D layouts, where the ferromagnetic coupling is exploited, i.e. when the digital signal is transferred from one physical layer to another, an entire clock cycle per ANC needs to be considered. As a consequence, for correct estimation of the delays, the designer should consider the longest path from the input to the output, counting the ferromagnetically and anti-ferromagnetically crossed ANCs.

#### 4.1.2 Glitches

As happens with CMOS, glitches may propagate errors inside the circuits and need to be suppressed. The main difference with CMOS is that in pNML all paths are sequential. The evolution of the circuit is entirely clock-driven. Therefore, glitches are latched as any other signal is. They do not automatically vanish after a while, as happens with CMOS in combinational paths, but they last at least for half a clock



Fig. 11 a, Propagation of the logic 1 in an inverter chain. b, propagation of the logic 0 in an inverter chain.

# cycle.

Here, a simple circuit with more than one input is considered as a case study. The circuit implements the AND function and its layout is shown in Fig. 12.a. The delays, in this case, depend on both inputs. As an example, the situation with inputs A=1, B=0 is considered. Now, suppose to have a transition on both inputs  $(A\rightarrow 0, B\rightarrow 1)$ . What happens to the output? Fig. 12.b and Fig. 12.c show two timing diagrams where the inputs are changed respectively on the negative and positive pulse. When inputs are changed on a negative pulse (Fig. 12.b), no glitches occur at the output even if there is a time window in which both intermediate signals  $\overline{A}$  and  $\overline{B}$  are 1.



**Fig. 12** a, AND function implemented in planar pNML. b, Timing diagram of the AND gate where no glitches occur. c, Timing diagram of the AND gate where one glitch occurs at the output.

The signal  $\overline{A}$  goes to 1 on the first positive pulse. Afterward, the subsequent negative pulse switches  $\overline{B}$  preventing any 1 from propagating at the output.

On the contrary, if the inputs are changed on a positive pulse (Fig. 12.c),  $\overline{B}$  goes to 0 in the immediate negative half cycle letting a glitch propagating at the output. Indeed, in the presence of the positive field, the output goes to 1 for half clock cycle. This is explained by the different propagation delays of logic 0 and 1.

Glitches can also be generated by reconvergent paths not correctly balanced. Fig. 13.a shows an example where multiple inputs are applied to two logic networks. Each logic network performs some kind of computation and if some conditions are met generates a logic 1 at its output. Then, if both logic networks produce a logic 1 some action needs to be taken. The two logic networks implement different logic functions and require a certain number of clock cycles to perform the computation, identified by  $t_{ln1}$ ,  $t_{ln2}$  in Fig. 13. What could happen is reported in Fig. 13.b. The different delays introduced by each logic network result in a wrong behavior of the circuit. Indeed, the logic ones on LN1 and LN2 do not overlap and the output of the AND gate remains at the logic 0. The simplest solution to adopt in these cases is the introduction of delays in the fastest network by means of inverter chains.

Another possibility is to buffer the information on the magnetic nanowires and release them in the right time instant. This approach requires a small control unit to generate current pulses in the proper time windows [42] [45]. Therefore, the designer needs to analyze and carefully characterize the timing of each logic network in order to:

- Sample/read a glitch-free output
- · Synchronize the logic networks by properly inserting delays
- Buffer the information by introducing magnetic notches [44] [45]



**Fig. 13** a, Circuit example where two logic networks having potentially different delays are fed with the same inputs. b, Sample timing diagram when paths are not balanced. The different delays result in a wrong behavior of the circuit.

Glitches may propagate errors in pNML circuit if sampled by the subsequent logic networks. Therefore, it is up to the designer to carefully define the sampling interval according to the delays involved in the circuit.

## 4.2 pNML as Co-processor

The experiments reported in the literature, developed at the academic level, demonstrate the correct behavior of the fabricated pNML circuit. Simple logic gates to monolithically 3D integrated devices have been shown. Moreover, simple arithmetic, memory and data selector devices have been proven. However, it is not target oriented to imagine a complete replacement of the CMOS technology in the next years, but rather an integration of the promising emerging devices in the CMOS ecosystem. The idea is to integrate a low-power pNML co-processor in the back end of line (BEOL) of current CMOS processes. The pNML processing system can cooperate and be interfaced with the CMOS circuits with MTJs. From a practical point of view, the envisioned on-chip clocking system for pNML has been firstly proposed in [6].



Fig. 14 a, Main portions of integrated circuits including the front end of line (FEOL), the back end of line (BEOL) and the packaging. b, Closeup view of the on-chip inductor integrated in the BEOL that can clock the pNML co-processor, controlled by the CMOS circuitry.  $\mathbf{c}$ , Top view of the inductor.

The integration in the BEOL is easier thanks to the lower resolution required. The pNML fabrication is simple compared to CMOS and can potentially be integrated into the BEOL. Fig. 14.a shows a schematic representation of CMOS chip where the front end of line (FEOL) and the BEOL are highlighted. Fig. 14.b shows a closeup view of the BEOL portion of the chip, where pNML could be integrated. What has been proposed in [6] is an on-chip clocking that sandwiches the magnetic stack and provides a homogeneous field. Indeed, the requirements for a pNML clocking system are:

- · high field amplitude
- · homogeneous field distribution

In [6], the authors assume Co/Ni magnets with switching field < 25 mT and a required clock amplitude of 20 mT. The study focuses on the main source of losses of an on-chip inductor designed to have homogeneous field. The pNML circuit, 3D stacked or 2D, is sandwiched between two ferromagnetic cladding layers (Fig. 14.b). The investigated cladding materials were NiFe, Supermalloy (Spy) and CoZrTa [27]. Their findings show that a power density of 3 W cm<sup>-2</sup> can be achieved at 50 MHz with 35x power saving compared to CMOS with a comparable throughput [6]. The proposed clocking system offers rectangular region with homogeneous field, clocked in the MHz regime.

#### 4.3 EDA tools and compact model

Many studies have been performed in literature on pNML devices both at the experimental and at the micromagnetic level. However, to understand the potential and drawbacks of the technology, it is fundamental to exploit the validated technological principle at a higher level of abstraction. Compact models [51] [57] describing in a sufficiently accurate way the behavior of the experimentally-validated building blocks are useful to circuit designers. Indeed, compact models are bridging the two



Fig. 15 Design flow provided by MagCAD for the pNML technology

worlds of the technology engineers and the system-level designers. They provide high computational speed enabling system-level explorations.

As happened in the mid-1970s with CMOS, EDA tools are a fundamental help to the designers for improving the technology at the architectural level and give feedback to the technologist regarding what needs to be improved. These tools facilitate the design and logic simulation of logic circuits.

At the time of writing, the software environment ToPoliNano [43] offers a complete design flow for the system-level exploration of field coupled technologies, including pNML. In particular, its layout editor MagCAD [41] provides the possibility to design, simulate and re-use pNML modules in larger architectures. It integrates all the low-level build blocks experimentally demonstrated in the literature. It is able to automatically detect simple logic functions and errors in the design [29], generating the

circuit netlist that integrates the compact model of every building block. Moreover, it helps the designer dealing with signal synchronization and delays. MagCAD reports circuit metrics like the critical path, the minimum pulse width to achieve reliable switching and the bounding box area. The compact model available in MagCAD is based on the works presented in [57] [51] [50].

To model a pNML circuits three main contributions have to be considered:

- 1. The artificial nucleation center (ANC).
- 2. The magnetic nanowire.
- 3. The coupling field on the ANC.

The ANC behavior is modeled as a Stoner-Wohlfarth particle [48]. This assumption is valid for small particle that shows coherent reversal. The switching probability of a given particle is derived by the Arrhenius equation [57]. The characteristics of the ANC are used to describe the switching behavior of the particle. The magnetic nanowire introduces a propagation delay between the nucleated domain wall and the complete reversal of the nanomagnet. The coupling field, summed up with the external field provides the effective field acting on the ANC supporting or preventing the switching.

Therefore, both the nucleation and propagation of the domain wall describe the signal propagation in pNML. The propagation can take place in-plane, in the case of 2D layouts, or vertically in the case of 3D layouts. As a consequence, it is possible to define the minimum pulse with  $(t_{pulse})$  as the time required to nucleate the domain wall  $(t_{nuc})$  and the time required to propagate it over the entire structure  $(t_{prop})$ .

$$t_{pulse} \ge t_{nuc} + t_{prop} \tag{19}$$

The nucleation time can be expressed in term of the desired probability as:

$$t_{nuc} = -\tau(H_{eff}) \cdot \ln\left(1 - P_{nuc}\right) \tag{20}$$

Where,  $H_{eff}$  represents the switching time constant and  $P_{nuc}$  the desired nucleation probability. The switching time constant can be derived by the inverse of the well known Arrhenius equation:

$$\tau(H_{eff}) = f_0^{-1} \cdot e^{\left(\frac{E_0 \left(1 - \frac{H_{eff}}{H_0}\right)^2}{k_B T}\right)}$$
(21)

Where  $f_0$  is the reversal attempt frequency,  $k_B$  the Boltzmann constant and T is the temperature in Kelvin. The numerator at the exponent represents the total energy barrier of the structure, with  $E_0$  the energy barrier at zero field:

$$E_0 = \frac{K_{ANC}V_{ANC}}{k_B T} \tag{22}$$

that depends on the volume of the ANC, its effective anisotropy and the temperature. The field required reverse the magnetization direction of a magnet at zero temperature in Eq. 21 is derived from Eq. 5 and can be expressed as:

$$H_0 = \frac{2K_{ANC}}{\mu_0 M_s} \tag{23}$$

The Arrhenius equation gives also the probability of nucleating the domain wall under an effective field  $H_{eff}$ .

$$P_{nuc}(t_{nuc}, H_{eff}) = 1 - e^{\left(-\frac{t_{nuc}}{\tau(H_{eff})}\right)}.$$
 (24)

After modeling the nucleation behaviour of a Stoner-Wohlfarth parcticle (the ANC), the propagation of a domain wall shall be described. The propagation time under an effective  $H_{eff}$  can be defined as:

$$t_{prop} = \frac{l_{mag}}{v_{DW}(H_{eff})}$$
 (25)

The velocity depends on the field intensity, it can be classified in three different regimes: the creep, the depinning and the flow regime. More details can be found in [51] [57]. The model includes material properties and geometrical information of the pNML circuit in order to estimate propagation delays, nucleation times under specific user-defined conditions. Some high-level studies reported the exploration of architectural solutions for pNML memories [30], the use of 3D pNML as a co-processor for the execution of the summed-area table algorithm [42] and the development of reversible gates [46]. The design flow offered by the MagCAD tool is schematically represented in Fig. 15. The element pool represents all the lowlevel blocks available in pNML. The designer is facilitated by an intuitive graphical user interface, which offers support for 3D circuit designs. When the design is completed, the netlist integrating the compact model can be automatically extracted thanks to a technology-independent algorithm [29]. Connections and functions are automatically detected and the final netlist is made available to the user for simulation. The user can verify the circuit timing in order to properly synchronize logic networks and properly sample output signals avoiding glitches.

#### 5 Design example

In this section, the design of a 4-bit carry select adder is described to show the design flow provided by the MagCAD tool, presented in section 4.3. This circuit has been chosen to highlight the hierarchical approach offered by the framework. Indeed, the presented layout exploits as much as possible the reuse of sub-modules. The architecture of the implemented adder is reported in Fig. 16.b for the sake of

clarity. The carry select adder is usually used to split the critical path of the standard



**Fig. 16** Hierarchical design of a 4-bit carry select adder designed with MagCAD. **a**, Top level of the architecture that includes three 2-bit ripple carry adders and one 3-bit multiplexer 2to1 at 1 bit. **b**, Schematic representation of the presented adder. **c**, Inner view of the 3-bit multiplexer 2to1 block that includes three multiplexers single bit. **d**, Inner view of the 2-bit ripple carry adder including two full adders.

ripple carry adder. Therefore, the 4-bit adder is divided in blocks. Every block is made of two full adders. To speed up the computation the addition is computed both for input carry equal to "0" and "1". The multiplexer at the output is used to select to correct result based on the output coming from the previous stage. The entire architecture is made of three 2-bit ripple carry adders and three 1-bit 2to1 multiplexer. The internal layout of the ripple carry is reported in Fig. 16.c. Here, two full adders are instantiated as sub-modules. The full adder implementation adopted is the one presented in section 3.4.1, in particular Fig. 8.b. Similarly, the multiplexer block depicted in Fig. 16.a includes three 2to1 multiplexers. The layout is reported in Fig. 16.d. Its internal structure is a NAND based implementation, equal to the fabricated device presented in section 3.4.2. Fig. 16 shows that the entire design has



Fig. 17 Extract of the simulation where input A=1000, B=0100 and  $C_{in}=1$  are applied. The simulation shows the input-output latency of the circuit.

Table 1 Simulation parameters according to [51]

| Parameters                                     | Value                             |
|------------------------------------------------|-----------------------------------|
| Clock field amplitude                          | 560 Oe                            |
| Intrinsic pinning field                        | 190 Oe                            |
| Coupling field strength for the inverter       | 153 Oe                            |
| Coupling field strength for the majority voter | 48 Oe                             |
| Coupling field strength for the magnetic via   | 75 Oe                             |
| Effective anisotropy                           | $2.0 \cdot 10^5 \text{ J/m}^3$    |
| Saturation magnetization of Co                 | 1.4 · 10 <sup>6</sup> A/m         |
| Thickness of the Co                            | 3.2 · 10 <sup>-9</sup> m          |
| Thickness of the stack                         | 6.2 · 10 <sup>-9</sup> m          |
| ANC volume                                     | $1.68 \cdot 10^{-23} \text{ m}^3$ |
| Notch apex angle                               | 51.5°                             |
| Notch width                                    | 54 · 10 <sup>-9</sup> m           |
| Nanowire width                                 | 220nm                             |
| Temperature                                    | 293 <i>K</i>                      |

been developed exploiting a multilayer design approach. The two colors represent two physical layers that have been used not only to arrange the gates but also to route the interconnections. The width of the magnetic nanowires is 220 nm, resulting in a bounding box area of  $186.12\,\mu\text{m}^2$ . The netlist, automatically extracted by the tool, has been used for simulating its behavior. The parameters used for the simulation are reported in Table 1. An extract of the VHDL simulation is reported in Fig. 17. It takes into account the propagation delays within the magnetic nanowires and the nucleation time. It is also possible to observe the pipelined behavior of the technology and evaluate its latency. The input signals require 11 clock cycles, in the worst case, to reach the output of the circuit. It can be easily understood that this kind of analysis is unfeasible using micromagnetic simulators. The modular approach provided by

the tool makes it possible to validate every sub-module in the hierarchy and reduce the possibility of making errors in the design. This is just an example of how this methodology can be exploited to investigate the pNML technology. Other works in the literature have explored even larger architectures and made a comparing the start of the art CMOS. In [42], the authors have a presented the implementation of a summed area table algorithm using pNML. The circuit compactness is very promising thanks to the 3D integrability offered by the technology. On the contrary, the throughput is lower in the pNML case as expected. However, the presented layout had the goal to show the potential of the multilayer structure without putting too much effort in performance optimization.

#### 6 Conclusion and Outlook

A lot of magnetic device research is going on to identify promising candidates for the development of the next-generation low-power devices [47] [38] [39].

The last sections introduced and exemplified the concept of pNML technology starting from device level of engineered Co/Pt islands up to system level investigations with ToPoliNano/MagCAD. With 3D integrated pNML devices it has been shown, that in principle pNML devices in the present state are competitive with conventional CMOS and could gain in monolithic 3D integrated, low-power, highly pipelined and systolic architectures. Furthermore, no severe showstopper in this stage of technology readiness level (TRL3-TRL4) can be seen.

#### **6.1 Material improvements**

However, there is of course room for improvement, and we claim that the next generation of pNML (let's call it pNML 2.0, if this book chapter was pNML 1.0) would give rise to further improvement and readiness for ultimate-scale-integration in a CMOS compatible technology. On materials level, we think that ultrathin bilayers like Ta/CoFeB/MgO already applied in tunnelling barriers of magnetic tunnel junctions (MTJs) would be the ideal fit, as it could be directly integrated in electrical writing/reading actuators/sensors for pNML 2.0 technology. CoFeB offers low switching fields, decent field-driven domain-wall speed and spin-transfer-torque and, in combination with Ta, the so-called spin-orbit-torque effects for efficient electrical transductions concepts. Compared to Co/Pt multilayers, CoFeB has less PMA and is more sensitive to fabrication variations, but such, can also be much more engineered and tailored. Few nanometer thick filmstacks could be sufficient for a functional NML layer, when vertically stacked in an ultimately scaled layout.

#### **6.2** ANC engineering

Besides new materials, the consequent development of improved device design is needed. 3D arrangements of islands are mandatory and the generation of self-aligned artificial nucleation centers is needed to render the serial technique of focused ion beam radiation superfluous. Photoresist masks combined with ion implantation technology for mass fabrication should be introduced and applied on wafer scale. Ga-ions are an ion species that has been heavily used for direct magnetic patterning. However, the ion dose for fine-tuned anisotropy control in multilayer films, is somewhat extremely small. In current pNML technology, about 50 ions at 50 keV are sufficient to create artificial nucleation centers. In contrast, He ions are known to be less invasive. Most likely 3 orders of magnitude higher areal dose has to be applied for similar impact. We speculate that the use of He ions would lead to much better control of the nucleation centers in pNML devices. The drawback is, that depth control is more difficult to achieve with He ions. But certainly, light ion irradiation with He ions combined with highest resolution lithography would be a prospective path to follow. However, it might turn out, that FIB radiation for prototyping pNML devices is not working for novel films like CoFeB/MgO. Completely avoiding ion-implantation but controlling PMA via interfacial strain, lateral control of the stack composition or even electrical control of PMA seems to be in reach.

#### **6.3** Switching field Distributions

As one is dealing with a large number of devices in integrated logic circuits, switching field distributions can not be overcome by system level compensation as e.g. done in memory technology. Every device should be fully functional and performant over the range of several sigma deviation. To our knowledge, this is maybe the hardest problem to tackle in nanomagnetic logic device research. Even though, sub-microsecond switching-distributions have been investigated, it turns out, that an artificial nucleation center is difficult to fully understand in terms of dynamic effects. There is strong indication, that at very fast rising field-amplitudes, Arrhenius-type models for the underlying distribution come to a limit. Future research has to address second-order effects in ANC magnet reversal and should be able to time-resolve the domain-wall formation and propagation of and ANC and its surrounding. From such investigations, an optimized ANC both in position and geometry could be deduced. As pointed out, for pNML circuits, the underlying distributions of the devices is essential.

#### 6.4 Clocking mechanism

Logic circuits in pNML technology need a global clock both for computation and signal propagation. This is on the one hand advantageous, as electrical connections are avoided and an oscillating magnetic field is acting as power supply and clocksignal at once. On the other hand disadvantageous, as on-chip field-generation seems to be rather difficult to achieve with low power. However, the proposed clocking scheme with magnetically cladded planar wires could be used for clocking zones of up to 0.5 square millimeter area and supply voltages smaller than 1 V, i.e. in the range of current processor chip supplies. Experimental results from the research community on high-frequency, high-Q on chip inductors shall be adopted to form efficient on-chip clock generators with highly permeable cladding materials for power-efficient clock-fields. Furthermore, by further stacking many active pNML layer in one on-chip clocking inductor, the needed overhead of clock-generation circuitry (typically controlled by the CMOS main processor chip) is minimized. Nevertheless, it is an open question, how the sandwiched pNML circuits can be interfaced with electrical wiring from the main processor chip or for electrical in-/output.

# 6.5 System level explorations

As reported, a functionally complete device family was demonstrated for 3D NML, including 3D signal routing via domain wall conduits. For system level investigations, compact modelling and especially the ToPoliNano simulation environment has been proven extremely helpful. The architectural solutions proposed in the literature showed the main aspects to be considered during the design phase. Moreover, the 3D designs available in the literature demonstrated high compactness compared with state of the art CMOS, even with large pNML magnet width. The flexibility of the environment makes it possible to update the compact model according to the future development of technology. It would be also interesting to develop a standard cell set of logic gates and enrich the framework with a automatic pNML layout engine. An automatic layout engine would increase the circuit compactness, better exploiting the functional layers and reduce the possibility of potential errors custom made layouts.

## 6.6 Magnetic devices for ULSI

However, everyone familiar with ultimate-large-scale-integration is sceptic of paradigm-changes in digital computation, simply because the MOS technology has been improved and optimized over many decades and has been the big successor. In other words, disruptive approaches in digital technologies are probably unlikely to happen—but does it mean simply: CMOS forever? It might be true for general purpose digi-

tal computation, but, there is a strong trend towards hybrid systems and non-Boolean concepts of computation might gain even more momentum. Magnetic devices are more than attractive for such implementations, as they combine very interesting characteristics like robustness against radiation, bi-stability, scaling potential, room-temperature operation combined with rich high-frequency dynamics. Hence, it is worth to study those and exploit them for real-world applications in future e.g. for classical digital computing concepts but also and even more for non-conventional concepts like spin-wave, neuromorphic or logic-in-memory computation.

Acknowledgements There are numerous people who greatly contributed to this work, too many to name them all personally. We would like to thank the Bachelor- and Master students of the involved research groups, the staff of the former TUM Chair for Technical Electronics, the Researchers Xueming Yu, Josef Kiermaier, Stephan Breitkreutz-v. Gamm, Irina Eichwald and Gražvydas Žiemys, Simon Mendisch, Valentin Ahrens, the staff from the VLSI group of Politecnico di Torino, the Professors Maurizio Zamboni, Mariagrazia Graziano, and Marco Vacca, the Researchers Giovanna Turvani, Umberto Garlando and Fabrizio Cairo.

Further thanks to our mentors Wolfgang Porod, Doris Schmitt-Landsiedel, Paolo Lugli.

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Projektnummer (403505866; 114933698; 229838035) and the Technical University of Munich - Institute for Advanced Study, funded by the German Excellence Initiative, is gratefully acknowledged.

#### References

- Allwood, D.A., Xiong, G., Cooke, M.D., Faulkner, C.C., Atkinson, D., Vernier, N., Cowburn, R.P.: Submicrometer ferromagnetic NOT gate and shift register. Science 296, 2003–2006 (2002)
- Amarú, L., Gaillardon, P., Chattopadhyay, A., De Micheli, G.: A sound and complete axiomatization of majority-n logic. IEEE Transactions on Computers 65(9), 2889–2895 (2016)
- Amarú, L., Gaillardon, P., De Micheli, G.: Bds-maj: A bdd-based logic synthesis tool exploiting majority logic decomposition. In: 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1–6 (2013)
- Amarú, L., Gaillardon, P., De Micheli, G.: Majority-inverter graph: A new paradigm for logic optimization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 35(5), 806–819 (2016)
- Becherer, M., Breitkreutz-v. Gamm, S., Eichwald, I., Žiemys, G., Kiermaier, J., Csaba, G., Schmitt-Landsiedel, D.: A monolithic 3D integrated nanomagnetic co-processing unit. Solid-State Electronics 115, Part B, 74–80 (2016). DOI 10.1016/j.sse.2015.08.004
- Becherer, M., Kiermaier, J., Breitkreutz, S., Eichwald, I., Åœiemys, G., Csaba, G., Schmitt-Landsiedel, D.: Towards on-chip clocking of perpendicular nanomagnetic logic. Solid-State Electronics 102, 46 – 51 (2014). DOI https://doi.org/10.1016/j.sse.2014.06.012. URL http://www.sciencedirect.com/science/article/pii/S0038110114001452. Selected papers from ESSDERC 2013
- Brayton, R., Mishchenko, A.: Abc: An academic industrial-strength verification tool. In: Proceedings of the 22Nd International Conference on Computer Aided Verification, CAV'10, pp. 24–40. Springer-Verlag, Berlin, Heidelberg (2010)
- Breitkreutz, S., Eichwald, I., Kiermaier, J., Papp, A., Csaba, G., Niemier, M., Porod, W., Schmitt-Landsiedel, D., Becherer, M.: 1-bit full adder in perpendicular nanomagnetic logic using a novel 5-input majority gate. EPJ Web of Conferences 75(05001) (2014)

- Breitkreutz, S., Kiermaier, J., Eichwald, I., Hildbrand, C., Csaba, G., Schmitt-Landsiedel, D., Becherer, M.: Experimental demonstration of a 1-bit full adder in perpendicular nanomagnetic logic. IEEE Transactions on Magnetics 49(7), 4464–4467 (2013)
- Breitkreutz, S., Kiermaier, J., Eichwald, I., Ju, X., Csaba, G., Schmitt-Landsiedel, D., Becherer,
   M.: Majority Gate for Nanomagnetic Logic with Perpendicular Magnetic Anisotropy. IEEE
   Transactions on Magnetics 48(11), 4336–4339 (2012). DOI 10.1109/TMAG.2012.2197184
- Breitkreutz, S., Kiermaier, J., Eichwald, I., Ju, X., Csaba, G., Schmitt-Landsiedel, D., Becherer, M.: Majority gate for nanomagnetic logic with perpendicular magnetic anisotropy. IEEE Transactions on Magnetics 48(11), 4336–4339 (2012)
- Breitkreutz, S., Kiermaier, J., Eichwald, I., Ju, X., Csaba, G., Schmitt-Landsiedel, D., Becherer, M.: Investigations on nanomagnetic logic by experiment-based compact modeling. Nanoelectronic Device Applications Handbook, ser. Devices, Circuits, and Systems. CRC Press pp. 779–90 (2013)
- Breitkreutz, S., Kiermaier, J., Ju, X., Csaba, G., Schmitt-Landsiedel, D., Becherer, M.: Nanomagnetic Logic: Demonstration of Directed Signal Flow for Field-coupled Computing Devices. In: IEEE Proceedings of the 41st European Solid-State Device Research Conference ESSDERC, pp. 323–326 (2011). DOI 10.1109/ESSDERC.2011.6044169
- Breitkreutz, S., Kiermaier, J., Karthik, S.V., Csaba, G., Schmitt-Landsiedel, D., Becherer, M.: Controlled reversal of Co/Pt dots for nanomagnetic logic applications. Journal of Applied Physics 111(7), 07A715 (2012). DOI 10.1063/1.3675171
- Breitkreutz, Stephan, Eichwald, Irina, Kiermaier, Josef, Papp, Adam, Csaba, György, Niemier, Michael, Porod, Wolfgang, Schmitt-Landsiedel, Doris, Becherer, Markus: 1bit full adder in perpendicular nanomagnetic logic using a novel 5-input majority gate. EPJ Web of Conferences 75, 05001 (2014). DOI 10.1051/epjconf/20147505001. URL https://doi.org/10.1051/epjconf/20147505001
- Cairo, F., Turvani, G., Riente, F., Vacca, M., Gamm, S.B., Becherer, M., Graziano, M., Zamboni, M.: Out-of-plane nml modeling and architectural exploration. In: 2015 IEEE 15th International Conference on Nanotechnology (IEEE-NANO), pp. 1037–1040 (2015)
- Carcia, P., Coulman, D., McLean, R., Reilly, M.: Magnetic and structural properties of nanophase pt/co multilayers. Journal of Magnetism and Magnetic Materials 164(3), 411 419 (1996). DOI https://doi.org/10.1016/S0304-8853(96)00410-6. URL http://www.sciencedirect.com/science/article/pii/S0304885396004106
- Causapruno, G., Riente, F., Turvani, G., Vacca, M., Ruo Roch, M., Zamboni, M., Graziano, M.: Reconfigurable systolic array: From architecture to physical design for nml. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 24(11), 3208–3217 (2016)
- Church, A.: Sze-tsen hu. threshold logic. university of california press, berkeley and los angeles 1965, xiv 338 pp. Journal of Symbolic Logic 40(2), 250–250 (1975). DOI 10.2307/2271930
- Csaba, G., Becherer, M.: Nanomagnet Logic: Computing by magnetic ordering. IEEE Nanotechnology Magazine 14(1), 6–13 (2020). DOI 10.1109/MNANO.2019.2952232
- Eichwald, I., Bartel, A., Kiermaier, J., Breitkreutz, S., Csaba, G., Schmitt-Landsiedel, D., Becherer, M.: Nanomagnetic logic: Error-free, directed signal transmission by an inverter chain. IEEE Transactions on Magnetics 48(11), 4332–4335 (2012)
- Eichwald, I., Breitkreutz, S., Kiermaier, J., Csaba, G., Schmitt-Landsiedel, D., Becherer, M.: Signal crossing in perpendicular nanomagnetic logic. Journal of Applied Physics 115(17), 17E510 (2014). DOI 10.1063/1.4863810
- Eichwald, I., Breitkreutz, S., Ziemys, G., Csaba, G., Porod, W., Becherer, M.: Majority logic gate for 3d magnetic computing. Nanotechnology 25(33), 335202 (2014)
- Eichwald, I., Breitkreutz-v. Gamm, S., Ziemys, G., Csaba, G., Becherer, M.: Demonstration of monolithically fabricated 3d magnetic computing devices in perpendicular nanomagnetic logic. In: BIT's 4th Annual World Congress of Advanced Materials (2015)
- Eichwald, I., Kiermaier, J., Breitkreutz, S., Wu, J., Csaba, G., Schmitt-Landsiedel, D., Becherer, M.: Towards a signal crossing in double-layer nanomagnetic logic. IEEE Transactions on Magnetics 49(7), 4468–4471 (2013)

- Ferrara, A., Garlando, U., Gnoli, L., Santoro, G., Zamboni, M.: 3d design of a pnml random access memory. In: 2017 13th Conference on Ph.D. Research in Microelectronics and Electronics (PRIME), pp. 5–8 (2017)
- Gardner, D., Schrom, G., Paillet, F., Jamieson, B., Karnik, T., Borkar, S.: Review of on-chip inductor structures with magnetic films. IEEE Transactions on Magnetics 45(10), 4760–4766 (2009). DOI 10.1109/TMAG.2009.2030590
- Garlando, U., Riente, F., Cirillo, G.A., Graziano, M., Zamboni, M.: Design and characterization of circuit based on emerging technology: the magcad approach. In: 2018 IEEE 18th International Conference on Nanotechnology (IEEE-NANO), pp. 1–4 (2018)
- Garlando, U., Riente, F., Graziano, M.: Funcode: Effective device-to-system analysis of field coupled nanocomputing circuit designs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems pp. 1–1 (2020)
- Garlando, U., Riente, F., Turvani, G., Ferrara, A., Santoro, G., Vacca, M., Graziano, M.: Architectural exploration of perpendicular nano magnetic logic based circuits. Integration 63, 275 282 (2018). DOI https://doi.org/10.1016/j.vlsi.2018.05.001. URL http://www.sciencedirect.com/science/article/pii/S0167926017306090
- Goertz, J.J.W., Ziemys, G., Eichwald, I., Becherer, M., Swagten, H.J.M., Breitkreutz-v. Gamm, S.: Domain wall depinning from notches using combined in- and out-of-plane magnetic fields. AIP Advances 6(5), 056407 (2016). DOI 10.1063/1.4944698. URL https://doi.org/10.1063/1.4944698
- 32. Hampel, D., Winder, R.O.: Threshold logic. IEEE Spectrum 8(5), 32-39 (1971)
- Imre, A., Csaba, G., Ji, L., Orlov, A., Bernstein, G., Porod, W.: Majority logic gate for magnetic quantum-dot cellular automata. Science 311, 205–208 (2006). DOI 10.1126/science.1120506
- Jiang, J.H.R., Devadas, S.: Chapter 6 logic synthesis in a nutshell. In: L.T. Wang, Y.W. Chang, K.T.T. Cheng (eds.) Electronic Design Automation, pp. 299 – 404. Morgan Kaufmann, Boston (2009)
- Johnson, M.T., Bloemen, P.J.H., den Broeder, F.J.A., de Vries, J.J.: Magnetic anisotropy in metallic multilayers. Reports on Progress in Physics 59(11), 1409–1458 (1996)
- Kiermaier, J., Breitkreutz, S., Eichwald, I., EngelstÀdter, M., Ju, X., Csaba, G., Schmitt-Landsiedel, D., Becherer, M.: Information transport in field-coupled nanomagnetic logic devices. Journal of Applied Physics 113(17), 17B902 (2013). DOI 10.1063/1.4794184. URL https://doi.org/10.1063/1.4794184
- 37. Mendisch, S., Ahrens, V., Kiechle, M., Papp, A., Becherer, M.: Perpendicular nanomagnetic logic based on low anisotropy Co\Ni multilayer. Journal of Magnetism and Magnetic Materials p. 166626 (2020). DOI 10.1016/j.jmmm.2020.166626
- 38. Nikonov, D.E., Young, I.A.: Overview of beyond-cmos devices and a uniform methodology for their benchmarking. Proceedings of the IEEE 101(12), 2498–2533 (2013)
- Nikonov, D.E., Young, I.A.: Benchmarking of beyond-cmos exploratory devices for logic integrated circuits. IEEE Journal on Exploratory Solid-State Computational Devices and Circuits 1, 3–11 (2015)
- Perricone, R., Liu, Y., Dingler, A., Hu, X.S., Niemier, M.: Design of stochastic computing circuits using nanomagnetic logic. IEEE Transactions on Nanotechnology 15(2), 179–187 (2016)
- Riente, F., Garlando, U., Turvani, G., Vacca, M., Roch, M.R., Graziano, M.: Magcad: A tool for the design of 3d magnetic circuits. IEEE Journal on Exploratory Solid-State Computational Devices and Circuits 3, 65–73 (2017). DOI 10.1109/JXCDC.2017.2756981
- Riente, F., Melis, D., Vacca, M.: Exploring the 3-d integrability of perpendicular nanomagnet logic technology. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 27(7), 1711–1719 (2019)
- Riente, F., Turvani, G., Vacca, M., Roch, M.R., Zamboni, M., Graziano, M.: Topolinano: A cad tool for nano magnetic logic. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 36(7), 1061–1074 (2017). DOI 10.1109/TCAD.2017.2650983
- 44. Riente, F., Ziemys, G., Mattersdorfer, C., Boche, S., Turvani, G., Raberg, W., Luber, S., Breitkreutz-v. Gamm, S.: Controlled data storage for non-volatile memory cells embedded in

- nano magnetic logic. AIP Advances **7**(5), 055910 (2017). DOI 10.1063/1.4973801. URL https://doi.org/10.1063/1.4973801
- Riente, F., Ziemys, G., Turvani, G., Schmitt-Landsiedel, D., Gamm, S.B., Graziano, M.: Towards logic-in-memory circuits using 3d-integrated nanomagnetic logic. In: 2016 IEEE International Conference on Rebooting Computing (ICRC), pp. 1–8 (2016)
- Sayedsalehi, S., Azadi Motlagh, Z.: Characterisation of a perpendicular nanomagnetic cell and design of reversible xor gates based on perpendicular nanomagnetic cells. IET Circuits, Devices Systems 14(1), 17–24 (2020)
- Stamps, R.L., Breitkreutz, S., Åkerman, J., Chumak, A.V., Otani, Y., Bauer, G.E.W., Thiele, J.U., Bowen, M., Majetich, S.A., Kläui, M., Prejbeanu, I.L., Dieny, B., Dempsey, N.M., Hillebrands, B.: The 2014 magnetism roadmap. Journal of Physics D: Applied Physics 47(33), 333001 (2014)
- Stoner, E.C., Wohlfarth, E.P.: A mechanism of magnetic hysteresis in heterogeneous alloys. London (1948)
- Tougaw, P.D., Lent, C.S.: Logical devices implemented using quantum cellular automata. Journal of Applied Physics 75(3), 1818–1825 (1994). DOI 10.1063/1.356375. URL https://doi.org/10.1063/1.356375
- Turvani, G., Riente, F., Plozner, E., Schmitt-Landsiedel, D., Breitkreutz-v. Gamm, S.: A compact physical model for the simulation of pnml-based architectures. AIP Advances 7(5), 056005 (2017). DOI 10.1063/1.4974015. URL https://doi.org/10.1063/1.4974015
- Turvani, G., Riente, F., Plozner, E., Vacca, M., Graziano, M., Gamm, S.B.: A pnml compact model enabling the exploration of three-dimensional architectures. IEEE Transactions on Nanotechnology 16(3), 431–438 (2017)
- Turvani, G., Tohti, A., Bollo, M., Riente, F., Vacca, M., Graziano, M., Zamboni, M.: Physical design and testing of nano magnetic architectures. In: 2014 9th IEEE International Conference on Design Technology of Integrated Systems in Nanoscale Era (DTIS), pp. 1–6 (2014)
- 53. Vacca, M., Frache, S., Graziano, M., Riente, F., Turvani, G., Roch, M.R., Zamboni, M.: ToPoliNano: NanoMagnet Logic Circuits Design and Simulation, pp. 274–306. Springer Berlin Heidelberg, Berlin, Heidelberg (2014)
- Varga, E., Niemier, M., Csaba, G., Bernstein, G., Porod, W.: Experimental Realization of a Nanomagnet Full Adder Using Slanted-Edge Magnets. Magnetics, IEEE Transactions on 49(7), 4452–4455 (2013). DOI 10.1109/TMAG.2013.2249576
- Žiemys, G., Ahrens, V., Mendisch, S., Csaba, G., Becherer, M.: Speeding up nanomagnetic logic by DMI enhanced Pt/Co/Ir films. AIP Advances 8(5), 056310 (2018). DOI 10.1063/1.5007308
- Wei Wang, Walus, K., Jullien, G.A.: Quantum-dot cellular automata adders. In: 2003 Third IEEE Conference on Nanotechnology, 2003. IEEE-NANO 2003., vol. 1, pp. 461–464 vol.2 (2003)
- Žiemys, G., Giebfried, A., Becherer, M., Eichwald, I., Schmitt-Landsiedel, D., Gamm, S.B.: Modelling and simulation of nanomagnetic logic with cadence virtuoso using verilog-a. In: 2015 45th European Solid State Device Research Conference (ESSDERC), pp. 97–100 (2015)