Resource Efficient Design and Implementation of Truncated Multiplier of FPGA

AKULA SHALINI, PG STUDENT

Mr.P.MADHAVA RAO M.Tech Asst. Prof

Mr. Rupa kumar Dhanavath, Associate Professor
Dept.of ECE

NAGOLE INSTITUTE OF TECHNOLOGY AND SCIENCE, ), Hayathnagar (M), Hyderabad, R.R.Dist– 501 505

Abstract- Multiplication is frequently required in digital signal processing. Parallel multipliers provide a high-speed method for multiplication, but require large area for VLSI implementations. In most signal processing applications, a rounded product is desired to avoid growth in word size. Thus an important design goal is to reduce the area requirement of the rounded output multiplier. This paper presents a method for parallel multiplication which computes the products of two n-bit numbers by summing only the most significant columns with a variable correction method. This paper also presents a comparative study of Field Programmable Gate Array (FPGA) implementation of 8X8 standard and truncated multipliers using Very High Speed Integrated Circuit Hardware Description Language (VHDL). Truncated multipliers can be used in finite impulse response (FIR) and discrete cosine transforms (DCT). The truncated multiplier shows much more reduction in device utilization as compared to standard multiplier. Significant reduction in FPGA resources, delay, and power can be achieved using truncated multipliers instead of standard parallel multipliers when the full precision of the standard multiplier is not required.

Index Terms- Digital Signal Processing (DSP), Field Programmable Gate Array (FPGA), Truncated Multiplier, Variable correction method, VHDL

I. INTRODUCTION

Multiplier provide high speed method for multiplications, but require large area for VLSI implementation. In most signal processing applications, rounded product is required to avoid growth in word size. Thus an important aim is to design a multiplier which required less area and that is possible with the truncated multiplier. In the wireless multimedia word, DSP systems are ubiquitous. DSP algorithms are computationally intensive and test the limits of battery life in portable device such as cell phones, hearing aids, MP3 players, digital video recorders and so on. Multiplication is the main operation in many signal processing algorithms hence efficient parallel multipliers is desirable. A full-width digital n × n bits multiplier computes the 2n bits output as a weighted sum of partial products. A multiplier with the output represented on n bits output is useful, as example, in DSP data paths which saves the output in the same n bits registers of the input.

A truncated multiplier is an n × n multiplier with n bits output. Since in a truncated multiplier the n less significant bits of the full-width product are discarded, some of the partial products are removed and replaced by a suitable compensation function, to trade-off accuracy with hardware cost. As more columns are eliminated, the area and power consumption of the arithmetic unit are significantly reduced, and in many cases the delay also decreases.

The trade-off is that truncating the multiplier matrix introduces additional error into the computation. Recent advancements in VLSI technology and in particular, the increasing complexity and capacity of state-of-the-art programmable logic devices have been making hardware emulations possible. The underlying key of the emulation system is to use SRAM-based field programmable gate arrays (FPGAs) which are very flexible and dynamically reconfigurable. In many cases implementation of DSP algorithm demands using Application Specific Integrated Circuits(ASICs).The development cost for Application Specific Integrated Circuits(ASICs) are high, algorithms should be verified and optimized before implementation.
Processing (DSP), image processing and multimedia requires extensive use of multiplication. The truncated multipliers can easily be implemented using Field Programmable Gate Array (FPGA) devices.

In FPGAs, the choice of the optimum multiplier involves three key factors: area, propagation delay and reconfiguration time. An FPGA is a digital integrated circuit that comes in a wide variety of size and with many different combinations of internal and external features. The state-of-the-art FPGAs consist of relatively small blocks of programmable logic. These blocks, each of which typically contains a few registers and a few dozen low level, configurable logic elements, are arranged in a grid pattern and tied together using programmable interconnects.

II. BLOCK DIAGRAM

The basic design of the multiplier is the same as that of a constant correction fixed width multiplier. The least significant N-2 partial product columns of a full width multiplier are truncated. The partial product terms in the N-1 column are then added to the partial product terms in the Nth column using full-adders. This is done in order to offset the error introduced due to truncation of least significant N-2 columns.

Fig 1: Block Diagram

The basic design of the multiplier is the same as that of a constant correction fixed width multiplier. The least significant N-2 partial product columns of a full width multiplier are truncated. The partial product terms in the N-1 column are then added to the partial product terms in the Nth column using full-adders. This is done in order to offset the error introduced due to truncation of least significant N-2 columns. The correction term that is generated is based on the following arguments,

1) column are then added to the partial product terms in the Nth column using full-adders. This is done in order to offset the error introduced due to truncation of least significant N-2 columns. The biggest column in the entire partial product array of a full-width multiplier is the Nth column.

III. BLOCK DIAGRAM EXPLANATION

Fig.1 shows the block diagram of truncated 8x8 multiplier. Fig.5 shows the architecture of truncated 8x8 multiplier. The internal RTL schematic of truncated 8x8 multiplier is shown in fig.6. The total equivalent gate count in case of standard 8x8 multiplier is 702 and that is improved to 456 using truncated 8x8 multiplier. The power consumption in case of standard 8x8 multiplier is 419mW and that is also improved to 156mW using truncated 8x8 multiplier. The number of occupied slices used in truncated multiplier is also improved. In case of standard 8x8 multiplier it is 60 and in truncated 8x8 multiplier it is 42.

This is done in order to offset the error introduced due to truncation of least significant N-2 columns. The correction term that is generated is based on the following arguments,
2) The Nth column contributes more information to the most significant N-1 columns than the rest of the least significant N-1 columns. The information presented could be made more accurate if the carry from the N-1th column is preserved and passed onto the Nth column.

3) Adding the elements in N-1th column to the Nth column provides a variable correction as the information presented is dependent on input bits. When all the partial product terms in the N-1th column are zero, the correction added is zero. When all the terms are one, a different correction value is added.

The accuracy of truncated multipliers can be significantly improved using variable correction truncated multipliers that compensate the effect of the dropped terms with a non constant compensation function.

The objective of this paper is to present a comparative study of variable truncated and standard multiplier by implementing the 8x8-bit respective multiplier using Spartan-3AN FPGA device. This paper is organized as follows. In section , the mathematical basis of truncated multiplication is briefly discussed. Section presents the FPGA design and implementation results and finally conclusion is provided.

The design of standard and truncated 6x6 bit multipliers are done using VHDL and implemented in a Xilinx Spartan 3AN XC3S700AN (package: fg484, speed grade: -5) FPGA using the Xilinx ISE 9.1i design tool. Fig. 1 shows the block diagram of standard multiplier. The internal RTL schematic of the standard 8x8 multipliers shown in fig.3. The behavioural simulation presents the utilization of MSB as the required value in truncated multiplier. This paper is organized as follows. In section , the mathematical basis of truncated multiplication is briefly discussed. Section presents the FPGA design and implementation results and finally conclusion is provided.

In proposed architecture we can multiply 8x8 bits, and the bits are reduced in step by step manner. Deletion is the first operation performed in Stage 1 to remove the PP bits, as long as the magnitude of the total deletion error is no more than 2^{P-1}. Then numbers of stages to reduce the final bit width without increasing the error. In normal truncated multiplier design, the architecture produces the output with some truncation error.

### Truncated Multiplier:

![Proposed Truncated Multiplier](image)

#### 3.1 Dadda tree:

Dadda reduction performs the compression operation whenever it required. Wallace tree reduction always compresses the partial product bits. In the proposed method, uses RA reduction method. So that the final bit will be reduced.

The Dadda multiplier is a hardware multiplier design invented by computer scientist Luigi Dadda in 1965. It is similar to the Wallace, but it is slightly faster (for all operand sizes) and requires fewer gates (for all but the smallest operand sizes). In fact, Dadda and Wallace multipliers have the same three steps:

1. Multiply (logical AND) each bit of one of the arguments, by each bit of the other, yielding results. Depending on position of the multiplied bits, the wires carry different weights, for example wire of bit carrying result of is 32.
2. Reduce the number of partial products to two layers of full and half adders.

3. Group the wires in two numbers, and add them with a conventional adder. However, unlike Wallace multipliers that reduce as much as possible on each layer, Dadda multipliers do as few reductions as possible. Because of this, Dadda multipliers have a less expensive reduction phase, but the numbers may be a few bits longer, thus requiring slightly bigger adders. To achieve this, the structure of the second step is governed by slightly more complex rules than in the Wallace tree. As in the Wallace tree, a new layer is added if any weight is carried by three or more wires. The reduction rules for the Dadda tree, however, are as follows:

Take any three wires with the same weights and input them into a full adder. The result will be an output wire of the same weight and an output wire with a higher weight for each three input wires. If there are two wires of the same weight left, and the current number of output wires with that weight is equal to 2 (modulo 3), input them into a half adder. Otherwise, pass them through to the next layer.

3.2 Wallace Tree Multiplier

A Wallace tree is an efficient hardwire implementation of a digital circuit that multiplies two integers. The Wallace tree has three steps:

Multiply (that is, AND) each bit of one of the arguments, by each bit of the other, yielding \( n^2 \) results. Depending on position of the multiplied bits, the wires carry different weights, for example wire of bit carrying result of \( a_2b_3 \) is 32. Reduce the number of partial products to two by layers of full and half adders. Group the wires in two numbers, and add them with a conventional adder. Take any three wires with the same weights and input them into a full adder.

The result will be an output wire of the same weight and an output wire with a higher weight for each three input wires. If there are two wires of the same weight left, input them into a half adder.

IV. FLOW CHART OF SYSTEM MODULE

![Flow Chart of System Module]

V. PROPOSED WORK:

The objective of a good multiplier is to provide a physically compact, good speed and low power consuming chip. To save significant power consumption of a VLSI design. In a truncated multiplier, several of the least significant columns of bits in the partial product matrix are not formed. This reduces the area and power consumption of the multiplier. It also reduces the delay of the multiplier in many cases, because the carry propagate adder producing the product can be shorter.

The new method for parallel multiplication which computes the products of two \( n \) bit numbers by summing only the most significant columns with the variable correction method. It also presents a comparative study of (FPGA) implementation of standard and truncated multipliers using very high speed. Significant reduction in FPGA resources, delay, and power can be achieved using truncated multipliers instead of standard parallel multipliers when the full precision of the standard multiplier is not required. The power and area of a truncated \( 6 \times 6 \)-bit multiplier shows significant improvement as compared to standard \( 6 \times 6 \)-bit multiplier.

In proposed architecture we can multiply 8x8 bits, and the bits are reduced in step by step manner. Deletion is the first operation performed in Stage 1 to remove the PP bits, as long as the magnitude of the total deletion error is no more than \( 2^{p-1} \). Then numbers of stages to reduce the final bit width without increasing the error. In normal truncated multiplier design, the...
architecture produces the output with some truncation error.

But in the proposed design of truncated multiplier the truncation error is not more than 1 ulp, so the precision of the final result is improved. Fig. 6 shows proposed truncated multiplier.

VI. SIMULATION RESULTS

Multiplication Result

![Fig 3: Multiplication Result](image)

Synthesis Results:

Above top module shows the process of BCD multiplication. Here we have taken two inputs. Each input is having 64-bit. The output result is having 128-bit. In this process we are using modified booth encoding technique i.e., radix-8. So the partial products will be reduced to 16. So that area and power will be reduced.

RTL Schematic:

The developed project is simulated and verified the functionality. Once the functional verification is done, the RTL model is taken to the synthesis process using the XILINX ISE tool. in synthesis process, the RTL model will be converted to the gate level net list mapped to a specific technology library. Here in this situation 3E family, many different devices were available in the XILINX ISE tool. In order to synthesis this design the device named as “XC3S500E” has been chosen and the package as “FG20” with the device speed such as “-4”. The design is synthesized and its results were analyzed as follows.
Fig6:Result

VII. CONCLUSION
In this paper we have presented hardware design and implementation of FPGA based parallel architecture for standard and truncated 8x8 multipliers utilizing VHDL. Both the design were implemented on Xilinx Spartan 3AN XC3S700AN FPGA device. The aim is to present a comparative study of the standard and truncated 8x8 multipliers. The truncated multiplier as compared to standard multiplier shows much more reduction in device Utilization. The power consumption of standard 8x8 multiplier is 419mW and that to truncated 8x8 multiplier power consumption is only 156 mW. The truncated 8x8 multiplier uses only 42 slices out of 2352 slices. Truncated multiplication provides an efficient method for reducing the power dissipation and area of parallel multipliers.

REFERENCES


Miss Akula Shalini pursuing M.tech in VLSI & ES from Nagole institute of technology and science, Hyderabad. She completed B.tech ECE from TKR college of engineering & Technology, JNTUH affiliated.

Mr. P. Madhava Rao is Assistant Professor of the Electronics and Communication Engineering, Nagole Institute of Technology and Science, Hyderabad. He received his B.Tech degree in Electronics and Communication Engineering from JNT University, Hyderabad, and M.Tech degree in VLSI System Design from JNT University, Hyderabad. He had about six publications in National and International Journals. His interested areas are micro electronics and communications.

Mr. Rupa Kumar Dhanavath is Associate Professor of the Electronics and Communication Engineering, Nagole Institute of Technology and Science, Hyderabad. He received his B.Tech degree in Electronics and Communication Engineering from JNT University, Hyderabad, and M.Tech degree in VLSI System Design from JNT University, Hyderabad. He had about six publications in National and International Journals. His interested areas are micro electronics and communications.