Fast and Efficient Division Technique Using Vedic Mathematics in Verilog Code

Contributed by:
Harshdeep Singh
This PDF contains :
2.1 Vedic Mathematics Sutras and Up-sutras,
3.1 Binary Dhwajanka,
3.1.1 Limitations and Solutions,
3.1.2 Negative Subtraction,
3.1.3 Correct Remainder,
3.1.4 Partial remainder overflow,
4.1 simulation results of 8 bits Vedic Division,
4.2 Synthesis Results,
4.3 Timing Results,
1. International Journal of Scientific & Engineering Research Volume 8, Issue 10, October-2017
ISSN 2229-5518
Fast and Efficient Division Technique Using
Vedic Mathematics in Verilog Code
Ugra Mohan Kumar1, Sandeep Kumar2, Madan Pal Singh3, Ashok Kumar Yadav4
1 Assistant Professor, Uttaranchal University, Dehradun
2 Assistant Professor, JB Institute of Technology, Dehradun
3 Assistant Professor, JB Institute of Technology, Dehradun
4 Senior Lecturer, JB Institute of Technology, Dehradun
Abstract: Division is the most fundamental and commonly used operations in a CPU. These operations furthermore form
the origin for other complex operations. With ever increasing requirement for faster clock frequency it becomes essential
to have faster arithmetic unit. In this paper a new structure of Mathematics – Vedic Mathematics is used to execute
operations. In this paper mainly algorithm on vedic division technique which are implemented for division in Verilog and
performance is evaluated in Xilinx ISE Design Suite 13.2 platform then compared with different parameters like delay
time and area (number of LUT) for several bits algorithms.
Key words: Vedic mathematics, multiplication, division, delay time, Verilog.
1. INTRODUCTION: consumption of required circuits are very necessary
Division is an important essential function in arithmetic requirements for various applications such as in digital
operations. Multiplication-based different operations signal processing and in digital image processing [2, 3].
considered as Multiply and Accumulate (MAC) and This work presents different division techniques and
internal product are among some of the commonly used architectures. Multiplier based on Vedic or ancient
Computation- Intensive Arithmetic Functions (CIAF) Mathematics is one of the fast and efficient with low
presently implemented and designed in many Digital propagation delay and low power consumption
Signal Processing (DSP) appliances considered as multiplier.
convolution of two or more than two information, Fast 2. VEDIC MATHEMATICS:
Fourier Transform (FFT) of different sequences, Vedic Mathematics introduces the magnificent
filtering of signals or information and in applications to Arithmetical calculation and verification,
microprocessors its used in arithmetical and logical unit theory of numbers, complex multiplications,
(ALU) [1]. Since multiplying is the most important fundamental algebraic operations, complex
factor for the implementation time for most of the DSP factorizations, simple quadratic and advanced order
algorithms or techniques, so there is a need of most equations, concurrent quadratic equations, partial
efficient and high speed division. Currently, division fractions, in differential calculus and integral calculus,
time is still the most important factor in determining the squaring of complex number, cubing, square root of
instruction cycle time and the delay time of a DSP chip complex number, cube root, 2-Dimensional and 3-
The requirement for high speed processing has been Dimensional coordinate geometry and brilliant Vedic
growing as a result of increasing work for computer and Numerical code.
signal processing applications. Higher throughput 2.1 Vedic Mathematics Sutras and Up-sutras:
arithmetical and logical operations are important to Entire mechanics of Vedic mathematics is based on 16
accomplish the required performance in various real- sutras – formulas and 13 up-sutras meaning –
time signal and image processing applications [2]. The corollaries.
main key of arithmetical and logical operations in these Sutras
applications is multiplication and division techniques 1. Ekadhikena Purvena
and the development and designing of fast and efficient 2. Nikhilam Navatascharamam Dashatah
multiplier circuit has been a subject of interest over last 3. Urdhva-tiryagbhyam
few years. Sinking the execution time and power 4. Paravartya Yojayet
IJSER © 2017
2. International Journal of Scientific & Engineering Research Volume 8, Issue 10, October-2017
ISSN 2229-5518
5. Shunyam Samyasamucchaye coefficient of polynomial-2 is added to 4th degree
6. Anurupye Sunyamanyat coefficient of polynomial-2 multiplied by 3rd degree
7. Sankalana vyavakalanabhyam coefficient of polynomial-1 to get (AY+BZ).
8. Puranaprranabhyam A B C D E
9. Calana – Kalanabhyam
10. Yavadunam
11. Vyastisamashtih
12. Sheshanynkena Charmena Figure 2 - Vertically Crosswise First Cross Product
13. Sopantyadvayamantyam
14. Ekanyunena Purvena Similar logic of cross multiplication and addition can be
15. Ginitasamucchayah extended till all 5 coefficients of both polynomials are
16. Gunaksamucchayah used as follows. Every iteration gives a coefficient of
Up-sutras A B C D E
1. Anurupyena
2. Shishyate Sheshsamjnah
3. Adyamadye Nantyamantyena
4. Kevalaih Saptakam Gunyat Z Y X W V
5. Vestanam
6. Yavadunam Tavadunam Figure 3 - Vertically Crosswise Intermediate Cross
7. Yavadunam Tavadunikutya Varganka ch Product
In this iteration, coefficient of degree 4 of product is
8. Antyayordhshakepi
obtained. For next iteration we drop A and Z which are
9. Antyatoreva
the highest degree polynomial coefficients. The
10. Samucchayagunitah
resulting operation gives coefficient of the degree 3 of
11. Lopanasthapanabhyam
multiplication of polynomials. As follows
12. Vilokanam
13.Gunitasamucchyah Samucchayagunitah
The Nikhilam and Anurupyena are for special cases,
whereas Urdhva-tiryagbhyam is general formula Z Y X W V
applicable to all [4]. Its algebraic principle is based on
multiplication of polynomials. Consider we want to Figure 4 - Vertically Crosswise Intermediate Cross
multiply two 4th degree polynomials
Ax4 + Bx3 + Cx2 + Dx + E
Zx4 + Yx3 + Xx2 + Wx + V
Continuing with this process last coefficient is obtained
AZ x8 + (AY+BZ) x7 + (AX+BY+CZ) x6 + by multiplication of 0th degree terms of both
(AW+BX+CY+DZ) x5 + polynomials as E*V. This process can be done is both
(AV+BW+CX+DY+EZ) x4 + (BV+CW+DX+EY) x3 + ways as it is symmetric. In summary the process can be
(CV+DW+XE) x2 + (DV+EW) x + stated as, process of addition of product of coefficients
EV of two polynomials in crosswise manner with increase
Figure 1 - Multiplication of two fourth degree and then decrease in number of coefficients from left to
polynomials right with crosswise meaning product of coefficients for
Highest degree coefficient can be obtained by one polynomial going rightwards while for other
multiplication of two highest degree coefficients of leftwards.
individual polynomial namely A and Z. A next degree Any decimal number can be thought as a polynomial
coefficient is obtained by addition of cross with unknown or x equal to 10. Being said that, formula
multiplication of coefficients of 4th degree and 3rd degree stated above can be utilized to calculate product of two
of other polynomials[5]. It means A which is 4th degree decimal numbers. Each digit of decimal number is
coefficient of polynomial-1 is multiplied by 3rd degree though as coefficient of power of 10. Only restriction in
IJSER © 2017
3. International Journal of Scientific & Engineering Research Volume 8, Issue 10, October-2017
ISSN 2229-5518
this case is each cross product should be only one digit, addition of cross-product of bits the sum is subtracted
if not it is added to the next power of 10. from combination of previous remainder and next digit
3. PROPOSED ALGORITHM: of dividend. In the example above first bit of quotient is
The algorithms will be compared to conventional equal to MSB of dividend and hence the remainder is 0.
algorithm which is considered as restoring algorithm This 0 is combined with next digit of dividend 1, to
and/or Non-restoring algorithm of division. In restoring form 01. Cross-product of quotient and rest bits of
algorithm shifted divisor is repeatedly subtracted from divisor gives 1 as only one bit in quotient at this point of
dividend and result of subtraction is stored temporarily. the process. Cross-product is subtracted from partial
The algorithm can be formulated as [6]. dividend 01 to get 0. Now when 0 is divided by 1 we get
N = Q×D + R quotient 0 and remainder 0. Quotient 0 of this partial
Where N dividend, D is divisor, Q is quotient and R is division forms the next bit of quotient and remainder as
remainder. next prefix [8]. In short, Dhwajanka formula for binary
is further simplified by the nature of the binary numbers.
Restoring Algorithm Despite being very easy to solve by Dhwajanka, there
1. R(m) = N m as width of N are some limitations of the process which will be
2. Repeat for i from m-1 to 0 described in next section.
3.1.1 Limitations and Solutions:
Z = R(i+1) - D×2i
A combinational model of this process was designed in
If Z ≥ 0 then Q(i) = 1, R(i) = Z Verilog [9]. The reason to choose combinational method
was to compare it with equivalent Verilog
Else Q(i) = 0, R(i) = R(i+1) implementation block and a sequential model will
Verilog implements restoring algorithm for its require varying number of clock cycles to finish the
division block. For restoring algorithm worse case N division depending on the input – dividend and divisor.
subtractions has to be performed to get N digits of The nature of problem in building the combinational
quotient. Each subtraction is equal to width of divisor. model and respective solutions on them are described in
next sections.
3.1 Binary Dhwajanka:
In binary number system, similar to decimal system, 3.1.2 Negative Subtraction:
MSB of divisor is kept aside and remaining digits are
used for cross-products[7].
In binary number system, similar to decimal system,
MSB of divisor is kept aside and
Figure 6 - Dhwajanka – Problem of recalculation
During the calculation of third bit of quotient partial
dividend is 0 and cross-product is 0 which results in
Figure 5 - Complete Example of Dhwajanka negative 1 as partial dividend which is unacceptable
remaining digits are used for cross-products. As digits in according to algorithm. Solution for this is given as
binary can only be 0 or 1, the process of Dhwajanka recalculate previous iteration with smaller quotient digit
becomes simpler as division has to be carried out with 1 to result in sufficiently large partial remainder. In this
always. Hence MSB of dividend becomes the MSB of case previous quotient bit being 0, it cannot be further
quotient. Cross-product is taken between quotient and reduced to make remainder bigger. Hence former to this
rest bits. Again, in cross-product digits being only 0 and iteration has to be recalculated shown below.
1 multiplication is replaced by AND logic. After the
IJSER © 2017
4. International Journal of Scientific & Engineering Research Volume 8, Issue 10, October-2017
ISSN 2229-5518
Figure 7 - Dhwajanka – Solution of recalculation
In case of combinational design reiteration would result
in feedback loop which is unacceptable and in
sequential design this would lead to a design which is Figure 9 - Dhwajanka – Solution for Negative Quotient
data dependant and hence undesirable. Solution on this
Correct remainder is obtained by subtracting divisor
is to allow bits of quotient as well as partial remainders
from remainder. If the subtraction gives remainder more
to be negative. This obviously is an overhead of
than divisor, process is repeated. Above correct
calculation as the state of each bit of quotient and partial
remainder is obtained by subtracting divisor once, so
remainders has to be maintained, but this enables us to
correct quotient is obtained by adding 1.
build a combinational design. The complete illustration
is as follows. 3.1.4 Partial remainder overflow:
As the width of dividend and divisor increases, in some
cases it is observed that last partial remainder is itself a
large number which when combined with right part of
dividend becomes a number which may sometimes
exceed the width of dividend or divisor itself. This
results in large correction logic and hence is
undesirable. There can be different approaches to deal
with this like checking the correctness of remainder
after every 3-4 bit calculation of quotient, use of
sequential design model. To check correctness of
Figure 8 - Dhwajanka – Problem of Negative Quotient remainder after every 3-4 bits can work for
During the calculations of third bit of quotient partial combinational design but has huge overhead of
dividend 00 is subtracted by cross-product 01 to get calculation partial quotient repeatedly and again results
quotient as -1 and next partial remainder as 0. If the is considerably large logic. As seen previously
subtraction is more than 1 then both the quotient bit and sequential model results in a design which would
partial remainder would be negative. depend on data to calculate the answer.
3.1.3 Correct Remainder 4. RESULTS:
There is another problem in the illustration
above. While calculating the remainder we subtract 4.1 simulation results of 8 bits Vedic Division
cross-products from right part of dividend prefixed with
last partial remainder. Cross-product consists of rest
digits of divisor and quotient with first cross-product
contains all bits and is also shifted left by one less than
bits in rest digits of divisor. If any cross-product is
negative then it is added. If last partial remainder is
negative then right part of dividend becomes inherently
negative. After all the calculations for remainder if it is
more than divisor or less than zero it is illegal. Also, it is
imperative to have legal remainder to get correct
quotient. In the illustration above calculations for
remainder are as follows. Figure 10 – Simulation result of 8 bits Vedic Division
IJSER © 2017
5. International Journal of Scientific & Engineering Research Volume 8, Issue 10, October-2017
ISSN 2229-5518
Description:- It is therefore seen that the Vedic division is much faster
x : Input data 8 – bit than the conventional division for higher order bits. The
algorithms of Vedic mathematics are much more
d : Input data 8 – bit
efficient than of conventional mathematics.
clk : Input clock
a : Output data 8 – bit 6. FUTURE SCOPE:
x = 00011001 In future this work can be extended to higher bit
Division which can be implemented using Vedic
d = 00000101
Mathematics. Floating Point Vedic Processor could be
a = 10010011 also a good extension of this work.
Thus simulated result and calculated result match
correctly. 7 REFERENCES: Figure 12 – Simulation result of 16X16 bits Vedic
[1] Purushottam D. Chidgupkar and Mangesh T. Karad, “The
Implementation of Vedic Algorithms in Digital Signal Processing”,
Global J. of Engng. Educ., Vol.8, No.2 © 2004 UICEE Published
in Australia.
4.2 Synthesis Results
[2] Himanshu Thapliyal and Hamid R. Arabnia, “A Time-Area-
Power Efficient Multiplier and Square Architecture Based On
Device utilization summary:
Ancient Indian Vedic Mathematics”, Department of Computer
Selected Device: 3s500efg320-4 Science, The University of Georgia, 415 Graduate Studies
Research Center Athens, Georgia 30602-7404, U.S.A.
Number of Slices: 248 out of 4656 5%
[3] E. Abu-Shama, M. B. Maaz, M. A. Bayoumi, “A Fast and Low
Number of 4 input LUTs: 450 out of 9312 4% Power Multiplier Architecture”, The Center for Advanced
Computer Studies, The University of Southwestern Louisiana
Number of IOs: 25 Lafayette, LA 70504.
Number of bonded IOBs: 25 out of 232 10%
[4] Honey Durga Tiwari, Ganzorig Gankhuyag, Chan Mo Kim,
IOB Flip Flops: 8 Yong Beom Cho, “Multiplier design based on ancient Indian Vedic
Mathematics”, 2008 International SoC Design Conference.
Number of GCLKs: 1 out of 24 4%
[5] Parth Mehta, Dhanashri Gawali, “Conventional versus Vedic
Total memory usage is 198936 kilobytes mathematical method for Hardware implementation of a
multiplier”, 2009 International Conference on Advances in
4.3 Timing Results Computing, Control, and Telecommunication Technologies.
Minimum input arrival time before clock: 98.119ns
[6]Prabir Saha, Arindam Banerjee, Partha Bhattacharyya, Anup
Maximum output required time after Dandapat, “High Speed ASIC Design of Complex Multiplier
Using Vedic Mathematics”, Proceeding of the 2011 IEEE Students'
clock: 4.283ns Technology Symposium 14-16 January, 2011, lIT Kharagpur.
Total REAL time to Xst completion: 13.00 [7] Ramalatha M, Thanushkodi K, Deena Dayalan K, Dharani P,
“A Novel Time and Energy Efficient Cubing Circuit using Vedic
Mathematics for Finite Field Arithmetic”, 2009 International
Total CPU time to Xst completion: 12.90 Conference on Advances in Recent Technologies in
Communication and Computing.
[8] Shamim Akhter, “VHDL Implementation of Fast NxN
5. CONCLUSION: Multiplier Based on Vedic Mathematics”, 2007 IEEE.
The designs of 8 bits Vedic division have been
[9] Anvesh Kumar, Ashish Raman, Dr. R.K. Sarin, Dr. Arun
implemented on Spartan3E (3s500efg320-4) device. The Khosla, Small area Reconfigurable FFT Design by Vedic
computation delay for 8 bits Vedic division is 98.119ns. Mathematics”, 20
10 IEEE.
IJSER © 2017