next up previous
Next: Future Work Up: Building Bayesian Networks with Previous: A Simple Bayesian Network

Subsections

Analysis and Discussion

Sum-product circuits are only valuable if they can outperform traditional digital circuits in some way, and they do have that potential. In this section, the sum-product circuits presented in this paper are compared to their digital counterparts in several areas. Some quantitative analysis is given, and possible applications for the circuits are discussed.

Die Area and Cost

One of the largest wins for analog computing is the size of the circuits. The function-to-variable circuit presented in figure 12 performs eight multiplications of three values in parallel, followed by two additions of four values each. The number of multiplications and additions being performed in parallel in this type of node increases exponentially with each added input. As messages propagate down a network built from sum-product circuits, they will at some point reach the widest part of that network. This is the point where the most computations are happening in parallel. In order to implement a comparable circuit digitally, you would need one multiplier and one adder for each multiplication and addition that the the analog network performs at this point. If we assume that this is possible to build on one chip, and in most cases it would not be, we see that the analog circuits are orders of magnitude smaller than their digital counterparts.

A smaller die has cost advantages as well. Not only is a small chip cheaper because it is physically smaller, but the yield for smaller chips is higher than for larger chips. It may not be appropriate, however, to compare yeild for analog chips directly to the yield for digital chips. This is because the yield metric for a batch of analog chips must take into account the accuracy of the circuit in addition to its functionality. Regardless, the sheer savings in area should outweigh any additional costs incurred due to the inconsistencies between individual transistors.

Wile the actual fabrication of analog chips is cheap, they lose in terms of development cost and testing. There are several reasons for this, the most fundamental of which is sheer difficulty. In analog chip design, calculus and complicated algebra replace boolean logic. Standard industry tools for layout and testing do not work, simulation tools are not always accurate, and CMOS processes are not tailored to analog VLSI applications. The result is longer development times and the need for highly skilled engineers, which translates to higher overall costs.

Power Consumption

The continuing trend towards small and portable devices brings up the question of power efficiency. In digital circuits with millions of transistors switching on and off, some amount of power loss is inevitable. Techniques such as adiabatic design and reversible computing attempt to conserve or recover the power that is needed to charge the small capacitances of the transistors, but there is no way to recover power that is dissipated as heat. Subthreshold CMOS design spearheads the problem by using less power rather than conserving or recovering it--the transistors never actually turn on. Typical current values in a subthreshold CMOS circuit are between $ 10^{-11}$ and $ 10^{-6}$ amps, several orders of magnitude smaller than typical values for standard CMOS circuits.

Accuracy and Reliability

Accuracy and reliability are the hallmarks of digital logic. With a digital circuit, you are assured that, up to some temperature, your circuit will always function in the exact same way, with some finite precision, and at some exact speed. That precision can be augmented by adding more logic at the cost of higher price, higher power consumption, and lower speeds. Analog circuits do not afford us the ability to increase their precision at will, nor are they as reliable as digital circuits.

The variability of $ I_{0}$ between individual transistors presents a difficult problem in the design of accurate analog circuits. Carver Mead states that a 20 percent variation in $ I_{0}$ is typical for two nominally identical adjacent transistors [14]. This directly affects the performance of any circuit that assumes they are identical like a Gilbert multiplier or a current mirror. Small errors from transistor mismatch can accrue like rounding error in a digital adder. A mismatch may cause a computed value to be slightly off, which will then be used in another calculation where that inaccuracy could be multiplied. The end result could be entirely incorrect.

Another source of error in these circuits is demonstrated by the several levels of approximation we had to go through to make equation 9 look like equation 12. This is the question of whether or not we believe that the current through a saturated subthreshold MOS transistor can actually be modeled as a simple exponential. If not, the derivation of the multiplier does not hold, and the circuit will not actually multiply.

The reality is somewhere between perfect multiplication and a non-functional circuit. We know that the response of a the subthreshold MOS transistor is actually some form of an exponential described by equation 9. Because the transistor's response is generally exponential, the circuit will multiply, but we should be concerned with exactly how well it will multiply. A mathematical solution to this problem is all but impossible. Equation 9 can not be solved or manipulated without simplification, so we must turn to a software simulator which can use that equation as a transistor model and some form of iteration to solve the problems we present to it.

UC Berkeley's SPICE circuit simulator has a subthreshold model similar to equation 9. For a more quantitative evaluation of the sum-product circuit's accuracy, SPICE has been applied to the two-input, one-output circuit shown in figure 12. The values for the inputs and $ f(a,b,e)$ are the same values used for the calculations in the example Bayesian network of section 8. The results are presented below.

Term Expected ($ 10^{-9}$A) Simulated ($ 10^{-9}$A) Percent Error
$ I_{1}$ 5 5 0
$ I_{2}$ 5 5 0
$ I_{3}$ 4.5 4.55 1.1
$ I_{4}$ 0.5 0.428 14.4
$ I_{5}$ 4.5 4.55 1.1
$ I_{6}$ 0.5 0.428 14.4
$ I_{7}$ 3.825 4.13 7.3
$ I_{8}$ 0.675 0.467 30.8
$ I_{9}$ 0.198 0.162 18.1
$ I_{10}$ 0.304 0.274 9.8
$ I_{11}$ 0.425 0.400 5.8
$ I_{12}$ 0.075 0.045 40.0
$ I_{13}$ 2.844 2.99 4.8
$ I_{14}$ 1.656 1.55 6.4


This data shows that while some error in the circuit may be tolerable, some values are too far off to consider correct. The most egregious errors occurred in the computation of $ I_{8}$ and $ I_{12}$, which correspond to $ p(b = 0)p(e = 0)f(a = 1, b = 0, e = 0)$ and $ p(b = 0)p(e
= 1)f(a = 1, b = 0, e = 1)$. Examining these computations further shows that this error is due to multiplication by the two smallest values of $ f(a,b,e)$. In both cases, $ f(a,b,e) = 0.15$, meaning that the input to the current mirror was scaled down by 85 percent. The most accurate value whose computation involved scaling is $ I_{13}$, which corresponds to $ p(b = 1)p(e = 0)f(a = 0, b = 1, e = 0)$. In this case, $ f(a,b,e) =
0.632$, which is the largest value of $ f(a,b,e)$. Scaling down by larger amounts generates more error, and the relationship between scaling and error is not linear as shown in figure 16.

Figure 16: Graph of percent error vs. f(a,b,e) for a two-input, one-output sum-product circuit.
\includegraphics[width=10cm]{graphics/circuits/scale.eps}

Error is also seen in the values $ I_{1}$ through $ I_{4}$, which are the results of the two partial Gilbert multipliers. Again, computation of smaller values leads to more error, but this error is related to smaller currents rather than smaller physical transistor sizes. This type of error appears to be related to non-linearities in the multiplier circuits, most likely related to the fact that a subthreshold MOS transistor does not respond as a simple exponential. There may also be some doubt about the validity of the subthreshold CMOS model for DC analysis, which would invalidate the results presented above.

If indeed these circuits are as inaccurate as simulated, which is unlikely, there are some possible ways to increase accuracy. We know that the more a value is scaled down with a current mirror, the more inaccurate the result is. We can alleviate this problem by using multiple levels of scaling. Given a tolerance, we can stay within that tolerance by finding the level of scaling with a percent error such that, when the inputs are scaled some amount of times, the total percent error does not exceed our tolerance. This is possible because the relationship between percent error and different amounts of scaling is not linear.

Another solution to this is to have some type of compiler that attempts to adjust the physical sizes of the transistors to compensate for the non-idealities of the current mirrors. If we can assure ourselves that a certain current mirror always scales down a value more than needed, we can change the relationship between the sizes of its transistors to fix the problem. Because, in our example, some of the mirrors scaled the values down too much and others up too much, it is unclear how effective this technique will be, but it is likely that there are some situations where it would help.

Error caused by the partial Gilbert multiplier circuits appears to be due to the fact that the subthreshold CMOS transistors do not simply take the exponential of their input, and diode-connected transistors do not simply take the logarithm. Again, the solution to this problem depends on choosing a tolerance, much like a digital circuit has some resolution, and making the circuit function within that range. Staying within a small enough range of currents wherein the response of the transistor looks exponential may fix this problem, but may also introduce other problems due to noise or error overpowering the computed values.

A final improvement that can be made to each circuit is to scale the sum if its computed values to some reference current. This type of scaling ensures that the values, at the very least, comprise a valid pair of probabilities, in that they sum to some reference current that represents unity. If a circuit constantly computes a smaller or larger than expected value for both of its outputs, scaling to a reference will bring them both closer to their correct values. If one is larger than the other, the scaling will not help, but will ensure that the message is valid, which may or may not be important to the next stage of the network.

Performance

With multiplications on today's fast processors taking three clock cycles, and additions only one cycle, analog circuits have to be exceptionally fast to compete. At 2GHz, a Pentium can compute can as sum in 0.5ns and multiply in 1.5ns. An analog circuit that performs the same function, but may be much less accurate, must be faster to be competitive. Intuitively, the sum-product circuits presented in this paper should be faster than even their most inaccurate digital siblings. This is because where a digital circuit would have hundreds or thousands of transistors arranged into a multiplier or an adder, the sum-product circuits have a single transistor or a shorted wire.

The speed at which a sum-product circuit computes a value depends on how fast it settles onto a stable value. This settling depends on the speed at which the transistors respond to a changing input, as well the time it takes for any instability in the circuit to stabilize. Transient analyses with the UC Berkeley SPICE tool using the BSIM3 circuit model show instability in the circuits, however this instability is probably artifact of the model. A possible cause of this is overestimation of the parasitic capacitances in the BSIM3 subthreshold transistor model. The result is enough capacitive coupling between the drain and the gate of the transistor to make a single transistor in the train of a current mirror oscillate when a current step is applied to the mirror's input.

Simulations show that the change in current increases the voltage at the gate of the diode-connected transistor, which turns on both it and the transistor in its train. As the transistor in the train turns on, the voltage at its drain falls. The capacitive coupling between its drain and gate reduces the voltage at the gate, which turns it off a bit. The connected current source does not yield, so the gate voltage then rises and oscillations ensue. It is for this reason that accurate performance measurements for these networks are not possible with the available tools.

Without proper simulation or measurement, exact performance parameters for the circuits presented cannot be given, but a more qualitative approach is possible. We know that current processes allow for transistor switching times in excess of 2 GHz and approaching 10GHz, but we also know that the accuracy of analog circuits is dependant on the size of the transistors. For this reason, we will not assume that the transistors in the sum-product circuits can settle on correct values as fast as the transistors in digital circuits can switch, although it's not implausible. We know from Gilbert's original paper that a Gilbert multiplier built from bipolar transistors can compute a value in less than a nanosecond [6]. If we assume that a single subthreshold transistor takes longer, say 1ns, to settle on a value, we can compare these circuits to a fast processor.

The two-input one-output circuit in figure 12 performs eight multiplications of three numbers at the same time. In a conventional processor, multiplications would have to be done with two operands, meaning that there would be four multiplications of the two inputs followed by eight more for the function $ f(x,y,z)$. If we assume the processor has 2 multipliers that can each compute a value in three cycles, we can schedule the 12 multiplications, two at a time, for a total 18 clock cycles or 9ns. In comparison, the sum-product circuit computes all of the values at the same time. The current must propagate through and settle at three transistors between any input to any output. If each takes 1ns to settle, the time is 3ns, which is better than the fast processor. In reality the transistors do not wait for the previous ones to finish before beginning their computation, so the time will be significantly less.

The circuit then adds the eight products together in two groups of four. This is essentially a free computation in the analog circuit, as it is accomplished by the shorting of wires and will happen at nearly the speed of light. On the other hand, the processor will have to do six additions, four of which could happen in parallel, followed by the remaining two. Assuming the processor has four adders, this could be accomplished in 2 cycles or 1ns. With these very conservative estimates, the circuit looks to be at least 3 times as fast as the fastest processor on the market. A more likely speedup is in the range of 1 or 2 orders of magnitude [9]. If we extend the single node to an entire network, the processor begins to fall farther behind. Where the network can compute all of the messages at any level in parallel, the processor can only compute part of one message. One could argue for multiple processors, but with 50 million transistors each, that is an unlikely and very expensive solution.

Feasibility

Analog subthreshold design is difficult and there are very few if any commercial applications of the technology at this time, 13 years after the publication of Carver Mead's seminal work. Digital design has advanced by leaps and bounds while analog design has been marginalized almost to the point of becoming a lost art. The result of this marginalization is a dearth of knowledge and tools for analog VLSI design and simulation, which brings the feasibility of any commercial subthreshold analog VLSI chip into question. Despite the difficulty of commercial subthreshold analog chip design, we will always have a need to compute faster than our current fastest digital technology allows. This is the niche where analog computing has a home, as it will always be faster than the fastest digital processors in certain applications.

It seems that a commercial subthreshold analog VLSI chip is not feasible at this point. The limited commercial applications for such a chip would render it a risky business venture at best. The simple truth is that much more work needs to be done in academia before analog subthreshold VLSI becomes a lucrative and practical technology.

Applications

Because they are so specialized in their function, subthreshold analog chips are better suited to small specific tasks such as interfaces between the analog and digital domains. Because of their speed and accuracy, they work best in applications where accuracy is not as important as speed, and the speed required is beyond the limits of current digital technology.

One excellent example of this is decoding messages that have been corrupted by a noisy channel. In the best coding schemes, the decoder infers the value that has been transmitted and possibly corrupted by the channel using some knowledge of the transmitter's behavior. The decoder decides the probable value of each incoming bit of information given its apriori knowledge and the sequence of previous bits [18].

One such coding scheme is the Turbo code. In 1993, the Turbo code was introduced and is still the best-performing type of channel coding we know. In a 1997 paper, Frey and Kschischang showed that a turbocode could be described as a multiply-connected graphical network, and that decoding the turbocode only required observing the data from the channel and solving the network given that data [4]. This decoding in iterative--the same computation must be performed several times.

This work is particularly interesting because the iterative algorithm used to solve the network is computationally intense and cannot currently be performed in real time with a practical digital system. This presents a plausible opportunity for the speed and limited accuracy of analog subthreshold CMOS, where the iterations disappear and the network only needs to settle on some value. In theory, this work has the potential to change the way we transfer high-speed data over noisy channels; currently sub-optimal coding schemes are used. This work has clear commercial applications in DSL and cable modems where a phone line or cable television line corrupt the digital signals they carry.


next up previous
Next: Future Work Up: Building Bayesian Networks with Previous: A Simple Bayesian Network
Samuel Luckenbill 2002-05-08