Brief Discussion of SOC Burn-In Test

Make it to the Right and Larger Audience

Blog

Brief Discussion of SOC Burn-In Test

Burn-in is the application of thermal and electrical stress to accelerate electrical failure and screen out “marginal” devices that with inherent defects or other manufacturing imperfections. Electrical device failure rate over time takes the famous “bathtub” shape as shown below. Burn-in test is to expose and screen out parts that fall in the early failure stage.

burnin_bathtub

 

A good start article for burn-in is Burn-In 101 on EDN authored by Mayank Para and Sandeep Jain.

 

As mentioned in this article, there are three types of burn-in tests, static burn-in that extreme voltages and temperatures are applied without operating or exercising the device, dynamic burn-in that various input stimuli are applied while the device is exposed to extreme volts and temps, and dynamic burn-in with the (DFT) test to “monitor device outputs at different points in burn-in process”.

So during static burn-in, device under test (DUT) is applied with voltage but is idle and there is no signal/clock toggling inside. Contrary to intuition, instead of no vector applied, some vector is needed to put device into this state. Reason is today’s SOC chip is normally complicated and with multiple power domains. If voltage is applied without turning on the device by leaving reset signal or power-on signal off, most of internal logics/power domains are off and with no power applied. This is intended to keep the leakage low. Even with the SOC turned on, internal microprocessor starts boot-up and finally go into idle state but still leave most of power domains off. Again it is intended to keep leakage low during device idle state. But this is against the intention of burn-in that voltage needs to be applied. So either burn-in tester talks to SOC DUT through some functional IO channel to turn on power domains or a burn-in vector is needed to put the device into a state that all power domains are up. In most cases, a burn-in vector is preferred over functional channel. This is not only to ease burn-in board design (no need to wire functional channel) but also due to the consideration that burn-in tester likely doesn’t have capability to use functional channel.

So the boundary of static burn-in and dynamic burn-in is blurred. The difference is not whether there is stimuli applied but whether the device logic is actively toggling inside. In many cases, dynamic burn-in is preferred due to better performance – accelerate failure to catch weak parts with less test time. But burn-in patterns are different from production screening vectors. One reason is production screening vectors such as LBIST toggle lots of FFs at the same time which introduces very high power consumption. Combined with high voltage and high temperature in burn-in, this can easily damage DUTs including good parts. So burn-in pattern is a selected subset of production vectors. In addition, burn-in tester can be different from production ATE tester and may have more limitations such as certain clocks can not be provided by burn-in tester. In some case, production pattern needs to be adjusted and sometimes you may have to change design.

Another important consideration is whether to power cycle DUT during burn-in. It has been found power cycling DUT can better stress DUT and accelerate failure. So power cycling is normally applied during burn-in. But here is a caveat. Power cycling is preferred to digital logics but not to analog power module (buck converter, LDO, etc.) which are not designed to tolerate many power cycles. So the burn-in pattern needs to separate analog power module with digital logics and only power cycle digital logics. This may even need some design changes if original design can’t support it. For example, digital logic may use an output from analog PMU module as reset and the idea behind is if power is not stabled digital logic should be in reset state. But since we need to separate analog PMU and digital logics during burn-in, we need to make sure burn-in pattern can bypass this reset so digital logic can be out of reset even when analog PMU is off.

There are two scenarios to run burn-in. One is lab burn-in and the other is production burn-in. Lab burn-in is to analyze the chip failure over its life time and it is conducted for each chip line but not to every single chip as the production burn-in. Lab burn-in lasts over a long time, 1000 hours. It has several readout points at 24 hours, 48 hours, 200 hours, 500 hours, and 1000 hours. During lab burn-in, we repeat the process of power cycle DUT, apply BI patterns, power cycle DUT again, etc. At readout points, we run the full production vectors and identify failure parts. Lab burn-in result is required before releasing a chip line into production.

Production burn-in is applied to each part. But unlike above lab burn-in, production burn-in takes much shorter time to save test cost. A common practice is to start production burn-in with a long duration such as 48 hours. After certain number of chips, ie 200-500, have gone through the process with no failures detected, burn-in time is reduced. This process is repeated until final burn in time is set at 1 or 2 hours.

burnin_process
Even 1-2 hour burn-in test is still very costly for production. In many cases production burn-in is skipped. A lot of companies are studying and pushing for zero run-time production burn-in. One of the good candidates is IDDQ test.

IDDQ test is to check DUT’s quiescent current. To illustrate the relationship between IDDQ and burn-in, we can refer to “Introduction to IDDQ Test“, an ebook on Google books. It lists some interesting tests done in industry as below.

iddq_book1

iddq_book2

 

So it looks very promising to replace burn-in with IDDQ test. Note IDDQ test is not trivial by itself and one of considerations is how to select current threshold. (If DUT’s IDDQ current is above this threshold, this part is labelled as failing IDDQ test). If threshold is too high, weak parts are not caught and if it is too low, too many good parts are rejected. One practice industry adopts is to check IDDQ on a large number of chips such as 15,000 parts first and then select IDDQ threshold accordingly. This threshold is found normally to be 2-3x of the mean IDDQ current.

 
Staff Hardware Engineer
Author brief is empty
Tags:

0 Comments

Contact Us

Thanks for helping us better serve the community. You can make a suggestion, report a bug, a misconduct, or any other issue. We'll get back to you using your private message ASAP.

Sending

©2021  ValPont.com

Forgot your details?