the magazine for computer applications # … your eq 8 additional questions resource links • solid...

®

# 1 2 5 D E C E M B E R 2 0 0 0

Finding DSP Application Algorithms

A Controller-Based LCD Panel

Design2K Winner: The QuizWiz

Designingwith SDRAM

T H E M A G A Z I N E F O R C O M P U T E R A P P L I C A T I O N S

THE ENGINEERS

TECH-HELPRESOURCE

Let us help keep yourproject on track or sim-plify your design deci-sion. Put your toughtechnical questions to theASK US team.

The ASK US researchstaff of engineers hasbeen assembled to shareexpertise with others.The forum is a placewhere engineers cancongregate to get sometough questions an-swered, or just browsethrough the archivedQ&As to broaden theirown intelligence base.

Test Your EQ8 Additional Questions

RESOURCE LINKS• Solid State Relays• SnubbersBob Paddock• Flattened DisplaysRick Prescott

THE ETHERNET DEVELOPMENT BOARDby Fred EadyPart 2: The Software and Firmware ExposedFred picks up where he left off and takes us through embedded Ethernet, as heexplores part two of his online series. Looking at the Ethernet development board from afirmware angle, he covers a lot of information, but still reminds us, just as he does in allhis print columns, that “It’s not complicated, it’s embedded.”November 2000

IR REMOTE-CONTROLLED VIDEO MULTIPLEXERby Peter GibbsCostly technicians and intricate equipment doesn’t have to coincide with multi-channel videomonitoring. Peter shows us a system for visual monitoring that can be designed so you canmake simple modifications yourself, rather than having to call in that pricey expert. Using off-the-shelf components, Peter succeeded at his task, while not obliterating the budget.

November 2000

A BETTER BATTERY CHARGERby Thomas RichterPart 2: Hardware and Software Implementation

Now that you’ve been introduced to the AVR Battery Charger Reference Design, Tom isready to delve into the hardware and software. The ATtiny15 has special features thatmake it ideal for battery-charging applications. Take a look at the charge methods hedetails this time around.

November 2000

STATE MACHINESLearning the Ropesby Ingo CyliaxIn this installment, Ingo illustrates state machines using algorithmic state machine charts,implementing them through one-hot state encoding, the implementation of choice forFPGAs. Ingo takes us through the basic design methodology, using a simple design withtwo states. Here, the hardware has to depend on internal state, instead of justcomputing an output value.November 2000

SCHEDULING REVISITEDLessons From the Trenchesby George MartinThis month, George addresses the wide response he received for his past schedulingarticle. How can you schedule for something you’ve never done before? Well, keepingrecords is a good start. And, asking around to gather information isn’t a bad idea. Even ifit comes down to taking a shot in the dark, George gives us some tips about breakingthe schedule down.November 2000

INTERNET, OR ELSE!Silicon Update Onlineby Tom CantrellThe Embedded Internet Conference has morphed from a workshop to a conference,occupying an even bigger space next year. Today’s designers are anticipating ’Netconnection for all manner of electronics, from elaborate A/V remotes to LED gadgetsgalore. Everyone’s in hot pursuit, working under the premise that the answer will comelater as to whether or not an Internet toaster is something people really want.November 2000

CIRCUIT CELLAR® Issue 125 December 2000 3www.circuitcellar.com

Task ManagerRob Walker

How Will You BePaying for That?

New Product Newsedited by Harv Weiner

Reader I/O

Test Your EQ

Advertiser’s IndexJanuary Preview

Priority InterruptSteve Ciarcia

ContinuingThe PlanISSUEINSIDE125125

SDRAM: The New Embedded SolutionMark Balch

Make Your Data ComfortableGet Bit CushionsMichael Smith & Laurence Turner

Design2K WinnerQuizWiz: A Hand-Held Scoring DevicePaul Kiedrowski

Using a T6963 Controller-Based Graphics LCD PanelBrian Millier

Designing for Reliability, Maintainability, and SafetyPart 1: Getting StartedGeorge Novacek

I From the BenchSharing Technology with Mother NatureOut of State with an Internet-Compatible Cell PhoneJeff Bachiochi

I Silicon UpdateHot Chips 12Tom Cantrell

6

8

11

84

95

96

12

20

32

58

68

74

78

EMBEDDED PC40 Nouveau PC

edited by Harv Weiner

43 RPC Real-Time PCsA Cup of JavaPart 1: Embedded and Real-Time ApplicationsIngo Cyliax

50 APC Applied PCsRabbit SeasonPart 4: The Wonderful World of TCP/IPFred Eady

6 Issue 125 December 2000 CIRCUIT CELLAR® www.circuitcellar.com

TASK MANAGER

EDITORIAL DIRECTOR/PUBLISHERSteve Ciarcia

MANAGING EDITORRob Walker

TECHNICAL EDITORSJennifer BelmonteRachel HillJennifer Huber

WEST COAST EDITORTom Cantrell

CONTRIBUTING EDITORSMike Baptiste Ingo CyliaxFred Eady George MartinGeorge Novacek

NEW PRODUCTS EDITORSHarv WeinerRick Prescott

PROJECT EDITORSSteve Bedford Bob PaddockJames SoussounisDavid Tweed

ASSOCIATE PUBLISHERJoyce Keil

CHIEF FINANCIAL OFFICERJeannette Ciarcia

CUSTOMER SERVICEElaine Johnston

ART DIRECTORKC Zienka

GRAPHIC DESIGNERSNaomi Hoeger

Mary Turek

STAFF ENGINEERSJeff Bachiochi

John Gorsky

QUIZ MASTERDavid Tweed

EDITORIAL ADVISORY BOARDIngo Cyliax

Norman JacksonDavid Prutchi

Cover photograph Ron Meadows—Meadows MarketingPRINTED IN THE UNITED STATES

For information on authorized reprints of articles,contact Jeannette Ciarcia (860) 875-2199 or e-mail [email protected].

CONTACTING CIRCUIT CELLARSUBSCRIPTIONS:

INFORMATION: www.circuitcellar.com or [email protected] SUBSCRIBE: (800) 269-6301, www.circuitcellar.com/subscribe.htm, or [email protected]: [email protected]

GENERAL INFORMATION:TELEPHONE: (860) 875-2199 FAX: (860) 871-0411INTERNET: [email protected], [email protected], or www.circuitcellar.comEDITORIAL OFFICES: Editor, Circuit Cellar, 4 Park St., Vernon, CT 06066

AUTHOR CONTACT:E-MAIL: Author addresses (when available) included at the end of each article.

CIRCUIT CELLAR®, THE MAGAZINE FOR COMPUTER APPLICATIONS (ISSN 1528-0608) and Circuit Cellar Online are publishedmonthly by Circuit Cellar Incorporated, 4 Park Street, Suite 20, Vernon, CT 06066 (860) 875-2751. Periodical rates paid at Vernon,CT and additional offices. One-year (12 issues) subscription rate USA and possessions $21.95, Canada/Mexico $31.95, allother countries $49.95. Two-year (24 issues) subscription rate USA and possessions $39.95, Canada/Mexico $55, all othercountries $85. All subscription orders payable in U.S. funds only via VISA, MasterCard, international postal money order, or checkdrawn on U.S. bank.Direct subscription orders and subscription-related questions to Circuit Cellar Subscriptions, P.O. Box 5650, Hanover, NH03755-5650 or call (800) 269-6301.Postmaster: Send address changes to Circuit Cellar, Circulation Dept., P.O. Box 5650, Hanover, NH 03755-5650.

ADVERTISINGADVERTISING SALES REPRESENTATIVE

Kevin Dows Fax: (860) 871-0411(860) 872-3064 E-mail: [email protected]

ADVERTISING COORDINATORValerie Luster Fax: (860) 871-0411(860) 875-2199 E-mail: [email protected]

ADVERTISING CLERK Sally Collins

[email protected]

How Will You Be Paying for That?

n othing in life is free. This is what my mothertold me when I turned nine. She also informed me

that I was old enough to get a paper route and wastherefore capable of earning my own money, which was

interpreted to mean, if I ever wanted to buy anything again, I had better callthe Hartford Courant to see if they had an open route in the area.

The pride and honor of delivering the nation’s oldest continuously-published newspaper faded with each falling snowflake that winter. But, I’msure anyone who’s ever delivered newspapers could share stories abouttrying to run away from a German Shepherd while wearing five layers ofwinter clothes, or the excitement and satisfaction that comes from ridingthrough Mrs. NoTipper’s tulip bed in the predawn darkness, so I’ll move on.

The intended lesson was that if you want something, you have to workfor it (the actual lesson went something like, delivering reading material tosleeping people helps you stop wanting things). In short, I learned to valuethe things I had to pay for.

When it comes to value and paying for things, one of the greatest de-bates of this year has been whether or not “available on the Internet” shouldbe synonymous with “free.” It’s an interesting debate because there aregood reasons to support both sides. Of course, the arguments can changedepending on whether you’re talking about music, books, videos, informa-tion, or services.

If I want information about refinishing furniture, I can find plenty ofhobbyist sites with interesting (and sometimes useful) information or I canbrowse the shelves of the local Barnes & Noble. The problem with free stuffis, you can’t complain when it doesn’t work. Let’s say I find a web site withinformation about a great new technique for refinishing furniture by burningoff the old layers of paint. Who’s to blame when my eyebrows and grandma’soriginal Hitchcock chair are nothing more than a pile of smoldering ashes atmy feet? For me, the value of credible information (that $20 book fromB&N) is much higher than the possible “costs” of free information.

Besides, by the time you figure in the cost of your PC, hardware andsoftware upgrades, the extra phone lines, ISP charges, and the time youspend searching the web and waiting for flashy ads (which pay for theinformation) to load, you’ll see that “available on the Internet” comes with itscosts. Don’t get me wrong, there’s more information on the Internet than inall of the local bookstores combined, but I think there are some things thatare worth paying for.

Sure, I could log on to CNN.com or one of the local newspaper’s websites to get the news each morning, but waking up to the sound of a news-paper thumping against my door each morning gives me the satisfaction ofknowing that I’m doing my part to help another young person learn thatnothing in life is free.

Circuit Cellar® makes no warranties and assumes no responsibility or liability of any kind for errors in these programs or schematics or for theconsequences of any such errors. Furthermore, because of possible variation in the quality and condition of materials and workmanship of reader-assembled projects, Circuit Cellar® disclaims any responsibility for the safe and proper function of reader-assembled projects based upon or fromplans, descriptions, or information published by Circuit Cellar®.

The information provided by Circuit Cellar® is for educational purposes. Circuit Cellar® makes no claims or warrants that readers have a right to buildthings based upon these ideas under patent or other relevant intellectual property law in their jurisdiction, or that readers have a right to construct oroperate any of the devices described herein under the relevant patent or other intellectual property law of the reader’s jurisdiction. The readerassumes any risk of infringement liability for constructing or operating such devices.

Entire contents copyright © 2000 by Circuit Cellar Incorporated. All rights reserved. Circuit Cellar and Circuit Cellar INK are registered trademarks ofCircuit Cellar Inc. Reproduction of this publication in whole or in part without written consent from Circuit Cellar Inc. is prohibited.


NEW PRODUCT NEWSEdited by Harv Weiner

HIGH-VOLTAGE FLOATING AMPLIFIERThe Model 237 DC coupled amplifier provides a fre-

quency response from DC to 40 KHz (–3 dB at 40 kHz). Itsinput floats at ±20,000 V and its output is connected toordinary lab instruments for data display and collection.

It has both real-time analog and digital outputs. Poweron/off and gain is set with an ordinary infrared remotecontrol unit (furnished) for hands-off operation.

The amplifier provides four gain ranges (x10, x1, x0.1and x0.01) for input voltages of 0 to ±500 mV, ±5 V, ±50V, and ±150 V (all peak-to-peak). The highest voltagerange is limited by component withstand voltages. Inputresistance is 1 MΩ on all ranges.The input can withstand ±150 Vwithout damage whether thepower is on or off

A real-time analog output of 0to 5 V full scale (on highest threegain ranges) is available. An outputof 0 to ±1.5 V is available on thex0.01 gain range. The low outputimpedance of the amplifier willdrive voltmeters, oscilloscopes,analog recorders, etc. A 12-bitparallel, real-time digital output

for 0 to 5 V full scale is available. This consists of two 8-bit bytes with the gain setting in the two most signifi-cant bits. This output connects to a computer parallelport with full handshaking.

The amplifier is housed in a plastic enclosure measur-ing 11″ × 7.75″ × 3″. It features BNC connectors for ana-log input and output, and a DB-25 connector for thedigital output. Power is supplied from an included DC“wall wart” supply (the input amplifier board uses threeAAA batteries).

The Model 237 costs $359. A full datasheet and UserGuide are available on thecompany’s web site.

TDL Technology, Inc.(505) 382-3173Fax: (505) 382-8830www.zianet.com/tdl

http://www.zianet.com/tdl


PIXEL ARRAYThe TSL3301 Linear Optical Sensor Array combines

a 300 dpi 102 × 1 pixel array with an 8-bit A/D con-verter and control circuitry.The monolithic IC operatesdown to 3 V and features asleep mode for reduced powerconsumption. It is ideal forportable battery-operatedapplications such as hand-heldscanners and OCR readers.Other applications includeautomotive steering control,robotics, linear and rotaryencoders, and spectrometers.

It converts pixel data at a 1µ/s. The array is split intothree 34-pixel segments, each with its own set of pro-grammable gain and offset registers. These registersprovide the system designer with the capability tocompensate for offset errors and to normalize the gainacross the array.

The TSL3301 command set provides you with fullcontrol of the device. You can set pixel integrationtime, read and write to gain and offset registers, read

NEW PRODUCT NEWSdata, and perform asynchronous pixel reset. TheTSL3301 utilizes an easy-to-use serial interface for all

control functions. Pixel output isserial, and each pixel value isrepresented by 8 bits. The serialdata clock can run as high as 10MHz for a 1 megapixel through-put rate.

The TSL3301 is supplied in an8-pin clear epoxy DIP packagewith standard commercial tem-perature ratings (0° to 70°C). It isin an extended temperatureglass-windowed package (–40° to115°C) that is designated as theTSLW3301. The TSL3301 and

TSLW3301 cost $4.65 and $5.58 respectively in 1000-piece quantities.

TAOS, Inc.(972) 673-0759Fax: (972) 943-0610www.taosinc.com

http://www.taosinc.com


READER I/OON TRACKGeorge, I really enjoyed your series “The Joys ofWriting Software.” (Circuit Cellar 121–123). Itwas thought provoking and is a must read for allof the software mongers out there. You’re righton about how we expect hardware designers tofollow rigorous design rules and practices inorder to reduce costs of bad ASICs, new boardspins, and missed market opportunities, but weneglect that this can and should also be donewith software as well.

Some faculty I worked with in the past wrotea book titled The Art of Hardware Design(Prosser/Winkel), which explains that the art ismore about applying proper design methodologyto hardware design (e.g., using ASM charts (itwas written in the ’80s), control/data pathdecomposition, synthesizing state machinesfrom it, following good synchronous designpractice, and so on). Perhaps a similar book onsoftware design would be a good idea. No doubt,you’ll probably get your share of negative orcynical feedback from engineers who considerthemselves to be “software artists.” I justwanted to make sure you also get some positivefeedback.

Ingo Cyliax

Thanks, Ingo. You’re absolutely right aboutdesign process. My problem was finding a way tosqueeze all I wanted to say into a manageablesize for the articles. Hardware engineers alsohave a tendency to make the same mistake assoftware guys—they would like to draw schemat-ics from the word go!

In my company, I insist that not a singlecircuit be drawn until the entire system hasbeen defined in terms of functional blocks. It’sdifficult to enforce, but the price for cowboyingis too high.

George Novacek

STILL LEARNINGPlease do continue to keep us on your college-

program mailing list. Now more than everCircuit Cellar is welcome and continues to beviewed as the leader in computer applications. Ibelieve that students want to continue (and thatmeans buy) your magazine when they leavehere.

I am now teaching introduction to micro-controllers and expect to add one or two ad-vanced classes to meet the needs of our localtechnical and engineering students. I knowhow odd this may sound that at a two-yearcollege we get students who already have theirBachelors in engineering, but I guess we aregiving them something they can’t get else-where. I love creating classes that bridgeelectronics, computers, and biomedical elec-tronics together.

Please pass along how much this publicationmeans to me and the students. Thank you.

Prof. William SchlickSchoolcraft CollegeLivonia, MI

If you are a professor who would like toreceive Circuit Cellar to use and distribute inthe classroom, please mail your request and e-mail address on college letterhead to:

Valerie LusterCircuit Cellar College Program4 Park StreetVernon, CT 06066

Editor’s note: The software for Edward Cheung’sInternet project article (Circuit Cellar 123) ismoving from the URL listed at the end of thepublished article to www.cheung.place.cc.

STATEMENT REQUIRED BY THE ACT OF AUGUST 12, 1970, TITLE 39, UNITED STATESCODE SHOWING THE OWNERSHIP, MANAGEMENT AND CIRCULATION OF CIRCUIT CELLAR, THE MAGAZINE FOR COMPUTER APPLICATIONS, published monthly at 4 Park Street, Vernon, CT 06066. Annual subscription price is $21.95. The names and addresses of the Publisher, Editorial Director,and Editor-in-Chief are: Publisher, Steven Ciarcia, 4 Park Street, Vernon, CT 06066; Editorial Director, Steven Ciarcia, 4 Park Street, Vernon, CT 06066; Editor-in-Chief, Steven Ciarcia, 4 Park Street, Vernon, CT 06066. The owner is Circuit Cellar, Inc., Vernon, CT 06066. The names and addresses ofstockholders holding one percent or more of the total amount of stock are: Steven Ciarcia, 4 Park Street, Vernon, CT 06066. The average number of copies of each issue during the preceding twelve months is: A) Total number of copies printed (net press run) 31,559; B) Paid Circulation (1) Salesthrough dealers and carriers, street vendors and counter sales: 5,460, (2) Mail subscriptions: 18,061; C) Total paid circulation: 25,432; D) Free distribution by mail (samples, complimentary and other free): 854; E) Free distribution outside the mail (carrier, or other means): 4,905; F) Total free distribution:5,759; G) Total Distribution: 31,191; H) Copies not distributed: (1) Office use leftover, unaccounted, spoiled after printing: 368; I) Total: 31,559. Percent paid and/or requested circulation: 80%. Actual number of copies of the single issue published nearest to filing date is (November 2000, Issue #124); A)Total number of copies printed (net press run) 31,300; B) Paid Circulation (1) Sales through dealers and carriers, street vendors and counter sales: 4,905, (2) Mail subscriptions: 18,946; C) Total paid circulation: 25,945; D) Free distribution by mail (samples, complimentary and other free): 800; E) Freedistribution outside the mail (carrier, or other means): 4,450; F) Total free distribution: 5,250; G) Total Distribution: 31,195; H) Copies not distributed: (1) Office use leftover, unaccounted, spoiled after printing: 105; I) Total: 31,300. Percent paid and/or requested circulation: 83.2%. I certify that the statementsmade by me above are correct and complete. Joyce E. Keil, Associate Publisher.


e mbedded sys-tems have come a

long way since thevenerable ’8051

microcontroller was hooked up to a2716 EPROM. Embedded controllerstoday cover a wide spectrum including8-bit single chip computers and 32-bitPCs. Each application imposes a differ-ent set of requirements for the design.A modern embedded controller mayrun at 32 kHz or 500 MHz, it may have16 bytes of SRAM or 256 MB ofDRAM and a huge hard drive.

I have always been attracted to thelower end of this spectrum wheresmall size, low power, and customhardware are keys to solving an em-bedded control task. In many of theseapplications, little scratchpad memoryis required and the 32–128 bytes oflocal SRAM on the MCU are adequate.

On the other hand, some other applica-tions require data logging and often acommon SRAM chip external to theMCU fits the bill perfectly. But whatabout those situations when a largequantity of local RAM is required?Enter DRAM.

DRAM has been around for manyyears and is one of the most commonICs manufactured. It’s ubiquitous,found in PCs, printers, and consumerelectronics. DRAM’s advantage is itshigh density of storage; it packs morevolatile RAM onto a square millimeterof silicon than any other technology. Inmost conventional computing applica-tions, this density is the bottom linethat renders DRAM’s disadvantagesinsignificant next to SRAM.

For its density, DRAM typicallyconsumes more power than SRAM, hasa more complex interface, and isslower. However, power consumptionusually is not an overriding factor in acorded appliance. The increased com-plexity of DRAM’s interface was ad-dressed a long time ago, first bydedicated DRAM controllers and nowby integrated controllers on micropro-cessors and support chips. Finally,multilevel cache architectures miti-gate the slower access time of DRAM.

Many embedded design engineersstay away from DRAM because of itspower consumption and added com-plexity. The good news is that newerDRAM with lower supply voltage andpower-saving modes uses less power.And, some embedded processors con-tain internal DRAM controllers, and ifnot, rolling your own for certainmicrocontrollers is not difficult.

Before you run off and start drawingschematics, let me provide one piece ofadvice: don’t design DRAM into your

FEATUREARTICLE

Mark Balch

The next time you’redesigning a small,low-power embeddedsystem, you mightwant to consider us-ing an SDRAM con-troller. As Markshows us, an embed-ded SDRAM control-ler just might be thesimplest and mostcost-effective solution.

Figure 1—SDRAMconsists of a multibankconventional asynchro-nous DRAM array sur-rounded by synchronousstate and interface logic.Separate enable logic ismaintained for each bank,enabling simultaneousoperations on multiplebanks.

SDRAM: The NewEmbedded Solution

Controlsignal

and rowaddress

latch

Columnaddresscounter

DRAMarray

Bank 0

Bank 1

Bank 2

Bank 3

Synchronousstate logic

*RAS

*CAS*WE

Row[]

Col[]

DQ[]

Control,addressinputs

Data bus


embedded system! Ordinary DRAM isa thing of the past, and you don’t wantto get stuck designing with parts thatare already obsolete. You want to usethe new technology (e.g., synchronousDRAM (SDRAM)). SDRAM is easier touse than regular DRAM because it hasa clean synchronous interface insteadof an asynchronous enable architec-ture. Additionally, the benefits ofriding the mass-market technology/economic wave of SDRAM are numer-ous. They range from widely availableparts, to decreasing costs, to increasingspeeds and densities.

WHAT’S SDRAM?SDRAM is a new twist on the same

DRAM technology that has beenaround for years. At the basic level, itcan be thought of as a familiar asyn-chronous DRAM array surrounded bya synchronous interface on the samechip (see Figure 1). However, a keyarchitectural feature of SDRAM ICs ismultiple independent DRAM arrays—usually two or four banks. Multiplebanks can be activated independentlyand their transactions interspersedwith those of other banks on the IC’ssingle interface. Rather than creating abottleneck, this functionality allowsbetter efficiency, and therefore higherbandwidth across the interface.

The synchronous interface andinternal state logic enable these com-plex multibank operations and burstdata transfers. After a transaction hasbeen started, one data word flows intoor out of the IC on every clock cycle.So, SDRAM running at 100 MHz has atheoretical 100-million-words-per-

second peak bandwidth. In reality, thisnumber is lower because of refresh andthe overhead of beginning and termi-nating transactions. In fact, the trueavailable bandwidth for a given appli-cation is dependent on that applica-tion’s data transfer patterns and theSDRAM controller.

Rather than presenting a DRAM-style asynchronous interface to thememory controller, the SDRAM’sinternal state logic operates on dis-crete commands that are presented toit. There still are signals called RASand CAS, but their functions are nowdifferent and they are sampled on arising clock edge. The full set of con-trol signals combine to form the stan-dard SDRAM command set that theIC’s internal state logic processes.These commands configure theSDRAM for certain interface charac-teristics such as default burst length,begin and terminate transactions, andperform regular refresh operations.

SDRAM provides high bandwidthin conventional computers becauseindividual burst transfers require afixed number of cycles of overhead tobegin and finish. A cache controller, forexample, might be able to fetch 256words in only 260 cycles (98.5% effi-ciency). And that’s just for starters. Itcould improve the overall efficiencyby keeping track of its many transfersand interleaving them between mul-tiple banks inside the SDRAM ICs.

This mode of operation allows anew burst transfer to be requested justprior to the current burst ending.Therefore, the overhead to start atransaction could be hidden within the

time of another transaction. Theoreti-cally, the SDRAM interface could bekept busy for every cycle, with theexception of the few cycles per secondrequired to refresh the DRAM array.

A SLEDGEHAMMER?Don’t worry if this sounds like a

sledgehammer to hit your small, em-bedded nail. All this means is that thePC industry is supplying inexpensive,high-density, easy-to-use memorydevices. Most of the whiz-bangSDRAM features are not useful for aslow microcontroller, therefore youcan ignore them when designing anembedded SDRAM controller. In fact,for the basic data transfers, designingwith SDRAM is practical and afford-able. Just because these chips aremeant to run faster than 133 MHz doesnot mean they can’t run at 12 MHz.

If your embedded processor isheavy-duty enough to include an inte-grated SDRAM controller, you canstop reading, you’re done! If not, youcan make a simple controller to fit intoa midsized programmable logic device(PLD). This controller must handleonly the basics—a small subset of anSDRAM’s full feature set. These tasksinclude powerup initialization, peri-

odic refresh, and single-wordread/write operations.

The main drawbacks ofattaching an SDRAM control-ler to many small processorsare setup latency and variableaccess time. At low frequen-cies, the setup latency is actu-ally less than at the intendedoperational frequencies, but isstill about three cycles for aread. The overall access time isvariable because of the needfor periodic refresh operations.Eventually a refresh will con-flict with a memory access.

Figure 2—SDRAMinterface signals looksimilar to those ofasynchronous DRAM,but the signals aresampled on theclock’s rising edgeand combine to formdiscrete commands.

SDRAMIC

CLK

CKE

*CS

*RAS

*CAS

*WE

ADDR[ ]

DQM[ ]

DQ[ ]

Command *CS *RAS *CAS *WE Address AP/A10

Bank activate L L H H Bank, Row A10Read L H L H Column LRead with auto-pre-charge L H L H Column HWrite L H L L Column LWrite with auto-pre-charge L H L L Column HNo operation L H H H X XBurst stop L H H L X XBank pre-charge L L H L X LPre-charge all banks L L H L X HMode register set L L L L Config ConfigAuto-refresh L L L H X XDevice deselect H X X X X

Table 1—Some common SDRAM commands (H is logic high, L is logic low, X is don’t care) are formed by combina-tions of control signals and samples on the rising clock edge. The AP/A10 signal contains valid address informationfor bank activate and mode register set commands.


When this happens and the refresh canno longer be held off, the processorwill have to wait several cycles longerfor the transaction to complete.

These drawbacks are not problemswhen dealing with processors thathave acknowledge-terminated buscycles. Such processors simply requesta data transfer and then wait for anacknowledge signal to complete theaccess. But, most 8-bit processors donot have such advanced bus interfaces.This does not mean that the game islost. Depending on the processor andapplication, there are a variety ofclever solutions, which I’ll cover later.

THE SDRAM INTERFACEBefore diving into designing an

embedded SDRAM controller, let meexplain the operation of the standardSDRAM interface. As mentioned, thememory controller uses a set of syn-chronous control signals to issue com-mands to SDRAM. Figure 2 illustratesthe full set of interface signals, and thecorresponding basic commands arelisted in Table 1. The SDRAM samplesthe control signals on every risingclock edge and takes action as dictated.Some common functions include acti-vating a row for future access, perform-ing a read, and pre-charging a row(deactivating a row, often in prepara-tion for activating a newrow). For complete de-scriptions of SDRAM in-terface signals andoperational characteristics,refer to any of the ICmanufacturers’ datasheets.

Figure 3 provides anexample of how thesesignals are used to imple-ment a transaction and

serves as a useful vehicle forintroducing the synchronousinterface. *CS (not shown inFigure 3) is assumed to betied low. The first require-ment to read from SDRAM isto activate the desired row in

the desired bank. This is done by as-serting an activate (ACTV) command,which is simply asserting *RAS for onecycle while presenting the desiredbank and row addresses.

The next command issued to con-tinue the transaction is a read (RD).The controller must wait the necessarynumber of clock cycles that equate tothe *RAS-to-*CAS delay time. Thisinteger is different for each designbecause it is a function of the particu-lar SDRAM’s timing specification (innanoseconds) and the clock period ofyour interface.

If, for example, your SDRAM’s*RAS-to-*CAS delay is 20 ns and yourclock period is 50 MHz or slower, youwould be able to issue the RD com-mand on the cycle immediately fol-lowing the ACTV. Figure 3 shows anadded cycle of delay, indicating a clockfrequency between 50 and 100 MHz.During the idle cycles, the controlsignals are inactive.

The RD command is performed byasserting *CAS and presenting thedesired column address along with theauto-pre-charge (AP) flag. The AP flag isconveyed by address bit 10 duringcertain commands such as reads orwrites. Depending on the type of com-mand, AP has a different meaning. Inthe case of a read or write, asserting APtells the SDRAM to automatically pre-

charge the activated row after therequested transaction completes. Pre-charging a row returns it to a quiescentstate and also clears the way for an-other row in the same bank to be acti-vated. Remember, a single DRAMbank can’t have more than one bankactive at any given time.

When would you want or not wantto have the SDRAM automaticallypre-charge the row after a transaction?This feature mainly serves to simplifythose implementations that will notattempt to perform seamless, back-to-back transactions from the same row.If you are reading or writing just a fewwords to a row, having the SDRAMautomatically pre-charge that row willspare your controller the effort ofperforming the pre-charge.

If your controller wants to takeadvantage of the SDRAM’s seamlessbursting capabilities, it may be worth-while to let the controller decide whento pre-charge a row. This way, thecontroller can quickly access the samerow again without having to issue aredundant ACTV command.

The AP flag also comes into playwhen issuing separate pre-charge com-mands. In this context, the AP flagdetermines if the SDRAM should pre-charge all of its banks or only the bankselected by the address bus.

Continuing the example, after thecontroller issues the RD command(called RDA if AP is asserted to enableauto-pre-charge), it must wait a prede-termined number of clock cycles be-fore the data appears. This is known asCAS latency in SDRAM jargon.SDRAM typically implements twolatencies—two and three cycles.

Why would anyone select threecycles when two are faster? Like mostthings, you don’t get anything for free.The SDRAM trades access time (clockto Q) for CAS latency. This becomes

important at higher clockfrequencies where accesstime is crucial to systemoperation. In these cir-cumstances, you are will-ing to accept one cycle ofadded delay to achievethe highest clock fre-quency. This is becauseone cycle of delay will be

Clock

Command

Address[]

Data[] D0y D1y D2y D3y

ACTV × RD

B, R × AP, C AP, C

RDy

D0x D1x D2x D3x

Clock

*RAS

*CAS

*WE

Command

Address[]

DQM[]

Data[] D0 D1 D2 D3

ACTV × RD

B, R × AP, C

tRAS to CAS CASlatency = 2

Figure 3—A typical SDRAM readtransaction consists of activating thedesired bank, issuing the read com-mand, and then waiting a specificnumber of cycles for the data burst tobegin.

Figure 4—Greater efficiency can be obtained by initiating a new transaction duringthe idle command time of a previous transaction. This results in seamless, back-to-back transfers that do not waste additional cycles of overhead.


one, two, four, or eight words, or theentire row. It is also possible to config-ure a long burst length for reads andonly single-word writes.

Writes, on the other hand, are notassociated with latency. Write databegins to flow on the same cycle thatthe WR/WRA command is asserted.

The DQM[] bus provides one data/Qmask signal for each byte lane. So, an8-bit SDRAM will have only one DQMsignal, and a 32-bit device will containDQM[3:0]. These mask signals enable

or disable outputs during readsand enable/mask data duringwrites. DQM is useful forwrites because it enables writ-ing to fewer bytes than the databus supports.

DQM assertion follows thetiming of the read or writecommand. This means that theDQM for the first word of atransaction always will be as-serted during the same cyclethat the read or write is as-serted. In the case of a read, theDQM leads the data by CAS

latency cycles. In the case of a write,the DQM lines up with the data be-cause writes do not have an associatedlatency.

After the transaction has com-pleted, the row will either be left acti-vated or pre-charged depending on theprevious contents of the AP flag. If leftactivated, the controller may immedi-ately issue a new RD or WR commandto the same row. Alternatively, therow may be explicitly pre-charged. Ifautomatically pre-charged, a new rowin that bank may be activated in prepa-ration for other transactions.

SEAMLESS TRANSACTIONSThus far, I’ve alluded to the capa-

bilities and benefits of seamless back-to-back transactions (see Figure 4).Because the command bus is essen-tially idle during the execution of atransaction request, new requests maybe asserted prior to the completion ofan in-progress transaction. The con-troller asserts a new read command forthe row that was previously activated.By asserting this command two cycles(CAS latency) before the end of thecurrent transaction, the controllerguarantees that there will be no idletime on the data bus between transfers.

This concept can be extended to thegeneral case of multiple active banks.Just as the controller is able to assert anew RD, it could also assert an ACTVto activate a different bank. Therefore,any of the SDRAM IC’s banks can beasserted independently during the idlecommand time of an in-progress trans-action. As these transactions near theirends, the previously activated bankscan be seamlessly read or written to in

Data latch(optional)

Configuration, read,write, refreshstate machine

Refresh timer

CPLD

SDRAM MCU/MPU

Data Data

Addresscontrol Enable

AddressACK

Figure 5—A simple SDRAM controller for an embedded system canbe designed into a modest CPLD. The basic elements are a commandprocessing state machine, a refresh timer, and an optional data latchfor certain types of implementations.

balanced by a higher bursttransfer rate. Remember thatSDRAM is powerful because ofits superior bursting capabili-ties. At lower clock rates, how-ever, it is possible to accept theslightly increased access time infavor of a shorter CAS latency.

When the CAS latency haspassed, data begins to flowduring every clock cycle. Datawill flow for as long as thespecified burst length. In thisexample, the standard burstlength is four words. This pa-rameter is configurable and adds to theflexibility of SDRAM. The controllercan set certain parameters at startup,including CAS latency and burstlength. The burst length defines thefundamental unit of data transferacross an SDRAM interface. Therefore,longer transactions must be built fromback-to-back bursts and shorter trans-actions must be achieved by terminat-ing a burst before it has completed.SDRAM enables the controller toconfigure the standard burst length as


the same manner shown in Figure 4.This provides a performance boost andcan eliminate nearly all non-refreshoverhead in an SDRAM interface.

SDRAM MAINTENANCEAs I stated, there are some opera-

tions that must be performed on theSDRAM to keep it functioning prop-erly, including interface configurationand periodic refresh. At powerup (or atany other time), the memory control-ler is able to configure various param-eters of the synchronous interface.These include the aforementioned CASlatency, burst length parameters, aswell as a sequential or interleave burstmode setting. The controller config-ures these parameters by asserting amode register set (MRS) command andencoding the parameters onto theaddress bus. At powerup, the control-ler must establish a known, stable stateinside the SDRAM by executing sev-eral pre-charge and refresh commandsprior to the MRS.

Periodic refresh is a universal re-quirement of standard DRAM devices,and SDRAMs are no exception. Thebasic concept behind DRAM refresh isthat the tiny capacitors that store thestate of each bit have a finite leakagerate. If left alone, they would lose theircharge and the memory would forgetits contents.

But, if the capacitors are refreshedbefore they can lose their charge, theywill retain their state; hence, DRAMrefresh. DRAM is refreshed one row ata time by feeding the outputs of theon-chip row sense amplifiers back intothe same row. If the capacitors are ableto reliably hold their charge for N sand there are M rows in the DRAM,you must refresh the array no less thanM times every N s. Modern DRAMcontains internal row counters thatincrement with every refresh cycle.This ensures that if M refresh cycles areexecuted every N s, each row will haveits chance to be properly refreshed.

In reality, typical refresh periods arein milliseconds. A modern SDRAMmay contain 4096 rows per bank, andrequire that all rows be refreshed every64 ms. The controller must ensure thatthe rate is accomplished. They couldbe evenly spaced every 15.625 µs, or

the controller might wait until a cer-tain event passes and then rapidlycount out the commands.

Different types of SDRAMs havediffering refresh requirements, but themeans of executing refresh are stan-dard. All banks are pre-charged becausethe refresh (REF) command operates onall banks at once. When this is done, anexecuted REF command takes advan-tage of the SDRAM’s internal refreshcounter. The counter incrementsthrough each row every time the REFcommand is executed.

DON’T FORGET TIMINGIt’s easy to forget the asynchronous

timing requirements of the DRAMcore when designing around theSDRAM’s synchronous interface. Afterstudying state transition tables andcommand sets, the idea that an asyn-chronous element is lurking can be-come an elusive memory. Make surethat you double-check your integerclock multiples versus the nanosecondtiming specs that are included in yourSDRAM datasheet.

Examples include minimum andmaximum RAS active time, RAS pre-charge time, and write recovery time.The tricky side of these timing specifi-cations is that they affect your systemdifferently depending on your operat-ing frequency. At 25 MHz, a 20-ns timedelay is less than one cycle. The delayis two cycles at 100 MHz. Errors likethese may manifest themselves asintermittent data corruption.

IT’S EASYYou’re probably wondering how

this information is useful to a small,embedded system that isn’t running a

Clock

Command

Address[]

Data[] DIN

ACTV × WR

B, R × AP, C

Figure 6—The simplest read transaction consistsof a bank activation, an auto-pre-charge readcommand, and a single word of outgoing data. Thistransaction is atomic because it does not rely onthe state of a previous transaction and leaves theSDRAM in a “clean” pre-charged state.


SOURCESSDRAM ICsFijitsu Ltd.(81-41) 754-3763Fax: (81-41) 754-3329www.fujitsu.com

Toshiba America, Inc.(212) 596-0060Fax: (212) 593-3875www.fujitsu.com

32-bit microprocessor at 100 MHz. AsI stated, you can pick and choose fromwhat SDRAM has to offer. Take advan-tage of the high memory density andrun at speeds of only a few megahertzto match your microcontroller. If 32-bit memories do not fit your applica-tion, you can use 8-bit devices.

The key to incorporating SDRAMinto your embedded system is provid-ing a simple memory controller to takecare of the SDRAM housekeepingfunctions. The minimal set of SDRAMcontrol logic can fit into modest PLDs.Some basic features are powerup con-figuration, periodic refresh, single-word read, and single-word write (seeFigure 5).

That’s it! All you need is controllogic that can perform these four op-erations and you have a minimalSDRAM interface to your controller.There are as many ways to implementthis basic idea as there are engineers.One technique is a single binary-en-coded state machine inside a PLD. Thisstate machine would be large, but ifyour system is only running at 12 MHzand you use a modern, mainstreamPLD, the timing may be on your side.

HOW WILL THIS WORK?The basic idea behind this simple

SDRAM controller architecture isthat all transactions are single wordsand they all execute atomically. Thatis, each read/write is automaticallypre-charged so subsequent transac-tions do not have to take into accountthe previous state of the device; theSDRAM’s banks will spend their idletime in pre-charge.

The logic to accomplish an atomicread transaction is simple: activate thebank/row, wait for the RAS-to-CASdelay (if necessary), issue the RDA,wait one cycle, and then latch the dataon the next rising clock edge (see Fig-ure 6). Assume the SDRAM was config-ured with a CAS latency of two cyclesand a read burst length of one word.Depending on the operating frequency,you may be able to skip the single-cycle delay between ACTV and RDA.

A one-word atomic write is eveneasier: issue a WRA along with dataafter the RAS-to-CAS delay and return(see Figure 7).

Implementing the periodic refreshcan be done with a counter that rollsover no slower than the regular refreshinterval specified by your device’sdatasheet. At each rollover, the statemachine can simply issue the REFcommand, insert the appropriate de-lay, and then return to execute thenext microprocessor request. Thepower-up MRS command will addseveral states to your state machine,but only has to execute once.

The only thorny issue related toconnecting an SDRAM controller to acommon 8-bit microprocessor is thelack of a request/acknowledge busarchitecture. Such processors can’t beheld off when it takes longer than ex-pected to provide data requested on aread transaction. The SDRAM’s peri-odic refresh creates the problem thatsome transactions will take longerthan others. In processors with an ACKsignal, this is not a problem becausethe ACK would simply be delayedseveral cycles. But, if the controllercan’t hold off the processor, the proces-sor will return to its software withfalse data. Solutions differ dependingon the processor and circumstances.

In some situations, it may be validto run the SDRAM at a higher multipleof the processor clock such that evenworst-case transactions will completein time for the processor’s data validwindow. Alternatively, some proces-sors can be held off by stretching theirclocks at the appropriate time duringtheir bus cycles.

Inserting a dummy transaction priorto the processor’s true data fetch op-eration is a third method. The proces-sor tells the controller that it wants toread data. Then, the controller ex-ecutes the read and holds the data.

The processor issues a second readrequest where it latches the read data.This usually works because there’s

ample time to execute a single-wordread in the time it takes a slow proces-sor to issue successive transactions.

In the case of a write, the processorcould issue the write request alongwith data. The controller would thenbe able to latch that data and hold ituntil the WRA command is issued.This requires that the processor pausebetween successive writes to allowtime for the operation. As with reads,this delay should not be more than oneNOP or equivalent cycle.

IS SDRAM RIGHT FOR YOU?Whether or not SDRAM is useful

for your controller application de-pends on a variety of factors. Do youneed more memory than can be easilyaccommodated by conventionalSRAM? Will your controller boardcontain a PLD that is large enough toaccommodate an SDRAM controller?

I hope you now agree it’s a goodidea to keep SDRAM in mind whensolving controller design problemsthat require substantial quantities oflocal memory. SDRAMs are designedfor high performance, leading edgesystems, yet they may be applied tosmaller and slower systems withoutexcessive cost or complexity. Thismay enable you to design your way outof a memory tight spot in the future. I

Figure 7—The simplest write transaction differsfrom the read case only in that incoming data ispresented in tandem with the auto-pre-charge writecommand. This transaction is also atomic.

Clock

Command

Address[]

Data[] DOUT

ACTV × RD

B, R × AP, C

Mark Balch is an electrical engineerwho designs FPGAs and PCBs in thedata networking industry. He previ-ously worked in the digital TV fieldwhere he participated in the develop-ment of various MPEG-2 broadcastproducts including an HDTV encoder.Mark earned a B.S. in Electrical Engi-neering from The Cooper Union. Youcan reach him at [email protected].

http://www.fujitsu.com

http://www.fujitsu.com


Get Bit Cushions

n ew floating-point DSP chips

are hitting the marketin the $5 to $10 range.

Analog Devices came out with theADSP21065L general-purpose 32-bitDSP processor with on-chip dual-ported memory and an independent I/Oprocessor. When the algorithmic windblows from the right direction, the’065L is capable of 180 MFLOPS (mil-lions of floating-point operations persecond). Texas Instruments recentlyannounced the 120 MFLOP TMSV33.For a little more money, there is thequad-SHARC or TI’s 150-MHzTMS320C6711 for even faster perfor-mance. If these new floating-pointprocessors are so great, why do thesame manufacturers still produce inte-ger DSP processors in such large quan-tities? Analog Devices has its new2.5-V ADSP2188 to compete againstTI’s TMS320VC5410.

In this article, we are going to take adesign-oriented look at implementingalgorithms on integer and floating-point DSP processors. How can youdetermine whether or not your algo-rithm will match the characteristics ofa specific processor? Using integerprocessors means that you have tobecome more aware of the finite andquantized nature of the numbers youare handling. How do you determine

the number of guard bits necessary tocushion your data against overflow orloss of precision? Are there equivalentproblems and solutions when usingfloating-point processors?

BASIC DSP ALGORITHMSListeners perceive sounds as located

inside their head when listening toprerecorded music on earphones (seeFigure 1 [1]). An interesting context inwhich to look at the characteristics ofDSP algorithms is to examine ways ofaltering these perceived sounds so thatyou can generate virtual speakers infront and behind the listener. This isone case when a code developer wouldbe proud to announce, “Look, nothingbetween my ears!”

The simplest way of moving theperceived sound location is to delaythe relative arrival of the sound fromone ear to another. Imagine a singlesound source placed 1 meter in front ofyour head. If your head is about 20-cmwide, the sound takes about 3.35 ms toarrive at each ear. Delay the sound byan additional 0.53 ms and 0.24 ms tothe left and right ears respectively, andthe position of the sound shifts 50 cmto the right. Listing 1 shows how to usea delay line to generate two indepen-dent sound sources from one, assumingsound sampling is occurring at 44 kHz.

There is only one problem, thisdoesn’t work. Ears perceive relative,rather than absolute, delays. Althoughyou might recognize two sounds inyour headphones, it would be difficultto convince yourself that the soundsare not still positioned in the centerand to the right of your head, along theline connecting your ears.

FEATUREARTICLE

Michael Smith& Laurence Turner

What does it take todetermine if your al-gorithm matches thecharacteristics of theprocessor you wantto use? Tune in whileMichael and Laurenceillustrate the bestways for implement-ing algorithms on bothinteger and floating-point DSPs.

Make Your DataComfortable

30˚

Delay 1

Delay 2

Figure 1—The sound source is perceived to be inthe center of the head if the same sound comesfrom both left and right earphones. If the sound isdelayed before being sent to the left earphone, theperceived position of the sound source shifts to theright of the head. Further modeling of the audiochannel using FIR and IIR filters can move theperceived sound out to a virtual speaker in front orbehind the listener.


To get a proper relocation of thesounds, you need a much more detailedalgorithm capable of handling theconcepts shown in Figure 2. [1] Here,we have taken into account the trans-fer functions of both direct and indi-rect sound arriving at your ears fromthe virtual speakers.

The direct transfer functions haveto take into account that sounds con-structively, or destructively, interfereafter bouncing off of the earlobes or areblocked by facial anatomy. The degreeof interference and blocking dependson sound frequency as well as soundlocation. The indirect transfer functionaccounts for multi-reflections andother characteristics of the room inwhich the sound is produced.

The direct transfer functions can beimplemented by applying a real-timefinite impulse filter (FIR) to the origi-nal sounds coming from the head-phones. Listing 2 shows the Cimplementation of a FIR filter, whichproduces the output signal Output(n)

as a weighted sum of the delayedinput sound values Input(n).

In Listing 2, the storage ofolder input values is handledusing a circular buffer imple-mented with pointer arithmetic.Some processors can handle thisaddressing more efficiently thanthe gross data movement of thedelay line in Listing 1.

On more recent DSP proces-sors such as the SHARC 2106X, circu-lar buffer operations are implementedthrough hardware. This allows delayedvalues to be handled with zero over-head, which is a distinct advantagewhen trying to handle multiple FIRfilters between audio samples.

Reverberation and other indirectsound effects can be achieved throughthe application of an infinite impulseresponse (IIR) filter:

IIR filters make use of delayed versionsof both the previous input and outputvalues. In Listing 3, the C code of abasic IIR filter is implemented using afast masking operation on an arrayoffset to achieve circular buffer opera-tions without the continual checkingnecessary in Listing 1. Masking, ormodulus, operations on array indexesor pointers permit processors withoutspecialized hardware to efficientlyhandle circular buffers.

Looking at the sound stage prob-lem, there are eight major characteris-tics of DSP algorithms. First, they arememory- or register-intensive to storeprevious or intermediate results. Sec-ond, DSP memory address calcula-tions compete for scarce processorresources. Third, DSP algorithms alsotypically involve multiplication.

Fourth, they involve lengthy sum-mations where intermediate resultscan be large or small compared to theoriginal input or final output values.And, DSPs involve loop operationswhose checking can compete for pro-cessor resources. Next, DSPs need fastand easy access to peripherals such asA/D and D/A converters and timers.

Fancy, hardware-based, circularbuffer (delay lines) and bit-reverse(FFT) addressing modes are character-istic of DSP CPUs. Lastly, there areno-overhead hardware loops.

Any processor, in principle, canhandle DSP algorithms. However,real-time applications require a lot ofprocessing between data samples. Thecustom features in a DSP-specificprocessor can prove invaluable. Fast

Right Left

Hh,0(f) Hh,0(f)

H0,30(f)H0,330(f) H0,330(f) H0,30(f)

Reverberation

Figure 2—The audio channel must be accurately modeled tomake the sound from headphones be perceived as a set ofleft and right speakers positioned in front of the listener.

Listing 1—C pseudo-code uses a delay line to generate a second perceived stereo sound sourcewhen only a single sound source is present. The delay line is implemented using gross data moves.

! "#

$% &'%(%)(%)*+, - %*.,%)/

0*/0*.1*23

+%)4

55 6+ ' ,%) %

7 8 %*.,%)4

55 6+& )9 %*. ,%) ,%)*+ *%: 0* 8 7 7 4 0*.1* 8 7 7 4

55 6+& )9 ,+% ,%) ,%)*+ *%:

0* ;8 7 7 4 0*.1* ;8 7 7 ! 4

55 <), $%):

0* 8 0* == 4 0*.1* 8 0*.1* == 4

55 :9: $* >,+ %9*% %* -+%) 8 4 +%) ? 4 +%);;2

+%) 7 8 +%)4@


access to intermediate or old valuesstored in memory is improved on theSHARC processor by on-chip memory.

Three data buses allow simulta-neous access to two data memoryblocks and an instruction cache toavoid data/data and instruction/dataclashes. Two independent data addressgenerators off-load address calcula-tions from the main processor core.

However, all the fancy addressingmodes and processing power availablethrough the super-scalar processor canvanish into simple processed sand ifone thing is not taken into account:Precise coefficient and data values areneeded to achieve the specific desiredDSP results.

SIGNAL QUANTIZATIONThe signals and coefficients manipu-

lated in a digital algorithm must berepresented (stored) by finite, quan-tized, binary values. With integer pro-cessors, this process has two mainconsequences. At every point, or node,within the algorithm, there must be afinite range for the signals. There alsois certain granularity in the signal. Thegranularity, or resolution, is the small-est value associated with changes inthe least significant bit of the numberrepresentation. This is in the range of–2N – 1 to 2N – 1 – 1, with a resolution of 1for 2’s complement numbers storedwith a word length of N bits.

The quantization effects also arepresent with floating-point (FP) proces-sors. The 32-bit IEEE single-precisionFP format can store signals with maxi-mum amplitude near 2127. Althoughthe smallest, nonzero magnitude signalthat can be stored is about 2–127, that isnot the resolution available for FPnumbers. The FP granularity changesfrom a fine resolution for small magni-tude signals to a coarse resolution forlarge magnitude numbers.

This change is clear when you lookat how FP numbers 178.125 and0.6958008 are stored in memory.Table 1 shows how these decimal num-bers are first converted to scientificdecimal and then biased exponentbinary numbers. Finally, the number istruncated to the 32 bits that can bestored in memory or register.

FP numbers are stored using 1 signbit, 8 bits for the unsigned magnitude,biased exponent, and 24 bits for thesignal magnitude. That’s a neat trickwith 33 bits stored in a 32-bit loca-tion! The signal magnitude is stored inthe form of A+%9*. The1 is “James Bond-ed”—implied, but notstored—in the number representation.

The granularity of the FP numbersis associated with changes in the lastbit of the fractional part of the numberrepresentation. This granularity iseffectively 2–23 times the value of theFP number that has been stored, so theresolution is continually changing.Changes near the floating point 0.695can be recorded with 256 times greateraccuracy than changes near 178.

COEFFICIENT PRECISIONAll designers must become worried

about the range and granularity ofsignals and coefficients imposed uponthem by the finite word length associ-ated with memory and data buses,registers, and processing units.

Listing 2—C pseudo-code is necessary. The right ear sound is colored by the free-field transferfunctions Hh,0(f), H0,30(f), and H0,300(f). Here, the delay line is handled by a circular buffer implementedusing software pointer arithmetic. Many DSP processors provide zero-overhead circular buffersthrough specialized hardware.

BC DE

% +*+>)*BC

55 +%+, 7 !1/-2 '1 !

/D-2

% !D 8 3 FA/ FA/ FA @

55 +%+, 7 !1/-2 '1 !

/DD-2

% !DD 8 3 FA/ FA/ FA @

$% &'%(%)*+, - %*.,%)/

%0*/%0*.1*23

+%)4

,+ % 0'$)94

% 09 8 '$)94

55 * 9%*, % +%+,

% 09!D 8 G!D 7 4

% 09!DD 8 G!DD 7 4

55 6+ ' ,%) % +*+)* >)*

09 8 -%2 %*.,%)4

55 !$ 1 * 1* 1 ,*.1 ,%) 0* 8 %*.,%)4

55 H%%* *.1 * ,%) '1 >%1 *,

0*.1* 8 4

%* -+%) 8 4 +%) ? 4 +%);;2 30*.1* ;8 09 0 -09!D ; 09!DD24

55 %$ % I * +%+

9!DJ J4 000 JJ

9!DD J J4 000 JJ55 %$ % I : +*+)* >)*

9JJ4000JJ

-9 ? +*+>)*2 9 8 9 ; BC4@

55 6*9* +*+)* >)* %* I ' ,%) *

'$)9;;4 -'$)9 =8 +*+>)* ; BC2

'$)9 8 '$)9 7 BC4

@

T T T

x x

x

x

Inputx(k)

y(k – 1)

–b1 –b2 –b3

y(k – 2) y(k – 3)

y(k)

Output

Σ

Figure 3—Here’s the direct form of an implementa-tion of a third-order digital IIR Chebychev filter.


Figure 3 shows the direct form of animplementation of a third-order digitalIIR Chebychev filter. The frequencyresponse of this filter (see Figure 4) wasdesigned for a passband finishing at1 kHz and a stopband starting at 8kHz. As Figure 4b illustrates, the maxi-mum passband ripple is 0.1 dB. Achiev-ing this filter response requires thefollowing filter coefficients:

b1 = 1.266 765 327 585 4

a sign). Later, we will discuss whatdetermines the effective precision ofthe coefficients.

With 7-bit precision, the best thatcan be done for the b2

coefficient–0.816 689… is to implement it as thefraction 52⁄64, or 0.8125. The coefficientmultiplication operation would beimplemented as a multiplication of 52followed by an arithmetic downshiftby 6 bits (fast division by a factor of64). As a result, the implemented trans-fer function of the filter is far fromwhat was intended (see Figure 5).

Multi-precision operations maydelay the onset of this quantizationproblem, but with loss of performance.However, a better solution is to usespecial filter design techniques. Thesemanipulate all the filter coefficients sothat, within the quantization limita-tions, the filter response is the closestpossible to the desired filter response.This manipulation can be achieved byrandomly hacking at the coefficientsso that three wrongs nearly make aright. A more systematic approach isto use simulated annealing (i.e., equiva-lent optimization techniques).

Another technique is to change theform in which the algorithm is imple-mented. Figures 6a and b show thestructure of a third-order LosslessDiscrete Integrator (LDI) Chebychevfilter and the associated passbanddetail respectively for 7-bit precisionfilter coefficients. [2] The transferfunction for an LDI structure has re-duced sensitivity to the values (wordlength) of the coefficients.

INTEGER OVERFLOWOverflow can occur on integer pro-

cessors when the algorithm involvesrepeated addition of numbers of thesame sign. Implementations of FIR andIIR filters are subject to overflow. Theoverflow condition causes a result thatis outside the range of the processorword length. The result is larger thanthe most positive number, or smallerthan the most negative number thatcan be represented within the integernumber representation.

If overflow is allowed to occur,severe signal distortion appears. For2’s complement, the effect of overflowat any node within the algorithm will

b2 = –0.816 689 070 728 84b3 = 0.211 876 355 985 14

Unfortunately, whether using floatingpoint, block-floating point, fixed point,or other number representations, thefilter coefficients must be representedby a finite number of bits.

Let’s assume for the moment thatthis third-order filter’s coefficientsmust be quantized to an effective wordlength with 7-bit precision (6 bits plus

Listing 3—C pseudo-code uses circular buffers so that the right ear sound is modified by an IIRfilter. The circular buffers are implemented using masking operations on an offset rather than directpointer arithmetic. This approach is useful on processors without hardware circular buffer capability.

BC DE BC(K I

% >)*BC4

% %)>)*BC4

H( #

55 +%+,

% H( 8 3 FA/ FA/ FA @

55 B +%+,

% BH( 8 3 FA/ FA/ FA @

$% &'%(%)*+, - %*.,%)/ 0*/

0*.1*23

+%)4

55 , % +*+)* >)*

,+ %,4

% 09 8 >)*4

% 0%)9 8 %)>)*4

55 * 9%*, % +%+,

% 09 8 GH( 7 4

% 0B9 8 GBH( 7 4

55 6+ ' ,%) % +*+)* >)*

0-9 ; %,2 8 -%2 %*.,%)4

55 !$ 1 * 1* 1 ,*.1 ,%)

0* 8 %*.,%)4

55 H%%* *.1 * ,%) '1 *, 0*.1* 8 4

%* -+%) 8 4 +%) ? H(4 +%);;2 3

0*.1*;80-9;-%,J+%)2G(K20-0924

9JJ4

@

%* -+%) 8 4 +%) ? H(4 +%);;2 3

0*.1*;80-%)9;-%,J+%)2G(K20

0B924B9JJ4

@

0%)9 8 0*.1*4 55 (%* ' %)9) >)*4 %, 8 -%, ; 2 G (K4

@


change the sign of the result. A severe,large amplitude, nonlinearity is intro-duced. To prevent this, the digitalhardware must be capable of handling(representing) the largest number thatcan occur within the algorithm for anyinput signal.

Preventing overflow can be handledin two ways. If the word length associ-ated with all internal nodes of thealgorithm can be made sufficientlylarger than the I/O signal word length,then all possible internal signal valuescan be accommodated and no overflowwill occur. Some DSP algorithms canbe efficiently performed using multi-ply and accumulate (MAC) instruc-tions, which use an 80-bit accumulatorfor storing intermediate results.

If there is insufficient processorword length, then the input signalamplitude must be reduced so that nooverflow is possible with any inputsignal. This reduction is obtained byscaling the input signal using shiftoperators rather than via implicit(slow) division. But, this scaling meansa penalty in useful signal range. Thereis also an increase in the noise level atthe output, because 1-bit inaccuracy atany stage in the algorithm has a greatereffect on the scaled-down signal.

INTEGER OVERFLOW VERSUS FPRENORMALIZATION

Integer processors overflow whenthe intermediate results of an algo-rithm are larger than values that can be

represented in the integer numberrepresentation. An equivalent effectcan be seen in FP processors when theintermediate results are larger than canbe represented in the fractional part ofthe FP representation.

When the FP result becomes toolarge, the processor renormalizes thenumber again. This means that thenumber associated with the fractionalpart still can be represented asA*+%9*. But, the biasedexponent of the FP number representa-tion is adjusted to compensate. Thelarger number is still validly stored

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

"df3.splmag""df3.sqplmag"

Normalized frequency (radians)0.8

Figure 5—Quantizing the multiplication coefficientsto meet the available processor word length willgenerate band-pass characteristics that were notintended.

0–2–4–6–8

–10–12–14–16–18–20

0 0.5 1 1.5 2 2.5 3 3.5

"ldi3.lmag"

Atte

nuat

ion

(dB

)

Normalized frequency (radians)

0

–0.02

–0.04

–0.06

–0.08

–0.1

–0.120 0.1 0.2 0.3 0.4 0.5 0.6 0.7

"df3.almag"

Normalized frequency (radians)0.8

Figure 4—This is atheoretical frequencyresponse of a third-order direct formChebychev filter. (b)is a zoomed versionof the passband of thenormalized filterresponse in (a).

a) b)


within the FP representation, but witha reduced accuracy, because the granu-larity has changed.

Renormalization is a straightfor-ward, but time-consuming operation.It is typically performed as part ofoperations using specialized fast, and/or highly pipelined, dedicated FP hard-ware. It will result in subtle nonlineareffects in most cases. However, if suc-cessive operations are performed onlarge and small FP numbers with differ-ing signs, algorithm instability andinaccuracy can quickly appear.

ADVANTAGES OF FP AND INTE-GER PROCESSORS

Until recently, one ofthe major advantages ofinteger processors wasthe speed of operations(particularly multiplica-tion) and the lesseramount of siliconneeded to implementthe arithmetic units.Less silicon also meanslower battery drain,

which is important in small, stand-alone, embedded systems. Althoughthis cost and performance effect is stillthere, changes in technology havedecreased its importance.

The major advantage of an FP pro-cessor is its automatic renormalizationif the number to be represented be-comes too large or, just as importantly,too small for the available wordlength. Renormalization allows thedesigner to be more flexible whenhandling worse-case scenarios of theinteractions between algorithms andsignals. This does not mean program-mers can use FP processors withoutdue diligence. Indeed, you need to

know the fundamental limitations ofboth integer and floating-point numberrepresentations.

A useful feature in many integerprocessor architectures that partiallycompensates for the advantages ofautomatic FP renormalization is thesticky flag. This can play a useful role,as demonstrated in this DSP exampleof averaging 64 values of a noisy inputsignal that has been sampled using a12-bit precision A/D converter:

Although the averageis guaranteed to be rep-resentable in 12 bits, itis possible for the sumgenerated to grow aslarge as 18 bits. Such asum is not possible toaccurately represent a16-bit integer processor.

A number of thingscan be done to prevent

0–0.02–0.04–0.06–0.08–0.01

–0.012–0.014–0.016–0.018

"ldi3.plmag""ldi3.pqlmag"

Atte

nuat

ion

(dB

)

Normalized frequency (radians)0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

TT

Tx

x

x

InputΣ Σ

Σ

Σ

Σ

Σ

Σ

Output

Figures 6a and b—These show the structure of a third-order LDI Chebychev filter andassociated band-pass detail, respectively, for 7-bit precision filter coefficients.

a) b)


the overflow of the sum. As suggestedearlier, this problem can be overcomeby prescaling the input. In this case,the input would need to be prescaledby four prior to calculating the sum:

Output(n) equals the sum divided by16. The problem with this approach isthat the output value loses precision asthe input signal is effectively sampledwith only 10-bit accuracy.

There is also an annoying side ef-fect. The precision is lost even forinput signals so small that no overflowwould have occurred. To get aroundthis problem, you need an algorithmthat continually checks whether or notthe sum is about to overflow and scalesonly when appropriate. This continualchecking process places a strain onprocessor resources unless shifted tospecialized hardware.

Many processors have sticky bits,which remember overflow conditionsfor many instructions. Rather than theslow checking for overflow at everyindividual stage in the algorithm, thesticky flag is set atthe end. Then, if thesticky bit is set, thewhole algorithm isrepeated with scaledinput values.

Other processorshave accumulatorswith two or threetimes the standardword length. Thisguarantees that in-termediate calcula-tions of a sum can

grow without loss of precision. But, ifthe algorithm requires storage of thesewide intermediate values, then scalingmust occur prior to storage with anassociated loss of precision.

Many processors can be switched toemploy saturation arithmetic. But, itcan cause long-term inaccuracy, espe-cially with recursive DSP algorithmssuch as IIR filters. Typical problemsthat can arise with saturation arith-metic are dead bands and limit cycles.Algorithms have dead bands whenchanges in the input are not reflectedby expected changes in the output.Limit cycles are the opposite—changesoccur in the output when there are noinput changes.

An alternate approach with greaterflexibility is to employ an FP numberformat. Here the numbers will be auto-matically scaled during the algorithm,only if necessary. This implies that FPoperations allow for easier DSP algo-rithm design and provide results atleast as accurate as an integerprocessor’s results.

Although the first concept is true,the second idea can be misleading.Algorithm implementation in integerand floating point can’t be discussedwithout specifying the processor!Imagine implementing the 64-pointaverage using a 16-bit integer formatand an imaginary processor capable ofperforming operations on the packed16-bit FP format that is used for shortword storage on the 2106 SHARCprocessors.

The 16-bit integer algorithm in-volves software scaling inputs by four,whether needed or not. The 16-bit FPalgorithm requires no software scaling,because hardware renormalization will

occur if needed. Keep in mind though,this 16-bit FP format uses 1 bit forsign, 4 bits to store a biased offsetexponent, and 11 bits to store therounded upper 11 bits of the source.With this format, even the addition ofonly two 12-bit input FP values has abetter than 50% chance of causingrenormalization and an associated lossof precision.

For the average algorithm with a12-bit input, the 16-bit integer imple-mentation will be more difficult todesign, but gives better results. Therejust are not enough significant binarybits available in a 16-bit FP numberrepresentation. This is why FP proces-sors have word lengths of 32 bits, toprovide 24 bits to store the signalwithout significant loss of precision.

STAND ON GUARDThe design and speed of integer

algorithms would significantly im-prove if you could guarantee that nooverflow could occur and sufficientprecision could be maintained, so thatthe output value is an accurate repre-sentation of the desired signal. Thiscan be achieved by conditioning theinput signal. Guard bits are placedabove the sampled signal to preventoverflow. Other guard bits can beplaced below the sampled signal toprotect against loss of precision.

Table 1—This shows the conversion of decimal values 178.125 and 0.6958008 (178.125 / 256) into their stored hexadecimalrepresentation for 32-bit IEEE single-precision FP numbers.

x +[ /64]

e(k) = error due to quantization

ideal

Figure 8—Modeling signal quantization effects interms of an ideal operation and an introducedquantization error signal is shown here.

T T T

x x

x

x

Input

x(k)

y(k–1)

–b1 –b2 –b3

y(k–2) y(k–3)

y(k)

Output

Σ

Figure 7—The number of guard bits needed toprotect against overflow can be calculated byknowing the maximum gain between the input andany internal node of the algorithm.

Example 1 Example 2

Decimal value 178.1250 0.695808Best integer 178 1 approximationScientific decimal 1.78125 × 102 6.958005 × 10–1

Scientific binary 1.0110010001 × 2111 1.0110010001 × 2–1

Biased exponent (1).0110010001 × 210000110 (1).0110010001 × 201111110

Stored binary 01000011001100100010000000000000 00111111001100100010000000000000 representationStored hexadecimal 0x43322000 0x2F322000 representation


The analysis also provides designinformation useful when developing FPsystems. First, the calculation gives anestimate of the dynamic range avail-able before precision is reducedthrough renormalization. Equally im-portant is that worse-case informationis made available for deterministictiming calculations of subtasks involv-ing real-time FP applications. Manypipelined FP processors take additionalcycles if renormalization becomesnecessary during an instruction.

For any algorithm, ask how manyguard bits are needed. Also, be sure toask where they should be placed in theprocessor word.

GUARDING AGAINST OVERFLOWIn principle, the determination of

the optimum number of angel guardbits for any specific DSP algorithm isstraightforward. You determine themaximum possible gain between theinput step or node and any other stepor node in the algorithm. Given thisinformation, the input signal can bepositioned on the data bus with a num-ber of additional sign bits.

For the average example, the gain issimple to calculate (64), indicating theneed for six additional sign bits beyondthe input signal’s word length to avoidoverflow. Calculation of the necessaryguard bits for the third-order digitalIIR Chebychev filter (see Figure 3) ismore complex because of the feedback.Such calculations are best automatedusing a tool such as DIGICAP. [3]

The maximum gain between theinput and any internal nodes can beexpressed in terms of the L1 Norms(see Figure 7). The L1 Norms werecalculated using DIGICAP as:

L1(IN, OUT1) = 1.267L1(IN, OUT2) = 1.267L1(IN, DELAY1) = 3.6877L1(IN, DELAY2) = 3.6853L1(IN, DELAY3) = 3.6823

This calculation shows a maximumgain between the input and any inter-nal node of 3.6877. Because this gain isless than 4 but more than 2, there is aneed for two additional sign bits toprotect against overflow for signalsprocessed using this filter form.

GUARD BITS FOR PRECISIONBecause of the limited resolution of

the digital signals used to implementthe digital operations, it is not possibleto represent the result of all operationsexactly. Therefore, the signals in thefilter must be quantized. The nonlineareffects caused by signal quantizationcan result in limit cycles. This meansthat the filter output may oscillatewhen the input is zero or a constant. Inaddition to that, the filter may exhibitamplitude dead bands where it doesnot respond to small changes in theinput signal.

The effects of signal quantizationare most obvious when coefficients arenot exact integers. For example, thecoefficient 0.8125, which would beimplemented as an exact multiplica-tion followed by a division and signalpath quantization (truncation). Oneway to understand the effect of quanti-zation is to treat the calculation of theresult for every node in the algorithmas perfect, but introduce a small, ran-dom error (noise) generator at everynode (see Figure 8).

The maximum amplitude of thenoise generator is associated withchanges of one in the least significantbit of the number representation. Themaximum effect of this error noisesignal at the output can also be deter-mined if the L1 Norm (gain) between

1212

Sign

Input

0

Digitalfilter

Unused

Output

Quantizationerror

(ignore)

Figure 9—Here you see positioning of a 12-bit inputsignal to avoid overflow and quantization errors for athird-order direct form digital Chebychev filter.

T T T

x x

x

x

Input

x(k)

y(k–1)

–b1 –b2 –b3

y(k–2) y(k–3)

y(k)

Output

Σ

Figure 10—Calculation of the number of guard bitsneeded to protect against quantization error requiresdetermination of the gain between any internal nodeand the output.


REFERENCES[1] E. Bessinger, “Localization of

Sound Using Headphones,”Master’s thesis, University ofCalgary, 1994.

[2] L. Bruton, “Low Sensitivity Digi-tal Ladder Filters,” IEEE Transac-tions on Circuit and Systems, vol.CAS-22, March 1975.

[3] L.E. Turner, D.A. Graham, andP.B. Denyer, “The Analysis andImplementation of Digital FiltersUsing a Special-Purpose CADTool,” IEEE Transactions on Edu-cation, Special Issue on Teachingand Research in Circuits and Sys-tems, vol. CAS-32, no. 3, 1989,287–297.

RESOURCEM. R. Smith and L. E. Turner, “Fi-

nite Precision Effects in DigitalFilter Implementations,” SHARC’99 International DSP Conference,U.K., November 1999, 28–33.

any internal node in the algorithm andthe output are known. The calculationis shown in Figure 9.

The L1 Norms calculated for thethird-order digital IIR Chebychev filterusing DIGICAP are:

L1(OUT1, OUT2) = 1.0000L1(DELAY1, OUT2) = 1.2657L1(DELAY2, OUT2) = 1.2657L1(DELAY3, OUT2) = 1.2657

The values indicate that the maxi-mum loss of precision between anyinternal node and the output is 1.26bits. Also, it’s possible for the quanti-zation errors from one node to add tothe quantization errors from anothernode. Therefore, the total possiblequantization error is determined bysumming the individual gains, giving4.7973 bits. This indicates that threeguard bits are needed to ensure thatthe least significant bit of the outputhas meaning.

EXAMPLEYou have seen that for an input

signal of N bits, the third-order digitalIIR Chebychev filter requires imple-mentation with two guard bits to pro-tect against overflow and three guardbits to protect against quantizationerrors. Figure 10 shows the ideal situa-tion for a 12-bit input signal.

What do these calculations mean inreal life when dealing with integer andfloating-point processors? Figure 11shows the ideal positioning of an 8-bitinput signal within the word length ofa 16-bit word processor. There aresufficient guard bits to protect againstboth overflow and quantization error.However, the 16-bit processor wouldnot satisfactorily handle the third-order digital IIR Chebychev filterwhen the input signal is 12-bits wide.

The calculations for necessary guardbits to protect against overflow and

quantization errors on integer proces-sors can be reinterpreted for floating-point processors. Guard bits foroverflow on integer processors trans-late into additional bits necessary toavoid possible loss of accuracy whenfloating-point numbers are renor-malized. Guard bits that protectagainst quantization noise on integerprocessors effectively translate intothe additional bits necessary to avoidfloating-point truncation errors.

Figure 12 shows an 8-bit input valuestored inside the 16-bit floating-pointrepresentation available on theSHARC processor. In this representa-tion, there is 1 bit for the sign, 4 bitsfor the exponent, and only 11 bits tostore the fractional part of the number.With two guard bits necessary to pro-tect against normalization errors andthree guard bits to protect againsttruncation errors, it appears this 16-bitFP representation is 1 bit short of accu-rately processing an 8-bit input signalthrough the Chebychev filter. How-ever, the FP numbers are stored using aA*+ format, which makes an addi-tional bit available.

Using an argument based on thisanalysis, a 16-bit floating-point proces-sor has the precision of a 13-bit integerprocessor. And, a 32-bit floating-pointprocessor essentially has the precisionof a 25-bit integer processor.

In this article, we looked at a vari-ety of techniques to ensure precision in

DSP algorithmson both integerand floating-pointprocessors. Bothhardware andsoftware featureswere discussed.The analysis toolDIGICAP is par-

S d5 d4 d3 d2 d1 d0 QEQE QENENE

Figure 12—For a third-order digital IIR Chebychev filter, a 16-bit floating proces-sor provides an adequate word length to protect an 8-bit signal against floating-point normalization and truncation errors because of the hidden bit associatedwith the format of floating-point numbers.

S S S d6 d5 d4 d3 d2 d1 d0

S d6 d5 d4 d3 d2 d1 d0

QEQE QE ???

S S S d10 d9 d8 d7 d6 d5 d4 d2d3 d1 QEQEQE

Overflow protection Quantization protection

8 bits of useable signal

Usable signal

For a 12-bit input signal1 bit of output is lost to quantization error

Figure 11—For a third-order digital IIRChebychev filter, a 16-bit integer processorprovides an adequateword length to protectan 8-bit signal againstoverflow and quantiza-tion error. A 12-bit signalwould lose 1 bit ofprecision in the output-to-quantization error.

ticularly useful, because it calculatesthe necessary guard bits to protectagainst overflow and quantizationerror. Copies of the DIGICAP diskwith user manuals and examples can beobtained from Laurence Turner. I

Mike Smith and Laurence Turner bothare professors of electrical and com-puter engineering at the University ofCalgary, Canada. Mike teaches andresearches microprocessor operationswith a biomedical slant. You mayreach him at [email protected] teaches programming lan-guages and ASIC design while research-ing digital signal processing systemdesign and implementation. You mayreach him at [email protected].


A Hand-Held Scoring Device

f or automaticscoring of multiple

choice tests, manyschools use a commer-

cially available system based on adesktop card reader machine, whichrequires that students mark their an-swers on preprinted forms of specificsize and layout. This method is expen-sive because of the equipment andscore cards, therefore, usually it’s usedonly for critical testing.

In most cases, because only onecentralized scoring machine is avail-able to teachers, it is not located in theclassroom where it would offer themost convenience. Perhaps more im-portantly, the most useful time toevaluate test results is immediatelyafter a test so that feedback can be

given and the missed questions dis-cussed promptly. This is especiallydesirable for periodic quizzes wherethe intent is to allow the teacher toquickly gauge the classroom’s learningprogress. What is needed is a better,lower cost, convenient way of quicklyscoring quizzes.

INTRODUCING THE QUIZWIZTo answer these needs, I developed

a hand-held scoring device based on aPhilips 8-bit microprocessor. The87LPC764 is a new 20-pin offering thatcombines fast speed, 8051 code com-patibility, and low cost. This processoris ideally suited for the scoring deviceproject because of its 4-KB code space,power-saving modes, serial port, andremaining 16 I/O pins. My objectivewas to create a device that is simple tooperate and affordable enough thatevery teacher could own one (seePhoto 1).

The QuizWiz has many featuresthat make the teacher’s ability to scoremultiple-choice quizzes fast and easy.It reduces scoring time to only 10 to15 s per page. It is capable of scoringtests printed on standard paper with-out preprinted forms, using a word-processing template.

The QuizWiz does not require ma-chine-assisted paper handling mecha-nisms. Also it operates on three AAAbatteries. The device is inexpensive($30 for parts) and measures only 6″ ×1.7″ × 1″. Simple to learn, the newteacher’s aid requires only two buttonsto operate.

A 2 × 8 LCD shows the status andresults, and a buzzer provides distinc-tive audio feedback. Automatic power

FEATUREARTICLE

Paul Kiedrowski

When it comes tomaking the grade,Paul’s Design2Kproject passed thetest with flying colorsand won a grandprize. With the Quiz-Wiz, a teacher has asimple and cost-effec-tive tool that can re-duce the time spentgrading quiz forms.

Photo 1—The prototype QuizWizsports a 2 × 8 character LCDdisplay and just two operatingswitches labeled “scores” and“save.” The quiz format can beseen here, requiring a startingsync section, dark areas betweenanswer selections, and a minimumamount of white space betweenquestions. A mini-DIN connectoris on one end to provide an op-tional serial port interface.

The QuizWiz


shutdown during idle periods provideslong life. The flexible quiz formatallows multiple columns and pages.

For convenience, there is temporarymemory storage of results duringpower shutdowns. Totals, per questionand per quiz, are available. TheQuizWiz can handle eight choices perquestion, four columns of questions,and 32 questions per quiz. The proces-sor provides an RS-232 serial connec-tion for uploading results to a PC,which allows tracking of which ques-tions were missed per student.

CIRCUIT DESCRIPTIONFigure 1 shows the circuitry has

been partitioned by the two chassissections. A PCB in the lower half con-tains most of the components, whereasthe sensor tip, LCD, and battery com-partment are in the upper half. Theprocessor was DIP-socketed for thedevelopment phase (see Photo 2).

A single reflective optosensor waschosen to perform the scanning detec-tion process, with an optimum sensor-

to-paper distance of 1 mm. Topreserve battery power, theoptosensor LED is only acti-vated when the QuizWiz ispressed against the paper, de-pressing a mechanical switchlocated in the tip. An alternatescheme that I initially consid-ered required two sensors, thesecond one used for scanning aparallel column of markingsintended only to synchronizethe scan position. The addi-tional LED would have signifi-cantly increased the powerconsumption, however.

Normal battery operatingcurrent is approximately 25 mAwhen all circuits are operating, 15 mAwhen not scanning, and 20 µA duringshutdown. Using three AAA batteriesin series with a typical capacity of1000 mAh, the teacher can score ap-proximately 100 quizzes for 30 stu-dents before replacing the batteries.

The QuizWiz uses a simple three-chip design consisting of the processor,

a 5-V DC/DC converter, and an RS-232 three-wire interface. The87LPC764 is a good match for therequired features, because all of its pinsand most of its features are used in thisproject. The only features not used arethe I2C interface and analog compara-tors. To minimize cost further, noexternal crystal is required, because

the processor conve-niently includes an inter-nal 6-MHz RC oscillator.

OPERATIONPhoto 1 exemplifies

how the quiz format isarranged to aid in thescanning process. Theformat requires questionsto be arranged as usual incolumns, with threeadditional constraints.These constraints includea short sync marking atthe top of each column toaid in establishing thescan rate, a black or dark-ened area between eachof the answer selections,and a minimum amountof white space betweeneach question to distin-guish them.

To use the device,teachers simply place theQuizWiz on the paperand slide it down alongthe check boxes used formultiple-choice selectionby the students. The

Photo 2—A design-for-manufacturing approach was taken forthe construction of the aluminum-chassis prototype. The maintwo-sided PCB has ample room for parts, mainly because ofthe housing size needed to fit the AAA battery pack and LCDdisplay.

Figure 1—Here you can see the 87LPC764 processor, MAX221 RS-232 interface, and MAX710 DC/DC converter. Thedevice supplies 5 V to the LCD, which uses a 4-bit interface to save I/O pins.


device scans the selectionsmade and compares theresults to the correct an-swers that were previouslyscanned in via a masterquiz. After each quiz isscanned, the score is dis-played as both the numberof correct answers and apercentage. The teacheralso can select the currentcumulative scoring statis-tic for each question or theoverall quiz, so the appro-priate review focus can beestablished.

As stated earlier, theQuizWiz includes a sound-ing device to provide au-dio feedback. A short beepwill be heard if a column isscanned properly, a longerbeep when all columns ofthe quiz are correctlyscanned, or various alarmbeeps if an error is de-tected. After each success-ful quiz scan, the teacherpresses Save or scans again.

At any time after amaster quiz is scanned, theteacher can press thescores button for results ofeach question, as well as for otherinformation. Simply continue to pressthe button to advance to the results forthe next question. Note that holdingdown the scores button for at least 1 sresets the display and shows additionalpertinent data.

Results are stored in memory unlessbattery power is removed or both thescores and save buttons are held downfor more than 5 s. Then, the QuizWizwill erase all results and start over,directing the teacher that the next quizscanned must be the master.

The QuizWiz goes into low-powershutdown mode after 2 min. of inactiv-ity and is returned to normal operationwhen both scores and save are pressedtogether. A flow diagram of the opera-tion is shown in Figure 2.

For access to quiz scoring details asthey are scanned, you may connect aPC to the QuizWiz using a standardRS-232 serial port connection at 9600bps. The QuizWiz automatically de-

tects the presence of the serial portconnection, and power usage is re-duced when not connected.

ADDITIONAL FEATURESThere is no main switch needed to

disconnect electronics from the bat-tery, and existing data is preserved inRAM during shutdown. Although theprocessor has keyboard interrupt capa-bility, a polling technique is used in-stead, because a main ISR timing loopis required. The circuit contains provi-sions to detect a low-battery conditionand displays an appropriate message,then shuts down if necessary.

The device must not only correctlyidentify marked answers, but also becapable of reading multiple columns ofquestions and ensure that the totalquestion count is the same as that ofthe master quiz.

Because the LCD used was chosenfor its minimal cost and small size, itwas important to carefully choose the

formatting of the dis-played results. For ex-ample, pressing the scoresbutton presents the fol-lowing sequence:

SCORING RESULTS:TOTAL # QUES = XXTOTAL # COLMN = XTOTAL # QUIZ = XXAVERAGE SCOR = XXXQUES #01 SCOR = XXXQUES #02 SCOR = XXX(And so on, until all ques-tions are listed.)QUIZ #01, SCOR = XXXQUIZ #02, SCOR = XXX(And so on, until all quiz-zes are reported.)NO MORE QUIZZES(Then it starts again.)

HARDWARE ANALYSISThe microprocessor has

the capability to monitorkeyboard interrupts, butcannot distinguish keystates or releases. Thekeyboard interrupt is levelsensitive and will continuegenerating interrupts untilthe key is released. There-fore, polling must be used

to determine when multiple keys arepressed as well as for debouncing.For these reasons, it is simpler to usethe Timer0 interrupt to monitor keystates and actions, and only enable thekeyboard interrupts for wake up frompower shutdown. The nominal Timer0interrupt rate is:

s

The behavior of the device during ascan is detailed in Figure 3. Assuming a10″ column of 0.1″ markings per an-swer (dark plus light), scanned in 0.5 s,the minimum value of TSYNC is:

s.

Assuming markings are .25 and thecolumn scanned in 3 s, the maximumTSYNC is: s.

So, during scans, if you use Timer1in 16-bit mode, it has a range of 0 to

Main program

Timer0 interruptWatchdog timeout (1 s)

Activity timeout (2min.)

Battery power-on"cold" reset

Power shutdown mode

Scores and Savekeys both pressed?

Initialize CPU,display intro messages

Do master scan

Scan

Scan

Scan

Select Saveor Scan

Setup master data table

Select Scanor Scores

Scores pressedfor >1s?

Do student scandisplay total score

Select Saveor Scan

Go to next quiz results

Calculate score

Display quiz results

Reset "next quiz"pointer to

master info

Show wrong answers and/or update student data table

Inc. big timers

Set delay flags

Read key states

Set key flags

Check battery

Check stack ovf

Feed watchdog

Return

Save

Save

YesScores

No

Figure 2—In the operation flow diagram, notice that the scan switch is depressedwhenever the tip is pressed to the paper. This approach minimizes power con-sumption by allowing the device to use only a single optosensor.


65.5 µs, which is close enough for start-ers. You can improve the range byindependently measuring TDARK andTLIGHT. To conserve memory usage,if you only store the upper eight bits ofTimer1 when measuring, the precisionwill be 256 µs. That’s good enough,because TDARK won’t be less then2 µs, and this error does not accumu-late during the scan.

The Timer0 interrupt latency is notcritical because the fastest operationyou are trying to measure is TDARKto within 1/10 of its period, or ap-proximately:

.

Therefore, you can either disabletimer0 interrupts during this criticaltime (with an insignificant effect) ortry to keep the timer0 ISR under 250µs. If enabled, Timer1 should be set toa higher priority than Timer0.

The “big timers” are all 8-bitcounts (reset at different times) ofTimer0 interrupts, so the maximumrange is 255 × 65.5 ms = 16.7 s. Thereis a 3-bit additional counter used forthe activity timeout, which is 2 min.

The scan key is usually closed, so itdraws current normally. It is biasedthrough the 5-V line that is turned offduring shutdown, and its keyboardinterrupt doesn’t need to be enabledfor wake up. The Save and Scores keysare biased to battery because they needto be monitored for wake up detection.Either can cause wake up, but pollingmust be done to ensure that both arepressed. This can be done withoutactivating the 5-V line.

In addition, this can help ignorefalse key presses. Key press states aredetermined when two successive keystate reads are the same in the Timer0ISR. Because you need to determinehow long a key is pressed as well as if

Listing 1—Take a look at how the ISR functions.

!

"

#$ $

!

!

$ $

% &

'

#!

(

!

)* )*

)*

)* )* + , )*

!

- .+ /' ' #$$


more than one is pressed, no action istaken on key presses until after theyare released.

The buzzer and sensor LED can bedriven directly from the port pins(20 mA maximum), however, the cir-cuit uses a 2N3904 NPN transistorbuffer to provide noise isolation.

The battery voltage was selected tobe 4.5 V (i.e., three AAA cells in series)because the LCD’s high state outputvoltage must be within 0.5 V of theprocessor’s supply voltage. Using threeAAA cells instead of two is allowedbecause of the height of the LCD andbecause the 5-V converter has betterefficiency there. A standard AAA alka-line battery (0.41″ × 1.75″) is typicallyrated for approximately 1000 mAh, or40 h at 25 mA of continuous usage.

By contrast, you could eliminate theconverter by using a standard 9-V

battery and a linear regulator. How-ever, that would be less efficient. Fur-thermore, with a capacity ofapproximately 500 mAh, it wouldyield only a 20-h lifetime.

By driving the LCD in 4-bit modeinstead of 8, you free an additional fourI/O lines, which is convenient andallows you to add power shutdown,battery monitor, buzzer drive, andserial port detection features.

The scan switch could be conve-niently replaced by another opto-sensor, but it would draw too muchcurrent. Another alternative is to peri-odically pulse the scan optosensor (i.e.,turn on the LED) and check for a highdetect level, indicating that it wassensing a reflective surface. Then, ifsynchronization wasn’t found within aprescribed time, you would turn it offagain. However, this also would draw

Scan switch is depressed when device touches paper; sensor LED activated---Wait for switch and optosensor to stabilize, then monitor detect signal---Disable serial port of scan if QuizTimeout or Column Timeout set NewQuiz; otherwise assume new column---Set QCurrent, QTotal as appropriate; read and save ScanCtartTime from Timer0---Wait for first dark edge to occur

Transition occurs, start Timer1, monitor detectstop Timer1, measure TDARK, restart Timer1

Call this period TSYNC = TDARK + TLIGHT

Next transition; stop Timer1, measure TLIGHT, restart

Repeat above process, then compute averagevalues for TDARK, TLIGHT, TSYNC

Wait for next light edge

Monitor detect , any dark to light records Timer0; this is reference point QRefTime used to determine when the next question has started---Set SyncDone flag

A gap between dark markings of more than1.5 × TSYNC, indicates start of next question

Transition occurs, stop Timer0 and measure (Timer0–QRefTime); restartTimer0; if delay >1.5 × TSYNC, increment QCurrent; then always startTimer1 for interrupt at t + TDARK +

Timer1 interrupt, read and record detect state; if light clears DarkSpaceflag and waits for int0 at start of next dark edge; if dark restarts Timer1 forinterrupt at t + TSYNC and sets DarkSpace flag

Transition doesn’t occur if previous space is dark

A new dark to light transition resets reference pointQRefTime is used to determine when next question starts

If the previous space was marked, the next detectsample will occur here based on Timer1 interrupt

Note: The scan switch is constantly monitored by a Timer0 loop every 65 ms.If the scan switch is released at any point other than between questions, abort thecurrent process, display “SCAN ERROR,” and wait for new sync to begin. Awhite space delay of > 1.5 × TSYNC is required after the last QRefTime before scan is released, in case the device is prematurely lifted from the paper.

Allow up to eight choices per question, eight questions per column.

Scan switch released; if QCurrent = QTotal then start saveroutine, otherwise restart NewColumnDelay (assume new column if next scan in less than 5 s) and NewQuizDelay (assume new Quizafter 10 s)

Thisdiagramdepictsa typicalcolumn of quiz

markings.

LightDark

Syncportion

requiredat startof eachcolumn

Scandirection

orincreasing

time

This is amarkedanswer

2–30 mstypical

Gap hereindicates

newquestionor end ofcolumn

time TDARK

time TLIGHT

Figure 3—The scanning process diagram shows how the quiz markings are interpreted and how timing isused to detect a marked answer. The sync portion provides an estimate of the scanning speed.


SOURCE87LPC764Philips Electronics N.V.+31 40 27 34184Fax: +31 40 27 32250www.philips.com

Special function registers(up to128 bytes)

Per question scoresand master data

(32 bytes)

Per student scoresand general-purpose flags

(32 bytes)Variable storage

(16 bytes)Bit-addressable memory

(16 bytes)

Stack(24 bytes)

Register R0–R7(8 bytes)

Current quiz results (Q1-32)(4 bytes)

Key and I/O status flags (2 bytes)

Scan progress flags (1 byte)

Error flags (1 byte)

Timer enables and overflow (2 bytes)

Byte variable (5 bytes)

Value = 7E used for stack monitoring20h

2Fh

00h

07h08h

1Fh20h2Fh30h3Fh40h

5Fh60h

7Fh

FFh

80h

16 bytes

Figure 4—The internal RAM memory map shows that all ofthe 87LPC764 memory is used.

too much current because the samplingwould have to be done even if therewas no intention of using the device.

The best solution is to embed thescan switch into the tip of the deviceso that it gets pressed automaticallyduring use. Note that a small lever-activated micro-switch is well suitedfor this purpose.

Optosensor testing revealed that forthe best rail-to-rail output signal, asimple op-amp comparator is needed.The processor’s fixed internal voltagereference of 1.28 V is not well centeredfor this application (2.5 V is desired),and using an external reference woulduse another I/O pin, so an external op-amp is used instead of one of theprocessor’s comparators. An externalop-amp also allows buffering closer tothe sensor and away from digital de-vices for noise protection.

The external op-amp chosen isMC33171, designed for single-supplyoperation, low 180-µA supply drain,and 1.8-MHz bandwidth. In addition,pin INT0/P1.3 is used for the detectsignal, so scanning can be interrupt-driven. This pin is a Schmitt triggerinput, which improves noise rejection.

The detect sensor signalmust be such that moving thesensor from a light to darkmarking causes the detectline to go from high to low, soInterrupt0 can be used (set foredge triggering). When thedevice is lifted from the pa-per, there is no reflection andthe detect line is low, similarto that of a dark marking.This is not a problem, be-cause the scan switch is de-pressed first, followed by ashort wait before the sensor isread. At this point, detectshould be high because of thepaper’s light area reflections.

SOFTWAREResults for the current

quiz scan are maintained infour bytes, where each bitindicates right or wrong for aparticular question (see Fig-ure 4). The cumulative scor-ing for up to 32 questions aswell as the master results

require 32 bytes.An external hardware reset signal is

not needed because the QuizWiz isusing the watchdog. Whenever a resetoccurs, the WatchdogOverflow bit ischecked to determine if the watchdogcaused it. Because the watchdog timercan’t be turned off by the softwarewhen it enters shutdown mode, thedevice must expend a small amount ofpower to refresh it. Alternatively, theWD timer can be disabled completelyat startup and no monitoring is neededduring shutdown.

The main ISR loop uses Timer0 inthe 16-bit mode 1 and always runs (seeListing 1). The stack will be used forthree levels of subroutines. Interruptswill be disabled as needed to ensurethis. Assuming that a single subroutinewill need to push PC, DPTR, ACC, B,R0, and R1 (worst case), this will re-quire 8 bytes. Therefore the 24 bytesfrom 08h to 1Fh are reserved for thestack. Store 07Eh at location 20h atthe bottom of the bit-addressablememory to monitor stack overflow.This value will be checked during eachTimer0 interrupt, and if changed, itwill cause a software reset.

LAST THOUGHTSThe most challenging aspect may

have been cramming all of the cir-cuitry into the smallest possible pack-age. Several iterations were required tooptimize mechanical constraints withbattery size, circuit layout, and assem-bly methods.

The software was limited mainly bythe amount of available RAM, but the8051 architecture proved adequatebecause of bit-flag memory. If evenmore RAM was available, then addi-tional scoring details could be storedor perhaps more complex grading re-sults. And, if more I/O pins were avail-able, then you might consider using afull 8-bit LCD interface and a largercharacter display.

Another improvement would be ifthe LCD could run directly off batteryvoltage, which of course varies overtime. This might lower current drainand eliminate the need for a 5-V regu-lator. However, a brief search for sucha display was not successful.

The scan timing currently is deter-mined by the use of simple sync marksat the top of each column and, there-fore, depends on a uniform scan speed.In a more elaborate approach, the de-vice would determine the scan ratefrom the timing of the markings them-selves and adapt to any changes as thescan progresses.

But, the difficulty is that the proces-sor must store a significant amount ofscan data and post-process it to evalu-ate the variable scan rate and take intoaccount marked spaces. I

Paul Kiedrowski obtained a BSEE fromRose-Hulman Institute in 1982 andhas worked in the RF, microwave, andcommunications industries since then.Currently, he is senior staff engineer atMotorola in Fort Worth, Texas, devel-oping base station infrastructureequipment. You can reach him [email protected].

http://www.philips.com


NOUVEAUPC Edited by Harv Weiner

SINGLE-BOARD COMPUTERThe Arcom SBC-386 is a single-board computer de-

signed for use in high-volume applications where net-working capabilities are required. It provides theperformance and reliability of previous generationboards, with the addition of onboard Ethernet. Applica-tions include communications products, intelligenttransportation products, medical products, and indus-trial control.

The heart of the computer is the 25-MHz 386EXCPU from Intel. Included on the board are 2 MB ofDRAM, 1 MB of flash memory, up to 512 KB of bat-tery-backed SRAM, three serial ports, a debug port,10BaseT Ethernet, and an onboard 8- to 18-VDC (10- to16-VAC) power supply.

The SBC-386 is rated at an operating temperature of–20° to 170°C and has been designed to meet EuropeanCF standards. The board is supplied with U.S. Software’sSupertask Multitasking RTOS and Treck’s real-timeTCP/lP stack. Arcom also offers a full suite of softwaredevelopment tools.

A development kit that contains all the hardware andsoftware needed to allow immediate development isavailable.

The SBC-386 (including software) costs $199 inOEM quantities. The SBC-386 development kit costs$415.

Arcom Control Systems(888) 941-2224Fax: (816) 941-7807www.arcomcontrols.com

http://www.arcomcontrols.com


PC/104 CPU MODULEThe CPU-1220 is an embedded 486 AT computer

in a PC/104 form factor.The module is built around a 486DX core operat-

ing at 75 or 100 MHz. Onboard features include 32MB of DRAM and SVGA with 2 or 4 MB of DRAM(for resolutions up to 1024 × 768 with 16 millioncolors at 75 Hz). Also included are IDE and floppydisk interfaces, four RS-232 serialports (two configurable as RS-485), and one 10/100-Mb Ethernetcontroller.

A keyboard port, bidirectionalparallel port EPP-ECP, SSD socketwith up to 288 MB of solid statedisk, watchdog timer, and real-time clock are included. The par-allel port is internallymultiplexed with the floppy diskcontroller, allowing the floppy tobe connected.

The flash BIOS has a capacityof 1 MB, 128 KB of which is used

NOUVEAUPCfor the BIOS and its extensions. The remaining 896 KBcan be used as read-only and can store the operatingsystem and user programs and data. The BIOS isreprogrammable onboard, and the setup parameters aresaved in flash memory, allowing the module to operatewithout battery.

The CPU-1220 supports any operating system avail-able for the standard PC platformssuch as DOS, ROM_DOS, Windows3.11/95/98/2000/NT, Linux, aswell as real-time operating systemslike QNX, pSOS, PHARLAP,VxWork, WinCE, and RT_Linux.

Options for the board include acustom BIOS, fast BIOS boot, ex-tended operating temperaturerange, and custom connectors.

Eurotech SpA+39-0433-486258Fax: +39-0433-486263www.eurotech.it

http://www.eurotech.it


EPCREAL-TIME PCs

Ingo Cyliax

A Cup of Java

Whether you’re start-ing your morning atStarbucks or cappingoff the evening atBorders, a cup of joecan make a lot ofthings better. Don’tknow beans aboutJava? Then tryIngo’s fine blend oftheory and applica-tion on the subject.

Part 1: Embedded and Real-TimeApplications

i might be in-clined to think

that you’ve beenliving under a rock if

you’ve never heard about Java. TheJava strategy was developed by SunMicrosystems several years ago. I usethe term “strategy” because Java isactually a collection of things (lan-guage, object libraries, and Java VirtualMachine), rather than one component.

At first, it seemed like there was alot of hype about Java, and it wasn’treally clear what it was suited for.Initially, it was marketed as a solutionto everything. On the client side, Javaquickly caught on, with a model todownload a Java application (applet) toyour web browser and execute itwithin a safe sandbox kind of environ-ment. This was nice be-cause it allowed you todevelop complex GUIsthat could run locally andprovide solid perfor-mance instead of usingthe traditional HTML-style GUI interface. Also,because the code ex-ecutes in a sandbox and(by default) only allowsInternet connections tothe server that the appletis downloaded from, it’s

easy to develop Internet and Intranetapplications without bothering withInternet security issues.

After client-side Java applicationsbecame popular, some server-basedJava applications gained acceptance,where Java programs can run on a webserver to implement applications.Usually this would be a kind of appli-cation gateway to a database server, orperhaps a reservation system. Therehas been a lot of press about thesetypes of applications, but I’m not goingto talk about that in this article. What Ido want to write about is Java in em-bedded and real-time applications.

WHY JAVA?There are several features that make

the Java strategy attractive for embed-ded applications. Java is an object-oriented language that maintains muchof the basic C syntax. Think of Java as aC++ with all of the complexity re-moved. This makes it an easy languageto learn and, in theory, you can expendmore brain power solving the problemsand implementing the algorithms thanfighting the syntax of the language.

Another feature is that it is stronglytyped, and this is enforced by the com-piler and some JVMs or class checkers.I’ll get to this a little later.

Besides the strong typing, there is notraditional pointer type. So, like it ornot, you can’t typecast a pointer to anarray into an integer. This is deemed asafety feature because now you can’tjust access memory willy-nilly, like inC or C++. If you want to access deviceregisters, you’ll have to write nativefunctions of an object class. This re-stricts access to the pointer by using aprotocol or API.

Machine type Length Java wrapper

— — BooleanByte 8 bit ByteShort 16 bit ShortInt 32 bit IntegerLong 64 bit LongFloat 32 bit IEEE floatDouble 64 bit IEEE doubleChar 16 bit Unicode character— — Void— — String

Table 1—Here’s a list of the basic data types in Java. These sizesare defined in order to make Java code portable across architectures.


Listing 2—Here is a simple in Java. The source code is more complex than in C,but the object file is only 436 bytes.

!

"

"

Finally, the language uses garbagecollection instead of relying on theprogrammer to explicitly free memory.This could be a foreign concept to Cand C++ programmers at first, but inpractice it’s actually not a bad idea.

In a nutshell, you allocate memorywhen instantiating new objects orarrays. As long as your program is stillreferencing the object or array, thememory allocated to it stays around.After the reference has been forgotten(i.e., none of the object reference vari-ables or array variables exist), thememory is freed at some point.

Right about now, C and C++ pro-grammers are probably saying, “Yeah,but....” Garbage collection does haveissues that need to be addressed whenyou want to use it in memory-con-strained or real-time applications, butit’s a slick way of dealing with memorymanagement issues. It pretty muchguarantees that there is no memoryleak. Memory leaks, along with illegalpointer problems, probably account formost bugs in C and C++ programs.

I will get to these issues next month.This time around, I want to begin witha discussion about the Java program-ming language and the Java VirtualMachine (JVM).

THE LANGUAGE TO LEARNAs I mentioned, the Java language is

fairly simple. There are basic intrinsicdata types (see Table 1).

In order to avoid confusion, all ofthe basic machine data types havedefined lengths and encoding. Thisimplies that a long machine type isalways a 64-bit integer, no matter whattype of machine the code is actuallyrunning on. Also, all of the integerquantities are signed integers. There isno unsigned integer.

You should be familiar with all ofthese data types if you program in Cand C++, except for the byte and char-acter type. A byte is an 8-bit integer,and a character is a 16-bit quantitythat uses Unicode (rather than ASCII)to encode character representations.The difference is that you can do mathon a byte but not on a character. In Cand C++ it’s possible to do math on achar, even if it holds an ASCII charac-ter representation. This is necessary to

force programmers to be disciplinedabout what they mean when they rep-resent a value in their programs.

I’ve added the types for Boolean,Void, and String, which are not reallydata types represented in the hardware.Strings are a special kind of type thatare implemented in an external classlibrary, so they are objects as far as theJVM is concerned. The compiler knowsabout them because it needs to be ableto deal with String constants. A Bool-ean is a true/false value, and it’s up tothe JVM to figure out how to store it ina convenient format.

In Java there is an assignment opera-tor (=) just like in C. In fact, most of theactual language syntax is the same.However, it’s stricter about type con-versions and whether or not variableshave been initialized. For example, thecode segment in Listing 1 wouldn’tmake it past the compiler in Java, butit will pass most C compilers withoutwarning and will actually run.

I don’t know how many times I’vemade this mistake. Granted, a com-monly used tool like lint would catchthe uninitialized variable in C and C++,but because the assignment in the #

expression is legal C syntax, it’s hard tospot. The reason Java flags it is becausethe # expression requires a Booleanvalue, but the constant 1 is compatibleonly with the integer types.

If you’re a C programmer, objectsmay be a new thing for you. Think ofthem as instances of data types. Theobject class is similar to a structuredefinition. When you instantiate anobject of a class, the system will allo-cate memory for the data that is de-fined in the class. The data elements ofa class are called fields. There are a fewdifferences between structures andclasses, of course, but these are gravy.

One difference is that, as part of thedata type, a class lets you define spe-cific functions you can use to operateon the fields that are stored in an ob-ject. These functions are called meth-ods in Java. So, a class is the definitionof an object type, which is made up offields and methods.

Another twist is that you can makefields and methods invisible outside ofthe class. This way, if you instantiatethe object, the implementation detailsdon’t have to be visible to the applica-tion. All the application has to know is

Listing 1—This shows an example of a common mistake you can make in C that would not passthe Java compiler.

$

%&

#%&

%

"


that there is an object and each in-stance is a reference to the objects thathave been allocated in memory.

Furthermore, you can inherit fieldsand methods from another class. Thislets you build complex classes fromsimpler classes. Also in Java, a collec-tion of classes that belong together canbe bundled into a package. Two ex-amples of packages are ,

basic classes made up of expectedStrings, and , classes that areused to perform I/O on the system.

Let’s look at some examples of Javaand how this might work. In Java, ev-ery software module is a class. Even ifyou want to code a simple ' ( application (see Listing 2), youhave to define a class. In this case, youhave to define ' .

Listing 3—This example shows that methods in Java can take different argument types and stillsort it out. The implementation of has several versions for the different types, and thedynamic linker figures out which one is called based on the signature.

)*

)

#%& +% ,--

"

"

The ' class is based on the class, which gives youthe ability to write top-level applica-tion classes. In this case, the ' class has only one method, main. Thismethod expects an array of Strings,which is the argument list from a com-mand line, and returns void. Noticethat in order to use Strings, you do notneed to import a class because thecompiler knows that Strings are de-fined in . The onlything the application has to do is callan output method to display the' ( String.

There’s something a little differentin Listing 3. Notice that in this ex-ample, I print integers as well asStrings using the same method. Themethods in a class are matched upusing signatures. The signature of amethod includes the return value, thename of the method, and the argu-ments expected. This means that thereare at least two different print methodsdefined in the class , pub-lic void print(String) and public voidprint(Integer).


LIGHTS ON, LIGHTS OFFI built a class and named it .',

(see Listing 4). The .' class has onlyone field, the state of the light, which Irepresent internally as simply on or off.The actual field is hidden from theoutside and only the methods, /0,/0##, and 1 , are visible.

If you want to use this class in anapplication, define the object to createstorage (see Listing 5). In this case, eachobject contains the state field, so it’ssmall. To use it, call the methods thathave been exported by the class.

Now you have a class that behavesas if there’s a light somewhere. In fact, Idefined several lights in this example.This is great for testing your systembefore you have the hardware to turnlights on and off. When the hardwarebecomes available, you can replace thesimulated .' class with one thatimplements talking to hardware. List-ing 6 shows that I changed the code toturn the lights on and off by making acall to poke, which is a native call thattoggles bits in a hardware port.

This illustrates one use for object-oriented programming. If you combinethe hardware components or sub-systems into objects, it’s easy to re-place objects that implement a testversion of the real thing. In Java, itbinds classes at run time, rather than atcompile time. So replacing the testfunction would simply involve copyingthe right class file into place, or chang-ing the search path to the class file.Whenever you start the software, itwill automatically find the appropriateclass and run.

If you’re seriously considering learn-ing Java and have some C or C++ expe-rience, I recommend reading Thinkingin Java, by Bruce Eckel, which is alsoavailable online. [1]

CLASS FILEAfter you’ve written a module in

Java, you need to compile it into a classfile. This is the object file format forJava and normally ends in extensions. A class file contains all ofthe information a module needs to beintegrated into another program orclass. It also includes what otherclasses and modules it depends on andhints to where the JVM can find them.

In the examples I’ve provided, theclass file that contains the mainmethod is the one you have to startwith. To run these programs, you needto tell the JVM to execute this module.Typically, Java hello will do the job.The JVM will load the class file andlook for the main module that matchesthe signature Void main(String []). Ifthere is no such method in the top-level class, it will not start. Whenfound, it will start executing the codefor the main method. When the JVMfinds a reference to a method or field, itwill look in the class files symbol tableto figure out where to find the class fileto resolve that reference. The JVM

then loads that class file and executesthe referenced method or accesses thefield. If an object of that class is instan-tiated, then the class will get loadedwhen the JVM allocates the memoryfor it, because it needs to resolve howmuch memory the object requires.

A CLOSER LOOKUp to now I’ve mentioned the Java

Virtual Machine in passing, so you’reprobably wondering what it really is.All Java class files contain machineinstructions for a fictional processor(virtual machine). Tim Lindhol andFrank Yellin of Sun Microsystems havewritten a document called “The Java

2

.'2

.' % ( .'

/0

/' 3 '

- -

/0##

/' 3 '

- -

"

"

Listing 5—This is the software-only implementation of the class. It uses a state variable totrack the state of a virtual light.

Listing 4—This is the main program for the Light example. It uses a class called , whichhas three interface methods, , , and .

2

.'

/0

%&

"

/0##

%4

"

# %%4

0##

0

"

"


Virtual Machine Specification,” whichdescribes the class file format and whateach instruction needs to do in order tobe compatible. [2]

The JVM is stack-based and hasseveral different classes of instructionsthat implement the basic data types(byte, char, int, short, long, float, anddouble). There are mathematical in-structions (add, subtract, etc.), logicalexpressions (and, or, xor), conditionaljumps, support for arrays, and instruc-tions to deal with field accesses andinvoking methods. The last class ofinstructions is the most difficult toimplement because fields and methodsare specified symbolically, and theinstructions that deal with them haveto contend with the dynamic loadingand linking of the classes.

NATIVE MACHINE CODEMany compilers produce intermedi-

ate code that can be interpreted orconverted into native machine code foran architecture. There are concernswhen you interpret intermediate code,as is true with naive JVMs. However,most Java instructions can be trans-lated into native machine instructionsefficiently. This code translation canbe done in several ways.

One way is to statically convert allof the code in a class into native ma-chine code. This is typically done with

2

.'

/0

4*56$4*56744&

"

/0##

4*56$4*568944&

"

# 4*56844& %% 4

0##

0

"

$

"

Listing 6—The hardware version of the class uses native C function calls to actuallytwiddle some bits on a hardware port. It has the same name and interface as the software-onlyversion, and the application picks up whichever one is the default in the current directory.

code that you don’t expect to changefor a particular architecture. The draw-back is that Java byte codes are denseand memory efficient. After translatedinto machine code, the code takes upmore space, which might be a problemif your device is memory-constrained.

Another way is to compile the codedynamically, because then the Javamethods are translated by the JVMinto native machine code before itexecutes them. This can be done whenthe class is first loaded, or even when amethod is executed the first time. Thelatter is sometimes referred to as just-in-time (JIT) compilation.

JIT COMPILATIONThere are problems with JIT compi-

lation as well. Obviously, there is aslowdown the first time a method hasto be translated before being executed.Some JIT-based JVMs try to analyzehow often a method is being calledbefore compiling it into native ma-chine code. The strategy is that there issome threshold when translating themethod, and running it in native ischeaper than interpreting it. Some JITswill optimize the code after it’s trans-lated. When translated, you have toallocate memory for the native code.

Performance-enhancing techniqueswere developed for desktop and serverapplications where you need the best


REFERENCES[1] B. Eckel, Thinking in Java,

Prentice Hall, Upper Saddle River,NJ, June 2000.

[2] T. Lindhol and F. Yellin, TheJava Virtual Machine Specifica-tion, Addison-Wesley, Reading,MA, 1997.

Ingo Cyliax is a computer and electri-cal engineer (BSCEE) and the founderof EZComm Consulting, which special-izes in embedded systems and FPGAdesign services as well as troubleshoot-ing. Ingo has been writing about vari-ous topics ranging from real-timeoperating systems to nuts-and-boltshardware issues for several years. Hecan be reached at [email protected].

performance. However, for embeddedand real-time applications, JIT andstatic compiling methods will get inthe way. The natively translated codeis likely to be bigger than the Java bytecodes. All of the Java instructions areexpressed in instructions that are 8 bitsin length. Also, because the machine isstack-based, many frequently-executedinstructions do not have operands.This makes Java byte codes dense.

Also, doing the dynamic compila-tions or JIT introduces uncertaintyabout timing. The first time a methodis invoked or a class is loaded, the sys-tem will go away and translate thecode. Some JITs will cache translatedcode blocks and throw out code if it’snot refreshed by executing oftenenough. Clearly, this won’t work forreal-time applications.

For those, you typically want tostatically pre-compile your methods tonative code or just use a JVM that doesnot perform JIT compilation on meth-ods that need to run in real time.

Fortunately, you don’t have to writeyour own JVM implementation, unless

you’re planning to run Java on a newarchitecture that isn’t supported.There are many commercial and non-commercial JVMs. Sun implementedJVMs on several architectures. IBM andAgilent have JVMs. On the real-time orembedded front, Newmonics and WindRiver have Java-based solutions. Thereare JVMs for 8- and 16-bit micropro-cessor platforms like Dallas Semicon-ductors’ Tini or Smart NetworkDevices. Finally, vendors like Ajile,Sun’s PicoJava, and Derivation Systemssell hardware Java solutions that ex-ecute Java byte codes directly.

ON THE HORIZONNext month, I plan to write about

some of the real-time issues in Java. Inthe third article, I will look at some ofthese JVM solutions in more detail.

As usual, when I introduce a newtopic, keep in mind that a single toolwon’t solve every problem. Just asthere are various CPUs and operatingsystems, there are various languages.

With this said, think of Java as yetanother tool you can put in your em-

bedded systems engineering toolbox tohelp you solve problems. Java makes iteasier to code large projects in a shorttime, but remember that there aresome performance issues. I


EPCApplied PCs

Fred Eady

Rabbit Season

With only one morepart to go in this se-ries, Fred wants tocover a few more ofthe Rabbit Ethernetproject details. So,this month he showsus how the boardhandles TCP/IP andeverything from col-lisions to pointers tocalling home.

rank Sinatra didit his way, and last

month, so did I. Inmy previous article, I

took a CS8900A-CQ Ethernet IC andstrapped it to the back of a Rabbit. TheEthernet IC and silicon bunny combi-nation worked well, although theintercard wiring between the Rabbitand my Ethernet development boardwas a bit tricky. As I pointed out ear-lier in this series, there’s more thanone way to force a Rabbit to generateEthernet packets. Move over Frank,there’s one more Rabbit coming to thestage in this magic show. Tell Alice tobring along her teacup in Wonderland,and let’s follow that silicon hare (wear-ing a TCP/IP stack over its ears) as ithops down the Ethernet trail.

f

Part 4: The Wonderful World of TCP/IP

DYNAMIC C PREMIERThe Rabbit 2000 TCP/IP develop-

ment kit is a single-board computerthat is based on the Rabbit 2000 mi-croprocessor. Although it comes outof the box with its own version ofDynamic C, I happened to have aversion of Dynamic C Premier that Iused instead. Dynamic C Premier iscontained on a single CD and is ac-companied by 1.2″ of excellent docu-mentation. The manual set consistsof an introduction to TCP/IP, TCP/IPhigh-level protocols, TCP/IP functionreference, and the Dynamic C Pre-mier user’s manual.

Everything for the Rabbit 2000hardware set is included with Dy-namic C Premier. All of the samplecode and libraries for the standardRabbit 2000 Basic development kit,the Rabbit 2000 TCP/IP developmentkit, and the Rabbit 2000 core moduleare shrink-wrapped into the package.

Dynamic C Premier also containsfull TCP/IP source code. Almost all ofTCP/IP’s rowdy friends are in on thisparty. ICMP and HTTP with facilitiesfor SSI, CGI routines, cookies, andbasic authentication are all rolled in.Also joining this magic show areSMTP, FTP, and TFTP (client andserver) capabilities. The Rabbit TCP/IP SBC wouldn’t rate without anEthernet interface, so, just to keep mehappy, the folks at Rabbit Semicon-ductor dropped Ethernet drivers forthe Realtek NE2000 chipset into theDynamic C Premier package.

As you read on, you’ll see that I’mgoing to use some of this TCP/IP andEthernet driver software magic to

Photo 1—This proves it. Theyreally aren’t invisible whenthey sit still!


implement a hare-raising Internetappliance that monitors its environ-ment and sends e-mails to inform youof the good, the bad, and the ugly.

The microprocessor bunny on theRabbit 2000 TCP/IP Development Kithops along at 18.432 MHz. A 10BaseTEthernet interface is about the onlything that logically differentiates theRabbit 2000 TCP/IP Development Kitfrom the Rabbit 2000 Basic Develop-ment Kit. All of the timers, I/O ports,and memory resources that come withthe latter are included in the formeras well. A still TCP/IP Rabbit is cap-tured in Photo 1.

REALTEK 8019ASUnlike your CS8900A-CQ-based

home-brewed Ethernet bunny, theRabbit 2000 TCP/IP Development Kituses the Realtek 8019AS. The Realtek8019AS is an inexpensive NE2000-compatible Ethernet IC that’s basedon the National DP8390 NetworkInterface Controller, which (like theCS8900A-CQ) provides all the mediaaccess control layer functions required

for transmission and reception ofpackets in accordance with the IEEE802.3 CSMA/CD standard.

In 8-bit mode, using DMA resourcesto transfer data from the NIC’s on-chipbuffer memory to the host micropro-cessor-controlled memory is not anavailable option on the CS8900A-CQ,but DMA is used to an advantage onthe Realtek 8019AS.

Within the Realtek 8019AS, theonboard FIFO and DMA channels worktogether to form a simple packet man-agement scheme that provides up to10-MBps internal DMA transfers.

A second set of Realtek 8019ASDMA channels is provided to get dataout of the internal buffer ring andinto the microprocessor-controlledmemory for processing. The first setof DMA channels is termed local andthe second set is called remote. LocalDMA channels burst data into andwithin the Realtek 8019AS. RemoteDMA channels help get data from theRealtek 8019AS NIC to the hostmicroprocessor.

Taking a look at Figure 1 and com-

paring it to the CS8900A-CQ imple-mentation in earlier articles, you cansee that physically the Realtek8019AS and the CS8900A-CQ hookup to external support circuitry in analmost identical manner. Just like theCS8900A-CQ, the Realtek 8019ASwas originally designed for majorEthernet applications in desktop per-sonal computers. 10/100 Ethernet ICscame along and washed out the use-fulness of the Realtek 8019AS as amajor desktop player. Because of itsaffinity to embedded-oriented 8- and16-bit microprocessors, the Realtek8019AS (like the CS8900A-CQ) foundits way into embedded applications.

ALMOST ONE AND THE SAMEThe Realtek 8019AS is controlled

through an array of on-chip registers.It isn’t CS8900A-CQ PacketPage tech-nology, but it’s logically identical.Again, just like inside the CS8900A-CQ, the Realtek 8019AS registers areused during initialization, packettransmission, and reception. There arealso registers for remote DMA opera-

Figure 1—An extra address line for the RealTek 8019AS’s internal register set is the major physical difference between implementing the RealTek 8019AS and the CS8900A-CQ.


tions on the Realtek 8019AS thatdon’t exist on the CS8900A-CQ. Byusing the Realtek 8019AS internalregisters you can perform the samelogical operations as when using theCS8900A-CQ’s PacketPage registersduring initialization. These operationsinclude defining the physical address,setting receive and transmit param-eters, and masking interrupts.

On the Realtek 8019AS, add config-uring DMA channels and hacking outtransmit and receive buffer ring areasto the aforementioned list. BecauseDMA is an integral part of the Realtek8019AS’s NIC-to-microprocessorinterface, there must be a controlmechanism or register to act as thetraffic cop for the dataflow betweenNIC and microprocessor memory andthe NIC-to-Ethernet interface. This isdone within the Realtek 8019AS viathe command register (CR), which isused to initiate remote DMA opera-tions as well as data transmission.

On the original CS8900A-CQ de-sign I presented, the interrupt linewas polled by the microprocessor to

check for receive activity. If you takea closer look at Figure 1, you’ll seethat the Rabbit 2000 TCP/IP Develop-ment Kit follows the CS8900A-CQlogic that polls the NIC interrupt line.Only the names have been changed toprotect the innocent. The micropro-cessor senses the Realtek 8019ASinterrupt via polling and consults theRealtek 8019AS’s interrupt statusregister (ISR) to determine what typeof interrupt has occurred.

The CS8900A-CQ generates inter-rupts in the same way, except theregister called the ISR on the Realtek8019AS is called the the InterruptStatus Queue on the CS8900A-CQ.Hmm…I wonder if they call the topon an IC a “bonnet.” (For those of youwho think us Southern boys aren’t“edumacated,” a bonnet is the com-mon term for a hood when cars arethe topic of discussion.)

COLLISION COURSEWe’re working with Ethernet here

and dealing with it is similar towatching a NASCAR Busch Grand

National Race. It’s a foregone conclu-sion that collisions are going to occur.So, the CS8900A-CQ and Realtek8019AS transmit packets are in accor-dance with the CSMA/CD protocol.Both the Realtek 8019AS and theCS8900A-CQ schedule retransmissionof packets on collisions up to 15 timesaccording to the truncated binaryexponential backoff algorithm. TheCS8900A-CQ datasheet calls this thestandard backoff algorithm. After youcut the transmit process loose, eachNIC is on its own until the transmis-sion cycle is aborted or completed.

Assuming that buffer memory isfree for the taking, transmitting pack-ets with the Realtek 8019AS entailssetting up an IEEE 802.3 conformalpacket in memory with 6 bytes of thedestination address, 6 bytes of thesource address, the data byte count,and data.

If you came in on the ground floorof this set of articles, you may recog-nize the above sequence of bits as astandard Ethernet packet. When therequired packet items are laid in by


the controlling microprocessor, thefollowing Realtek 8019AS registersare loaded with TPSR (the packetstarting address) and TBCR0 andTBCR1 (the length of the packet).

Finally, to initiate the transmis-sion, the PTX (transmit packet) bit ofthe Realtek 8019AS command registeris set. If the total length of theEthernet packet is less than 46 bytes,the Realtek 8019AS (and theCS8900A-CQ) will automatically padthe packet to avoid sending a runtpacket onto the network. If you’reinto research and debugging, by con-figuring the right bits in the rightregisters, you can tell either theCS8900A-CQ or the Realtek 8019ASto send the runt anyway.

All of the Realtek 8019AS acro-nyms I just exposed you to can beeffectively put into place by followingthe setup in Figure 2.

The CS8900A-CQ is a little differ-ent in that the on-chip buffer space isasked for and permission is granted toload the buffer area before any data istransferred for transmission. This iscalled a bid for buffer space in theCS8900A-CQ documentation. TheRealtek 8019AS documentationstresses that if the buffer ring area ofthe Realtek 8019AS is set up correctlyat initialization, there should never beany contention for transmit buffermemory during normal operations.

The transmission of data to theether is an area that both NICs (theCS8900A-CQ and the Realtek8019AS) run in parallel as far as opera-tion is concerned. Before jumping intothe ether, both NICs will check them-selves to see if they are receiving.

If the coast is clear, the CS8900A-CQ starts transmission of the 8-bytepreamble after a specified number ofbytes (5, 381, 1021, or all) are loadedinto the transmit buffer. Note thatthis threshold is determined by bitsettings in the CS8900A-CQ TransmitCommand Register.

The Realtek 8019AS uses its DMAchannels and FIFO to follow the pre-amble with valid data. The Realtek8019AS DMA bursts data to the FIFO,which is then serialized out asclocked NRZ data. Then, the Realtek8019AS DMA refreshes the FIFO

when the FIFO additional maximumthreshold is reached. The FIFO thresh-old is register-programmable.

In both cases, each NIC continuesthe transmission as long as the trans-mission byte count in the registers aregreater than zero. After all bytes aresent, the CRC is calculated by eachNIC and sent to complete the packet.

If a collision occurs during trans-mission, the transmission is stoppedand 32 ones (a jam sequence) aretransmitted to make sure the segment

knows a collision just took place. Thestandard backoff algorithm is ex-ecuted by the CS8900A-CQ and theRealtek 8019AS, and the transmissionis tried again. When the transmissionfinishes, both NICs have transmitstatus registers that can be queried asto how the transmission went.

Receiving data from the ether is asimilar process for each NIC, as well.Both NICs listen to the wire, sense acarrier, and start synchronizing withthe alternating ones and zeros pre-


amble. After the two consecutiveones of the SFD (start of frame de-limiter) are sensed, the preambleends and the NICs expect everythingbehind this set of ones to be validdata. Both NICs check the destina-tion address to see if the incomingpacket is addressed to them. If it’snot, then the packet is moved intobuffer memory and subsequentlydiscarded.

On the other hand, if the packetdestination address matches theNIC’s address filter setting (hashed orindividual), the packet is moved intomemory so it can be transferred tothe microprocessor memory for pro-cessing. If everything goes OK duringthe receive cycle, both NICs postreceive statuses in their respectiveregisters and generate an interrupt tolet the microprocessor know that anevent needs attention.

SOME POINTERSThe Realtek 8019AS differs from

the CS8900A-CQ in that the datacoming into the Realtek 8019AS isput into a receive buffer ring,whereas the CS8900A-CQ stuffs thedata into a flat predefined buffer area.The CS8900A-CQ method is validand there is nothing special about theRealtek 8019AS’s ring buffer. It’s aclassic circular, head and tail bufferscheme. The following four pointerscontrol the ring:

• PSTART—the beginningaddress of the buffer ring

• PSTOP—the address atthe end of the buffer ring

• CURR—the current pagepointer, which points tothe next available bufferarea for the next incom-ing packet

• BNRY—the boundarypointer, which points tothe next packet to beunloaded from the bufferring

The buffer ring size isdetermined by the bytesbetween PSTART andPSTOP. PSTART andPSTOP are loaded at initial-ization time. As packets

come in, the CURR pointer movesahead of the BNRY pointer around thering. If CURR reaches BNRY, thebuffer ring is full. All receptions areaborted and missed packet registers areupdated until this condition is cleared.The remote DMA channels help re-move packets from the buffer ring. Theoriginal datasheets tell you directlythat enough receive buffer memoryshould be allocated initially to avoidthe CURR = BNRY condition.

Each Realtek 8019AS buffer is256 bytes long. A valid received packetand a 4-byte offset are placed at loca-tion CURR. Buffers are automaticallylinked together to receive packetslarger than 256 bytes. When all thebits are in, the receive status register(RSR), a pointer to the next packet,and the byte count of the currentpacket are written into the 4-byteoffset. That’s basically how theRealtek 8019AS works.

To be honest, I was disappointedwith the Realtek 8019AS datasheetsand support. I realize that, as a profes-sional, I’m supposed to know aboutparts like the Realtek 8019AS. So far,I haven’t received an airline ticket toTaiwan for a personal briefing. So,how do other people learn how tomanipulate this IC? Besides, whodoesn’t know that NE2000 andDB8390 don’t correlate?

The Realtek 8019AS datasheet onthe web site leaves much to be desired.

It states, “This documents only thedifferences between the 8019 and the8019AS.” Nothing plus nothing equalsnothing. So, you know me, I fired offan e-mail to Realtek asking for moredetailed documentation. They pointedme to the DP8390 where I looked at aNational AT/Lantic datasheet. Thisturned on the gas but didn’t boil thewater. Not yet satisfied at this point, Iwrote a second note to Rabbit Semi-conductor. I figured if they were usingthis part, they had to know how tomake a go of it.

JUST LIKE MAGIC…Well, what do you think I ended up

with? If you guessed more DP8390stuff, you’re right! In the end, every-thing I received from all partiesproved useful. I found that the codeoffered in the National DP8390 data-sheets from Rabbit Semiconductorwas portable as well as informative.All of the datasheet material descrip-tions are posted at the end of thisarticle for your reading enjoyment.

I’m amused that, by putting theNE2000-compatible label on theRealtek 8019AS, you’re expected toknow what to do with it. Get a grip.I’m not caught in an episode of StarTrek here but if I was, my TriCordermust not be working. If you want tosee how the Rabbit 2000 TCP/IP de-velopment kit brings up its onboardRealtek 8019AS, just start some of theRabbit’s TCP/IP sample code and stepthrough it. When you do that, theRealtek 8019AS packet driver sourcecode will magically appear.

RABBIT PHONES HOMENow that you have a sense of what

the Rabbit 2000 TCP/IP developmentkit is and some knowledge of theRealtek 8019AS Ethernet IC that re-sides on it, let’s get on with a descrip-tion of the supporting hardware you’lluse to make it send e-mail.

If your Rabbit was in a briar patch(a remote location void of otherfriendly cottontails), and it sensedsomething was not as it should be, itwould most likely hop to the nearesttelephone to dial, connect, and log onto the ISP, compose an e-mail, and fireit off. When the e-mail was en route,

Destination address6 bytes

Source address6 bytes

Type length2 bytes

Data 46 bytes minimum

TPSR

TBCI

Figure 2—You can fiddle around with the transmit buffer’s locationand size on the RealTek 8019AS. This is fixed on the CS8900A-CQ.


your silicon bunny would return to itspost in the briar patch to listen forevents to write home about. The prob-lem is, there’s no modem includedwith the Rabbit 2000 TCP/IP develop-ment kit to help any of the onboardserial ports send data or e-mail viadialup through the Internet. And, ifthere was a modem, there’s no dial-upIP software to make the connection tothe ISP. Basically, it would be outthere collecting intelligence at Rabbitspeed with no way of communicatingits findings to Cottontail Central.

Well, I’ve been talking about devel-opment platforms from the beginningof this article. So, why not bring onone more development board to solvethe modem problem?

You’ll need a modem that can dowhat desktop modems do in an em-bedded size and price range. For the e-mail packets the bunny will besending, the modem you select won’thave to be fast, but it would be nice tohave all the goodies you’ve come tolove on internal and external personalcomputer models. For instance, theAT command set is a lot easier to dealwith than some obscure proprietarymodem command set.

Another factor is the location ofthe briar patch. Your modem shouldbe just as at home with a Texas jack-rabbit as it would be with a nosetwitching Aussie puddle jumper.Thanks to my peers at Circuit Cellar,the search for a suitable modem endedquickly; Jeff Bachiochi wrote aboutthe Si2400 in issue 117. Silicon Labs’V2.BIS ISOmodem is imitating aninvisible Rabbit in Photo 2.

ISOMODEMWhat you see in

Photo 2 is the Si2400evaluation board. Look-ing closer at the photoyou’ll notice two SOICpackages, the Si2400and the Si3015, sepa-rated by a couple oflarge SMT capacitors.Those four parts areessentially the wholeenchilada. Actually, theSi2400 ISOmodem is achipset that integrates aglobally programmable

telephone line DAA (direct accessarrangement). Normally, I list thingsthat are included with a whiz-bangpart. With the Si2400 ISOmodemchipset, the following are what youdon’t need to include:

• DSP data pump• modem controller• AFE (analog front end)• isolation transformer• relays

Photo 2—I’ve never heard bunnies babble, but if they do, the Si2400ISOmodem can send it over.

• optoisolators• 2- to 4-wire hybrid• voice codec

Now here are the things the Si2400ISOmodem chipset does without theprevious list of items:

• 2400-bps V.22bis• 1200-bps V.22, V.23, Bell 212A• 300-bps V.21, Bell 103• V.25 fast connect• V.23 reversing• security protocols including SIA• Caller ID detection and decode• DTMF tone generation and detection• UART functionality with flow control• globally programmable integrated

DAA with parallel phone detect,overcurrent protection, and capaci-

tive isolation• AT command set• integrated voice codec• call progress support• HDLC framing in hardware• PCM DATA pass-through mode• 3.3- or 5-V power


Fred Eady has more than 20 years ofexperience as a systems engineer. Hehas worked with computers and com-munication systems large and small,simple and complex. His forte is em-bedded-systems design and commu-nications. Fred may be reached [email protected].

SOURCESRealtek 8019ASRealtek Semiconductor Corp.(886) 3 5780211Fax: (886) 3 5776047www.realtek.com.tw

Si2400 ISOmodemSilicon Laboratories(877) 444-3032Fax: (512) 416-9669www.silabs.com

S-7600ASeiko Instruments USAElectronic Components Division(310) 517-7771Fax: (310) 517-7792www.seiko-instruments-usa.com

Ethernet development boardE D Technical Publications(800) 499-3387Fax: (321) 452-1721www.edtp.com

Rabbit 2000 TCP/IP development kitRabbit Semiconductor(530) 757-8400Fax: (530) 757-8402www.rabbitsemiconductor.com

RESOURCEApplication notesAN-475, AN-874, AN-93

Photo 3—Here are 48 pins of Internet Express.

MP

U in

terf

ace

Physical layerinterface

Network stack

PPP

IP

TCP

SR

AM

inte

rfac

e

SRAM 10 KB

RESETX

CLK

SD(7:0)CSPSXC86RSREADXWRITEXBUSYXINTCTLINT1INT2X

CT

SX

DS

RX

RI

RX

DD

CD

DT

RIX

RT

SX

TX

D

UDP

It’s a good thing Rabbits can’tspeak, or the Si2400 ISOmodemwould come as standard equipment onevery bunny in the field. Voice codecis nice, but I won’t be using it here.

The silicon bunny now has some-thing to say and a new modem, butwith the equipment it has now, itcan’t get any e-mail through to theISP. That’s because there’s no protocolin place to complete the ISP connec-tion. If you back up a bit, you’ll noticethat I didn’t mention anything aboutPPP (point-to-point protocol) being aRabbit virtue. Most ISPs use PPP fordial-up connections today. Well, Iheard on the QT that the Rabbit pro-gramming staff is close to releasingPPP code for the Rabbit 2000 TCP/IPdevelopment kit. Just in case itdoesn’t hit the streets by press time,I’ll revert to Plan B from the Floridaroom. Plan B entails the implementa-tion of Seiko Instruments’ S7600Aengine for the Rabbit (see Photo 3).

The iChip S-7600A in Photo 3 hasa TCP/IP protocol stack built into itshardware that provides TCP/IP func-tionality for small microprocessorsthat are unable to support a full TCP/IP stack and their application codingat the same time. The Rabbit is notincluded in the small microprocessorequation because it can indeed sup-port a full TCP/IP stack and carry onwith its daily duties as well. Rightnow, all you need the S-7600A to dois provide the PPP functionality thathasn’t been found in the Rabbit’swinter coat yet.

PLAN IN MOTIONAs you can see in Figure 3, the S-

7600A takes Rabbit data in and spitsPPP, TCP/IP, and UDP out the otherend to a modem interface. I’ll have toside with Steve Ciarcia on this one.I’d rather solder in the S-7600A thanwrite the TCP/IP stack code.

So, the plan’s set. The Rabbit 2000TCP/IP development kit includes thecapability to send e-mail using SMTPcode in the Dynamic C Premier librar-ies. The Rabbit 2000 TCP/IP develop-ment kit has ample I/O to senseevents that will trigger and send e-mails. Si2400 ISOmodem hardware isstanding by to affect the ISP connec-tion and the S-7600A is ready to fill inif you don’t get the PPP native codefor the Rabbit 2000 TCP/IP develop-ment kit before press time.

The ISOmodem evaluation board isavailable for $150. Considering theneed for FCC approval of the DAA, Iopted to purchase the eval board forthis development project. I held off onthe S-7600A evaluation kit because

$199 was too expensive for what itoffers in terms of hardware, and itassumes that an ISA interface wouldbe used for evaluation. I wasn’t readyto pay for an ISA FPGA I may neveruse. Seiko did stick a standalone S-7600A on the board but that wasn’tenough to sway me. So, next time I’llconstruct a nifty little S-7600A boardwe all can afford, send e-mail from theRabbit 2000 TCP/IP development kit,and again prove without a shadow of adoubt that it really doesn’t have to becomplicated to be embedded. I

Figure 3—There’snot much I need tosay about thispicture.

http://www.realtek.com.tw

http://www.silabs.com

http://www.seiko-instruments-usa.com

http://www.edtp.com

http://www.rabbitsemiconductor.com


Using a T6963 Controller-Based Graphics LCD Panel

FEATUREARTICLE

rWith the latest ad-vances in display-panel technology, youwouldn’t think thatBrian would have ahard time finding asimple 200 × 64 pixeldisplay. But, findingone that was easy todesign into a micro-based project had itschallenges.

odneyDangerfield’s fa-

vorite expression is “Idon’t get no respect,” a

sentiment I’m sure LCD panels wouldshare if they had feelings. Consumersown so many gadgets containingLCDs that they don’t give them asecond thought. Like most electronicsenthusiasts, I take them for grantednow that I’ve used so many $10 alpha-numeric LCDs. Had I not recentlyreplaced my laptop with one thatfeatures a 15″ XGA screen, I probablywould feel blasé about that LCDpanel, too.

When faced with designing an in-strument incorporating a modest-sizedgraphics LCD, this cavalier attitudeevaporates quickly. I recently de-signed an instrument that needed agraphics display with about 200 hori-zontal pixels and at least 64 verticalpixels. My first thought was that theremust be tons of small LCD panels lan-guishing in warehouses, the victims ofa technology moving so quickly thatmany products become obsolete al-most as soon as they are introduced.Unfortunately, many of those LCDpanels are hard to design into theaverage micro-based project.

In this article, I’ll outline the expe-rience I gained while designing a “se-rial backpack” that makes it easy to

Brian Millier

use a popular LCD panel in smallmicro-based projects.

THE PLAYERSI searched the Internet for available

devices and learned that they fall intothree broad categories including VGAand sub-VGA panels, NTSC videopanels, and LCD panels that containonboard controller and memory.

The first panels, VGA and sub-VGApanels, lack both an “intelligent” con-troller and raster memory. They aredifficult to use in a project containingonly a conventional microcontrollerbecause they must be constantly re-freshed and the timing is critical. Theyare designed to use with a dedicatedcontroller LSI chip and usually areinterfaced to a full microprocessor bus.I tip my hat to those Circuit Cellarauthors who have managed to use thistype of display with a PIC micro andstill have some CPU time left to douseful tasks. Although not for the faint-hearted, their advantage is that theycan be obtained on the surplus marketfor a song.

NTSC video panels are designed tobe driven with standard TV videosignals and have an aspect ratio similarto that of a standard TV screen. Theycome in all sizes from 0.75″ to about5″ and cost about $50 on the surplusmarket. These displays are best left toapplications where NTSC video is themain requirement. Incidentally, if youneed video with a text overlay, con-sider Decade Engineering’s BOB-II.

The third category, LCD panelsthat contain an onboard controller andmemory, generally use a CMOS LSIcontroller such as the Toshiba T6963,Sanyo LC7981, S-MOS SED133x se-ries, or Samsung KS0708. Unlike thedevices mentioned earlier, these unitshave onboard raster memory, whichmeans that they don’t require constantrefreshing. This relieves your project’smicro of a time-consuming chore. Ofthe three controller chips, the ToshibaT6963 has the best mix of features,availability, and documentation.

THE T6963 CONTROLLERThe T6963 controller isn’t an “in-

telligent” controller, so don’t go look-ing for a screen clear command, line


draw command, or automatic cursorhandling. Anything more involvedthan putting up a text character on thescreen requires firmware routines. Thegood news is that the T6963 has awell-designed memory architectureand instruction set that provide effi-cient text and graphics routines. Un-like some other controllers, the T6963has a character generator built-in.

An important feature is the screenmemory organization. The T6963 cancontrol a 64-KB SRAM directly. Com-monly this controller is used with 240× 64-pixel LCD displays and uses an8-KB SRAM, which provides at leasttwice as much memory as is neededto perform all functions.

Another significant feature of thiscontroller is that it uses separatememory areas for its text and graphicsbitplanes. You can, for example, erasetext without disturbing underlyinggraphics images. Or, you may drawdynamic graphics displays such ascharts and graphs without worryingabout clobbering text labels in thesame region of the screen.

The custom character generatorarea (CGRAM) is a third memory area.In normal text operation, the T6963

uses an internalROM-based charactergenerator for the first128 characters of itsfont. The last 128characters are definedby patterns that youmay download intothe CGRAMmemory.

There also is anexternal charactergenerator mode forusers who want todefine a complete256-character font.This could be usefulfor a foreign language,for example. Eightbytes per characher ofSRAM memory isneeded for this func-tion. That’s a maxi-mum of 2048 byteswhen using the exter-nal character genera-tor mode.

The location and size of each of thethree memory areas is user-defined. Inother words, you must send the T6963commands to set up pointers to thethree regions during the initializationphase of your program.

The T6963 controller is interfacedto the host via an 8-bit parallelinterface, using two registers,command/status and data. Toaccess the raster SRAM usingan interface such as this, youmust first issue a commandthat loads the desired memoryaddress. Then you must issuea command that reads orwrites to that address.

This requires quite a fewdata transfers just to read orwrite one memory byte. Tospeed up the process for se-quential memory accesses,there are two alternatives.There are data read/write com-mands that automatically in-crement the SRAM memorypointer every time they areexecuted, which eliminatesthe need to send a newmemory pointer for eachmemory access.

And, for large, sequential accessesto the SRAM memory, there is thedata auto mode. After the data autowrite command has been sent, youwrite a continuous stream of data tothe T6963 data register, which thenloads it into sequential SRAM loca-tions. Alternatively, using the dataauto read command followed by arepeated reading of the data registerwill return contiguous SRAMmemory data. To exit this mode,there is the data auto reset command.

Text is placed on the screen bycalculating the proper text memoryaddress to correspond to the desiredrow and column and then writing thedesired character code to this address.The only caveat is that the charactercodes that Toshiba uses are not theASCII character codes; instead, theT6963 uses the ASCII code minus 32.Because the first 32 ASCII charactercodes are not displayable control char-acters, there is logic to this decision.It frees 32 character codes in the up-per 32 locations of the ASCII 7-bitcharacter set.

Toshiba uses this area for accentedvowels and currency figures that areuseful in an international context. Itwould have made more sense to placethese international characters in thelower 32 ASCII code locations (in

1 FGND To bezel

LCD controller

U1

11 D011 D018 D75 *WR6 *RD7 *CE8 C/*D10 *Reset19 FS

DecoderU6

SRAMU7

Data addresscontrol bus

Bias circuit

ComU2

SEGU3

SEGU4

SEGU5

240 × 64 LCD panel

EL/LED backlight

CCFL backlight

64

80 8080

9 A

20 K

VFL

VFL

R18

R19

2 VSS

3 VDD

4 VEE

LCD driver control signals

Bias voltage for LCD

Figure 1—Although not labeled as such, U1 is a Toshiba T6963 controllerin the AZ Displays’ AGM2464D LCD panel.

Pinnumber Symbol Function

1 FGND Frame ground (0 V)2 VSS Ground3 VDD Logic supply (5 V)4 VEE Negative supply for LCD5 –WR Data write6 –RD Data read7 –CE Chip enable8 C/–D H—command/status

L—data9 A Backlight anode10 –RESET Controller reset11 D0 Data bit 012 D113 D214 D315 D416 D517 D618 D7 Data bit 719 FS Font select

H—6 × 8, L—8 × 820 K Backlight cathode

Table 1—The T6963 controller uses Intel-style interface signals.Note that the LCD backlight signals may differ on other displayswith different or no backlight.


Item

C/*D setup time

C/*D hold time

*CE,*RD,*WR clock width

Data setup time

Data hold time

Access time

Data output hold time

Item Condition Min. Max. Unit

tCDS Figure 100 _ ns

tCDH Figure

Figure

Figure

Figure

Figure

Figure

ns

ns

ns

ns

ns

ns

150

50

_

_

_

_

tDS

tDH

tACC

tOH

_

10

10

80

80

40

tCP, tRP,tWP

C/*D

tCDS tCDH

tCP,tRP,tWP

tDS

DO~D7(read)

DO~D7(write)

*RD,*WR

*CE

tDH

tOHtACC

Figure 2—If you are using a fast microcontroller, you should note theaccess time specification of the T6963 controller.

place of the control characters), thusleaving the commonly used charactersaccessible by using their proper ASCIIcodes.

An additional enhancement is avail-able when you select the text attributemode using the mode command, whichI’ll describe later. In this case, you losegraphics capabilities, but can writevarious bytes to the graphics bitplaneto modify the way each text characteris displayed.

Text cursor control is flexible, butmust be handled entirely by your firm-ware routines. You can define eightdifferent cursors from a simple under-line to a full block, turn on/off thecursor, or make it blink, all under pro-gram control. Unlike those alphanu-meric LCD modules you’ve probablyused, the T6963 does not move itscursor along with the text while youplace it on the screen. When you writea routine that places an ASCII textstring on the screen, you may want toinclude some code that places thecursor to the right of the lastcharacter displayed.

There are two methods ofplacing graphics data on thescreen. For bitmapped graph-ics, you can transfer thebitmap data directly to thegraphics area of the T6963SRAM. This is straightfor-ward; the T6963’s graphicsformat is similar to that usedby the Windows BMP imageformat. I’ll cover this in moredetail later.

The second method, thebit set/reset command, placesindividual points on thescreen and draws lines. Touse this command, you basi-cally convert the x, y (i.e.,Cartesian) coordinate of thatpoint into the proper graphicsmemory location, then usethe appropriate bit set/resetcommand to access the de-sired pixel within the 8-pixelrange represented by thatmemory location.

The T6963 is a slowCMOS device and it mustconstantly refresh the multi-plexed LCD screen (which is

connected through several ASIC de-vices). For this reason, it is not alwaysready to accept commands or datathrough its 8-bit parallel interface. Toaddress this, the T6963 supports astatus register, which informs thehost of its readiness to accept furthercommands or data.

The rule is: check the status regis-ter and wait until the controller isready before sending a command ordata byte to the T6963. The only com-plication is that both bits 0 and 1 ofthe register must equal 1 in order toproceed with any command or datatransfer except when using the dataauto mode. In the latter case, bit 2must equal 1 for data reads and bit 3must equal 1 for data writes.

Alphanumeric LCD panels havesimilar timing restraints. When usingthem, I often employed software delayroutines in lieu of the status checkingroutine to reduce the number of I/Olines needed (i.e., the –RD line) and toeliminate the need to switch the data

direction of the port used by the hostas the data bus. According to Toshiba,you don’t want to try that scheme onthis controller! In fact, to reinforcethat rule, Toshiba does not indicatethe amount of time to perform any ofthe commands.

T6963-BASED LCD PANELThe T6963 LCD controller chip can

handle LCD panels up to 640 horizon-tal pixels by 256 vertical pixels. Ichose AZ Displays’ AGM2464D panel(240 × 64 pixels) because it is reason-ably priced and supported by gooddocumentation in PDF format on thecompany web site. The following dis-cussion applies to the T6963 control-ler as it is configured in Figure 1.

If you’re doing prototypes or smallproduction runs, it’s important to notethat this display is easily connected toyour target board via a 20-pin ribboncable terminated in a 10 × 2 IDCheader. The unit contains a convenientLED backlight that requires only the

usual 5 V at about 220 mA.Units that use an electrolu-minescent (EL) backlightrequire less power, but need a600- to 800-V inverter circuit.

The T6963 controlleronboard the AGM2464DLCD panel interfaces to themicro via an 8-bit data bususing Intel standard controlsignals *WR, *RD, and *CSand register select line C/–D.Table 1 shows the pinout ofthe interface connector. Thenegative reset line may beleft unconnected, because thedisplay has an onboard powerreset circuit.

The interface timing isshown in Figure 2. Note thatthe T6963 has a 150-ns readaccess time, which is slowerthan the bus timing require-ments of many microcontrol-lers available today. Iexperienced intermittentproblems when I tried drivingthe T6963 using the Atmel90S8515’s (with a 7.37-MHzclock) extended SRAM busmode with wait states en-abled. When I compared


Atmel’s bus timing diagram tothe T6963, apparently therewasn’t a timing violation. But Iexperienced sporadic data errors.When using this mode, I couldn’tconnect my oscilloscope’s 10×probe to the *WR line withoutdisturbing the data transfers!

For this reason, I recommenddriving the T6963 using directport I/O (i.e., one 8-bit port for thedata bus and several lines of anotherport for the control lines). In thisconfiguration, the *CS line can be tiedto ground. Conveniently, because theT6963 is slow, using this slowermethod of I/O access does not imposea performance penalty.

SELECTING FONT SIZEThe T6963 controller chip has two

font select lines, FS0 and FS1. Theseare character width select lines, asthere is only one font built into theT6963’s CGROM and it uses a 5 × 7matrix for the actual character pattern.However, by changing the levels on theFS0 and FS1 lines, you can change thecharacter spacing to four differentwidths of five, six, seven, or eight.Note that the character size is alwayseight pixels vertically.

On the AGM2464D, the FS0 line istied low, reducing the choice of widthsto either six or eight. The remainingFS1 line is brought out to the interfaceconnector as the font select line.When tied to VCC it produces a charac-ter width of six andhas a width of eightwhen grounded.

It’s tempting tochoose the characterwidth of six becauseit closely matchesthe 5 × 7 characterfont and results in a40-character-widedisplay. However,there’s a side effectwhen using thegraphics displayfunctions. In thiscase, the graphicspixels are organizedhorizontally ingroups of six on thescreen. When dis-

playing bitmaps (and to a lesser ex-tent, drawing lines), the required algo-rithms are slower and complexbecause of this mismatch with the 8-bit byte size. For this reason, I config-ured the display for a character widthof eight pixels.

THAT PESKY VEE REQUIREMENTUnlike those alphanumeric LCD

panels that you’ve probably used,most graphics panels, including thisone, require both a 5-V (VCC

) and anegative (VEE) supply. The negativepower supply must have several im-portant characteristics. It must bestable, but adjustable over a rangefrom approximately –8 to –14 V.

Achieving optimum contrast re-quires a different VEE at different am-bient temperatures, so it is ratherimportant to be able to adjust VEE

using an accessible pot or softwarecommands. The VEE current require-ment is less than 3 mA.

Before you hook up your new LCDpanel to your bench supply, be awarethat power supply sequencing is im-

portant to prevent damage to theLCD. For this particular unit, it sim-ply means that VEE

must be kept offfor at least 20 ms after the VCC supplycomes up and must disappear beforethe VCC supply shuts down. Becausethe LCD unit draws little currentfrom the negative supply, using amulti-output bench supply almostguarantees that the sequencing re-quirement will be violated, causingthe LCD to be damaged. I didn’t takeany chances, and designed a VEE sup-ply based on a MAX749 digitally-adjustable LCD bias supply.

INITIALIZING THE CONTROLLERWhen the T6963 is powered, it is

not configured and the LCD panelattached will display nothing. Evenafter the controller has been properlyinitialized, the LCD panel still willdisplay garbage until the text orgraphics areas of the RAM is cleared.

When testing your new LCD panel,it makes sense to start by executingthe initialization code without thescreen clear routines. And then use

the resultant garbage-filled screen to adjustVEE

for the best con-trast. There is a nar-row range of VEE

thatproduces a readabledisplay.

As mentioned ear-lier, the T6963’s statusmust be checked priorto writing commandsor data. Therefore, inall of the followingcommand sequences,it is assumed that thestatus checking rou-tine precedes each bytesent. With that inmind, the data writeand command write

Figure 3—If you follow my initialization recommendations, the graphics memory organizationwill look like this.

2016

0

32

MSB LSB

240 × 64 data matrix

07 06 05 04 03 02 01 00 07 06 05 04 03 02 01 00

240 data

29

2045

64 DOT

UnusedRAM30, 31

2046, 2047

Step Command description Parameter 1 Parameter 2 Command byte

1 Set graphics home address LSB (0) MSB (0) 0x422 Set graphics area N (0x20) 0 (0) 0x433 Set text home address LSB (0) MSB (0x08) 0x404 Set text area N (0x20) 0 (0) 0x415 Set offset register (for CGRAM) M (02) 0 0x22

Table 2—This table outlines some of the commands used to initialize the controller. The numbers in parentheses inthe parameter columns are the specific values that I used.


routines that I wrote include the sta-tus checking code at the start of eachof those two routines.

The T6963 expects commands tobe sent to it with the data parameterbyte(s) preceding the actual commandbyte. Commands require 0 to 2 pa-rameter bytes. In the following com-mands, the parameters must be sentto the T6963’s data register and thecommand (last byte) must be sent tothe command register.

Now, I’ll outline the procedureneeded to initialize the T6963. If youwant to read more detail, check outthe T6963 datasheet and a detailedT6963 application note available onmy web site.

INITIALIZATION DETAILSThe first issue is setting the graph-

ics home address. Because graphicsoperations require the most calcula-tions, it makes sense to place thegraphics home address at RAM ad-

dress 0. That eliminates the need toadd an offset for graphics operations(see Table 2).

Setting the graphics area is thenext issue. Because the T6963 cancontrol many different sizes of LCDpanels, you have to tell it how manyRAM bytes are needed for each hori-zontal line. For a 240-pixel display,this is 30. As I’ll explain later, itmakes sense to use a value that is thenext largest binary number (32). Al-though this wastes some RAM space,it’s moot because the 8 KB used in theAGM2464D is twice as big as needed.

Then, you want to set the texthome address. You can place the textRAM anywhere, but I choose to placeit directly after the graphics RAM.The graphics RAM is 32 bytes × 64lines, or 2048 bytes, so the text RAMcan be placed at 0x0800. Check outTable 2 for the details.

Setting the text area is the nextstep. Like the graphics area, the text

area must reflect the LCDpanel size. For a given panelsize, the number of charactersdisplayed per line depends onthe font width selected. Forexample, using a 240-pixeldisplay and a character width of8 (as I configured theAGM2464D), the number ofcharacters per line is 30. Set

the text area (desig-nated N in Table 2) tobe equal to or largerthan this number.

Similar to the graph-ics area set command,it’s advantageous touse the next largestbinary number (32).Note again, this wastes2 bytes of RAM perline, but the routinesthat convert the textrow and column into atext RAM address aresimplified.

Now, let’s talkabout setting the dis-play mode. This com-mand sets up the waythe controller handlesseveral features. Thereare individual text and

graphics bitplanes, and you mustselect which one (or both) will beused. This command also controlswhether or not the text cursor is vis-ible and whether or not it will blink(see Table 3).

Table 4 states another commandthat sets up other aspects of the dis-play mode not covered by the preced-ing command. I use the value 0x80,which enables both text and graphicsbitplanes in OR mode, and uses theinternal CGROM for the first 128 textcharacters. The OR mode simplymeans that for a given screen loca-tion, if either a text pixel or graphicspixel has a value of 1, then the screenwill be black at that location.

The set offset register command(start address of CGRAM) is optional,needed only if you are defining cus-tom characters. The external CGRAMmode requires 8 bytes per character(2048 bytes total). Place the CGRAMstarting address at some location inRAM that will not interfere with thetext and graphics RAM area. Thisaddress is expressed differently thanthat used by both the text and graph-ics start setup commands. The valueM, shown in Table 2, is as follows:

M = 0 for 0x0000 startM = 1 for 0x0800M = 2 for 0x1000

Figure 4—The serial backpack comprises only four ICs, including the optional serial EEPROM, which stores only bitmap images.

Table 3—For the display mode command, I use the value of 0x9Eto enable text, graphics, and a non-blink text cursor.

D7 D6 D5 D4 D3 D2 D1 D01 0 0 1 GRPH TEXT CURS BLNK

If GRPH is 1, graphics bit plane is activeIf TEXT is 1, text bit plane is activeIf CURS is 1, cursor is visibleIf BLNK is 1, cursor will blink (if visible)


Photo 1—I developed a Visual Basic program to test the serial backpack. It alsodownloads bitmap images to the backpack’s serial EEPROM.

M = 3 for 0x1800

and so on up to the sizeof the RAM fitted on theLCD panel.

After the precedinginitialization has beendone, the LCD screenwill be active and filledwith random text charac-ters superimposed onrandom graphics patterns.To clear the screen, fillboth the text and graphicsRAM areas with zeros.This is the perfect placeto use the data automode. Check out the steps of thegraphics clear screen example (withgraphics screen at RAM address 0).

• set address pointer to graphics RAM start (0x0, 0x0, 0x24)• set data auto mode on (0xB0)• execute a loop that sends 2048 zeros to the T6963’s data register• exit data auto mode (0xB2)

Clearing the text screen is similar.Set the address pointer to the textRAM start address and write 256 ze-roes to the data register.

BASIC GRAPHICS OPERATIONSGraphics operations can be broken

down into two categories, vector op-erations (drawing points and lines) anddisplaying bitmap images. Let’s look atvector operations first. Because thebasis for all vector operations is theplacement of a point on the screen, I’llstart there.

A common operation for all vectoroperations involves converting an x, ycoordinate(s) into an offset in thegraphics RAM memory space. Fig-ure 3 shows the graphics screen mapwhen using a 240 × 64 LCD and thevalues recommended previously.

The offset into graphics RAM is(x\8) + 32 × y, where \ denotes an inte-ger divide. Because you want fastgraphic operations, you can speed upthis conversion by substituting inher-ently fast shift operations for theslower divide and multiply opera-tions. Thus, the offset into graphicsRAM is (x SHR × 3) + y SHL × 5,

where x SHR × 3 denotes three rightshifts on x (you may use an 8-bit shiftroutine) and y SHL × 5 denotes fiveleft shifts on y (you must use a 16-bitshift routine).

You use this offset with the setaddress pointer command to point tothe proper RAM location. The com-mand structure is LSB, MSB, 0x24,where LSB and MSB are the lower andupper bytes, respectively, of the offsetjust calculated.

Now that you’re pointing at thecorrect location in graphics RAM,you must determine which bit atthat location must be set or reset andissue the correct bit set/reset com-mand. To place a dot, use the follow-ing command:

bit set/reset = 0xF8 + ([255 – x] AND 7)

To erase a dot, use:

bit set/reset = 0xF0 + ([255 – x] AND 7)

DRAWING LINESUntil you’ve actually written the

code, it’s easy to assume that draw-ing lines wouldn’t be more difficultthan cobbling together some loopsusing the routine for placing pointsdescribed earlier. This is true only fortwo special cases—vertical and hori-zontal lines. Angled lines requiremore attention.

I’m not a mathematician. I’m surethere have been many algorithmsdeveloped to break down a line into aseries of points. However, knowingthe way that the T6963 controller

organizes its graphicsmemory and the best wayto use the AtmelAT90S8515’s instructionset, I designed my ownroutines that work effi-ciently.

The line draw routinepasses through the x, ycoordinates of the end-points of the desired line.First the endpoints arechecked to see if either ahorizontal or vertical lineis being specified; if so,jump to dedicated rou-tines, which are faster

than the ones for angled lines.For angled lines, the routine used

will depend on the angle of the line.In theory, you should divide the 360°into eight 45° zones. These zones canbe reduced to four if you allow thestart and end point of a line to beswapped in certain cases. This is rea-sonable because you don’t carewhether a line is drawn from start toend or vice versa. It draws so quicklythat the end effect is visually equiva-lent anyway.

Determining which zone the linefalls into is accomplished by compar-ing the absolute value of ∆x with theabsolute value of ∆y. If |∆x| is greater,then the proper quadrant is deter-mined by whether y0 or y1 is greater. If,on the other hand, |∆y| is greater, thenthe proper quadrant depends onwhether x0 or x1

is greater.Based on these four quadrants, there

are four specialized line drawing rou-tines:

∆x > ∆y and x1 > x0

and y1 > y0

∆x > ∆y and x1 > x0

and y0 > y1

∆y > ∆x and x1 > x0 and y1

> y0

∆y > ∆x and x0 > x1 and y1

> y0

The first and second cases have alarger ∆x. For the first case, calculate∆y⁄∆x and then step through all x integervalues from x0

to x1 while adding this

fraction to the y value each time. Dothe same for the second case, exceptsubstract the fraction.

In the latter cases, which have alarger ∆y, calculate ∆x⁄∆y and then stepthrough all y integer values from y0 to


y1, while adding (for the third

case) this fraction to the x valueeach time. For the fourth case,subtract the fraction.

In practice, the routines haveto be more complex to produceaccurate angled lines. You mustadd a bias of half the value of thefraction when calculating thecoordinates of the second pointdrawn. I used an integer divideroutine with scaling instead offloating-point routines to calcu-late the value of the fraction.Because of the routine I chose, Ihad to add the remainder of thedivision operation to the valuecalculated for the last point ofthe line.

The speed of these routines isso fast that the Atmel microcontrolleris waiting for the T6963 controller toreturn an unbusy status byte for asignificant portion of time. That is tosay, the T6963 is what limits theultimate line drawing speed. Actually,it takes the liquid crystal panel about100 ms to turn on a pixel and about

250 ms for it to extinguish, so theT6963 is not the bottleneck.

The line drawing routines used bysome serially controlled, intelligentLCD panels only erase a line properlyif you specify the starting and finalcoordinates in the same order inwhich they were specified in the cor-

responding draw command. Forexample, a line drawn from 0, 0to 99, 50 may not be erasedproperly if you pass the coordi-nates 99, 50 to 0, 0 to the eraseroutine. My line drawing algo-rithms do not suffer this short-coming.

DISPLAYING BITMAPPEDIMAGES

After text and vector graph-ics, the next most commondisplay task is bitmapped graph-ics. The T6963’s graphics RAMorganization lends itself nicelyto this, because it is similar tothe format used by WindowsBMP images. Although you cannot take a BMP file from a PC

and stream it directly into theT6963’s graphics RAM space, there isnot much preprocessing necessary.

Table 5 gives the locations in aBMP image file where the parametersrelevant to B/W images are located.The format of the bitmap array storedin the BMP file differs from what is

Photo 2—My prototype was built on a DonTronics SimmStickprotoboard, shown plugged into the matching DT003 motherboard.


needed by the T6963 in the followingthree ways.

First, BMP files store image datafrom left to right in groups of 4 bytes(long integers). For example, in animage that is 42-pixels wide, 5 byteswould be needed, but an additionalthree 0 bytes would be appended tothe datastream to meet the 4-byterule mentioned earlier. You have toweed out these extra 0 bytes whenthey occur in the BMP file.

Second, BMP files store data fromthe bottom of the image to the top,whereas the T6963’s graphics coordi-nate system has the origin at the top ofthe screen.

The third difference is that BMPfiles use a bit value of 0 to denoteblack, and the T6963 uses a bit valueof 1 for a darkened pixel.

Considering these issues, the datafrom the bitmap file simply can betransferred to the T6963’s graphicsRAM directly. The quickest way toaccomplish this is to set the T6963’saddress pointer to the correct loca-tion, enter data auto write mode, andsend one complete horizontal line ofthe BMP file data to the T6963, onebyte at a time. Then, add 32 to theaddress calculation performed earlierand send another line until the entireimage is transferred. When finished,turn off the data auto mode by issuingthe data auto reset command.

And, there’s a lot more to the BMPfile format. Microsoft publishes thefull specifications of the format.

SERIAL BACKPACKThe background for this article was

gathered while designing an instru-ment that required a graphics display.I first bought a commercial LCD that

had a serial data interfaceand an intelligent control-ler. Although expensive, itwas easy to use because ofthe serial interface andhigh-level commands.

What I needed that thecommercial unit lacked,was a function best de-scribed as an oscilloscopedisplay mode. In otherwords, I wanted to be ableto send a stream of ampli-

tude values to the LCD that would bedisplayed quickly from left to right.Then, without doing a full screenerase (which would result in a dimimage with an annoying flicker), Iwanted to be able to refresh the imagewith new data.

Doing this with commercial unitsinvolved keeping track of the old data,sending this data and a command toerase it one point at a time, then writ-ing new data to the display. Theamount of data that had to be trans-ferred over the serial data link wassignificant and the whole process wasslow. Additionally, the host had tokeep two complete sets of data, whichrequired more RAM than was availablein my microcontroller.

To solve this problem, I designedmy own serial backpack that workswith commonly available T6963-basedLCD panels such as AZ Display’sAGM2464D or Optrex’s DMF5005.This serial backpack performs thefollowing functions.

For the text, the serial backpackdoes cursor placement and attributes,writes ASCII string to the LCD, clearsthe text screen, defines custom fontcharacters, and adjusts LCD contrastand backlight. For vector graphics, itdraws and erases points, lines, rect-angles, and blocks.

The serial backpack performsbitmap graphics, too. It clears thegraphics screen, downloads thebitmap image to flash EEPROM, dis-plays the bitmap image fromEEPROM, and displays the bitmapimage from serial data input (seePhoto 1). More of its functions in-clude displaying and refreshing a dy-namic x, y coordinate data array.

Although the use of meaningful

Table 4—Here’s the structure of the mode command.

D7 D6 D5 D4 D3 D2 D1 D0 1 0 0 0 CG MC2 MD1 MD0

CG = 0, internal ROM character generator 1, external RAM character generator

MD2 MD1 MD00 0 0 OR mode0 0 1 XOR mode0 1 0 AND mode1 0 0 Text attribute mode


commands and ASCII strings for pa-rameters make it easy when using ahigh-level language, it results inlonger command/datastreams. Be-cause this device gets its commands/data through a serial data link, I de-signed a compact command/param-eter structure to speed up things.

Figure 4 is a schematic diagram ofthe backpack. I chose the Atmel90S8515-8PC microcontroller largelybecause it has 8 KB of ISP flashmemory, 512 bytes of RAM, and a fastRISC instruction set. The prototypewas built on a small DonTronicsSimmStick protocard (see Photo 2).The unit’s 5-V power supply andMAX232 reside on a DonTronicsDT003 mini-motherboard. The firm-ware is written entirely in assemblylanguage and uses about half of theavailable flash memory.

The design also includes an AtmelAT25256 32K × 8 SPI EEPROM. Thisnonvolatile memory can be used tostore up to 127 bitmap images. Theseimages can be quickly referenced bynumber and displayed using thebitmap display command. Becausemonochrome bitmap images lendthemselves to data compression, Iimplemented a run-length-encodingalgorithm in the EEPROM imagedownloading routine. It crams five to10 times as many images into theavailable EEPROM and is quicker toaccess a small, encoded image file fromthe serial EEPROM, even though ittakes a few additional CPU cycles todecode the RLE data.

A MAX232 is used for RS-232level shifting. One section of thedevice is used for the receive datafrom the host. The backpack imple-ments both hardware and softwarehandshaking, so two additional sec-tions are needed for the transmit dataand CTS signals.

Unfortunately, the MAX232’sbuilt-in negative charge pump doesn’tgenerate enough voltage for the LCD’s

VEE supply. Therefore, the next easiest

way to generate the VEE supply for theLCD panel is to use an IC that wasdesigned specifically for that purpose.I chose the Maxim MAX749 digitallyadjustable LCD bias supply. This 8-pin DIP plus six discrete componentsfits the bill nicely. It provides a regu-lated negative supply that can be setto the appropriate range via a singleresistor selection. The voltage thencan be adjusted in 64 steps from 33%to 100% of this value, under digitalcontrol. This is sufficient resolutionto provide good LCD contrast at dif-ferent temperatures.

The requirement that the VEE sup-ply be sequenced is easily addressedby keeping the MAX749 in shutdownmode for at least 20 ms after the logic(VCC) supply comes up. The two digitalcontrol lines used for digital voltageadjustment also are used to place theunit in shutdown mode. At powerup,the firmware in the AT90S8515 micro-controller holds the MAX749 instandby by holding these two digitallines low for the required 20 ms.

The value of R6 was chosen to pro-vide –7.5-V output with the MAX749’sinternal DAC set at mid-scale (defaultsetting). This voltage provides a rea-sonable contrast on the LCD panel atnominal room temperatures.

The AGM2464D uses an LEDbacklight that is controlled by soft-ware using a port pin on the microand an IRFD110 HexFet. The datainterface to the LCD panel is via a 2 ×20 header connector. The AGM2464Dand Optrex DMF5005 panels use thesame pinout, which may be commonto many 240 × 64 T6963-based panels.

WRAP UPIf you’ve considered using a graph-

ics LCD panel but were scared off bythe thought of a lot of programming, Ihope this article changes your mind.There is a lot of good information onthe ’Net, but it takes effort to find it. I

SOURCESAGM2464DAZ Displays, Inc.(949) 360-5830Fax: (949) 360-5839www.azdisplays.com

DMF5005Timeline, Inc.(310) 784-5488(800) 872-8878Fax: (310) 784-7590www.digisys.net/timeline

AT90S8515-8PCAtmel Corp.(714) 282-8080Fax: (714) 282-0500www.atmel.com

MAX749, MAX232Maxim Integrated Products(408) 737-7600(800) 998-8800Fax: (408) 737-7194www.maxim-ic.com

Programmed AT90S8515-8PC for the serial backpack

Brian MillierComputer Interface Consultants(902) 494-3709Fax: (902) 494-1310bmillier.chem.dal.ca

SimmStick protoboardDonTronicsFax: +613 9338 2935www.dontronics.com

SOFTWAREThe object code is available on the

Circuit Cellar web site.

Offset Size Type Description of the variable

1 2 Character “BM” to indicate that the file is a bitmap11 4 Long int. Offset into file of start of bitmap data19 4 Long int. Width of the image in pixels23 4 Long int. Height of the image in pixels

Table 5—TheseBMP file param-eters determine thesize of the imageand locate theposition of thebitmap in the file.

Brian Millier is an instrumentationengineer in Dalhousie University’schemistry department, Halifax, NS,Canada. He also runs Computer Inter-face Consultants. You may reach himat [email protected].

posted datasheets from the variousmanufacturers, application notes, andmore. on my web site, as well as thecomplete documentation package formy serial backpack. I

Author’s Note: I’d like to acknowl-edge Steve Lawther for his good appli-cation note on the T6963 controller.

http://www.azdisplays.com

http://www.digisys.net/timeline

http://www.atmel.com

http://www.maxim-ic.com

bmillier.chem.dal.ca

http://www.dontronics.com


Designing for Reliability,Maintainability, and Safety

e lectronic equip-ment designers are

familiar with the con-cepts of reliability, main-

tainability, and safety (R/M). But sadly,we use these disciplines much less thanthey deserve, especially in consumerproduct development. Many peoplefind them boring and think they chewup a big chunk of the precious develop-ment budget but add no value.

In this article, I want to show youthe basic steps in performing R/Manalyses and how they do add value toyour design. You’ll learn to appreciatetheir benefits and, hopefully, come tothe conclusion that your time is wellinvested using them. Not all manufac-turers spend the time to ensure gooddesign, but as you will learn, we couldbe buying better, safer products atlower prices if they did.

To be effective, the analyses mustbe performed concurrently and as anintegral part of product development. Iwill take you through development ofa hypothetical controller to illustratethe fundamentals of the R/M engineer-ing approach. I set up the project to

demonstrate the use of R/M tools, notthe design of a controller. So, somedesign decisions are tailored to exag-gerate R/M aspects. For the same rea-son, some issues have been simplifiedor omitted.

HAZARD ANALYSISLet’s assume you are contracted to

develop an electronic controller for ahot tub, to maintain its water tempera-ture at 100°F (±2°). It is heated by agas-fired burner capable of raising thewater temperature to the boiling pointquickly. The hot tub supplier (yourcustomer) should start the project byperforming a system hazard analysis.And, you should know enough aboutthe system to prepare your own. Therequirements pertinent to the sub-system become a part of the perfor-mance specification.

The simple analysis shown inTable 1 has only two potential failures.The failures and their permitted prob-abilities of occurrence per hour ofoperation are the result of a trade-offbetween requirements and cost.

With the possibility of personalinjury at hand, I’m not aware of anysystem where it would be considered anacceptable risk to allow such a designweakness. Performing the hazard analy-sis and implementing its results reducesrisk and shows reasonable care andprudent design in court (just in case).

The probability of 10–9 failures perhour of operation generally is acceptedas “never.” With the exponential faultdistribution, which is most popular inelectronics and yields a constant fail-ure rate, it represents a 50% chance ofexperiencing such a failure after about79,000 years of continuous operation.

The second line of Table 1 showsnoncritical subsystem failures whenthe hot tub is not as hot anymore. Youdon’t want such failures but can livewith them. Statistically, all compo-nents will fail at some point. As youwill learn during this design exercise,decreasing the probability of a failureis expensive. When there is no poten-tial for personal injury, the decisionboils down to the manufacturing costversus potential customer unhappi-ness, cost of service, warranty, main-tainability, life cycle cost, and so on.

FEATUREARTICLE

George Novacek

If you think that de-signing for reliabilityand maintainabilityare just steps in thedevelopment processthat eat up more bud-get without addingany value, then youmight want to listenup because Georgehas evidence thatproves otherwise.

Part 1: Getting Started


Imagine that the controller fails. Itdoesn’t matter why, now you have acustomer complaint and must send outa maintenance technician. If it happensduring the warranty period, you payfor the repair. With the probability offailure pegged at 10–5/h as a designgoal, you must expect a failure every100,000 h of the controller operation.But, suppose sales take off and duringthe next few years you sell 10,000units. If they run 24 h per day, you canexpect one failure every 10 h.

Is it possible to lower the failurerate by an order of magnitude? Onemillion hours of mean time betweenfailures (MTBF) would drastically re-duce the service cost but will be ex-pensive to achieve. What is the cost ofimproving MTBF compared to thesavings in maintenance and warrantycost? Will there be enough parts toservice the equipment several yearsfrom now, recognizing that the currentlife cycle of microelectronic parts ismerely five years or less?

Let’s say you must extend a five-year warranty. Based on the probablefailure rate, you can estimate the war-ranty cost. How will it affect profit-ability? R/M analyses also help makethese business decisions. Provided theyare performed concurrently with thedesign, their results are implementedin a closed-loop system for optimalresults. Discovering the laws of statis-tics after the product introduction tothe market may be revealing, but usu-ally too late.

RELIABILITY FUNDAMENTALSTo fully appreciate the aspects of

reliability, you need to review thefundamentals of failure prediction.Because electronic components can bemost often modeled by constant fail-ure rate (λ), which is the characteristic

property of exponential failure distri-bution, you won’t have to go into thegory details of statistical analysis. Themathematics will be straightforward.

Suppose a sample population of acomponent you are interested in istested while you record observed fail-ures, plotting them against time oftheir occurrence. You can plot thenumber of occurrences within given,short, time intervals to obtain a fre-quency distribution plot. Or, you canrecord the cumulative number of fail-ure occurrences against the time as youproceed with the test (cumulativefrequency distribution).

The cumulative frequency distribu-tion of the majority of electronic com-ponents will be exponential andresembles the curves in Figure 1. Themathematical model for the frequencydistribution is called probability den-sity function (PDF) and for exponentialdistribution is expressed as f(t) = λ × e–λt.The cumulative frequency of distribu-tion is modeled by the cumulativedistribution function (CDF):

where f(y)dy is a dummy variable ofintegration. For the exponential distri-bution, you can write the followingfunctions for PDF and CDF, respec-tively:

f(t) = λ × e–λt

F(t) = 1 – e–λt

where λ is a single unknown that de-fines the fault distribution. You cancalculate reliability (the number ofsurviving units) as:

R(t) = 1 – F(t)

Then, use this to arrive at the expres-sion for an instantaneous failure rate:

and solve for constant λ:

h t =λ× e–λt

e–λt =λ

A component following an exponentiallife distribution exhibits the sameprobability of failure in the next hourregardless of whether it is new or used.It does not age or degrade with use.Failure occurs at a constant rate, unre-lated to the hours of use.

This important, seemingly illogicalconcept allows you to gain equivalentinformation from testing 10 units for10,000 h as if from 1,000 units for 100h. It also means that the “impossible”10–9 failure is as likely to happen in thefirst 5 min. of operation as 114,000years from now.

If, based on observation, you sus-pect that the failure rate does dependon the time used, it may be because ofwear caused by improper derating orthe exponential fault distribution doesnot apply for this part.

A measure of reliability more com-monly used at the user level for irrepa-rable parts is mean time to failure

(MTTF). For componentswith exponential life distri-bution, this is a reciprocal ofλ. For assemblies where afailed component can bereplaced, MTBF is appropri-ate. For exponential fault,distribution also equals 1/λ.

Figure 2 shows the well-known bathtub diagram. Thediagram consists of three

Table 1—Even a simple hazard analysis matrix is an effective tool for identifying system weaknesses and potentialhazards that the design must eliminate.

0 1 2 3 4 5 6 7 8 9 10Time

Sam

ples

CDF PDF

Failure description Failure effect Maximum probability

The controller fails to turn off the A critical failure must not happen <10–9

burner when the water under any circumstances, personalreaches 102°F. injury may result.

The controller fails to turn During a noncritical failure, the tub is not <10–5

on the burner when the water useable and the system is no longertemperature drops below 98°F. available. Customer dissatisfaction results.

Figure 1—Probability density function (PDF) andcumulative distribution function (CDF) are the resultsof exponential failure distribution in components andare characteristic of the constant failure rate typicalof electronic parts.


Reliability Models For Electronic ComponentsThe calculated value is λp, which represents the predicted num-ber of failures per 106 hours. The reliability model for microelec-tronic circuits (ICs) states that:

λp = (C1 × πT + C2 × πE) × πQ × πL

where:C1 = die complexity; 0.14 for the PIC controller and 0.020 for the

regulator 7805.πT

= temperature coefficient. Assuming the junction tempera-ture Tj < 100°C for both ICs, it will be 1.5 for the PIC con-troller and 16 for the regulator.

C2 = a constant based on the number of pins. 0.0034 is used forthe PIC with 8 pins and 0.0012 for the 3-pin regulator.

πE = environmental constant. Assume the equipment will oper-ate in a “ground fixed” environment, a benign locationwith average ambient temperature of 25°C, not exceeding45°C.

πL = learning factor; 1 for ICs more than two years in production.πQ = quality factor. This is the most controversial coefficient.

For military screened components it is between 1 and 2, butclimbs to 10 for commercial components. Many criticshave established that the penalty for commercial, off-the-shelf parts is unrealistically high, especially when takinginto account modern manufacturing processes.

For diodes, the equation looks like:

λP = λb × πT × πT × πS × πC × πQ × πE

where:λb = base failure probability related to the construction; 0.0012

for switching and general-purpose diodes, 0.0030 for powerrectifiers, and 0.0013 for transzorbs.

πT = temperature coefficient; 3.9 for junction temperature Tj <70°C.

πS = is based on stress 1.0 for transzorbs and 0.054 for other di-odes in the system, provided they are not exposed to morethan 30% of their rated characteristics.

πC = contact construction factor; 1.πQ = 8.0 for plastic encapsulated devices.πE = environmental constant; 6.0 for the “ground fixed” environ-

ment.Next is the solenoid driver power MOSFET (less than 1-W dissi-

pation):

λP = λb × πT × πA × πQ × πE

where:λb = base failure probability related to the construction; 0.012 for

power MOSFET.πT = temperature coefficient; 2.3 for junction temperature Tj <

70°C.πA = 1.5 for switching applications.πQ = 8.0 for plastic encapsulated devices.πE = environmental constant; 6.0 for the “ground fixed” environ-

ment.Resistors’ reliability calculates as follows:

λP = λb × πT × πP × πS × πQ × πE


for RLR resistors and 0.0019 for thermistors.πT = temperature coefficient; 1.3 for resistor operation less than

50°C and 1 for the thermistor.πP = is determined by the dissipated power; 0.44 for 100 mW and

greater, 1 for 1 W, and 0.44 for the thermistor.πS = 1.1 for stress factor 0.4 (i.e., you don’t operate the device at

more than 40% of its rated characteristics). It equals 1.0 forthe thermistor.

πQ = 3.0 for devices without established reliability.πE = environmental constant; 6.0 for the “ground fixed” environ-

ment.Capacitors’ reliability is expressed as:

λP = λb × πT × πC × πV × πSR × πQ × πE


for fixed, metallic film capacitors and 0.00040 for tantalumcapacitors.

πT = temperature coefficient; 1.6 for capacitor operation lessthan 50°C

πC = a factor for capacitance; 0.81 for the fixed capacitors as-sumed to be 0.1 µF and 1.6 for the electrolytic capacitorsassumed to be 10 µF.

πV = 1 for stress factor 0.3.πSR = 1 for both types.πQ = 3.0 for devices without established reliability.πE = environmental constant; 10.0 for the “ground fixed” envi-

ronment.The next device on the list is the transformer:

λP = λb × πT × πQ × πE

where:λb = base failure probability; 0.022 for low-power transformers.πT = temperature coefficient; 1.4 for operation less than 50°C.πQ = 3.0 for devices without established reliability.πE = environmental constant; 6.0 for the “ground fixed” environ-

ment.And finally, for quartz crystals:

λP = λb × πQ × πE

where:λb = base failure probability related to the crystal frequency;

0.022 for 10 MHz.πQ = 2.1 for nonmilitary devices.πE = environmental constant; 3.0 for the “ground fixed” environ-

ment.The results of the calculations are tabulated in Table 2.


curves and three distinctareas. The quality curve ispredominant during theinitial life period and isoften referred to as infantmortality, which is the result of design,handling, or workmanship problems.

Infant mortality effects can be re-duced by using robust designs andmanufacturing process control. At theend of the manufacturing process, agood burn-in or environmental stressscreening (ESS) period will weed outthe majority of the failures. Productsshipped from the factory should bepast the infant mortality curve.

The useful life period is character-ized by stress-related failures, in otherwords, the MTTF or MTBF. The infantmortality curve can bottom out withina few days of “shake ’n bake,” but theuseful life period is counted in years.

Finally, at the end of their life, com-ponents begin to fail because of wear.Although common in mechanical sys-tems, well-derated electronic systemsseldom reach this period.

RELIABILITY PREDICTIONReliability prediction analysis re-

sults in definition of λp, the predictedfailure rate, which is expressed as anumber of failures per 106. This numberforms the basis for other R/M analyses.In most electronic systems with a con-stant failure rate, the MTBF and MTTFare the reciprocals of λp stated in hours.If a device is not repairable (e.g., a failedtransistor), MTTF is used. If it can berepaired, MTBF is used.

Because they attempt to forecast thefuture, reliability prediction methodshave been the subject of many heateddebates. Several schools of thoughtexist, each having adherents and just asmany opponents. This article is not theforum for getting into the thick of thedebate. I’ll discuss MIL-HDBK-217Fbecause it’s a widely recognized modelbased on the constant failure rate. [1]

Remember that you are extrapolat-ing historical or test data into the

future. This data is used as a yardstickfor performance evaluation and im-provement. So, you need to understandwhat the numbers mean and must notlook at them as the only objective.

Tweaking the numbers will be self-defeating. Field returns are always amore powerful statement of perfor-mance than statistical predictions.Both you and the customer need tounderstand that the failure probabilityof 10–6 is not a guarantee of a million-hour, failure-free operation. A failurecan occur in the first or one-millionthhour. On the positive side, reliabilitymodels are conservative, the equip-ment outperforms the statistics.

Let’s start with preliminary designof the controller. You’ll calculate itspredicted reliability, put it throughFMECA and FTA analyses, and reviewthe results (see Figure 3).

For this example, I used a CMOSPIC micro operating from a 5-V powersupply provided by a three-terminal7805 regulator. The water temperatureis detected by thermistor R3 and thegas-fired water heater is controlled bysolenoid valve (SV) L1. The valve con-tains an internal freewheeling diode tosuppress back-EMF kick and isswitched off and on by powerMOSFET Q1.

Diodes D2 and D4 clamp the analoginput of the PIC within 0 to 5 V toprotect the micro and prevent ESDdamage. One of the built-in ADCsreads the thermistor voltage. The sec-ond ADC monitors the currentthrough the solenoid valve to provideshort-circuit protection and to moni-tor operation. The valve is externaland supplied by the system integrator,so exclude it from the reliability calcu-lation; although, it is an importantplayer for safety. Tranzorb D5 protectsthe MOSFET driver against voltage

Life characteristic curve

Stress-related failures

Infantmortality

Useful life Wear out

Quality failures Wear-out failures

Time

Failu

res

Figure 2—The product life cyclecharacteristic curve is composedfrom the constant failure rate ofcomponents, increased at thebeginning by infant mortality and atthe end by wear. (This figure is notto scale.)


REFERENCES[1] U.S. Department of Defense,

Reliability Prediction of ElectronicEquipment, MIL-HDBK-217F,General Policy Series, no. 14T.

S.E.R. subcommittee, AutomotiveElectronics Reliability Handbook,Society of Automotive Engineers,Warrendale, PA, 1987.

P. Tobias and D. Trindale, AppliedReliability, Van NostrandReinhold, NY, NY, 1986.

U.S. Department of Defense, Elec-tronic Reliability Design Hand-book, MIL-HDBK-338, GeneralPolicy Series, no. 542.

U.S. Department of Defense, Main-tainability Program for Systemsand Equipment, MIL-HDBK-47OB,Washington D.C.: GovernmentPrinting Office, 1995.

SOFTWAREReliability calculations are avail-able on the Circuit Cellar web site.

George Novacek has 30 years of expe-rience in circuit design and embeddedcontrollers. He currently is the generalmanager of Messier-Dowty Electronics,a division of Messier-Dowty Interna-tional, the world’s largest manufac-turer of landing-gear systems. You mayreach him at [email protected].

tial distribution, assuming a constantfailure rate. All the values and calcula-tions come from that source. Read the“Reliability Models for ElectronicComponents” sidebar for the details.

Immediately it is apparent that theresulting failure rate and MTBF ofnearly 300,000 h satisfy the 10–5 sys-tem availability requirement thecustomer defined in the hazard analy-sis. Can it be improved? Let’s take acloser look, because this would mini-mize future warranty claim costs,maintenance requirements, and im-

prove customersatisfaction.

All compo-nents are wellderated (i.e.,working at lessthan 30% to 40%of their specifiedratings). Furtherderating willhave a minimaleffect on theirreliability im-provement. But,five componentshave failure ratesthat are greaterthan the rest: Q1,U1, U2, D5, and

Figure 3—Here’s my first run at the controller schematic diagram. The design issimple, straightforward, and appears to do what is needed.

transients, which could be the result ofthe freewheeling diode failure.

To calculate the predicted failurerate, you could use one of several ex-pensive programs. Some methods evenextract components and their operat-ing conditions out of the schematiccapture program, simplifying the chorethat generations of engineers haveperformed manually. My program is asmall circuit and performing the calcu-lation by hand will be good exercise.As stated previously, you’re using theMIL-HDBK-217F model with exponen-

T1. What can you do?Q1, U1, and U2 work at conserva-

tively estimated junction temperatureTj

= 100°C. Keeping the ambient oper-ating temperature at 27°C and withefficient heat sinking, Tj can be re-duced to 50°C. Heat is the reliabilitykiller and even a small reduction willhave a significant effect.

Transzorb D5 and diodes D2 andD3 conduct current during infrequenttransients only. You can reduce theircontribution by applying a duty cycle.Design T1 to run at a lower tempera-ture and you’ll improve its reliability.

Implementation of these steps willincrease the MTBF to 714,000 h (λp =1.4). And, for the remaining analyses,you will use the results of these calcu-lations to evaluate product safety. I

Component Description lp/106 hours MTTF

R1 Small resistor RLR 2.7936E-02 3.5795E+07R2 Small resistor RLR 1.0794E-02 9.2647E+07R3 Small resistor RLR 1.0032E-02 9.9681E+07R4 Small resistor RLR 2.7936E-02 3.5795E+07R5 Small resistor RLR 1.0794E-02 9.2647E+07R6 Current sensing resistor 6.3492E-02 1.5750E+07C1 Tantalum capacitor 10 mF 3.0720E-02 3.2552E+07C2 Tantalum capacitor 10 mF 3.0720E-02 3.2552E+07C3 Metallic capacitor 0.1 mF 1.9829E-02 5.0432E+07C4 Metallic capacitor 0.1 mF 1.9829E-02 5.0432E+07C5 Metallic capacitor 0.1 mF 1.9829E-02 5.0432E+07C6 Metallic capacitor 0.1 mF 1.9829E-02 5.0432E+07C7 Metallic capacitor 0.1 mF 1.9829E-02 5.0432E+07C8 Metallic capacitor 0.1 mF 1.9829E-02 5.0432E+07Q1 Power MOS-FET PD

= 1 W 1.9872E+00 5.0322E+05

U1 7805 regulator 1.1680E-01 8.5616E+06U2 PIC12C672 microcontroller 1.8160E-01 5.5066E+06D1 1A bridge rectifier 1.2131E-02 8.2436E+07D2 Signal diode (1N914) 1.2131E-03 8.2436E+08D3 Signal diode (1N914) 1.2131E-03 8.2436E+08D4 Signal diode (1N914) 1.2131E-03 8.2436E+08D5 Transzorb 2.4336E-01 4.1091E+06X1 Quartz crystal 1.3860E-01 7.2150E+06T1 Transformer 120 V primary 5.5440E-01 1.8038E+06Controller total 3.5691 280,180 h

Table 2—Preliminary failure rate calculation also indicates that you are on the right track and that thecustomer’s specification is achievable.


FROM THEBENCH

Jeff Bachiochi

Sharing Technology withMother NatureOut of State with an Internet-Compatible Cell Phone

Out ofthe of-fice forthreeweeks

meant that Jeffneeded a way tostay connected. Therecipe for success?Take one new cellphone, one old mo-torcycle battery….

find myselfspending more

and more time onInternet-related mat-

ters. I send and receive dozens ofe-mails every day (I hate the tele-phone). I use the instant messenger tokeep in contact with my family be-cause it’s easier to send a WindowsRealPopup message over the homenetwork than to locate someone inthe house.

By surfing manufacturers’ web sites Iget the most up-to-date news and infor-

i

mation about parts and products.Oftentimes I also find good applicationtutorials to help simplify designs. Databooks have gone from paper to theweb, some without stopping at theintermediate CD stage. I find searchengines indispensable for locating bothproduct and educational references.For me, the Internet truly is the infor-mation superhighway.

I use my local phone company asmy Internet service provider. BecauseI hardly ever travel out of state, alocal number is all I need to connectfrom home. This summer, a numberof events came together in such a waythat I was going to be away fromhome for three weeks straight.

The first week, I was going to CapeCod, Massachusetts with the familyfor some much needed R&R (rest andrelaxation). The second week, Iplanned to be in Arizona at Microchip’sMASTER conference for some hands-on E&E (experience and education).Finally, it was off to Camp JuneNorcross Webster in Connecticut forsome wilderness C&C (camping andcounseling).

All would have been fine if I didn’tneed to bring up a revised online ques-tion and answer forum I was moderat-ing at the time. I was handling it viae-mail, and the new application prom-ised to ease my burden and give theusers a real-time feel. The transitionperiod (while I was to be away) re-quired that old and new applicationsrun concurrently, which promised totake much of my time.

Photo 1—MyStarTAC is connectedto a Toshiba laptopvia Motorola’s con-nectivity kit. InternetExplorer dials out tothe cell phone con-nected as an externalmodem. My ISP getsme on the Internetwhere I can surf orconnect to get my e-mail.


Preparing to visit Massachusetts andArizona was not a big deal; I searchedaround for a free ISP, which offeredlocal numbers in the areas where I’d bestaying. I ended up using freei, whichgave me access to the Internet so Icould log on to Circuit Cellar’s e-mailserver on a daily basis. I could alsoaccess the FTP site and make dailyupdates of questions posed to myresearchers.

To read and answer e-mails andupdate the FTP site required one totwo hours per day. I found that lateevening was the best time for me toaccomplish this. The local Hyannisexchange on the Cape was never busy,however, the Phoenix exchange wasdifficult to get into. The third weekwas the most challenging.

PREPARATIONSOur camp wasn’t exactly far from

civilization, just free from modernhassles. Because the setting lackedpower, phone, and running water, freeInternet wasn’t going to solve theproblem of having no utilities.

First, I needed to overcome myloathing of the telephone. In fact, Ineeded to go a step further, all theway to cellular. Surfing the ’Net forcell phone manufacturers proved to bean empty venture, because few cellphones are Internet-compatible. Ofthose few that are, one stands outabove the rest, the Motorola StarTAC.

The StarTAC 7868W is the dual-mode, analog and digital, dual-bandCDMA cell phone with a host of pre-mier features. It operates at 800 and1900 MHz. Most of you have probablyseen this shirt pocket 4.4-oz. handsetthat provides digital talk and standbytimes of up to three hours over fivedays. The optional data connectivitykit coupled with my laptop providesfax capability, Internet access, and e-mail (send and receive).

The StarTAC also has mini-browser capabilities. This works forgetting weather information, but thefour lines of text definitely won’thandle any e-mail. I used my ToshibaSatellite laptop and a real browser forcomfortable communication, whichwould undoubtedly be necessary forextended periods of time.

Looking at the talk and standbytimes of the StarTAC, I expected to getalmost two days worth of work (basedon two hours a day) before the phonewould need recharging. My laptopwon’t even make that. Something hadto give because there aren’t any outletson trees.

The cell phone has a car adapter forcharging, but I’d be miles from anycars. Fortunately, I just replaced mymotorcycle battery and haven’tbrought the old one to the recyclingcenter yet. There wasn’t enough energyin the failing plates to start an engine,but plenty for what I needed.

A sealed lead acid battery is muchsafer to use in this situation! The cor-rosive electrolyte could leak from theunsealed unit if you tip the battery onits side. Lead acid batteries are capableof high currents and, if they areshorted, can cause serious damage byexplosion. Use breathable batterieswith extreme caution.

I designed a carrier for the batterywith a wide base that would keep itmore stable and give me a place tomount some components. The firstthing I did was wire up a fuse to thebattery. Radio Shack has a good assort-ment of fuses and cigarette lighteraccessories, including sockets andplugs. Mounting a cigarette lightersocket onto the battery carrier madeconnecting to the battery safe and

convenient. I measured about 25-mAcharging current for the cell phone,which could receive a full charge over-night. This calculates to about 3 Whper charge.

DC/AC/DCMy Toshiba laptop, on the other

hand, is not the perfect match. It cannotbe directly powered or charged from a12-V battery. Its AC plug-in powersupply creates 15 VDC for the laptop’sDC input. The simplest way to convertmy 12-VDC source into power for thelaptop would be to use a vehicle powerinverter, which produces 110 VAC, andthen use the laptop’s own power supply.Although simple, it seems inefficient(see Tables 1 and 2).

With the laptop on and idling, itdraws ~21 W of power. The powerconversion circuitry requires 26.7 Wand the idle efficiency is ~78%. Withthe power to the laptop off, the charg-ing current is ~0.5 A (~7.5 W at 15 V).The power conversion circuitry re-quires 11 W. The charging efficiencyis 68%. We’re only talking about ahandful of watts here. Is efficiencyreally important?

If you paid attention last monthwhen I discussed photovoltaic cells,you’ll remember that the panel outputabout 3 W in direct sunlight. If Iwanted to use my laptop for twohours each day, I would require 42 Whof power. That’s 14 h of solar panelcollection (100% efficiency). Theefficiency of the converter wouldrequire 53.4 Wh, or 18 h of solar panelcollection. I don’t know about you,but, I don’t get 18 h of sunlight in oneday. So, how much power can besaved by directly converting 12 VDCto 15 VDC?

Average idle Current measured Watts

12-VDC source 2.28 A 26.715-VDC laptop 1.4 A 21

Table 1—When my laptop is running off its 110-VACpower supply it requires 21 W. If that supply isplugged into a 12-VDC-to-110-VAC inverter, the12-VDC battery must supply about 27 W for bothitems.

From battery

+12V 3L1

422µH

EXTV+FB

SHDN

CSGND

AGNDREFR1

10.0k

43

C5

100pF

C1

470µF

C2

0.1µF

R290.9k 1

28765

C30.1µF

R30.025 ohms

T1NDP603

D1

1N5821

+15

To laptop

C4

470µF

GND

MAX1771

Figure 1—The DC/DC converter booststhe 12 V from mymotorcycle batteryup to the 15 Vneeded to run orcharge my laptop.Efficiency is ~95%,as opposed to ~68%using an off-the-shelf DC/AC inverterand the laptop’s AC/DC external powersupply.


VIN (3 V), circuit efficiencies can reachover 85%.

RECHARGERecharging the cell phone was less

of a hassle. After I placed a cigarettelighter accessory socket on the batterycarrier, I was able to plug the cellphone’s car adapter charging cabledirectly into my portable storagesource. It uses a simple zener diode as aregulated 5-V charging source for thecell phone. The draw on the portablebattery is ~350 mA of initial chargingcurrent. The phone will fully rechargein about 5 h with a final current drawof ~50 mA, so the portable battery isrequired to supply ~9 Wh for a totalcell phone charge. The solar array’soutput was ~3 W, which is about 3 hworth of sun for a total charge.

BENDING THE RULESThe issue of total portability is

based on a delicately balanced system.If you can’t gather enough energy fromthe sun (or another source), then even-tually your portable power source willbecome depleted. If your portablestorage system can’t sustain usage oversome nominal period of time (i.e., totake care of cloudy days), then you willalso run out of electrons. If you happento be located in an area outside of cellcoverage, it doesn’t matter how muchpower you’ve collected; you’ll never beable to make that all-important con-nection.

During my three-week stint awayfrom the office, I received close to 300legitimate e-mails, above and beyondthe normal amount of spam. I only lostthe cell connection a total of threetimes over many hours of continued

service. Although there was noactual cell time charge, the invest-ment can’t be overlooked. If youalready have a web-capable digitalcell phone and you disregard thecost of the phone itself, then theInternet connection costs for my

DC/DCA boost-mode switching regulator

could provide the 15 VDC necessary torun my laptop from the 12-VDC bat-tery. Maxim’s MAX1771 looked like itcould do the job for me. This devicehas a fixed 12-VDC output or can beadjusted using a resistor divider on theoutput (see Figure 1).

Coil current through L1 is allowedto flow when the external MOSFET isenabled by the ’1771. This currentramps up, generating a magnetic fluxin the coil’s core until either the cur-rent reaches a maximum (set viasense resistor R3) or the ON-oneshottimes out (~16 µs). At this point, anOFF-oneshot triggers and the externalMOSFET remains off for ~2.3 µs. Cur-rent no longer flows through the coilbecause the MOSFET now looks likean open circuit. Therefore, the coil’scollapsing magnetic field causes thevoltage on the anode of the diode D1(1N5821) to rise until the current canflow through it, charging the outputcapacitor.

To set the output voltage, a resistordivider combination is selected toproduce 1.5 V at the feedback input tothe ’1771. When the output voltagedips below the nominal level, so doesthe feedback input. An internal com-parator triggers the ON-oneshot againand the cycle starts over. The ’1771differs from the standard pulse widthmodulation (PWM) circuits in that itadds pulse frequency modulation (PFM)and dramatically reduces the operatingcurrent.

With the DC/DC converter in Fig-ure 1 attached to my motorcycle bat-tery, I connected four 1-Ω, 10-Wresistors as a load. The battery’s powerwas measured at 36 W and the loadcurrent was 34 W (see Table 3). Theefficiency of this circuit is ~94%.Charts in the Maxim datasheet sug-gest that this is about right. Betterefficiencies are where the VIN is closestto the VOUT. However, even with lower

phone were around $150 for theStarTAC interface cable and laptopsoftware (Mototola’s connectivity kit).There is an $8 monthly access fee, inaddition to the monthly cell phoneplan, for allowing you to make a con-nection to your ISP, via the cell phonesystem.

While camping, I tried keepingthis Internet thing a secret by loggingon in the late evenings (see Photo 1),after the boys had retired for thenight. Some of the rules we havewhen we go camping are no radios,CD players, or Gameboys because wewant the kids to experience thebeauty of nature without outsidedistractions. However, one morningmy son was asking some rather in-quisitive questions. It seemed thatin those nighttime hours he heardthe familiar beep of Windows load-ing. The jig was up. He was contentwith my explanation of need andnever mentioned it again. Theadults, by comparison, were con-stantly badgering me to send e-mailsto their wives. It just goes to showya’, sometimes it’s tough to tell themen from the boys. I

SOURCESStarTAC cell phones and

accessoriesMotorola, Inc.(847) 576-5000Fax: (847) 576-5372 http://www.gx-2.net/wwow/

phones.html

MAX1771Maxim Integrated Products, Inc.(408) 737-7600Fax: (408) 737-7194www.maxim-ic.com

Free ISPfreei(253) 796-6500Fax: (888) 841-9825www.freeinternet.com

Jeff Bachiochi (pronounced“BAH-key-AH-key”) is an electrical engineer onCircuit Cellar’s engineering staff. Hisbackground includes product designand manufacturing. He may be reachedat [email protected].

DC/DC Current measured Watts/h

12-VDC source 3 A 36 16-VDC load 2.1 A 34

Table 3—Using the circuit from Figure 1, a 4-Ω load has ameasured current of 21 A. Notice the high efficiency.

Charging only Current measured Watts

12-VDC source 0.92 1115-VDC laptop 0.5 7.5

Table 2—When my laptop is off and charging, therequirements are lower, but so is the efficiency.

http://www.gx-2.net/wwow/phones.html

http://www.maxim-ic.com

http://www.freeinternet.com


n owhere are thechips hotter than

on the ’Net, judging bythe action at this year’s

Hot Chips conference.Long-time readers know that Hot

Chips is one of my perennial summer-time favorites. The conference’s focusis bleeding edge, performance-at-any-price silicon, which is a fun changefrom the workaday chips I usuallycover. But, remember that when itcomes to silicon, today’s bleedingedge is tomorrow’s mainstream.

Hot Chips 12

SILICONUPDATE

Tom Cantrell

HotChips isone ofTom’s fa-vorite

conferences becausenothing is more excit-ing than the latest andgreatest. There’s noquestion that the hot-test new chips havepotential, but just howpractical are they?

NET WORTHSo it is with this season’s collection

of haute network processors. The sili-con wizards are banking on network-ing to fuel the next wave of demand,and I think they’re right on the money.

I don’t claim to have a better ideathan the next stuttering dot-com ofwhere this is all is heading, but itdoesn’t take a rocket scientist to un-derstand that the ’Net will be a hugedeal. The only way to find out whatthe ’Net is good for is to hook every-thing up and see what happens.

As you might imagine, hookingeverything up will swallow silicon at aprodigious pace. How many chips doesit take to rebuild, rewire, and retrofitthe global communications infrastruc-ture? It would take a lot, and they’dbetter be hot!

TO EVERY SEASONAs it was with signals, graphics,

and multimedia before, the network-ing crowd is demanding its own flavorof network processor instead of hav-ing to make do with a lash-up of bor-ing standard chips and messy ASICadd-ons.

The premise underlying special-purpose processors is that a perfor-mance advantage can be obtained overa general-purpose design. However,marginal improvement isn’t enoughto overcome standard chip inertia andeconomies of scale. The gain has to besignificant (at least on the order of 2×)

Figure 1—The C-5 sets the stage for the next generation of computing with plenty of processors and a lot ofmemory on a single chip.

Table look upunit

Fabricprocessor

Queuemanagement

unit

Buffermanagement

unit

Channel processor 0

Channel processor 1

Channel processor 2

Channel processor 3

Channel processor 12




Pay

load

bus

Executive processorSerial interfacePROM interface

Ring bus

PCIinterface

CP cluster

CP cluster

Glo

bal b

us

SRAMtable storageand statistics

SRAMqueuestorage

Fabric

SDRAM

Frame/cellstorage

XPprgm./datastorage

ExternalhostCPU

Control logic (optional)

External PROM (optional)

PH

YP

HY

PH

YP

HY

PH

YP

HY

PH

YP

HY

16 c

hann

els


before it becomes compelling.Whether or not that happens dependson the characteristics of the applica-tion (i.e., how amenable it is to opti-mization and acceleration byspecial-purpose features).

Because I’m certainly no expert onthe inner workings of networking apps,the opening day tutorial, “Introductionto Network Processors,” by ChuckNarad and Larry Huston of Intel, was agreat way to kick off a conference thatwould see nearly a dozen new network-ing-related chips take a bow. [1]

Narad and Huston began by classify-ing particular networking tasks interms of the degree to which the pro-cessor must touch, or further compute,data and state (see Table 1). Tradi-tional routing and switching appsspend most of their time doing littlemore than forwarding data from A to B.On the other hand, new and antici-pated features such as QoS (Quality ofService) and VPN (Virtual Private Net-works) call for detailed inspection ofthe traffic and on-the-fly, data-drivendecision making. In any case, withbandwidth aspirations of 1 Gbps andbeyond, time is of the essence.

In answer to their own rhetoricalquestion (“Why not just use a GHz+Pentium?”), the instructors pointed outthat locality of networking apps ispoor. In what’s essentially a dataflowproblem, there’s usually little relation-ship between a packet and other re-cently processed packets.

What that means is that the fancycache schemes found on desktop chipsnot only don’t help much, but cachethrash may make matters worse. In-

deed, as wire speeds increase, a par-ticular piece of netgear more likelyaggregates traffic from disparatesources, exacerbating the problem.

Also filed under the heading of“Cruelty to Memory” is the fact thatnetworking data structures rarelycooperate with the DRAMs that arerequired to buffer all the bits alongtheir way. For instance, DRAMs workbest for power-of-two burst transferlengths, but the ’Net is littered withweird 53-byte ATM cells and Ethernetpackets of variable lengths. There arealso alignment problems as packetsmove up and down the stack, gettingencapsulated and de-capsulated alongthe way. For instance, Ethernet head-ers are 14-bytes long, and a higherlayer (such as PPP) may introduce itsown header jitter of variable length.

In short, networking applicationslend themselves to partitioning be-tween what’s referredto as control and data(i.e., forwarding)planes. Packets thatrequire connectionsetup (and tear down)and other significantstate modificationsare handled by thecontrol plane, andthe data planehandles the bulk ofthe byte blasting.

As you’ll see,various flavors ofsuch partitioning arethe primary differen-tiating factors thatrepresent an opportu-

nity for specialized processors tomake their mark. Now, without fur-ther ado, on with the chips.

BIG BIT BANGERSBy my count, there were a total of

nine networking chips presented atthe conference, mostly from newstartups jumping on the bandwagon.There’s no hope of covering each indetail, but a few examples will serveto demonstrate the major trends.

As usual, the most obvious trend isthat the hot chips are hotter thanever—bigger, faster, and more compli-cated. I must say, it’s a challenge tokeep my jaw from dropping when Isee a chip like the C-5 DCP (DigitalCommunication Processor) fromstartup C-Port (recently acquired bybig leaguer Motorola).

As you can see in Figure 1, C-5 startswith 16 individual channel processors.Each is a Harvard RISC with its ownmemory (24-KB instruction and 48-KBdata), boosted with networking in-structions and four register sets forefficient multitasking. Digging a bitdeeper (see Figure 2), you can see thateach channel processor incorporatesdual (transmit and receive) serial dataprocessors that are programmableenough to handle the many flavors oflinks that abound (10-/100-/1-GbEthernet, HDLC, ATM, SONET, etc.).

Then there’s the executive proces-sor that handles control plane stuffand host (PCI) interface. It functionswith the aid of four specialized

CPRC

(channel processorRISC core)

RxSDP(receive

serial data processor)

TxSDP(transmit

serial data processor)

Instructionmemory

Extractspace

Mergespace

Datamemory

Table look up(s)(TLU via ring bus)

Descriptorto QMU

Data DMA(BMU viapayload

bus)

Control

Data

Application Data touch State touch Compute Hybrid

Switching Low Low/Med Low/Med YesRouting Low Low/Med Low/Med YesQoS Low/Med Low/Med Low/Med YesStateful firewall Low/Med Low/Med Low/High YesProxy firewall Med/High Med Med DependsLoad balancing Med Med/High Low/Med YesCB load balance High Med/High Low/Med YesVPN High Med High *Virus detection High High High NoIDS High High High No

* Crypto processing requires a special processor

Table 1—Here are networking applications characterized by their data, state, and compute requirements.Many applications are hybrids where some packets involve a lot of processing to establish flow state, andsubsequent packets are simply forwarded. The most demanding apps must process all traffic content in real time.

Figure 2—Each channel processor in the C-5 has its own memory and addi-tional serial data sub-processors that handle the ones and zeros.


RISCs each, like the C-5, with fourcontexts so the chip can juggle 16packets in flight at once.

Similarly, you can see that network-ing chips (and others) are starting toincorporate a ton of memory. Con-sider the iFlow address processor fromSilicon Access Networks, bulked upwith a whopping 52 Mb of DRAM!

Finally (although not limited tonetworking chips, but highlighted bythe frenzy in that market), is theobservation that the fab-less chipcompany model is key for innova-tion. Many of the bleeding edge net-working chips announced were madeby foundries, with TSMC (TaiwanSemiconductor Manufacturing Co.)credited for more than a few. Thebottom line is that none of thesehopeful startups would ever have achance of raising enough money tomake chips this hot themselves.

PROCESSOR PROGRESSIONAlthough most of the action was on

the networking front, there were morethan a few interesting presentationsabout traditional processors and DSPsas well. Certainly, everyone with a PCpaid close attention to the Itaniumpresentations from Intel.

I don’t need one yet, but Itaniumlooks like the trailblazer on the fron-tier of computing know-how. Most ofthe obvious concepts (pipelining,superscalar, vector, VLIW, cache, bigregister sets) were discovered longago. For some time, CPU designershave faced a challenge going beyondthe basics, and gains have comemainly from clock rate, bus band-width, and more cache, not from ar-chitecture. Worse yet, does Itaniumstill have to run clunky DOS gamesand all the other ’x86 detritus gather-ing dust on old hard drives? That’s achallenge on the order of a moon shot.

The last time I wrote aboutItanium, I covered the basic conceptssuch as EPIC (VLIW) and predication(conditional execution) and decidedthat, being at the end of my architec-tural rope and all, Intel had probablydone as good a job as could be done.

It’s a simple (but showstopping)dilemma; gains achieved by increasingcircuit complexity are lost by the re-

coprocessors for table look up, bufferand queue management, and 3.2-Gbpslocal (fabric) bus.

That’s about 50 processors, over1 MB of memory, and three internalbuses with 50-Gbps bandwidth for atotal of 56 million transistors, alltidily packaged in an 838-pin BGA!

The C-5 single-handedly illustratesa collective wisdom when it comes tonetworking chips. First, it’s clear thatthe so-called chip multiprocessors(i.e., replicated processors on onepiece of silicon) are moving from thelab to the market. Another example,the Vitesse NPU, incorporates four

Heat pipe lid

Itanium processor organicC4 package

Cartridge basesubstrate

Edge connectorfor power delivery

Surface-mounted pin arrayconnection to system bus

4-MB cache organic C4 MCM

Custom L3 SRAM

Figure 3—As gains by architecture become more difficult to achieve, the Itanium cartridge indicates thatimplementation (cache integration, bus bandwidth, power management, etc.) will move to the fore.

CIRCUIT CELLAR®www.circuitcellar.com

ALU clusterALU cluster

ALU cluster

SDRAM

SDRAM

SDRAM

SDRAM

Str

eam

regi

ster

file

2 GBps 32 GBps 544 GBps

(Stanford) and David Patterson (UCB)at the helm, the pair responsible fortomes of near biblical stature (Com-puter Architecture: A QuantitativeApproach). [2]

Both the Bears and the Cards wereexpected at the conference, so I waslooking forward to seeing the latest bigchips on campus. And thankfully, Iwas not disappointed.

Pushing the chip multiprocessorenvelope, the Stanford team has aptlynamed its IMAGINE chip, foreseeing afuture with not just a few, but hun-dreds or even thousands of processorsonboard. Integrating 48 ALUs (8 clus-ters × 6 ALUs), it’s especially well-suited for media and signal processingapplications that have little data reuse(don’t want or need cache), are highlydata parallel (outputs independent ofeach other), and computationallyintensive (60 ALU operations permemory reference).

With conceptually unlimited pro-cessing power on tap, the challengefor such designs is interconnect.IMAGINE uses a three-stage hierarchycomprised of external synchronousDRAM (SDRAM), global register files,and local (to each ALU) register files(see Figure 4).

duced clock rate that results. There’sindication that Itanium is at the edge,with architectural gains just sufficientenough to offset the faster clock ad-vantage of a simpler design. Neverthe-less, it’s no secret that the term“Itanic” is heard in certain circles.

With the core architecture commit-ted, implementation will decide thefate. The Itanium cartridge (see Fig-ure 3) is a clear reminder that Intelreally is, and has been for some time,a system rather than a chip company.

The cartridge is also a reminder ofwhy Hot Chips is relevant, with theliquid-filled heat pipe recalling a HotChips presentation years ago by theAlpha crew. What’s hot today will beon your desk tomorrow.

SHADES OF CRAYThe presence of (not to mention

the rivalry between) two world-classuniversities has been key to the Sili-con Valley phenomenon. Whether onthe football field or in the computerlab, it’s a big deal when the Bears (UCBerkeley) and the Cardinals (StanfordUniversity) meet.

Between the two schools, they’veset the course for computer researchwith notables like John Hennessy

Instructioncache

MIPS645 Kc core

Data cache

DMA

JTAG

JTAG IF

SysAD IF

FPU

CP

IF

Flag unit 0

Flag register file (512 b)

Arithmeticunit 0

Vector register file (8 KB)

Memory unit TLB

Memory crossbar

Flag unit 1

Arithmeticunit 1

DRAM0(2 MB)

DRAM1(2 MB)

DRAM7(2 MB)

8 b

32 b

32 b 32 b

8 b Figure 5—Vectorsupercomputersaren’t new, andneither is DRAM.But, put ’em to-gether on a singlechip and you’ve gotsomething special—a VectorIRAM.

Figure 4—Taking multiprocess-ing to the “exStream” calls for alot of “IMAGINE-ation,” and alot of bandwidth.


Working with TI and Intel, theIMAGINE team expects to take the22-million transistor, 500-MHz, 10-GOPS bun out of the oven shortly.

By contrast, the Bears are pullingthe memory integration strategy outof their playbook. Their Vector IRAMis what it seems to be: a vector pro-cessor with 128 Mb of DRAM thrownin (see Figure 5)!

The MIPS+ vector coprocessor partof the chip is pretty much the equiva-lent of a traditional vector super-computer, a Cray-on-a-chip if youwill. In fact, the IRAM compiler withC, C++, and Fortran frontends is basedon well-regarded, time-proven Crayvectorizing compiler technology.

Because neither the vector registerlength nor the datapath width isexplictly defined by the architecture,processing power is arbitrarily scal-able. Once again, that puts the burdenon bandwidth, where the on-chip 7.5-ns (page hit), 256-bit wide DRAM and12.8-GBps per direction (load/store)crossbar switch shine.

I don’t know who’s going to winthe big game, but both teams are suregiving it the old college try, wouldn’t

you say? For grad students, that’s notbad at all. Maybe they’ll find sometime in between to get out and hit thebeach or a party or two, but I doubt it.

WORLD BEYOND CHIPSWhether it’s the world’s smallest

PC, phone, or web gadget, when itcomes to electronic stuff, smaller andcheaper is beautiful. Can silicon con-tinue to deliver the goods?

I recall a speech by Intel’s chairmanof the board, Gordon Moore, at a HotChips conference some years back. Heweighed in on the subject of a possiblesilicon wall, the proverbial end of theline for silicon. In a remarkably self-effacing way (for someone who’s defacto chairman of the Valley), hefrankly admitted that he was provenwrong in predicting a wall so manytimes that he finally quit doing it.

Whether or not there is a wall andwhen it might be reached is signifi-cant. When the price and performanceof electronic products quits improv-ing, falling to the ranks of things likeinsurance and hamburgers, then theparty will be over. The business willbecome boring, Silicon Valley will

Sensors

Passive transmitter withcorner-cube retroreflector

Active transmitter with laserdiode and beam steering

Receiver with photodetector

1–2 mm

Thick-film battery

Solar cell

Power capacitor

Analog I/O, DSP, control

Figure 6—If smart dust becomes a reality, I suppose we’ll all be using Dustbusters instead of emulators.

crash, and the entire economy willsuffer, losing the free lunch electron-ics-driven productivity gains nowthoroughly enjoyed.

Folks at IBM’s Almaden ResearchCenter aren’t twiddling their thumbswaiting for the ax to fall. Their presen-tation, “Towards Quantum Computa-tion: A 215-Hz, 5-qubit QuantumProcessor,” foresees a day when 1 bitequals 1 atom. [3]

I must say, the presentation was alittle over my head, what with all thebulk spin resonance, collapsing super-positions, Cooper pairs with Joseph-son junctions, and the like. But anybit-head would sit up and take noticeof a machine that could, for example,search a database a thousand timesfaster or, dig this, factor integers morethan a billion times faster!

Will it be ashes to ashes, dust todust for the wizards? Those guys whoare working on the “Preliminary SmartDust Mote” at UC Berkeley sure hopeso. Their goal: tiny autonomousmotes in 1 mm3 that proliferate likedust bunnies. Currently, they’ve been

Tom Cantrell has been working onchip, board, and systems design andmarketing for several years. You mayreach him by e-mail [email protected].

REFERENCES[1] L. Huston and C. Narad, “Intro-

duction to Network Processors,”HOT Chips 12: A Symposium onHigh-Performance Chips atStanford University,www.hotchips.org, August 2000.

SOURCESmC-5 DCPC-Portwww.cportcorp.com

iFlow Address ProcessorSilicon Access Networks(408) 545-1100Fax: (408) 577-1940www.siliconaccess.com

ItaniumIntel Corp.(408) 765-8080Fax: (408) 765-1399www.intel.com

[2] J. Hennessy and D. Patterson,Computer Architecture: A Quanti-tative Approach, MorganKaufmann Publishers, San Fran-cisco, CA, 1995.

[3] Almaden Research Center, IBM,“Towards Quantum Computation:A 215-Hz, 5-qubit Quantum Pro-cessor,” HOT Chips 12: A Sympo-sium on High-Performance Chipsat Stanford University,www.hotchips.org, August 2000.

experimenting with COTS (commer-cial off-the-shelf) prototypes and havedeveloped a capacitively actuatedreflector passive optical communica-tion scheme (see Figure 6) that’sachieved 118 bps over 150-m range atless than 1 nJ per bit.

I’m an optimist at heart. If andwhen the wall does loom, I have tobelieve the wizards will climb rightover it. May your chips stay hot, but ifthey don’t, meet me at the Hot At-oms, Hot Dust, or Hot something orother conference. I

Problem 2—Given a 16-bit digital signal at sample rate Fs,upsample it to a new sample rate of N×Fs by inserting N-1 zerosbetween each original sample. Then lowpass filter the result withan ideal sinc function to interpolate the N-1 samples. Next, ditherand quantize this signal to 8 bits using, say, simple rounding.Lowpass-filter this result with the ideal brickwall filter with acutoff frequency of Fs/2 Hz. Finally, decimate this result backdown to sample rate Fs by throwing away N-1 out of every Nsamples.

Quantizing a signal results in the same total noise power no matterwhat the sample rate is. If we assume the quantization noise iswhite, then the quantization noise power in any fixed bandwidth(say, 0 to Fs/2) will decrease as we increase the oversamplingratio N. This way we can reduce the 16-to-8-bit quantization noiseto an arbitrarily low level by using a sufficiently high oversamplingfactor.

Right?

Problem 3—The original Ferris Wheel was built for theColumbian Exposition in Chicago in 1893. It was the engineeringhighlight of the exposition and one of the most pervasive, lastinginfluences of the 1893 fair. The Ferris Wheel was Chicago’s answerto the Eiffel Tower, the landmark of the 1889 Paris exhibition. Thewheel was created by Pittsburgh, Pennsylvania bridge builderGeorge W. Ferris. Supported by two 140 foot steel towers, its 45-foot axle was the largest single piece of forged steel at the time inthe world. The wheel itself had a diameter of 250 feet, a circumfer-ence of 825 feet, and the maximum height was 264 feet. It waspowered by two 1000-horsepower reversible engines. Ithad 36 wooden cars that could each hold 60 people.

Assuming that just one car is packed with 60people that have an avereage weight of 150 lbs,how much torque is required in the worst caseto turn the wheel? How fast can the two 1000-hpengines turn it under these circumstances?

Problem 1—What does this function do? Why would you write a

function like this?

Problem 4—When was the IBM PC first available to thepublic? What was Steve Ciarcia doing at the time?

What’s your EQ?—The answers and 4additional questions and answers areposted at www.circuitcellar.com.

You may contact the quizmastersat [email protected].

8 more EQ

questions

each month in

Circuit Cellar Online

see pg. 2

http://www.cportcorp.com

http://www.siliconaccess.com

http://www.intel.com


his afternoon I had to choose among a few tasks. I could spend the afternoon dragging in firewood forthe solarium wood stove. I could fire up the leaf blower and try to finish moving tons of wet leaves before

they freeze tonight and remain like concrete until spring. Or finally, I could keep an already rescheduledappointment at the dentist. As you might guess, they all lacked a certain amount of appeal.Fortunately, before I had to choose among all these evils (I don’t know that I’d call any of them lesser), Rob reminded me

that I owed him an editorial. In truth, I didn’t really need an excuse. Any opportunity to discuss Circuit Cellar should alwaystake precedent over the dentist or getting a hernia lugging wood.

Seriously though, December is a good time to look back and discuss what we’ve attempted and what we’veaccomplished over the year. There was a time when we were simply a technical print magazine. Today, I’d like to thinkwe have become a whole lot more than that. The plan was always in place but it took the explosive expansion of theInternet to properly launch it. 1999 and 2000 have been banner years for Circuit Cellar. We’ve increased ourcirculation, increased our editorial content, and doubled the number of design contests. When many print magazinesare imploding, consolidating, or restructuring, Circuit Cellar continues expanding. The good news is that 2001 lookslike more of the same.

Because hardware applications are near and dear to my heart, design contests were a high priority this year. Itdidn’t hurt that big semiconductor manufacturers were lining up at the door for them either. We started the year withPhilips Design2K (8051-family), did a little parallel online tango with the Microchip PIC2000 contest (PIC-family), andare ending the year with the Zilog Driven to Design contest (Z180-family). In my mind, contests have very real sidebenefits for both entrants and readers. $100,000 worth of combined prize money in this year’s contests is ampleincentive for entrants to stay on their design toes and all the great application articles that come from contests are adelight for the readers.

If a $20,000 first prize looks like enough incentive these days, you’ll be happy to know that there is still time toenter the Zilog Driven to Design contest. If, on the other hand, you have always wanted to enter one of our designcontests but just haven’t seen your favorite processor yet, wait a minute—it will eventually show up! Before you burnup my e-mail in-basket, let me tell you that we are planning a couple more contests in 2001. The first one will be anAtmel AVR design contest starting in February. It will concentrate on AVR processors, FPGA’s, and Atmel’s newFPSLIC components.

The Internet continues to be an integral ingredient. Among our Internet achievements is the continued success ofCircuit Cellar Online and ASK US. I suppose we should really be calling it the Circuit Cellar e-zine or something sothat people aren’t confused and think it is the same as the print magazine you are reading. It’s not. Unlike many websites that simply post rehashed HTML print articles, Circuit Cellar produces a second 100% original-material onlinemagazine that is posted and available for free at ChipCenter.com (soon to be renamed eChips.com). If you like whatyou read here, make sure you check out www.chipcenter.com/circuitcellar/main.htm for a second dose of CircuitCellar each month.

Finally, it’s important to remember that everything you view at Circuit Cellar begins as an idea—and we don’t haveall of them. Many of the editorial products and ideas you see come from the readers. Our future depends on keepingthe ideas flowing. If you’ve got a great inspiration or just think that we need to have a little prodding now and then, tellme about it. If I’ve tried to emphasize one idea more than any other over the years, it’s that Circuit Cellar is acommunity and it’s the contributions from the community that keep it running correctly.

PRIORITY INTERRUPT

[email protected]

Continuing the Plan

t

the magazine for computer applications # … your eq 8 additional questions resource links • solid...

Documents