everything old is new again -...

Download Everything Old is New Again - s1.nonlinear.irs1.nonlinear.ir/epublish/magazine/Circuit_Cellar/Circuit Cellar...2 Issue 102 January 1999 Circuit Cellar INK® Everything Old is New Again

If you can't read please download the document

Upload: haquynh

Post on 08-Feb-2018

234 views

Category:

Documents


3 download

TRANSCRIPT

  • 2 Issue 102 January 1999 Circuit Cellar INK

    Everything Old is New Again

    TASK MANAGER

    o ut with the old, in with the newisnt thatwhat we tend to say at the beginning of a brandnew year? But this time round, it gives me pause. Its

    true that were starting a brand-new year, but its the lastone of the century, last one of the millenium, and thats an odd feeling.

    OK, OK, Yes, Im educated. Yes, Ive heard that the new milleniumdoesnt really start until January 1, 2001. But give me a break. Most peopledont think that way. Twelve months from now, everyones either going tobe wildly singing, Were gonna party like its 1999, or huddling under theirmattresses with their savings and flipping out over the Y2k bug. Were notgoing to be thinking too seriously about pronouncements from all thoseprescriptivists who are so determined that we say things 100% accurately.

    Well, at least Im not going to. Im more determined to celebrate thisspecial, once-in-a-lifetime event. Its the event Ive been waiting for since Iwas a child. The event Ive been waiting for since I could add up how manyyears old I was going to be when the calendar did its cartwheel into 2000.So, with the clock ticking down and only 12 months to go, my question is,how to prepare for it?

    In one respect, we (the publishing, editorial types) prepare for thesethings way ahead of time. For example, the Circuit Cellar 1999 EditorialCalendar has been set for many months already. But its only now that you,the reader, are starting to see it in action. Its only now that you are startingto see how Circuit Cellar is going to celebrate this special year.

    We start off with this issue on embedded processors, and when nextmonth arrives, well see some real-world applications of fuzzy logic. Andfollowing that, its a spring whirlwind of informative issues on automationand control, DSP, measurement and sensors, and communications.

    As we get into summer, youll be sitting in the sun, seeing whats cool inthe robotics field as well as the latest techniques and tools for developmentand debugging. September brings the embedded apps issue (just right forheading back to school), and in October, well be hearing more aboutsoftware algorithms. Getting back to dealing with the bumps and hassles ofthe real world, November deals with analog problems. And, in the last issueof 1999, the focus will be on embedded interfacing.

    Hey, wait a sec! All this doesnt sound so new, does it?! Uh-oh. Do youmean to tell me that for all this talk of brand-new year and a special way ofcelebrating the century, weve scheduled more of what weve been doing allalong? Hmm... Well, on second thought, thats probably the best possibleway for us to pay tribute to the years that have treated us so well. And, itsa good way for us to remind you that were committed to providing the samequality of editorial in the future.

    I hope you enjoy a productive 1999, and I wish you a happy new year!

    T H E C O M P U T E R A P P L I C A T I O N S J O U R N A L

    INKEDITORIAL DIRECTOR/PUBLISHERSteve Ciarcia

    MANAGING EDITORElizabeth Laurenot

    TECHNICAL EDITORSMichael PalumboRob Walker

    WEST COAST EDITORTom Cantrell

    CONTRIBUTING EDITORSIngo CyliaxKen DavidsonFred Eady

    NEW PRODUCTS EDITORHarv Weiner

    ASSOCIATE PUBLISHERSue Skolnick

    CIRCULATION MANAGERRose Mansella

    CHIEF FINANCIAL OFFICERJeannette Ciarcia

    ART DIRECTORKC Zienka

    ENGINEERING STAFFJeff Bachiochi

    PRODUCTION STAFFPhil Champagne

    John GorskyJames Soussounis

    PROJECT EDITORJanice Hughes

    Cover photograph Ron MeadowsMeadows MarketingPRINTED IN THE UNITED STATES

    For information on authorized reprints of articles,contact Jeannette Ciarcia (860) 875-2199 or e-mail [email protected].

    Circuit Cellar INK makes no warranties and assumes no responsibility or liability of any kind for errors in theseprograms or schematics or for the consequences of any such errors. Furthermore, because of possible variation inthe quality and condition of materials and workmanship of reader-assembled projects, Circuit Cellar INK disclaimsany responsiblity for the safe and proper function of reader-assembled projects based upon or from plans, descriptions,or information published in Circuit Cellar INK.Entire contents copyright 1999 by Circuit Cellar Incorporated. All rights reserved. Circuit Cellar and Circuit CellarINK are registered trademarks of Circuit Cellar Inc. Reproduction of this publication in whole or in part without writtenconsent from Circuit Cellar Inc. is prohibited.

    CONTACTING CIRCUIT CELLAR INKSUBSCRIPTIONS:

    INFORMATION: www.circuitcellar.com or [email protected] SUBSCRIBE: (800) 269-6301 or via our editorial offices: (860) 875-2199

    GENERAL INFORMATION:TELEPHONE: (860) 875-2199 FAX: (860) 871-0411INTERNET: [email protected], [email protected], or www.circuitcellar.comEDITORIAL OFFICES: Editor, Circuit Cellar INK, 4 Park St., Vernon, CT 06066

    AUTHOR CONTACT:E-MAIL: Author addresses (when available) included at the end of each article.ARTICLE FILES: ftp.circuitcellar.com

    CIRCUIT CELLAR INK, THE COMPUTER APPLICATIONS JOURNAL (ISSN 0896-8985) is published monthly byCircuit Cellar Incorporated, 4 Park Street, Suite 20, Vernon, CT 06066 (860) 875-2751. Periodical rates paid atVernon, CT and additional offices. One-year (12 issues) subscription rate USA and possessions $21.95,Canada/Mexico $31.95, all other countries $49.95. Two-year (24 issues) subscription rate USA andpossessions $39, Canada/Mexico $55, all other countries $85. All subscription orders payable in U.S. fundsonly via VISA, MasterCard, international postal money order, or check drawn on U.S. bank.Direct subscription orders and subscription-related questions to Circuit Cellar INK Subscriptions,P.O. Box 698, Holmes, PA 19043-9613 or call (800) 269-6301.Postmaster: Send address changes to Circuit Cellar INK, Circulation Dept., P.O. Box 698, Holmes, PA 19043-9613.

    ADVERTISINGADVERTISING SALES MANAGER

    Bobbi Yush Fax: (860) 871-0411(860) 872-3064 E-mail: [email protected]

    ADVERTISING COORDINATORValerie Luster Fax: (860) 871-0411(860) 875-2199 E-mail: [email protected]

    [email protected]

    EDITORIAL ADVISORY BOARDIngo Cyliax Norman Jackson David Prutchi

  • Circuit Cellar INK Issue 102 January 1999 3

    42 Nouveau PCedited by Harv Weiner

    47 RPC Real-Time PCEmbedded RT-LinuxPart 4: Developing Under Linux gcc/gdbIngo Cyliax

    56 APC Applied PCsIn the Face of MedusaPart 2: A Whole New SolutionFred Eady

    ISSUEINSIDE

    Multiprocessor CommunicationsPart 1: Methods for CommunicatingStuart Ball

    Developing a Custom Integrated ProcessorAnalyzing the Price/Performance TradeoffJoe Circello and Sylvia Thirtle

    Using Java in Embedded SystemsVladimir Ivanovic and Mike Mahar

    Music at Your FingertipsGuitar Effects via Remote ControlHank Wallace

    The PCL3013 Step/Servo Motor Controller in ActionGordon Dick

    I MicroSeriesTPUPart 1: A Coprocessor for Timing FunctionsJoe DiBartolomeo

    I From the BenchCan You Feel the Beat?Jeff Bachiochi

    I Silicon UpdateWires, Wires EverywhereThe RF SolutionTom Cantrell

    2

    6

    8

    12

    95

    96

    EMBE

    DDED

    PC14

    20

    2636

    6268

    76

    80

    102102

    Task ManagerElizabeth Laurenot

    Everyhting Old is New Again

    Reader I/O

    New Product Newsedited by Harv Weiner

    INK On-line

    Advertisers Index/February Preview

    Priority InterruptSteve Ciarcia

    Net Worth

  • 6 Issue 102 January 1999 Circuit Cellar INK

    READER I/OCOULD I SEE YOUR LICENSE, PLEASE?

    For some years now Ive followed Linux and con-sidered its use in embedded systems. I found the Em-bedded RT-Linux (INK 100) article to be both timelyand informative.

    However, an unfortunate error occurred whereBSD versus GNU GPL licensing was discussed. Al-though the BSD kernels follow a different licensingscheme, the Linux kernel is GNU GPL protected.

    Aside from the brief mention that componentsdeveloped for a GNU GPL-protected system dont fallunder the GNU GPL license, the problem concerns theuse of the GNU C library in embedded (especiallyROMed) applications.

    This problem has received a lot of attention re-cently because the GNU C library is protected under amodified license, the GNU LGPL (Library GeneralPublic License). The GNU LGPL states that if youdistribute binaries that are linked with the GNU Clibrary, you must make available linkable objects (orthe source) to your binaries so any downstream userscan recompile and/or relink your code with any up-dated GNU C libraries that are available. This way,downstream users arent locked into a particular revi-sion of the GNU C library that was linked into avendors closed-source application.

    Linking with a library was seen as producing a de-rivative work. This would have forced GNU GPL li-censing issues on the original code, making the GNU Clibrary almost useless for most commercial developers.

    So that the FSF software would get some use, theGNU LGPL was created. Object code generated by theGNU C compiler from your source isnt considered aderivative work. Therefore, using the gcc compiler togenerate code doesnt place you under the GNU GPL.

    GNU LGPL was designed to deal with desktopworkstation situations where the GNU toolset is in-stalled by default, enabling end users to relink an appli-cation (given the linkable objects) or use dynamiclinking with distribution-supplied shared libraries. But,an embedded system that boots from read-only mediaimmediately runs afoul of the GNU LGPL. Choosingan alternative library and runtime enables you to ROMan application without violating the FSF licenses.

    In the case of an embedded system that includesenough facilities (writable filesystems, and a consoleinterface or remote network attachment), supplyinglinkable application objects and the tools and instruc-tions necessary to relink the application (or use theshared libraries) should be sufficient to be compliantwith the GNU LGPL.

    This isnt meant to scare anyone away from freelyavailable software, but a little time spent reading andunderstanding licenses is time well spent.

    Dave [email protected]

    Youre right about the GPL issue. Because Imnot a lawyer, I wanted to concentrate on the techni-cal issues and just mention that, in contrast to Net/FreeBSD, Linux is GPL licensed. Also, Pat Villanidiscusses some of the issues in INK 9596.

    Anyone developing commercial products shouldconsult a lawyer for advice on legal issues about GPLlicensed code or license agreements of other codes.Because the situation may be different for eachproject, this is the best way to make sure the intellec-tual properties of the project are protected. Unfortu-nately, thats the way it is in this business.

    Ingo Cyliax

    IS THAT ALL I GET?I enjoyed the article by Alberto Ricci Bitti about

    the Graphing Data Logger (INK 99), but it left mewanting moremore description of the protocol thatthe FX9750G uses. Does the Casio FX7400G share thesame protocol? What links are available for describingCasio features? What links did Alberto find? Whatkinds of projects are Circuit Cellar readers undertak-ing with respect to the Casio/PIC combination?

    Gus [email protected]

    Editors note: Any thoughts on the topic? Wed loveto hear from you. Send any correspondence [email protected].

    Editors note: Thanks to James Horton for noticingthat the www.res.gatech.edu/~bdixon/rtlinux andwww.r52h146.res.gatech.edu/~bdixon/rtlinux URLsmentioned at the close of Embedded RT-Linux(INK 100) didnt work. Although they were currentwhen Ingo wrote the article, it doesnt take long forthings to get outdated. Now, theres an official RT-Linux site (www.rtlinux.org) with links to projects,documentation, and downloadable modules for dif-ferent Linux distributions.

  • 8 Issue 102 January 1999 Circuit Cellar INK

    NEW PRODUCT NEWSEdited by Harv Weiner

    MULTIPLE TAPE BACKUP UNITUltera Systems has announced plug-and-play

    mirroring controllers for producing two or morebackup tapes as quickly as one. The Imager series ofcontrollers appears to have a single drive or auto-loader but actually mirrors the data being backed uponto two drives or two autoloaders, running them attheir maximum recording speed. By cascading thedevices, a user can produce four, six, or more copiessimultaneously without a sacrifice in speed.

    The Imager series includes two models. Imager 1operates at up to 20-MBps burst rate over a SCSI I orSCSI 2 host channel, and records at up to 10 MBpsonto two individual drives or two autoloaders.Imager 2 runs at up to 40 MBps from the host and tothe drives and also supports the robotics forcontrolling tape libraries. Any SCSI tape drive andany backup software can be used with either system.

    Imagers can be managed on-line through a GUIthat is compatible with Windows 95, 98, and NT andwith DOS. Imagers can also be operated off-linethrough their own control panel for tape copying,comparing or verifying. Internal half-height 5.25,desktop, and rack-mount units are available.

    Pricing for the Imagers begins at $2445.

    Ultera Systems(949) 367-8800Fax: (949) 367-0758www.ultera.com

    BATTERY MONITOR ICThe DS2436 battery identification chip provides

    a convenient method of tagging and identifyingbattery packs by manufacturer, chemistry, or otheridentifying parameters. The chip enables the batterypack to be coded with a unique two-byte identifica-tion number, and it stores information about batterylife and charge/discharge characteristics in its non-volatile memory. Applications include cell phones,audio/video equipment, data loggers, scanners, andother hand-held instruments.

    The DS2436 integrates a 10-bit voltage ADC and13-bit temperature-sensing circuitry that monitorsbattery temperature without requiring a thermistorin the battery pack. A cycle counter manages bat-tery maintenance intervals and helps the user todetermine the remaining cycle life of the battery.

    The DS2436 also measures battery voltage andsends the measured value to a host CPU. This fea-ture is useful for end-of-charge or end-of-dischargedetermination or for basic fuel-gauge operation.Information is sent to and from the DS2436 over aone-wire interface, so the battery packs need onlyhave three output connectors: power, ground, andthe one-wire interface.

    The DS2436 sells for $4.10 in quantities of 1000.

    Dallas Semiconductor(972) 371-4448Fax: (972) 371-3715www.dalsemi.com

    www.dalsemi.comwww.ultera.com

  • Circuit Cellar INK Issue 102 January 1999 9

    NEW PRODUCT NEWSPC DATA INTERFACE ADAPTERS

    A PC can now be used to log data from GPS satellites, depth sounders, radar,and other marine navigational devices with the latest data interface adaptersfrom B&B Electronics. The plug-in connectors convert NMEA (National Ma-rine Electronics Assn.) standard data signals so they can communicate withany RS-232/-422/-485 device, such as a PC or printer.

    Two adapter models are provided to suit either the older NMEA or thelatest NMEA specs. Model 183COR converts the data signal from theolder version of the specifications (NMEA0183 V.1.x) to EIA RS-232/-422/-485 signals. Model 183V2C is for NMEA0183 V.2.x, which is thelatest version of the NMEA specification. The 183V2C converts onedata signal in each direction between NMEA0183 and EIA RS-232.

    The adapter model sells for $99.95 each.

    B&B Electronics Mfg. Co.(815) 433-5100Fax: (815) 434-7094www.bb-elec.com

    PORTABLE EMBEDDED GUIThe PEG (Portable Embedded GUI) library is a pro-

    fessional-quality graphical user interface library createdfor embedded-systems developers. It is small, fast, andeasily ported to virtually any hardware configurationcapable of supporting graphical output. The defaultappearance of PEG objects is almost identical to commondesktop graphical environments.

    The PEG library is written in C++ and implements anevent-driven programming paradigm at the applicationlevel. Each control type is built incrementally on its pre-decessor, enablingusers to select anduse only objectsthat meet theirrequirements. ThePEG library pro-vides an intuitiveand robust objectheirarchy. Objectsmay be used asprovided or en-hanced throughuser derivation.

    PEG provides aset of hardware and OS encapsulation classes, so thePEG user interface can run as a standard 32-bit Win-dows application.

    PEG includes two PC executable utility programs.PEG Font Capture enables users to convert standard

    font files into a format required by PEG, and PEGImage Convert converts standard .pcx, .bmp, and .tgaimages into a compressed format supported by thePEG bitmap functions.

    PEG is designed to work with any compiler/debug-ger combination. There are no internal restrictions onCPU type or hardware configuration. It currently sup-ports standard EGA/VGA, SVGA, and LCD (320 240 4 color grayscale) video controller/display reso-lutions. PEG is designed to work with any combina-tion of mouse, touchscreen, or keyboard input.

    PEG is licensed on a per-developed-product basis,eliminating royalty fees. It is delivered with full sourcecode, several example application programs, hardware

    interface objects for severalcommon video configura-tions and input devices, and

    thorough documen-tation.

    The cost of$5000 includes sixmonths of free sup-port.

    Micro Digital, Inc.(800) 366-2491(714) 373-6862Fax: (714) 891-2363www.smxinfo.com

    www.bb-elec.comwww.smxinfo.com

  • 10 Issue 102 January 1999 Circuit Cellar INK

    NEW PRODUCT NEWSGLOBAL COMMUNICATION DEVELOPMENT SYSTEM

    The GC1100 developmentsystem integrates an embeddedcontroller, GPS receiver, communications modem, and auser-command interface to enable the rapid design of track-ing systems for a wide variety of GPS applications. TheGC1100 is ideal for GPS-based fleet management, AVL, orasset-tracking systems.

    The GC1100 contains a motherboard, an Ashtech G8receiver, a Motorola 505SD modem that allows ARDISPacket Data, an operator display interface, 32 digital userI/O lines, eight analog user inputs, and an active GPS an-tenna (1530-dB gain). Also, prewritten software providesinstant communication among all of the components.

    The GC1100s high I/O count provides many options foruser-specific applications. Digital and analog I/O enablemonitoring several aspects of vehicle status, ranging fromengine performance, cargo integrity and temperature, to fuelstops and door openings. Also, messages to and from thedispatcher and driver can be sent and displayed as text. Thevariety of I/O enables easy interfacing as well as connectingand monitoring digital and analog sensors.

    The development system is available without a receiver ormodem and can accomodate a variety of receivers.

    The individual package includes a modem andreceiver and sells for $1895.

    Z-World(530) 757-3737Fax: (530) 753-5141www.zworld.com

    www.zworld.com

  • Circuit Cellar INK Issue 102 January 1999 11

    NEW PRODUCT NEWSThe EDE702 serial

    LCD interface IC permitsalmost any text-basedliquid crystal displayscreen to be controlledvia a one-wire seriallink. The chip, fromE-Lab Digital Engineer-ing, is ideal for embed-ded microcontrollerapplications whereminimal I/O pin usageis desired.

    The EDE702 enablesfull LCD control, in-cluding the creation ofcustom characters,scrolling text, cursoron/off, and so forth.With transfer rates of2400 and 9600 bps as

    well as selectable data polar-ity, the chip can interfaceto almost any microcontrol-ler that is capable of sendingasynchronous serial data.

    Another plus for design-ers is that this microcon-troller connection can bemade without any type of

    voltage-level conversionhardware.

    With the EDE702, circuitdesigners can easily add anLCD screen to their designwithout being concernedwith the increased softwareoverhead or I/O require-ments that typically accom-

    pany an 11-pin LCDinterface. A seriallycontrolled digital outputpin makes the one-pinserial interface effectivelya zero-pin interface.

    The EDE702 is avail-able in 18-pin DIP orSOIC packages, and itsells for $4.50 in quanti-ties of 1000.

    E-Lab DigitalEngineering

    (816) 257-9954Fax: (816) 257-9945www.elabinc.com

    SINGLE-CHIP DATA LOGGER

    www.elabinc.com

  • 12 Issue 102 January 1999 Circuit Cellar INK

    NEW PRODUCT NEWSUNIVERSAL SECURITY DEVICE

    The Safety Claw is a universal lock for any driveon a PC or workstation. This device enables diskdrives to be protected whether theyre on a desktopor tower PC. All other drives, including CD drives,streamers, Zip drives, MO, or Syquest, can be reli-ably secured against unauthorized use whethertheyre installed in a PC or as an external drive.

    The Safety-Claws security plate is affixed to thePC or external drive casing, and its bar is inserted toblock the drive. The Safety-Claw protects the PC orexternal drive from robbery if the user inserts a steelcable through the loop in the bar and attaches theprotected device to another firm object.

    Additional uses of the Safety-Claw include pre-venting a scanner or copier lid from being opened oravoiding the unauthorized use of an interface and/orremoval of the cable on any device.

    There are 200 different keys available for the Safety-Claw, and keyed-alike systems can be ordered. Thesteel bar of Safety-Claw is 6 mm in diameter, so it isextremely difficult to cut.

    The Safety-Claw sells for $29.95.

    Interface Security Solutions Corp.(800) 254-4392(203) 743-1228Fax: (203) 743-1458www.crocodile.de

    www.crocodile.de

  • 14 Issue 102 January 1999 CIRCUIT CELLAR INK

    MicroprocessorCommunications

    FEATUREARTICLE

    Stuart Ball

    Communication istricky no matter what,right? But when youhave several proces-sors involved, well, itjust gets worse.Stuart begins thistwo-part series bylooking at ways toget the messagesbetween processorson a single backplane.

    lthough mostembedded applica-

    tions can be handledwith a single processor,

    every now and then, you find a jobrequiring a system with two or moreprocessors.

    Nearly every multiprocessor designneeds a way for the processors to com-municate. In this series, I look at thedifferent methods for communicatingbetween processors and the varioustradeoffs involved.

    To start off, Ill look at useful ap-proaches when two processors sharethe same PC board or backplane. Letssay the processor communicates witha higher-level system, like a PC, anddistributes commands to a lower-levelprocessor that controls a DC motor(see Figure 1).

    CPU 1 talks to the host system, andCPU 2 controls a DC motor, under the

    Part 1: Methods forCommunicating

    a

    14

    20

    26

    36

    62

    FEATURESMultiprocessorCommunications

    Developing aCustom IntegratedProcessor

    Using Java inEmbedded Systems

    Music at Your Fingertips

    The PCL 3013 Step/Servo Motor Controllerin Action

    Figure 1Although its rather simple, this block diagramis representative of a typical multiprocessor application

    Host system

    CPU 1 CPU 2

    DC motor

    Shaft encoder

    Sensors

  • CIRCUIT CELLAR INK Issue 102 January 1999 15

    As Figure 1 shows, only seven bitsof the register are used for transferringdata. The eighth bit (D7/Q7) is a strobethat indicates when data is available.

    The strobe bit is needed becauseCPU 2 may read the register anytime,including the exact instant when CPU1 is writing to it. As you see from thetiming diagram, when CPU 1 wantsto change the register, it executes twowrite operations. The first write setsthe lower 7 bits (D0D6), and thesecond write toggles the strobe bit,D7, without changing D0D6 again.

    CPU 2 only reads data when it seesthe strobe bit change state. So, if CPU 2happens to be reading the register whenCPU 1 is updating the data bits, CPU2 wont see a change on the strobe bitand will ignore the data. If CPU 2 readsthe register at the exact instant thatCPU 1 is changing the strobe bit, itwont matter if CPU 2 sees the change,because the data bits are already stable.

    Lets say you defined some commandsfor the control system as in Table 1.As Figure 2 shows, CPU 1 previouslysent a Motor On command. Thiscommand is followed by a commandto set the speed to 4, followed by aMotor Off command. Each newcommand changes the state of thestrobe bit.

    The advantage to this scheme issimplicity. A single 8-bit register isused and may be embedded in an FPGAor ASIC, or it may be implementedwith a multibit parallel I/O IC. Ofcourse, 16- or 32-bit processors canuse wider registers.

    A simple system may not even needa command structure. Instead, it canassign a separate bit to each function.Two-way communication can beimplemented with a second register,written by CPU 2 and read by CPU 1.

    Unfortunately theres no feedbackto tell CPU 1 when CPU 2 has read thedata. This drawback has serious impli-cations for the systems throughput.

    Say CPU 2 checks the register every10 ms in response to a timer interrupt.CPU 1 cant send data any faster thanthis or CPU 2 may miss a byte.

    If CPU 2 polls the register on anirregular basis, such as in a backgroundloop, then the fastest thatCPU 1 cansend data is the longest time it takes

    CPU 2 to execute the loop.

    REGISTER WITH FLAGTo achieve faster throughput, the

    circuit in Figure 3 adds a set/resetflip-flop to the basic register circuit.The flip-flop is set when CPU 1 writesto the data register and reset when CPU2 reads the register. The flip-flop canbe constructed from a pair of NANDgates, or it can be half of a 74xx74 withthe clock and D inputs grounded.

    The flip-flops output is provided toboth processors so it can be read as anempty/full flag for the data register. Itcan connect to a status register or to amicrocontroller port bit.

    To use this scheme, CPU 1 writessomething to the register. CPU 2 seesthe data available (because the flip-flopwas set) and reads it. CPU 1, pollingthe flip-flop output, sees it go low andknows that the register is empty andready for another byte of data.

    Now, the basic register circuit ismorecomplex but the potentialthroughput is greatly increased . Themaximum throughput is still limited,though, because both CPUs must pollthe empty/full bit.

    For example, if CPU 2 still polls fordata once each pass through a back-ground loop, the worst-case transferrate is the same as in the single-register

    control of CPU 1. CPU 2 senses themotor position via a shaft encoder andgets other sensor inputs as well.

    In a typical real-world scenario, youmight find this arrangement if CPU 1needs to execute slow, complex tasksin response to the host, and CPU 2 hasto execute fast, simple tasks to controlthe motor speed or position. You mayfind CPU 1 controlling multiple proces-sors like CPU 2.

    Clearly, CPU 1 must communicatewith CPU 2 to get this job done. Thisrequires commands like turn motor on,turn motor off, set motor speed to x, andstart motor when sensor y goes active.

    Figure 2 shows one method of com-municating between processors. Thecircuit is an 8-bit register written byCPU 1 and read by CPU 2. The registeris a 74xx374 (xx = LS, HC, ACT, etc.),but this scheme can be implementedin programmable logic or with anyregister that has tristate outputs.

    The D inputs to the register connectto the data bus of CPU 1 (or to the lower8 bits if CPU 1 is a 16- or 32-bit proces-sor). The clock input (pin 11) connectsto a write strobe from CPU 1. The writestrobe is the same type youd use toclock data into any register or a periph-eral IC, and it goes low when CPU 1writes to the specific address wherethe communication register is located.

    The registers Q outputs connect toCPU 2s data bus. When CPU 2 wants toread the register, it generates a low-goingread strobe (a decoded address strobe)at the registers Output Enable (pin 1).This strobe enables the tristate outputs,so CPU 2 to read the register data.

    74xx374

    34

    1

    567

    8 9

    11

    121314 15

    161718 19

    D0D1D2D3D4D5D6D7

    Q0Q1Q2Q3Q4Q5Q6Q7

    CPU 1databus

    CPU 2databus

    CPU 1 *WRITE strobe

    CPU 2 *READ strobe

    CPU 1 WRITE strobe

    D0D6 (data)

    D7 (strobe)

    50 44 51

    2

    Figure 2Note that CPU 1 performs two writes to theregister for every bite transferred. The first write clocksthe data in, and the second one toggles the mostsignificant bit as a strobe.

    Code(hex) Command

    4x Set motor speed to x (x = 0 to F)

    50 Turn motor on51 Turn motor off

    Table 1These are some examples codes for variouscontrol commands.

    74xx374

    34

    1

    567

    8 9

    11

    121314 15

    161718 19

    D0D1D2D3D4D5D6D7

    Q0Q1Q2Q3Q4Q5Q6Q7

    CPU 1databus

    CPU 2databus

    CPU 1*WRITE strobe

    CPU 2 *READ strobe

    2

    Register full flagto CPU 2S

    R *Q Register empty flagto CPU 1

    Q

    Figure 3Adding a set/reset flip-flop improves theefficiency and speed of the register-based communicationmethod by providing an empty/full status to the two CPUs

  • 16 Issue 102 January 1999 CIRCUIT CELLAR INK

    implementation. The maximum rateat which data can be transferred in-creases because CPU 1 can transferdata at the actual polling rate insteadof at the slowest possible rate. Theprice is that both CPUs must have anavailable port bit or status input toread the empty/full flag.

    INTERRUPT-DRIVEN SYSTEMYou can improve performance by

    connecting the outputs of the set/resetflip-flop of figure 3 to an interrupt oneach CPU (instead of to status bits).CPU 2 gets an interrupt when theregister is full, and CPU 1 gets aninterrrupt when the register is empty.Be sure you get the polarity of theinterrupts correct if you make thischange.

    This scheme greatly increases thepotential data throughput. The maxi-mum data rate becomes the sum of theinterrupt latency and processing timefor both processors. The price for thisapproach is the need for one free inter-rupt on each CPU (two if a reverse pathis also implemented).

    Theres also the potential for CPU 2to get hammered with constant inter-rupts if CPU 1 has a lot of data to send.One way around this is to have CPU 2turn off the communication interrupt

    during critical processing. However,this approach decreases the overallthroughput.

    FIFOFigure 4 shows a FIFO interface.

    This example uses a 7203, which is a2 9-KB FIFO. The 7203 is an industry-standard part from a family of partsavailable in 512 9-KB, 1 9-KB, 2 9-KB, and up. This example circuitdoesnt show all the 7203 pins, justthose were interested in here.

    The 72xx family of FIFOs containsan internal SRAM and logic to controlaccess to the RAM. Data written to theFIFO input by CPU 1 is placed in theinternal FIFO memory. Any time theFIFO is empty, the Empty Flag (EF)output is low. If the FIFO is notempty, EF is high.

    The incoming data is stored so thatCPU 2 reads it out in the same orderit was written. So, the FIFO acts as adeep register that allows CPU 1 towrite multiple bytes without worryingabout how fast CPU 2 is reading them.

    To use this method, CPU 1 typicallywrites a complete, multibyte messageto the FIFO. When EF goes high (notempty), CPU 2 reads the data. Thistype of interface requires very littleoverhead from the processors. The rateat which CPU 1 sends data is not lim-ited to the rate at which CPU 2 reads it.

    The first drawback to this systemis a throughput limitation. AlthoughCPU 1 can send a message withoutworrying about how fast CPU 2 canread it, the average transfer rate cantexceed the capacity of CPU 2. If it does,the FIFO fills up and data is lost. So, theFIFO doesnt really increase the over-all data throughput, it just decreasesthe overhead of transferring the data.

    A second problem with the FIFOinterface is time delays. If CPU 1 sendsa command, such as Motor Off, theremay be a delay before CPU 1 reads themessage and acts on it. With register-based approaches, CPU 1 always knewCPU 2 was reading data as it was sent.But with the FIFO design, thats nolonger the case.

    The third problem is relates to thesecond and involves message priorities.Suppose this hypothetical system hadmessages of differing priorities. Thenormal Motor Off command may

    allow the motor to coast to a stop, buttheres an Emergency Stop commandthat brakes the motor instantly.

    If Emergency Stop is received, youpresumably want to service it imme-diately. If the interface uses a FIFO,theres the possibility that commandscan stack up in it, as shown in Figure5. A low-priority command, like a speedchange, can be in front of a commandsuch as Emergency Stop.

    Suppose that your command setconsists of long, multibyte messagesand that the software in CPU 2 readsthe first byte of each command to seewhat kind of command it is. If thecommand is low priority, like a speedchange, the software may decide toread and process the command later.This decision leaves the possibilitythat Emergency Stop wont be acted

    on right away if its behind a low-priority command in the FIFO.

    The solution to this priority prob-lem is for CPU 2 to read all messagesas soon as they are received. Lowerpriority messages can be stored forlater execution, and high-priority mes-sages can be executed immediately.The drawback is that all messagesmust be treated as high priority be-cause any message could have a high-priority message stacked behind it.

    DMA-BASED INTERFACESome microprocessors, such as the

    186 and 386EX, have built-in directmemory access (DMA) controllers.For these applications, the circuit inFigure 6 eliminates nearly all of thedisadvantages Ive described so far. Thiscircuit goes back to the register-and-flip-flop approach, but with a twist.

    7203

    65

    15

    10114

    3 12

    1

    162726 17

    182524 19

    D0D1D2D3D4D5D6D7

    Q0Q1Q2Q3Q4Q5Q6Q7

    CPU 1databus

    CPU 2databus

    CPU 1 *WRITE strobe

    CPU 2 *READ strobe

    9

    R

    *EF

    WData available

    to CPU 2

    Figure 4A FIFO provides a very fast interface

    Figure 6If the processors support it, adding DMA tothe register-based scheme provides extremely fast, low-overhead communication.

    74xx374

    34

    1

    567

    8 9

    11

    121314 15

    161718 19

    D0D1D2D3D4D5D6D7

    Q0Q1Q2Q3Q4Q5Q6Q7

    CPU 1databus

    CPU 2databus

    CPU 1 *WRITE strobe

    CPU 2 *READ strobe

    2

    S

    R *Q

    Q Register full DMA request to CPU 2

    Register empty DMA request to CPU 1

    Figure 5Its possible for a high priority message tostack up behind lower priority messages in a FIFO Thissituation can cause thehigh-priority message to beserviced later than expected.

    CPU 1

    FIFO memory

    High priority message

    Low priority message

    CPU 2

  • 18 Issue 102 January 1999 CIRCUIT CELLAR INK

    *WRITE strobefrom CPU 1

    Data ready flag

    *READ strobefrom CPU 2

    Data ready does not get reset during read.

    In Figure 6, the flip-flop outputs donot connect to status bits or to inter-rupt inputs. They connect to the DMArequest signals on both processors. CPU1 receives a DMA request when the dataregister is empty and CPU 2 gets onewhen the register is full.

    Lets look at three ways to imple-ment this interface in software. First,all messages have a predefined length(16 bits, 32 bits, etc.). Shorter messagesare padded out to this length. CPU 2sets up its DMA controller to transferone complete message. When CPU 1wants to transfer data, it tells itsDMA controller to send the block ofmemory to the data register.

    As each byte is sent, the DMArequest to CPU 1 goes inactive andthe DMA request to CPU 2 goes active.When CPU 2s DMA controller readsthe byte, the DMA requests swap statesagain and CPU 1s DMA controllersends the next byte.

    After the entire message is received,CPU 2 gets an interrupt from its DMAcontroller and processes the receiveddata. The entire transfer is accomplishedin a few tens of microseconds, althoughthe processing may take longer.

    If your application requires variable-length messages, you can define eachmessage so the first byte defines thelength. As before, CPU 1 sets up its

    Figure7aIf CPU 2 is much faster than CPU 1, CPU 2 may detect the register full condition and read the data while CPU 2 is still performing a write cycle. This results in theregister full condition remaining active and CPU 2 reading two bytes instead of one. bBy connecting the CPU 1 write strobe to the clock input of the status flip-flop (instead ofthe SET input), the status flip-flop is not set until the end of the write cycle. cThe change prevents the race condition from occurring, regardless of the relative CPU speeds.

    *WRITE strobefrom CPU 1

    Data ready flag

    *READ strobefrom CPU 2

    +5 74xx74

    Data readySDR Q

    Q

    CPU 2 *READ strobe

    CPU 1 *WRITE strobe

    c)b)a)

    DMA controller to send the entire mes-sage. CPU 2 has already set up its DMAcontroller to transfer a single byte.

    When the first byte of the messageis sent, CPU 2 gets an interrupt (fromits DMA controller) and reads the byte.CPU 2 then sets up its DMA controllerto transfer the rest of the messagebased on the length byte. This methodenables variable-length messages to besent, but CPU 2 now has to service twointerrupts for each message and themaximum transfer rate is slower.

    The third method is for CPU 2 toset its DMA controller to transfer moredata than the longest possible message.When CPU 1 sends data, it gets aninterrupt (from its DMA controller)indicating that the transfer is complete.

    CPU 1 notifies CPU 2, via anotherinterrupt path, that a message is avail-able. CPU 2 then reads the length fromits DMA controller pointer registers andprocesses the message normally.

    In a DMA scheme, CPU 2 sets up ablock of memory as a buffer for theDMA data. For example, if each mes-sage is 16 bytes in length, CPU 2 canset up a 256-byte block of memorythat contains 16 message buffers.

    Using DMA also avoids the FIFOpriority issue in two ways. First, thetransfers are executed directly to mem-ory in hardware, making the process ofreading the data less of a bottleneck.Second, if CPU 2 uses multiple buffers,lower priority messages can be left intheir buffers until they are acted on,whereas high-priority commands canbe executed immediately.

    Even if only one of your CPUs hasbuilt-in DMA, you can take advantageof this approach. The CPU with DMAcan transfer messages using DMA,eliminating the overhead of polling orservicing one interrupt per byte. Whileyou wont get the throughput of a

  • CIRCUIT CELLAR INK Issue 102 January 1999 19

    REFERENCE

    S. Ball, Embedded MicroprocessorSystems, Real World Design,Butterworth-Heinemann, New-ton, MA, 1996.

    dual-DMA design, sometimes you getsimpler software with this approach.

    With any DMA-based approach, makesure the timing of setting and resettingthe flip-flop meets the requirementsfor the DMA controller. The primarydrawback to this approach is that oneor both processors must have DMAcapability.

    The primary advantage is extremelyfast data throughput, and although itsprobably not worth changing processorsjust to use this technique, it can providea fast communication path if the pro-cessors support it.

    RACE CONDITIONAll the variations of the register-and-

    flip-flop design are susceptible to a racecondition if one processor is consider-ably faster than the other. In Figure 7a,CPU 2 is much faster than CPU 1, soCPU 2 sees the flip-flop get set andreads the data while CPU1 is stillwriting to the register. As a result, theflip-flop doesnt get reset properly.

    A typical scenario where this mightoccur is if CPU 2 is a very fast DSP

    communicating with a slower general-purpose microprocessor.

    The solution: use a synchronousdesign (where evrything is referencedto one of the CPU clocks) or use aclocked flip-flop. Figure 7b shows howa flip-flop like the 74xx74 would beconnected to fix the timing problem.

    Figure 7c shows the new timing.Because the write strobe from CPU 1is connected to the clock input of the74xx74, the data ready flag doesnt getset until the end of the write cycle,eliminating any timing conflict.

    DUAL-PORT RAMI also want to mention dual-port

    RAM (i.e., RAM that can be accessedby either processor). One option is touse an off-the -shelf dual-port RAM ICwith two addresses and data buses. Youcan get controller ICs that convertstandard RAM devices to dual port.

    The second method is to use theexisting RAM associated with one ofthe processors. This approach is sim-pler than an external RAM, but it canaffect the throughput of both processors.

    All of these approaches can be mixedand matched. For instance, if your appli-cation has a CPU 1-to-CPU 2 interfacethat requires long data messages athigh rates, you might implement aDMA controlled register for that inter-face. The return path, from CPU 2back to CPU 1, might carry only in-frequent status bytes, so it may be asimple polled register interface.

    Next time, Ill look at methods youcan use if your processors must com-municate over a greater distance. I

    Stuart Ball works at Organon Teknika,a manufacturer of medical instruments.He has been a design engineer for 18years, working on projects as diverseas GPS and single-chip microcontrollerdesigns. He has also written two bookson embedded-system design. You mayreach Stuart at [email protected].

  • 20 Issue 102 January 1999 CIRCUIT CELLAR INK

    Developinga CustomIntegratedProcessorAnalyzing the Price/PerformanceTradeoff

    ew uses for ad-vanced embedded

    microprocessors areemerging everywhere,

    especially in the highly competitive,fast-paced market of consumer elec-tronics. Thanks to cooperative effortswith silicon vendors, embedded-systemdevelopers can manipulate powerfulvariables in the price/performanceequation that were previously beyondtheir control.

    Optimizing an embedded processorpresents an earnest challenge and itrequires the system designer to per-form a delicate balancing act betweenperformance and cost. Ultimately, thisapproach produces an embedded-pro-cessor solution that is fine-tuned for agiven system and/or application.

    DESIGN MODELSWith the traditional system design

    model, the engineer remains at themercy of standard product offeringsfrom the semiconductor vendor. Achipmakers catalog of standard pro-cessor configurations may or may notinclude precisely whats required for agiven application.

    With a standard product, systemdesigners may have to pay for functions

    n

    (and silicon) theyll never use. Addition-ally, the device may lack certain func-tions that could significantly enhancethe system performance of a givenapplication if integrated on-chip.

    But, a new system design model isemerging, brought on by the availabil-ity of modular, fully synthesizable,process-independent microprocessorcores. For the first time, design engi-neers have unprecedented control overdefining and configuring embeddedprocessors.

    Customizable cores, like the Motor-ola ColdFire family, can be cost-effec-tively tailored to meet the demands ofspecific applications. The ColdFirearchitecture was developed to addressthis class of applications.

    Based on variable-length RISC tech-nology, ColdFire combines the archi-tectural simplicity of conventional32-bit RISC with a memory-saving,variable-length instruction set. Indefining the ColdFire architecture,Motorola incorporated a RISC-basedprocessor design and a simplified ver-sion of the variable-length instructionset found in the 68k family.

    The result is a family of 32-bitmicroprocessors suited for those em-bedded applications requiring highperformance in a small core size. TheColdFire family provides balancedsystem solutions to a variety of em-bedded markets. Here are some of thebasic philosophies that have guidedall ColdFire designs.

    When it comes to small, fullysynthesizable processor cores, devel-opments are on track with a publiclyannounced performance roadmapreaching 300 MIPS by the year 2001.Using compiled memory arrays and100% synthesizable designs enables

    Control

    ColdFire Core

    Debug unit Mis-alignment

    Slaveperipheral

    Arbitration module

    M-BUS

    S-bus

    K-to-M

    Cntrl

    System

    bus cntrl (SB

    C)

    Data

    RAMcntrl

    K-bus

    I/O

    E-busAddr

    ROMcntrl cntrl

    Cache

    AlternatemasterRAM ROM Cache

    Slaveperipheral

    Slaveperipheral

    Figure 1This generalized block diagram shows acustom integrated processor using a ColdFire core.

    If standard processorconfigurations arentquite what you need,consider a processor-independent core.Joe and Sylvia bringto your doorstep acustomizable core, soyou can manipulatevaluable variables inthe price/performanceequation

    FEATUREARTICLEJoe Circello &Sylvia Thirtle

  • CIRCUIT CELLAR INK Issue 102 January 1999 21

    system designers to easilydefine CPU configurations.

    Figure 1 depicts the standardColdFire microprocessor con-figuration. The hierarchicalbus structure and the modulararchitecture are apparent. Youcan add other logic, in the formof predefined macros, fromMotorolas library or synthe-size your own proprietarycircuits.

    MAXIMUM ARCHITECTUREFine-tuning a custom em-

    bedded processor for optimalprice and performance requiressome insight into the specificarchitectures variables. Thedifficulty of this process isinfluenced by the sophistica-tion of the silicon vendorsdevelopment environment aswell as the system designersability to provide accurate, real-worldapplication data for the target system.

    For example, the system OEM maybe able to provide information from aprevious-generation system. The datacan be a key piece of software thatrepresents a critical execution path ofthe given application. If possible, theability to extract the key softwareroutines and recompile them for thetarget system makes the processmuch easier.

    As an alternative, trace data capturedfrom a previous-generation systemcan also provide critical informationfor sizing the processors local memo-ries (e.g., cache, RAM, ROM). Thesedynamic traces, whether capturedfrom an earlier design or created bythe application code running on asoftware simulator of the target sys-tem, are crucial for the price and per-formance optimization analysis.

    PREDICTING PERFORMANCEAlthough ratings for microproces-

    sors are expressed in MIPS, this num-ber often fails to accurately predictthe performance of an embedded mi-croprocessor system for a given appli-cation. Many times, these ratingsneed a mileage you get may varydisclaimer. Unless the effects of thememory subsystems are taken into

    account, these simplistic ratings cantaccurately indicate performance.

    Today, more precise performanceestimates of a hypothetical or actualprocessor core can be made. By takingspecific system and memory subsystemvariables into account, this methodol-ogy provides a more accurate represen-tation of completely different CPUconfigurations and architectures.

    The predicted performance of aprocessor can be developed using anaverage-instruction-time methodology.In its simplest form, this cycles perinstruction (CPI) metric representsthe number of machine cycles perinstruction and is calculated for asingle-issue architecture as:

    where CPI is the average instructiontime expressed in cycles per instruction,F(i) represents the dynamic frequencyof occurrence per instruction, and ET(i)is the execution time for a given in-struction i. By summing the productof relative frequency and executiontime for each instruction type, theaverage instruction time for a processorexecuting any given instruction mixcan be calculated.

    Consider the definition of abase average instruction time(base CPI). Let the base CPIrepresent maximum processorperformance strictly as a func-tion of the instruction mix.Stated differently, this metricrepresents the processors per-formance assuming the rest ofthe system (caches, memorymodules, etc.) is ideal.

    Figure 2a shows the baseCPI where the summationproduct was previously definedand the sequence-related pipe-line stalls include all pipelinebreaks caused by the instruc-tion sequence. You can calcu-late the base CPI by summingthe product of the relativefrequency of occurrence andexecution time for each instruc-tion type plus the sequence-related holds.

    This base CPI provides a parameterto quantify the performance of a givenprocessor microarchitecture. To con-vert this value into a more realisticmeasure of predicted system perfor-mance, you have to consider a seriesof degradation factors.

    Let the effective average instructiontime (effective CPI) represent this morerealistic measure of performance. Byquantifying the degradation factorsassociated with these other systemcomponents, the effective CPI can becalculated. As an example, the proces-sor stalls resulting from cache missestypically represent the largest degra-dation factor in the effective CPIequation.

    In Figure 2b, the calculated effec-tive CPI is reached by summing theindividual degradation factors. Lets

    a) base CPI [cycles/inst] = summation {F(i) ET(i)} + sequence-related pipeline stalls

    b) effective CPI [cycles/inst] = base CPI + summation of memory factors + summation of system factors

    c) effective CPI [cycles/inst] = base CPI+ IC_miss IF IF_stall+ OC_miss REF OP_stall

    where the cache memory degradation factors include:

    IC_Miss = Cache miss rate on instruction fetches (Miss/fetch)IF = Instruction fetches per instructionIF_stall = Time [cycles] the processor core is installed servicing an instruction fetch missOC_Miss = Cache miss rate on operand fetches (Miss/OPFetch)REF = Operand references per instructionOP_stall = Time [cycles] the processor core is installed servicing an operand miss

    d) effective CPI = base CPI+ {(IC_Miss IF) + (OC_Miss REF)} {2 + 11 + 0.6 (12 + 13 + 14)}

    Figure 2aHeres the simplified expression for the processors performancemeasured by effective CPI. bThis generic expression defines performance asmeasured by effective CPI. cThis more detailed equation defines effectiveCPI performance for a processor with cache memory. dAnd, this is theeffective CPI equation for the ColdFir2 V.2 and V.3 processors.

    Figure 3This diagram gives you an overview of theColdFire performance-analysis methodology.

    ISAsim HW trace

    Profile CFxPipemodels

    Memorymodels

    Math

    Eff CPI

    Programheximage

    Base CPI Hit rates

    Stream of addresses(PC, Operand, Instructions)

    Addresshistogram

    CPIcyclesinst = summation F i ET i

  • 22 Issue 102 January 1999 CIRCUIT CELLAR INK

    0

    10

    00

    0

    20

    00

    0

    30

    00

    0

    40

    00

    0

    50

    00

    0

    60

    00

    0

    70

    00

    0

    80

    00

    0

    90

    00

    0

    10

    00

    00

    0000000000003000

    000060000000c0000000f000

    0001900000029000

    0003d0000004000000043000

    000460000005500000058000

    0005c0000006400000068000

    0006b0000000000600079000

    000000070008d000

    0009c000000a7000000d2000

    000d6000000de000000e1000

    000eb000000ee000000f1000

    000f4000

    Me

    mo

    ry a

    dd

    ress

    Reference count

    Figure 4The operand address histogram is takenfrom a set-top box application.

    define the memory subsystem factorsas those associated with a cachememory, and assume the remainingsystem factors are negligible.

    The effective CPI equation canthen be rewritten as in Figure 2c. Thefirst degradation term quantifies theCPI contribution due to instructionfetch cache misses, and the secondterm quantifies the operand referencecache misses.

    The relative performance betweentwo systems, x and y, can be expressedas:

    where the first ratio defines the archi-tectural factor, the second ratio is thetechnology factor, and the third ratiois the instruction set/compiler factor.

    Using the system performanceequation, you can analyze the relativeperformance of different generationsof a microprocessor family, or comparedifferent architectures. For benchmarkswhere the same binary code image is

    executed on different designs, the rela-tive performance equation reduces tothe product of the architectural andtechnology factors.

    MODELING TOOLSGiven CPI methodology, a number

    of tools have been developed to assistin this kind of performance analysisfor the ColdFire architecture.

    You can use a number of architec-tural models to analyze various factorswithin the effective CPI performanceequation. These tools are typicallyhigh-level C language models of cer-tain functions within the design andare driven with information from theColdFire ISA simulator or trace data.

    The ISA model is a C-languageprogram that defines the expectedresults of execution of the instructionset architecture. By inputting a mem-ory image file, the ISA model executesthe program on an instruction-by-instruction basis, updating all program-visible machine registers and memoryas required. This ISA model is instru-mented to optionally output informa-tion on instruction fetch, operandaddresses, and program counter values.

    By executing the target applicationon the ISA simulator with the appro-priate outputs enabled, a stream of datafrom the executing application can beinput to one of the architectural models.This input data provides the requiredstimulus to the architectural models.

    Processor pipeline models are usedfor base CPI analysis. Theres also aprogram that gathers detailed statisticsabout dynamic opcode usage. Recall-ing the base CPI equation, this programprovides the F(i) factors associated withthe various opcodes for the application.

    ADDITIONAL ANALYSIS MODELSThe ColdFire cache model quantifies

    numerous performance parameters forvarious cache sizes, associativity, andorganizations. It uses the stream ofreference addresses generated by thesimulator as input, and models thebehavior of Harvard and unified cachesof sizes from 512 bytes to 32 KB.

    Additionally, the associativity canvary between two-way and four-way,and the operands can be mapped intocopyback or store-through space. This

    model can also include a RAM, mappedto a specific region, for heavily-refer-enced operands or code segments. Map-ping the active region of the stack frameto this type of RAM is often effective.

    A second model provides informa-tion for memory address profiling.Using the stream of reference addressesas input, this model profiles the mem-ory access patterns to identify criticalfunctions and/or heavily referencedoperand locations. For some systems,such profiling helps you understandthe required amount of RAM as wellas which variables to map into thisspace to maximize performance.

    Of prime importance is verificationof the architectural models. So, atvarious times throughout the analysisprocess, the accuracy of the architec-tural models is validated.

    The V.2 processor pipeline archi-tectural model was initially verifiedby comparing predicted base-CPI val-ues versus those directly measuredfrom silicon. Reviewing measuredbase-CPI values versus those predictedby the pipeline model, the error wasless than a 0.5% difference across alarge set of embedded benchmarks.

    The cache architectural models werevalidated against the design descrip-tions for several ColdFire MPU designs.

    Another area of interest is themodeling of the {IF,OP}_stalltimes. These degradation factors rep-resent the pipeline stall that occurs ona cache miss. For the nonblockingstreaming cache designs of the V.2 andV.3 cores, these terms are modeled as:

    {IF, OP}_stall = (1 + t1) + 1.0 + 0.6 (t2 + t3 + t4)

    MemoryConfiguration

    2-KB cache +4-KB RAM4-KB cache +4-KB RAM8-KB cache +4-KB RAM16-KB cache +4-KB RAM32-KB cache +4-KB RAM

    Relativeperformance

    1.001.051.191.271.521.611.982.062.712.91

    Relativearea

    1.001.191.111.311.321.521.791.982.712.91

    Table 1Heres the relative performance and area forvarious ColdFire configurations executing a set-top boxapplication

    x performancey performance

    =

    y eff CPIx eff CPI

    y cycle timex cycle time

    y executed instsx executed insts

  • 24 Issue 102 January 1999 CIRCUIT CELLAR INK

    04 KB

    2 KB

    4 KB

    8 KB

    16 KB

    32 KB

    0.00

    0.50

    1.00

    1.50

    2.00

    2.50

    3.00

    Pe

    rfo

    rma

    nce

    RAM size

    Cach

    e Si

    ze

    2.50-3.02.00-2.51.50-2.0

    1.00-1.50.50-1.00.00-0.5

    04 KB

    2 KB

    4 KB

    8 KB

    16 KB

    32 KB

    0.00

    0.20

    0.40

    0.60

    0.80

    1.00

    1.20

    Pe

    rfo

    rma

    nce

    /Are

    a

    RAM Size

    Cach

    e Si

    ze

    1.00-1.20.80-1.00.60-0.8

    0.40-0.60.20-0.40.00-0.2

    where the responsetime of the externalmemory for a line-sized fetch is speci-fied as t1 - t2 - t3 -t4 when viewedfrom the micropro-cessor pins.

    Using the equa-tion in Figure 2d forthe V.2 and V.3designs, the relativeerror between thepredicted and mea-sured effective CPIwas less than 2% across a wide suiteof embedded benchmarks.

    Figure 3 summarizes the process.The architectural models are drivenby trace data captured from existinghardware or from a compiled applica-tion executed on the instruction setsimulator. The resulting streams ofaddresses and instructions are theninput to the specific models.

    The profiling tool determines anyhot spots in the code or data areasthat might be considered for placement

    in local RAMs or ROMs. The pipelinemodel produces the base CPI perfor-mance metric for a given version ofthe ColdFire microarchitecture.

    The local-memory models deter-mine all the performance parametersassociated with the cache, RAM, andROM modules. The miss ratios arebased on size, organization, and thedynamic stream of reference addresses.The base CPI and memory parametersare combined to produce an effectiveCPI value that provides an accurate

    measure of predictedperformance for a givenconfiguration.

    OPTIMAZATIONEXAMPLES

    To see how the per-formance analysis pro-cedure works, considerthe following real-world examples.

    To begin, lets sayyou are implementing adigital set-top box. Byinstrumenting an exist-

    ing 68k system, trace data is capturedfor two critical execution paths.

    The challenge is to determine theappropriate amount of local processormemories (cache and possibly RAM)to optimize price and performance fora V.3 ColdFire design. When imple-mented in 0.35-m process technology,the V.3 core provides 70-Dhrystone,2.1-MIPS performance when operatingat 90 MHz.

    The trace data is profiled to identifyany potential hot spots that mightbenefit from placement in a RAM.The profile in Figure 4 shows severalspikes representing heavily referencedoperand areas.

    The largest reference area is gener-ally the system stack and the firstcandidate for mapping into a localRAM. Using the architectural models,the relative performance and areacalculations across a range of cacheand RAM configurations are given inTable 1. The reference design is a V.3core with 2 KB of cache memory.

    In Table 1, the relative performanceranges from 1.0 to 2.6 as a functionof local memory configurations with acorresponding relative area of 1.02.9.Depending on system requirements,the appropriate configuration can beselected, as shown in Figure 5.

    In the second example, a customerprovides a C-language benchmark thatrepresented four execution paths in aservo control application. In this real-time application targeted for a V.2core, absolute performance in responseto certain interrupts was critical.

    There was a fixed amount of timeto service the interrupt and the algo-rithm implemented a number of digital

    Figure 5aThis graph depicts the relative performance as a function of cache and RAM sizes. bBycontrast, this graph shows the relative performance per area as a function of cache and RAM sizes.

    a) b)

  • CIRCUIT CELLAR INK Issue 102 January 1999 25

    filters. Given the signal-processingnature of the application, this analysisattempted to quantify the impact ofthe ColdFire multiply-accumulateunit (MAC). The optional MAC istightly coupled to the basic executionpipeline and is designed to acceleratesignal-processing algorithms.

    Initial analysis indicated that thedynamic frequency of occurrence formultiply instructions (i.e., F(mul))was ~10%. Applying the MAC pro-vides faster execution time for multi-ply instructions, reducing ET(mul).

    Many implementations of digitalfilters can be optimized using MACinstructions directly. First, the bench-

    mark code was compiled and executedon a V.2 core and its performance mea-sured. This value provided the reference.

    The code was recompiled using C-language macros to use MAC instruc-tions for arithmetic calculations. Thecompiler-generated MAC assembly-language code was optimized by handto provide an upper bound of perfor-mance. Table 2 shows the results.

    The baseline core configurationincluded the processor complex with8 KB of RAM. Including the MAC unitincreased this area by only 11% butincreased performance by 1.51.7.

    WHICH MEANSThis analysis methodology pro-

    vides a powerful tool, now systemdesigners can balance processor perfor-mance, clock speed, and relative die size.

    Given a highly configurable archi-tecture, system designers now haveaccess to the key silicon variablesneeded to create embedded processorsolutions optimized for a given appli-cation. And, the result? Smart, intui-tive, and user-friendly products. I

    RelativeConfiguration performance

    CF2, no MAC 1.00x

    CF2 + MAC with compiler- 1.45x

    generated MAC instructions

    CF2 + MAC with hand- 1.69x

    optimized MAC instructions

    Table 2Depending on the servo control application,the relative ColdFire performance will vary.

    Joe Circello works as an advanced micro-processor architect for Motorolas Semi-conductor Products Sector and was thechief architect for the MC68060 and theColdFire family of microprocessors. With23 years of experience, he is a veterandesigner specializing in pipeline organi-zation, control structures, and perfor-mance analysis. You may reach Joe [email protected].

    Sylvia Thirtle is a principal staff engineerfor Motorolas Semiconductor ProductsSector, specializing in high-speed digitalASIC design. In her five years atMotorola, shes been involved in variousdesign activities with ColdFire and iscurrently leading the design of the debugmodule for the next-generation develop-ment. You may reach Sylvia at [email protected].

    SOURCE

    ColdFireMotorola, Inc.(512) 895-2134Fax: (512) 895-8688www.mot.com/ColdFire

    www.mot.com/ColdFire

  • 26 Issue 102 January 1999 CIRCUIT CELLAR INK

    Using Javain EmbeddedSystems

    lthough Java hasproperties that

    would be useful forembedded-system design,

    the versions of Java used in desktopsystems just arent suitable for embed-ded systems. There are some alterna-tives, but they do have drawbacks.

    When considering the alternatives,its important to consider issues likemultithreading and debugging support.Regardless of whichever option emergesas the preferred form, two key issuesmust still be addressed by any embed-ded Java programming environment:how to provide determinism and howto interface to hardware.

    JAVA IS GOODOne of Javas strengths is its reason-

    ably clean sytax that is strongly remi-niscent of C or C++. So although its anew language, its familiar. Gettingup to speed with Java is easy.

    More importantly, Java is bothobject-oriented and strongly typed.Everything in Java is an object andthere are no loopholes to circumventJavas strong typing. Since the adventof C++, these features are considredessential in a programming languagebecause they contribute enormously tothe correctness of programs.

    Aneccdotal evidence bandied aboutin Java newsgroups and mailing lists

    a

    suggests thatdevelopers take less timeto produce a working Java program thana program in C or C++. Debugging is alsoeasier because Java has removed a prolificsource of hard-to-find bugs, includingthose related to the incorrect use ofpointers.

    Example bugs include memory leaksand memory access errors (wild point-ers, referencing freed memory, return-ing a pointer to a local variable, etc.).Java doesnt allow its pointer equivalent(i.e., object references) to be manipulatedin the same way as pointers are in Cor C++, and it provides automaticgarbage collection.

    Another strength of Java is its largereusable code base. In the standarddistribution, Java supports threads,TCP/IP networking, and remote invo-cation. It even has a full set of classesfor building GUIs.

    Additional APIs support a varietyof needs, such as database access,communication, multimedia, a wayto use GUI components, and security.

    With Javas strengths as a language,a development environment, and areusable code base, its easy to seewhy developersand not just embed-ded-system developersare eager toput it to use.

    DESKTOP JAVA DRAWBACKSUnfortunately, as I mentioned,

    desktop Java has some drawbackswhen used in embedded systems.Although Javawas originally intendedfor use in set-top boxes, it was firstused in a web browser, which is adesktop application.

    First, desktop Java is too big forembedded applications. Not only mustthe entire Java virtual machine (JVM)be present, but a Java interpreter or ajust-in-time (JIT) compiler must bepresent as well.

    On top of that, all the standardclasses must be present. These takeup to 8 MB on disk, more whenloaded. Fonts take even more space.

    The bottom line is that desktopJava needs on the order of 16 MB justto run, and the application needs areadditional. Very few embedded systemshave that kind of memory available.

    Also, Java is too slow. Suns firstreleases were usually more than 30x

    If youre set on puttingsome desktop-Javafunctionality into anembeded system,chances are thatyouve had somesleepless nights. Nomore! Thats whatVladimir and MikePromise if youll con-sider the availablealternatives.

    FEATUREARTICLEVladimir Ivanovic &Mike Mahar

  • CIRCUIT CELLAR INK Issue 102 January 1999 27

    Object AObject A

    GC scan here

    Object B

    Object C

    Object A

    Object C

    Object B

    Application movespointer to Object C

    from Object B to Object A.

    GC never sees pointer to Object C.

    slower than equivalent C code. Subse-quent releases, which use JIT compilers,are significantly faster but still perhaps5x slower than equivalent C. If youreused to squeezing out the last fewcycles out of a processor, this is aheavy penalty to pay just to use Java.

    But, the most omportant drawbackof desktop Java is that it doesnt meetthe constraints of most embeddedsystems. One such constraint is therequirement for real-time behavior(i.e., execution thats both predictableand bounded in duration).

    Many embedded systems have severereal-time requirements. For instance,the collision-detection system on ajetliner has seconds in which to respond.Computation must finish in a certainamount of time, so execution has tobe predictable.

    Another constraint of embeddedsystems is their limited resources.Consumer devices, which may bemanufactured in the millions, are verysensitive to cost, so designers tend touse the smallest processor and thesmallest amount of memory possibleto do the job. A programming languagethats slow and uses up a lot of mem-ory just isnt competitive with existingalternatives.

    More importantly, Java doesntpossess the notion of an address. Em-bedded systems, almost by definition,are required to access hardware. Mostoften, that hardware is accessed byreferring to a specific address. Becauseaddresses arent part of Java, you haveto go outside the language to overcomethis constraint.

    Finally, desktop Java has someattributes that get in the way of suc-cessful use in embedded systems.These attributes may be useful andeven necessary in desktop systems,but not in embedded systems.

    For instance, Java is interpreted(the source of much of its slowness)an dit is dynamic because it supportsthe down loading of new classes on-the-fly. Java is portable across many differ-ent systems because its source code iscompiled, not to native code, but tobytecodes, an architecture-neutralformat. Also, Java supports a compre-hensive security model designed toprefent many kinds of attacks.

    However, for embedded sysems,which frequently exist in completelyclosed environments, portability andsecurity arent issues. Unless anembeded system is connected to anetwork, the ability to load newclasses dynamically is useless.

    These attributes of desktop Javaprevent its use in embedded systems.And, the issues of performance, memoryconsumption, and poor real-time be-havior make it hard to retarget the desk-top version to an embedded system.

    EMBEDDED ALTERNATIVESWhat are the alternatives? How

    can an embedded-systems developeruse the great features of Java withoutquadrupling the systems cost or writ-ing piles of non-Java code?

    Essentially, there are only threeoptions: use a special-purpose JVM,use a JVM with a JIT compiler, or usecompiled Java instead of some form ofinterpreted Java.

    Many vendors have come up withspecially tailored versions of Java thatare a better fit for the needs of embed-ded developers. For instance, Sun offersPersonalJava for systems with 2-4 MBof memory and EmbeddedJava forsmaller systems (Mentor GraphicsMicrotec Division is a licensee ofPersonalJava). Hewlett-Packard, NSICom, Insignia Solutions, NewMonics,and others have similar offerings.

    Another approach, even in versionstailored for embedded use, is to use adynamic compilation technique, typi-cally a JIT compiler, to increase perfor-mance. But there are several tradeoffsinvolved.

    First, JVMs with JITs have poten-tially longer start-up times becausethe JIT compiler has to compile Javabytecodes into native amchine languagebefore executing. Secondly, its difficultto do a good job of optimizing nativecode while keeping memory consump-tion low. The more optimizationsthat are done, the larger and slowerthe JIT compiler becomes.

    Several vendors provide knobs totune the dynamic compilation processso you can choose on a case-by casebasis exactly what the performanse,memory consumption, and start-up-time tradeoffs are going to be.

    But, for some embedded applications,even a JVM with dynamic compilationis too slow and takes up too muchmemory. One option that is increas-ingly being considered is compilingJava directly to a native machine lan-guage, thereby eliminating both theJVM and either the interpreter or theJIT compiler.

    Of course, the resulting applicationis no longer portable, but embeddeddevelopers typically dont care aboutportability. For a given design, theirapplication needs to run on a singlewell-known hardware configuration.

    The other attribute that compiledJava forces a developer to give up isthe ability to load new classes on-the-fly. Because all the code is precompiled,theres no facility for dynamic loadingof classes. Again, this issue probablyisnt too serious for embedded-systemdevelopers, most of whom dont wantrandom classes downloaded onto theirsystem.

    object_a->next = object_c;if(object_c != NULL)if(object_c->garbage_flags == WHITE){

    object_c->garbage_flags = GRAY;gc_make_gray(object_c);

    }

    Listing 1Since a compiler knows about every write to memory, it inserts a write barrier automatically.

    Figure 1Object C is lost if the application changespointers between garbage-collection scans.

  • 28 Issue 102 January 1999 CIRCUIT CELLAR INK

    If youre willing to tolerate thelack of portability and the lack ofdynamic class loading, you can stillreap all the benefits of Java as a greatlanguage and keep the system smalland fastthat is, if you can resolvethe issues of determinism and low-level programming.

    RESOLVING THE ISSUESAny version of Java for embedded

    systems must first be deterministicand predictable. It also has to be ableto access memory directly.

    One bugaboo of embedded systemsis ensuring real-time response. In thecase of Java-based systems, the primarycause of nondeterminism is the garbegecollector.

    In desktop systems, it doesnt mattermuch that the JVM stops for severalseconds to collect unused memory.But in an embedded system, severalseconds can be the difference betweencorrect operation andthe loss of hu-man life.

    The biggest threat to an embeddedoperation is that most garbage collec-tors work in whats called stop-the-world mode. Usually, the collector iscalled only when an allocation failsbecause memory is exhausted. There-fore, allocation time is impossible topredict, and when the collector isrunning, no other processing is beingdone. This situation is unacceptablein a real-time system.

    An obvious solution is to have thegarbage collector run concurrentlywith the application so that the im-pact of garbage collection is spreadaround more evenly. This way, time-critical events are processed in atimely manner.

    Ensuring real-time reponse stillisnt enough to make Java useful fordeveloping embedded systems. Be-cause the added value of embeddedsystems is their specialized hardware,the embedded software must alwaysbe able access or control the hardware,which requires an extension to theJava through a Java Native Interface(JNI) with several possible options orthrough a nonstandard extension ofthe Java language.

    Suns JNI permits portability acrossdifferent JVMs on a particular processor

    import COM.mentorg.microtec.phys.*;class m68561 {

    PhysByte this_uart;PhysByte Tsr, Tdr, Rsr, Rdr;int baseAddr;m68561(int base_addr) {

    baseAddr = base_addr;this_uart = new PhysByte(baseAddr);Tsr = new PhysByte(baseAddr + 0x8 * 4));Tdr = new PhysByte(baseAddr + 0xa * 4));Rsr = new PhysByte(baseAddr + 0x0 * 4));Rsd = new PhysBYte(baseAddr + 0x2 * 4));

    // Initialize UART. Reuse this_uart object because registers// are only used once or twice

    this_uart.setAddress(baseAddr + (0x1 * 4)); //RCRthis_uart.set(1); // Reset receiverthis_uart.setAddress(baseAddr + (0x9 * 4)); //TCRthis_uart.set(1); // Reset transmitterthis_uart.setAddress(baseAddr + (0x19 * 4)); //PSR2this_uart.set(0x1e); // 1 stop, 8 bitthis_uart.setAddress(baseAddr + (0x1c * 4)); //BRDR1this_uart.set(0x8c); // 9600 bpsthis_uart.setAddress(baseAddr + (0x1d * 4)); //BRDR2this_uart.set(0); //this_uart.setAddress(baseAddr + (0x1e * 4)); //CLKCRthis_uart.set(0x1c); // Divide by 3, TCS out, TXC inthis_uart.setAddress(baseAddr + (0x1f * 4)); //ECRthis_uart.set(0); // No parity, no error checkthis_uart.setAddress(baseAddr + (0xd * 4)); //TIERthis_uart.set(0); // No transmitter interruptthis_uart.setAddress(baseAddr + (0x15 * 4)); //SIERthis_uart.set(0); // No serial interruptthis_uart.setAddress(baseAddr + (0x05 * 4)); //RIERthis_uart.set(0x1e); // No receiver interruptthis_uart.setAddress(baseAddr + (0x1 * 4)); //RCRthis_uart.set(0); // Enable receiverthis_uart.setAddress(baseAddr + (0x9 * 4)); //TCRthis_uart.set(0x80); // Enable transmitterthis_uart.setAddress(baseAddr + (0x11 * 4)); //SICRthis_uart.set(0x80); // RTS, DTR low

    }

    int pollReceive() {if (! Rsr.andByte(0x80)) {

    return Rsd.getByte(); //Check for break or error}return (-1) //-1 means no character

    }

    int receive() { // Wait for character to be called in a separate thread. while(Rsr.andByte(0x80)) {return( Rsd.getByte());

    }}

    void send(byte character){ // Wait for transmitter to be ready. Should probably be // in a separate thread.

    while Tsr.andByte(0x80)) {;

    }Tsd.setByte(character);

    }}

    Listing 2 This code gives you an example of how the Phys package can be used.

  • 30 Issue 102 January 1999 CIRCUIT CELLAR INK

    architecture, but it suffers from poorefficiency. An earlier version ofJNIwas more efficient, but it required theJava code to know the layout of anobject in memory. In any case, it makessense for an embedded system to offera package specifically for accessingphysical memory.

    REAL-TIME GARBAGE COLLECTIONThe Java language requires garbage

    collection of unused objects and theresno corresponding delete operator togo with the new operator. One advan-tage of garbage collection is that youcant have bugs in your memory allo-cation if the reaponsibility for detect-ing unused memory and reallocatingit is automatic. The application sim-ply clears a reference to memory tomake it available for future use.

    However, the advantages of garbagecollection come with a price. Findingunused memory and freeing it cantake a long time, causing critical dead-lines to be missed in a real-time envi-ronment. A garbage collector for areal-time system. must be predictable

    and fast in addition to allowing high-priority threads to run. Unfortunately,most collectors fail on all three ofthese requirements.

    Garbage collectors work oppositelyof what their name implies. They findall the memory blocks that are in useand free up whats left over. There aremany different algorithms for garbagecollection, but most of them share thefollowing steps:

    scan the local and statically allocated variables for pointers to the heap mark each memory object that can be reached from these pointers scan each marked memory object for pointers to the heap repeat steps 2 and 3 until no new pointers are found sweep the heap and free up any memory that is not marked

    MOVING POINTER PROBLEMA major problem with concurrent

    garbage collection is that while thecollector is scanning the heap forpointers, the application is changing

    those same pointers. In essence, theentire heap is a critical section.

    Suppose an application is manipu-lating a liked list of three objects, asillustrated in Figure 1. Object A pointsto object B, which points to object C.The garbage collector scans object Afor pointers, but has yet to scan ob-jects B or C. The application thendeletes object B by copying the pointerto C to object A.

    The collector hasnt scanned objectC, nor has it scanned object B, so itwont find any pointers to object Cbecause it already scanned object A.When the collector completes scan-ning, it frees object C even thoughtheres a live pointer to it in object A.

    APPLICATION AND INTERFERENCEInterference between the applica-

    tion and garbage collection is the onlysituation you have to worry about.Other accesses to the heap dont affectgarbage collection. To make concurrentgarbage collection work, the applicationmust tell the collector every time itwrites an object pointer to another

  • CIRCUIT CELLAR INK Issue 102 January 1999 31

    memory object. This is called a writebarrier.

    Write barriers sound expensive, butthere are several ways to speed thingsup. Every allocated object on the heaphas a flag word containing the state ofthat memory object. There are threepossible states: black, gray, or white.

    The collector knows a black-stateobject is live and has scanned it forpointers. The collector knows a gray-state object is live but has not yetscanned it for pointers, In the whiteatate, the collector has not yet found apointer to the object.

    There are several variations on howa write barrier is implemented. Listing1 is one example of a write barrier. Theif statement is generated automati-cally by the compiler, so its not nec-essary to put in write barriers by hand.

    Every pointer assignment to theheap has two additional tests. Usually,programs manipulate the same pointersmultiple times. The garbage_flagsand gc_make_gray function callsoccur only the first time an object isseen.

    Subsequent stores find the objectalready marked GRAY and dont haveto call the collector. Making a WHITEobject GRAY isnt an expensive process,often taking less than ten instructions.

    Once the garbage collector and theapplication are cooperating, its possibleto run the garbage collector as a sepa-rate thread of exexcution. The garbagecollectors priority may have to be setdifferently depending on the character-istics of the application. The prioritycan be set low if the application spendsa lot of time waiting for external events.

    In fact, applications that are I/Obound or event driven may have betterperformance than explicitly freeingobjects because the garbage collectorcan run while the processor is nototherwise busy, and still keep up withthe demands of new memory. How-ever, thats not always the case.

    If an application has a mix of event,I/O, and compute-bound processingthreads, the priority of the garbagecollector can be set lower than theevent and I/O threads and at the samepriority as the compute-bound threads.

    And, when all free memory is ex-hausted, the garbage collectors prior-ity can change dynamically to takethe priority of the thread that wastrying to allocate memory.

    The garbage collector can run untilcompletion, and then the allocatingthread can continue. Its also reason-able to have just two priorities for thegarbage collector: one that is loe forwhen free memory is plentiful andone that is higher for when free mem-ory is exhausted, or nearly exhausted.

    MEMORY ALLOCATIONGarbage collection isnt the only

    memory-management considerationin a real-time system. The allocationof memory must be fast and predict-able. The system must allocate anobject in nearly the same amount oftime for every allocation.

    Therefore, the memory managercant maintain long lonked lists ofobjects that must be searched everytime memory is requested. The mem-ory heap structure must be organizedin a way that ensures predictability.

  • 32 Issue 102 January 1999 CIRCUIT CELLAR INK

    INTERFACING HARDWAREThe Java language doesnt have

    pointers to physical memory or anyother built-in method for accessingspecific memory addresses. This limi-tation makes it difficult to write devicedrivers or any other code that needs totalk to physical devices entirely in Java.Additionally, there may be existingmodules written in C++ that you wantto keep.

    For this reason, the Java languagespecifies that certain methods may bedeclared native. Originally, nativemethods enabled high-performancefunctions to be performed in the nativeinstruction set of the host machinewithout incurring the performance ofthe virtual machine.

    Native methods can interface be-tween Java and C++ in a compiledenvironment as well. The native dec-larartion tells the compiler that themethod is externally defined, so youcan write this external method in C orC++.

    Besides allowing native methods tobe written in C or C++, the Java com-piler has a switch (-xj for the MicrotecJava compiler) that tells the compilerto emit two files for every class thathas native methods. The first file is aC++ header file called Class.h thatcontains a C++ definitionof the Java class. The sec-ond file is a Class.cppfile that contains a stubC++ program for everynative method.

    To implement the na-tive methods, just edit the.cpp file and add code tothe method stubs (seePhoto 1).

    PHYSICAL MEMORYPACKAGE

    Because acceesingmemory directly is such acommon request, Microtecincluded a package of classescalled COM.mentorg.microtec.Phys, whichenables you to create ob-jects that access memorydirectly. There are threeclasses, one for each sizeof memory.

    Photo 1This screenshot demonstrates implementing the native method by editingthe .cpp file.

    These classes are PhysicalByte,PhysicalShort, and PhysicalInt,and each class contains the followingmethods:

    Physicalsize(int address) set(size value) size get() int getAddress() setaddress() size and( size value) size or (size value)

    The constructor for each of theseobjects takes an int argument thatsthe address in memory associated withthis object. When a variable of typePhysicalInt, PhysicalByte, orPhysicalShort is declared, it takesone argument, which is the memoryaddress at which you want the data toreside. For example, Physical IntmyInt = new PhysicalInt(0x01-00000C); creates a PhysicalIntvariable and stores it at memory loca-tion 0x0100000C.

    To set the data to something useful,use the set() method. For example,myInt.set(256); gives your myInta value of 256.

    Later, when you need to retreive thedata in myInt, use the get() method.int newInt = myInt.get(); will

    make newInt contain the value (e.g.,256) of the data in myInt. If you needto find out the address of a variable inmemory, try something like, int myAddress = myInt.getAddress() ;.

    You can change the address of myIntin memory with the setAddress()method. For instance, myInt.set-Address(0x0100000A ; changes theaddress from whatever it was before to0x0100000A.

    If you want to perform a bitwiseand or or with the data, they workthe same way:

    newInt = myInt.or(31);newInt = myInt.and(31);

    The functions take an argument,the number to and or or, to the dataalresady in place nd return the result.

    These classes can be subclassed andany of the methods may be overriddento add additional functionality. Forexample, the and or or methods mayneed to be noninterruptable so theycan be overridden with a method thatdisables interrupts during their execu-tion,. Listing 2 shows how to useCOM.mentorg.microtec.Phys.

    RECOMMENDATIONSYouve seen some of Javas advan-

    tages as a language for devel-oping embedded systems, butyouve also seen some of thenonobvious pitfalls of using aversion of desktop Java in anembedded system.

    Of the three options (spe-cial purpose JVM, JVM with aJIT, or compiled Java), compiledJava probably has the best setof tradeoffs. Its clearly thewinner in raw CPU speed,and it has memory usagesimilar to a conventionallanguage like C or C++. Al-though compiled Java isntportable and doesnt immedi-ately offer the ability to loadclasses dynamically, for manyembedded systems, thesedrawbacks arent issues.

    If portability or dynamicclass loading are needed, thenext-best alternative is prob-ably a special purpose JVM.

  • 34 Issue 102 January 1999 CIRCUIT CELLAR INK

    REFERENCES

    K. Arnold and J. Gosling, The JavaProgramming Language, Addison-Wesley, Reading, MA, 1997.

    R. Jones and R. Lins, Garbage Col-lection: Algorithms for AutomaticDynamic Memory Management,Wiley & Sons, New York, NY,1996.

    Although a JIT compiler seems tooffer the most straightforward way toadd performance to Java, it runs therisk of taking up too much memoryand causing unacceptable pauseswhile compiling a just-loaded class.

    These alternatives resolve the prob-lem of using Java in embedded sys-tems. So, all the benefits of Java as alanguage and as a source for reeusablecode are available to both embeddedand desktop application developers. I

    Mike Mahar has worked in the softwaredevelopment industry for more than20 years and has been involved in

    SOURCES

    Mentor Graphics Microtec Division(800) 950 5554(408) 487-7000Fax: (408) 487-7001www.mentorg.com/microtec/java

    Hewlitt-Packard(800) 452-4844

    (650) 857-1501www.hpconnect.com/embeddedvm

    Insignia Solutions(800)848-7677(510) 360-3700Fax: (510) 360-3701www.insignia.com/embedded

    NewMonics(515) 296-0897Fax: (515) 296-4595www.newmonics.com

    NSI Com(212) 717-9615Fax: (212) 734-4079www.nsicom.com

    EmbeddedJava, Java Native Interface, PersonalJava

    Sun Microsystems(650) 960-1300www.java.sun.com/products/

    embeddedjavawww.java.sun.com/products/jd