(19) tzz ¥ ¥ t - patentimages.storage.googleapis.com · -figure 4 shows the tokenizer...

Printed by Jouve, 75001 PARIS (FR)

(19)E

P2

858

323

A1

TEPZZ 858¥ ¥A_T(11) EP 2 858 323 A1

(12) EUROPEAN PATENT APPLICATION

(43) Date of publication: 08.04.2015 Bulletin 2015/15

(21) Application number: 13306357.8

(22) Date of filing: 01.10.2013

(51) Int Cl.:H04L 29/06 (2006.01) G06Q 40/00 (2012.01)

G06F 17/00 (2006.01)

(84) Designated Contracting States: AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TRDesignated Extension States: BA ME

(71) Applicant: Enyx SA75003 Paris (FR)

(72) Inventor: Kodde, Edward75010 PARIS (FR)

(74) Representative: Lopez, FrédériqueMarks & Clerk France Immeuble Visium 22 Avenue Aristide Briand94117 Arcueil Cedex (FR)

(54) A method and a device for decoding data streams in reconfigurable platforms

(57) Embodiments of the present invention providea decoding device (10) implemented on an integratedcircuit, for decoding a market data input stream receivedin a given data representation format. The decoding de-vice comprises an engine (4) built around a finite statemachine (41), the engine (4) being generated from atleast one description file (5) and configured to performthe following steps, in a current state of the finite statemachine:i) dividing the market data input stream into a number of

tokens and reading a set of tokens,ii) accumulating the set of read tokens in internal regis-ters,iii) generating output commands from the tokens accu-mulated in the internal registers depending on a conditionrelated to the tokens accumulated in the internal regis-ters, andiv) selecting the next state of the Finite State Machinestate based on a triggering condition.

EP 2 858 323 A1

2

5

10

15

20

25

30

35

40

45

50

55

Description

Field of invention

[0001] The invention generally relates to data process-ing systems for processing market data, and more par-ticularly to a method and a device for decoding datastreams in reconfigurable platforms.

Background art

[0002] As there is a growing need for faster processingof large volumes of data in financial industries, dataprocessing systems based on clusters relying on gener-al-purpose CPUs show a number of limitations. Indeed,if cluster approaches involve inexpensive hardware andprovide tools that simplify the development, they have anumber of constraints which are all the more significantas the requirement for high performance computing in-creases: high electricity consumption, costly mainte-nance, important space required for data centers. Fur-ther, the overall performance obtained with a cluster doesnot increase proportionally with the number of clusters.Unlike the cluster approach, data processing systemsbased on FPGAs allows execution of complex tasks inparallel with an important throughput, with a limitednumber of machines equipped with FPGAs. Accordingly,this hardware approach appears particularly suitable forthe development of applications in the field of financialand investment industries where fast calculation is keyto remain competitive.[0003] An FPGA (acronym for Field-programmablegate array) designates an integrated circuit which can beconfigured after manufacturing. The configuration is gen-erally specified using a "Hardware description language"(HDL). FPGAs contain a huge number of programmablelogic components ("logic blocks"), and a hierarchy ofreconfigurable interconnections that allow the blocks tobe "wired together". Logic blocks can be configured toperform complex combinational logic or merely simplebasic logical operations (boolean AND, OR, NAND, XORetc.). As FPGA can perform parallel calculations, a samealgorithm can be executed simultaneously for a numberof independent inputs in only a few clock cycles. FPGAsare thus particularly suited for executing complex com-putation very fast.[0004] For these reasons, more and more market dataprocessing systems are designed using FPGAs.[0005] Existing market data processing systems re-ceive data from external sources (such as Exchanges),publish financial data of interest to their subscribers (suchas traders at workstations), and route trade data to var-ious exchanges or other venues.[0006] They generally comprise at least one decoderthat interacts with the feed sources for handling real timedata streams in a given format (FAST, FIX, binary), anddecodes them, converting the data streams from source-specific formats into an internal format (data normaliza-

tion process). According to the message structure in eachdata feed, the decoder processes each field value witha specified operation, fills in the missing data with valueand state of its cached records, and maps it to the formatused by the system.[0007] Currently, the decoding of input data streamsis performed in software or in hardware, in a purely se-quential way, without any parallelization. Existing decod-ers which perform the decoding in software often undergobandwidth limitation as the processor of the decoder can-not decode the packets fast enough. This stems from thefact that the software decoder needs to decode everymessage to determine if it concerns an instrument thatis of interest to the application(s). Furthermore, when do-ing the rest of the processing in hardware, two transfers,from the hardware to the software and the other wayaround are required. These transfers are very time con-suming compared to the typical processing time, and adda lot of latency.[0008] Market data rates have dramatically increasedover the past few years, approaching a peak of 1 millionmessages per second. As market data rates continue toincrease, high speed, ultra low latency, and reliable mar-ket data processing systems are becoming increasinglycritical to the success of the financial institutions. In par-ticular, there is currently a need to provide high-perform-ance decoders capable of processing up to 10 Gb/s mar-ket data feeds to feed the order management core withnormalized commands that do not depend on the marketbeing processed, while still having the lowest latencypossible.[0009] Further, the market data formats evolve quiteoften, especially those in FAST. This does not raise anymajor issue for classic software decoders, which can usu-ally be modified easily. In the case of FAST formats, theexchange provides the updated templates file, and thesoftware either loads this file dynamically, or its code (ora part of it) is regenerated automatically from these tem-plates.[0010] However, with decoders using reconfigurableplatforms (FPGA), it is difficult to adapt to such formatchanges in an efficient way. Indeed, while a general-pur-pose CPU can be easily updated to execute any task,once an FPGA is programmed for a particular task, it isquite complicated to update the FPGA so that it can toexecute another task. This would require reprogrammingthe FPGA again, which is both expensive and complex.

Summary of the invention

[0011] In order to address these and other problems,there is provided a device for decoding an input marketdata stream as defined in the appended independentclaim 1, and a method for decoding an input market datastream as defined in appended claim 15. Preferred em-bodiments are defined in the dependent claims.[0012] The invention thus provides high-performancedecoders capable of processing up to 10 GB/s market

1 2

EP 2 858 323 A1

3

5

10

15

20

25

30

35

40

45

50

55

data feeds to feed the order management core with nor-malized commands that do not depend on the marketbeing processed, while still having the lowest latencypossible, and the ease of use and update of softwaredecoders.[0013] The decoder according to the embodiments ofthe invention can be transparently adapted to evolutionsof data formats. Such adaptation can be easily performedby updating the description file which is written in a formatsuch as XML (eXtensible Markup Language), recompil-ing the description file and providing a new version of thefirmware to be downloaded in the reconfigurable plat-form.[0014] Further advantages of the present invention willbecome clear to the skilled person upon examination ofthe drawings and detailed description. It is intended thatany additional advantages be incorporated herein.

Brief description of the drawings

[0015] Embodiments of the present invention will nowbe described by way of example with reference to theaccompanying drawings in which like references denotesimilar elements, and in which:

- Figure 1 represents an exemplary data processingarchitecture including a decoding device accordingto the embodiments of the invention;

- Figure 2 shows a decoder architecture, according tocertain embodiments of the invention;

- Figure 3 is a diagram block illustrating the generationof the decoder engine, in accordance with the em-bodiments of the invention;

- Figure 4 shows the tokenizer architecture, accordingto certain embodiments of the invention;

- Figure 5 is a flowchart of the steps performed fordecoding an input streams, according to certain em-bodiments of the invention; and

- Figure 6 shows an exemplary Finite State Machine,according to certain embodiments of the invention.

Detailed description

[0016] Referring to figure 1, there is shown an exem-plary data processing system 100 provided to acquireand process market data.[0017] As used herein, the term "market data" refersto data received in the form of a data stream from anumber of external sources that comprise quote andtrade-related data associated with equity, fixed-income,financial derivatives, currency, etc.[0018] The data processing system 100 comprises atleast one market data packets decoder 10 (also referredto as a "decoder" or "decoding device") that interacts withfeed sources for processing market data feeds receivedaccording to any source-specific protocol from exchangenetworks 1, for decoding them.[0019] More specifically, the decoding device 10 is

configured to receive input messages such as UDP pay-loads (acronym for User Datagram Protocol) or TCP pay-loads (acronym for Transmission Control Protocol), de-code them into messages, and output normalized com-mands 8 based on the decoded message.[0020] The output commands 8 provided by the decod-ing device 10 may be fed to an order management system12.The order management system 12 comprises at leastone memory for storing details related to each order toretrieve them when needed.[0021] The system 100 may further include a LimitsAggregation and Book Building unit 13 to aggregate theorders that are pending into order books, present for eachinstrument a list of orders, possibly aggregated into limitsand sorted by price. Generally, client applications 15 es-sentially need to access the first limits of a book. Alter-natively, the client applications may access the ordersdirectly.[0022] As used herein, the terms "order" or "limit order"or "market order" refer to an order to buy or sell a givenquantity of a financial instrument at a specified limit priceor better, or at the market price for market orders.[0023] Further, an order book refers to the electroniccollection of the outstanding limit orders for a financialinstrument, such as for example a stock. As used herein,the term "limit" refers to a "line" or "entry" in an order bookthat corresponds to one or several orders. When it cor-responds to several orders, it may also be referred to asan "aggregated limit". Limits are aggregated by price, i.e.all orders with a same price have their quantities addedup to form the limits quantity. An aggregated limit mayalso have an "order count" property reflecting the numberof orders that have been aggregated in this limit. Theposition of a limit inside a book is referred to as a "linenumber" or a "level".[0024] As shown, the data processing system 100 mayalso comprise a Message Dispatch and Transport unit14 for formatting the processed data in messages anddispatching them to selected client applications 15 forfurther processing and decision taking. The client appli-cations 15 may be located on different servers so thatmessage transport may be done over a network.[0025] In many fields of data processing, there is aneed to improve data processing speed. However, mod-ern data processing systems are faced with a growingamount of data. The first step of a data processing chaincomprises of the data acquisition phase which in networkapplications consists of Network (generally UDP or TCP)acquisition 2, and is generally performed by the NIC (Net-work Interface Card) and the Operating System’s Net-work Stack, and of the data packet decoding phase.[0026] The data packets decoding phase performedby the decoding device 10 depends on the format of theincoming data, which itself depends on the specific ap-plication. For example, in data processing systems forprocessing market data exchanged by financial institu-tions, the Data Packets decoding phase depends on themarket since each market has its own data format.

3 4

EP 2 858 323 A1

4

5

10

15

20

25

30

35

40

45

50

55

[0027] Generally, three types of formats are used formarket data feeds:

- binary formats,- FIX-based formats, and- FAST-based formats.

[0028] In binary formats, generally used in equity mar-kets, all the fields are sent in binary, and message struc-tures are fixed. This allows the decoding device 10 touse these structures to parse the messages. Messagesare generally encapsulated in packets, with a binaryheader indicating the number of messages included inthe packet. Messages generally start with a field indicat-ing their sizes so that the decoding device can skip themessages it does not handle based on the size informa-tion, and possibly do some integrity checking.[0029] In some market application, ASCII strings, ormore rarely UNICODE strings, may also be put in somefields. However, when those strings have a fixed length,with padding at the end of the string if necessary, theycan be treated as a regular binary field.[0030] FIX stands for Financial Information eXchange.In this kind of formats, fields are coded in ASCII, eachpreceded by their field ID and separated by SOH (StartOf Header) characters. The example below representsa Fix encoded message, where SOH characters are rep-resented by the sign "I" :

"8=FIX.4.2|9=178|35=8|49=PHLX|56=PERS|52=20071123-05:30:00.000|11=ATOMNOCCC9990900|20=3|150=E|39=E|55=MSFT|167=CS|54=1|38=15| 40=2|44=15|58=PHLXEQUITY TESTING|59=0|47=C|32=0|31=0|151=15|14=0|6=0|10=128|"

[0031] For example, the first part of the above mes-sage, "8=FIX.4.2", defines the value of field 8, which isthe version of the FIX protocol used, and the part "35=8"defines the value of field 35 representing the messagetype.[0032] In FIX, fields do not have fixed sizes and canappear in almost any order, except for some fields whichhave a mandatory order like the fields in header of themessage, or the fields in repeating data groups.[0033] A FIX-based format is available for most ex-changes, usually as a legacy format that supplementstheir proprietary one.[0034] FAST stands for FIX Adapted for STreaming. Itis a compressed variant of the FIX protocol designed touse less bandwidth by removing redundant information.Fields are coded in binary, but with "stop bits": each bytecontains 7 bits of payload and one bit, called the "stopbit", to signal the end of a field. Like FIX, FAST’s fieldshave thus variable sizes.[0035] Further, in FAST, fields are always in the sameorder but operators are applied to them and in some cas-

es they can be completely removed from the inputstream.[0036] In future and derivative market applications,where feeds are encoded in FAST, the templates file(s)may be provided by the exchange. These templates de-scribe the structures of the messages and the operatorsto apply to each field. For example, a field with a "delta"operator that remains constant can be completely re-moved from the stream. If this same field is incrementedby 1, the value transmitted in the stream is 1, which iscoded on 1 byte using the stop bits, instead of sendingthe whole value of the field, which can require severalbytes. Some fields represent "presence maps" and areused to code the presence of other fields. Thus, whenbits in this field are set to 0, this indicates that the corre-sponding optional fields are absent. Generally, no redun-dant information is transmitted, and all the informationcan still be recovered by the decoding device.[0037] The FAST protocol is mainly used by future andderivative exchanges.[0038] Conventional FIX, FAST and even binary de-coders generally decode an input stream according to aserial processing. Every field is generally read sequen-tially. Operators may be applied if required, and decisionsabout the next steps of the decoding may be taken aftereach field.[0039] In FIX protocol, the fields in a message can bein any order, and the end of the message is detected bythe beginning of the next one. Similarly, for repeatingdata groups, the end of one group is detected by thebeginning of another group. The number of fields to beread can thus not be known in advance, and fields mustbe read one after the other.[0040] In FAST protocol, each byte is appended to acurrent field buffer until a stop bit is found. The suitableoperator is then applied to this field. However, if a fieldis absent (i.e. the corresponding presence map bit is setto 0), no data can be read and this field must be skipped.The same applies to some operators such as for exam-ple:

- the "delta" operator : when the old and new valuesare the same, the delta operator can set a presencemap bit to 0 and put nothing in the stream,

- data types : "optional decimals" can have only 1 fieldused; the exponent is then in the stream and null,and no mantissa is present; the number of fields tobe read cannot be known in advance.

[0041] For binary protocols, the size of each messageis known in advance. However, there are still some con-structs, such as the selection of the suitable messagetype, that are sequential.[0042] Currently, the decoding of messages in binaryor based on the FIX or FAST protocols is performed insoftware or in hardware, in a purely sequential way, with-out any parallelization.[0043] Most existing decoding devices are handwritten

5 6

EP 2 858 323 A1

5

5

10

15

20

25

30

35

40

45

50

55

to be optimized for a specific market and are thus differentfor each market. Some existing software decoding de-vices are compiled from description files but main knownsoftware decoders for FAST protocol are generic decod-ers which use templates, possibly precompiled. Howev-er, these approaches do not allow reaching the samedata rates even in situations where the templates areprecompiled into some kind of binary code.[0044] The decoding device 10 according to the vari-ous embodiments of the invention relies on a code gen-eration approach that provides specially generated andoptimized code that allows reaching higher data ratesthan the ones obtained with conventional generic decod-ers. Such code generation according to the embodimentsof the invention is unusual in the field of hardware de-coders.[0045] Reference is now made to figure 2 showing theinternal structure of the decoding device 10 (also referredto as a decoder) according to certain embodiments ofthe invention.[0046] The decoding device 10 is configured to receivean input stream 2 and generate output commands 8 onits output bus. The input stream 2 is received in a givendata format (which can be FIX-based, FAST-based, bi-nary) and is provided by a market (for example the NAS-DAQ market provides a binary data stream). In particular,the input stream 2 may be provided in any data repre-sentation format that extends the FAST templates spec-ification.[0047] The decoding device 10 comprises an engine4 generated by a compiler based on at least one descrip-tion file 5. The description files 5 describe the commandsthat are to be generated by the engine 4. The engine 4provides most of the code of the decoding device, whichis adapted to process the input stream 2 and providesnormalized output commands 8.[0048] Thus, the decoder is further configured to con-vert a variety of messages from the market into outputcommands 8. The messages may include for examplefor by-order markets (for example, NASDAQ, BATS):

- An add message for requesting addition of an orderto an order book;

- A cancel or delete message for requesting deletion(total or partial) of an order from an order book,

- An execute message, requesting execution (total orpartial) of an order from an order book,

- A replace or modify message for requesting modifi-cation of one or more properties of a limit order com-prised in an order book (for example, modification ofthe quantity or price).

[0049] The messages may further comprise for "by-limit" markets (for example, CME, EUREX):

- A limit creation message, to create a limit, indexedby its level, and shift all the limits under it down,

- A limit deletion message for deleting a limit in the

book, indexed by its level, and shifting all the limitsunder it up, and

- A limit modification message, to modify a limit, in-dexed by its level.

[0050] The messages may also comprise for "by-price"markets (for example, LIFFE, TSE): a price update mes-sage for updating a limit indexed by its price; such mes-sage creates the limit if it does not already exist, anddeletes it if its quantity reaches 0.[0051] These messages are transformed into com-mands by the decoding device 10. Output commands 8are carried on a bus that may comprise several signals,such as :

- An operation code or opcode used to identify thecommand type (such as add, delete, replace, etc.for « by-order » streams, or create limit, delete limit,etc. for "by limit" streams);

- An instrument identifier for identifying the instrumentto which the command is related;

- An order identifier for identifying the order to whichthe command is related;

- A price and quantity information representing theprice and quantity parameters of the command ;

- Additional data signals depending on the command.

[0052] Some of the above signals may not be used.For example, the order identifier may only be used forcommands related to an order and the instrument iden-tifier may only be used for the commands related to aparticular instrument and when the instrument is known:the DELETE commands for deleting a particular orderfor example relate to an order on a given instrument. Thisinstrument is generally not transmitted by the market,and thus not present on the output commands bus 8. TheOrder Management System 12 is then responsible forfinding the instrument ID using the order ID.[0053] The decoding device 10 may further comprisea set of conversion units 6. The conversion units 6 areconfigured to further normalize the commands output bythe engine 4 that are already as close as possible to thedesired normalized output commands 8. Indeed, someoperations may be hard to describe efficiently in the de-scription files 5, or need specially optimized code thatcannot be automatically generated. The conversion units6 may comprise the following units for example:

- ASCII to Binary converters 60: some fields in binarystreams are coded in Ascii, as are all the fields inFIX streams. When such fields correspond to quan-tities for example, it may be interesting to convertthem to a binary format because that allows arith-metic operations to be applied to them. This mayapply for example when the bus on which the outputcommands 8 are sent requires that some fields mustbe sent as integers, which is the case for examplefor quantities.

7 8

EP 2 858 323 A1

6

5

10

15

20

25

30

35

40

45

50

55

- Price format converters 61: market data streams cancontain prices in various formats, depending on themarket, such as floating point, fixed point with vari-ous coefficients (x100, x10000...), "ticks" associatedwith a "tick size" (in that case the actual price is ob-tained by multiplying the number of ticks by the ticksize). In order to be able to handle several marketsand compare their prices easily, they may be nor-malized to fixed point values with a 10^8 coefficient.This allows coding all the prices from all known mar-kets without losing any information.

- Hashtables 62 for transforming strings into integers:Integers are generally easier to handle. As a result,the output bus may preferably use instrument andgroup IDs (Identifiers) instead of names as done insome markets. To transform the names into IDs,hashtables 62 can be used that contain the relationbetween the names and the IDs. Even when the mar-ket sends IDs, a hashtable may be still added totransform the market’s IDs into internal IDs.

[0054] As shown in figure 3, the compiler 3 is config-ured to process the description file 5 and generates anengine 4 comprising a Finite State Machine 41 (also re-ferred to as "FSM"), in a hardware description languagesuch as Verilog or VHDL. The following description willbe made with reference to a description file 5 comprisingat least one XML description file and an engine 4 writtenin a hardware description language such as VHDL andcomprising a Finite State Machine 41.[0055] The description file 5 may be for example in theXML (acronym for Extensible Markup Language) format.The syntax of the description files 5 used according tothe embodiments of the invention to generate the decod-ing device is similar to the syntax of conventional FASTdescription files. More specifically, the description file 5may comprise a description of the commands generatedby the engine and of the format of the input stream.[0056] If the input stream is derived from the FASTformat, the description file 5 may incorporate the contentof the templates obtained from the exchange. Such tem-plates may be in XML format. The content of the tem-plates received from the market can then be referencedin the description file 5, for example by use of a specifictag <xi:include>, and the templates file can then be sup-plemented to describe decoder device output com-mands. This tag is used to incorporate the market de-scription file content (template description file content) inthe tag location. The original template description file iskept as is so that it can be compared to future versionsthat may be created in the context of evolutions of thestream format.[0057] When the input stream is in a binary format, thedescription file is not based on templates provided by theexchange. It may be written from the specifications pub-lished by the exchange. The description file then includesthe description of the input market data feed and the de-scription of the output commands to generate.

[0058] The structure of the XML description file 5 ac-cordingly depends on the specific market application.[0059] For example, for a EUREX stream using theFAST format, a corresponding templates file is providedby the EUREX market. It describes the format of thestream, the "template" used for each message, and theoperators used for each field. The content of the tem-plates file will then be included in the description file 5 byusing a « xi:include » XML tag.[0060] In another example where the input stream is aNASDAQ stream, in a binary format, there is no templateprovided by NASDAQ to describe its streams, in XML orin any other computer-readable format. Thus, the de-scription and the commands of the NASDAQ stream areentirely written in a same XML file 5, so that they can bemore readable.[0061] The description file 5 written according to theinvention adds two features sets to the existing FASTtemplate format:

- the first set is configured to describe feeds that arenot FAST encoded and that might have features notsupported by FAST; For feeds in FIX format, thecompiler is made compatible with XML descriptionsof the FIX specifications (as generated by the Quick-Fix project);

- the second set is configured to describe the outputcommands of the decoding device; in particular,when the engine 4 cannot output directly fully nor-malized commands, the description file 5 is adaptedto describe commands that are as close as possibleto the normalized ones.

[0062] The description files 5 thus written make it pos-sible to decode the input stream, in any data represen-tation format that extends the FAST format. By supple-menting the input description files provided by the marketwith an additional tag section, it is possible to supportadditional formats and evolutions of the conventionalFAST formats.[0063] The XML description files 5 have a higher levelof abstraction than the VHDL file 41 generated from them,and contain less code, making modifications easier andbugs less likely.[0064] According to one aspect of the invention, eachengine 4 instantiates at least one tokenizer 40 which isconfigured to break the input stream into meaningful el-ements referred to as tokens, depending on the inputstream format.[0065] To dynamically generate the engine 4, the in-vention provides a common code structure, based on thetokenizer 40 and the Finite State Machine 41, which canbe adapted to all formats.[0066] Figure 4 illustrates the architecture of a token-izer 40 according to certain embodiments of the inven-tion.[0067] The tokenizer 40 is configured to receive theinput stream, process it and output tokens separated

9 10

EP 2 858 323 A1

7

5

10

15

20

25

30

35

40

45

50

55

from each other that can be used by the Finite State Ma-chine 41. According to one aspect of the invention, thetokens may comprise bytes depending on the format ofthe market data input stream: for example, in binary for-mat, each token corresponds to individual bytes; in Fixformat, each token corresponds to FIX fields which com-prise bytes, and in Fast format, each token correspondsto FAST fields which comprise bytes.[0068] The tokenizers 40 may comprise:

- For FAST streams, a tokenizer of a first type for out-putting FAST fields which cuts the input stream ateach stop bit.

- For FIX streams, a tokenizer of a second type foroutputting FIX fields and the field IDs as separatetokens, so as to be able to read the next field ID atthe same time as the current field value. The token-izer of a second type is arranged to cut the streamat both the SOH character (|) and the = character.

- For binary streams, a tokenizer of a third type foroutputting individual bytes.

[0069] A tokenizer 40 according to the embodimentsof the invention have not only deserializer functions foroutputting single tokens but can also output the numberof tokens required by the rest of the engine logic, with amaximum set at compilation. This allows several tokensto be read per clock cycle. The tokenizer 40 is configuredto output an array of tokens (3 tokens in the example ofFigure 4) which can be used by the Finite State Machine41. In return, the Finite State Machine 41 is configuredto send the number of tokens it used. During the nextclock cycle, the tokenizer 40 will then read and presentat its output interface the unused tokens and new tokensobtained from the input stream.[0070] As shown in figure 4, a tokenizer 40 may com-prise a parser 400 which forms the core of the tokenizer.The parser 400 is provided to actually cut (or divide) thestream into a set of tokens.[0071] Additionally, the tokenizer 40 may be config-ured to take the input stream bandwidth without back-pressuring effect. A back-pressuring effect generally oc-curs when a core lowers its "ready" signal to make thecore connected to its input wait. The side effect is that itlowers the maximum bandwidth a core supports, be-cause it wastes clock cycles waiting. In this respect, theparser 400 may be accordingly configured to output themaximum number of tokens that can be provided in oneword of data from the input stream 2. Some tokens at theoutput of the parser 400 may not be used. If a token spansacross multiple bytes, the maximum amount of tokensmay not be used and some tokens can thus be markedas invalid. For example, in the case of a FAST parser, ifthe bus at the input of the parser 400 has a width of 8bytes, the maximum amount of tokens, as defined at com-pilation time, will be 8 since a token is at least one byte.However, some tokens can be several bytes long. Forexample, if 2 tokens are two bytes long and 1 token is

four bytes long, only 3 valid tokens may be presented atthe output of the parser, and the other five tokens maybe marked as invalid. According to another example, ifa token is encountered that is 20 bytes long, it may spanacross 3 words of input data and there may be no validtoken output for at least 2 clock cycles.[0072] The tokenizer 40 may further comprise a buffer401 for buffering the tokens so that the tokens in the arrayof tokens at its output are valid at each transfer. Thetokens of the array are then marked as valid, unlike whatoccurs at the output of the parser 400 where some tokensof the array may not be used. This eases the processingperformed by a read management core 402 (also referredto thereinafter as "read align core"), provided down-stream the buffer 401, and makes higher operating fre-quencies possible.[0073] The read align core 402 of the tokenizer 40 isprovided to allow partial reads. The FSM 41 may read allthe available tokens presented on the output interface ofthe "read align" core 402. Alternatively, it may be config-ured to read only some of the available tokens. In suchembodiment, the set of tokens read among the availabletokens may depend on a condition, and in particular acondition setting the number of tokens that are to be read.For example, the number of tokens that are to be readmay depend on the value of a specific token. The oper-ation of the read align core 402 has particular advantageswhen the Finite State Machine 41 is not ready. For ex-ample, if the number of tokens that are to be read de-pends on the value of a specific token that is being read,the specific token is read, and then, depending on itsvalue, more or less tokens will be read during the nextcycle(s). The number of tokens to be read may also de-pend on some parameters related to the structure of thefinite state machine 41.[0074] The amount of tokens read is sent back to the"read align" core 402 so the remaining tokens which havenot been read may be appended to new tokens and pre-sented again on its output interface during the next clockcycle(s).[0075] For binary streams, the tokenizer 40 may notcomprise a parser 400. Further, a special version of theread align core 402 can be used to allow for random bytelength reads in the stream, each word of the input streamalready containing the maximum number of tokens sincethe tokens are actually the individual bytes in the stream.Indeed, in the case of a binary stream, the tokens corre-spond to bytes so that a parser is not required to separatethe stream into bytes and then assembling the obtainedbytes in a buffer.[0076] For FIX streams, the parser 400 can create upto half the number of input bytes as for each token adelimiter-character is consumed (either a "|" or a "=").[0077] For FAST streams, the parser 400 can createas many FAST tokens as there are input bytes, sinceeach of them can contain a stop bit.[0078] Tokens thus obtained may comprise, along theoriginal extracted field, information about the field like its

11 12

EP 2 858 323 A1

8

5

10

15

20

25

30

35

40

45

50

55

length, and its value converted in various formats (forexample, the binary representation of the ASCII stringfor FIX, or the right-padded version of the field for FAST).[0079] In accordance with the embodiments of the in-vention, the operation of the tokenizers 40 is controlledby the finite state machine 41.[0080] In particular, the finite state machine 41 is con-figured to perform at least some of the following actions:

- reading a selected amount of tokens/fields from thetokenizer 40,

- copying the read fields to storage elements,- applying selected operators for FAST streams,

and/or- initiate the output commands at a determined mo-

ment.

[0081] The finite state machine 41, dynamically gen-erated from the XML description file 5, forms the heartof the engine 4. The structure of the state machine 41corresponds to the structure of the decoded feed, withbranches at each message or part of message selection,and loops for repeating elements inside messages.[0082] The finite state machine 41 allows reading asmany tokens as possible in each state. However, thepresence of some fields and thus the number of tokensto read may depend on the value or the presence of thefield before it.[0083] The state machine 41 may be also configuredto handle any error that could happen while decoding thepacket, such as a malformed message, an unexpectedend of packet, or an unknown message.[0084] Each state machine 41 depends on the marketdata input stream as it is generated from the XML de-scription files, and can be updated at each input streamformat update. Each state machine 41 may have a hugenumber of states. For example, in options and equity mar-ket applications, where the FAST message templatesprovided by the exchanges can be quite big, the finitestate machine 41 can include more than 600 states.[0085] According to another aspect of the invention,the engine 4 may include a set of storage elements 7 forstoring data read in the input stream and later outputtingcommands containing the data, according to the infor-mation comprised in the description file 5. In particular,the storage elements 7 may comprise internal registers.The following description will be made with reference tostorage elements 7 implemented as internal registers,for illustrative purpose only.[0086] The information comprised in the input stream2 depends on the market. For binary markets, the inputstream is generally provided with only key information sothat most fields from the input stream are stored and sentin the output commands. In FAST or FIX markets, theinput streams generally comprise a lot of information. On-ly key information (i.e. determined as being of interestfor the target clients) may be stored and sent in the outputcommands.

[0087] A separate and clocked process is provided totake the signals from the state machine 41 that initiatethe output commands. The process then carries outthese commands by selecting internal registers and cop-ies them to output command ports. In particular, the FiniteState Machine 41 can generate a signal called« command sending » and a further signal called« command number ». The registers copied in the com-mand can be selected depending on the commandnumber. The description of the commands that are to besent is provided in the description file. Using a separate,clocked process adds a register stage between the statemachine 41 and the output bus, after the multiplexersthat select the registers to copy to the output bus, whicheases timings closure.[0088] The decoding device 10 according to the inven-tion may further include statistic counters generated bythe engine compiler (also referred to as statistical regis-ters). The statistic counters can be updated based on thesignals provided by the state machine 41. The statisticscounters may be incremented or set based on the signalstaken from the state machine 41. They can be read froma standard interface such as an Avalon MM interface.The information maintained in the statistic counters canbe used for debugging and detecting misconfigurations.Some registers are added automatically by the compiler(counters of decoding errors, of unexpected ends ofpackets). The other registers can be described in thedescription files 5. Counters can be added for each mes-sage type in order to be able to monitor the messagecounts for different message types. They can be used inparticular to check if test vectors contain a sufficientamount of messages of each type.[0089] Figure 5 is a flowchart illustrating the decodingmethod according to certain embodiments of the inven-tion.[0090] At step 500, in the current state of the finite statemachine, the input stream 2 in a given format is received,separated into tokens, and accumulated in the internalregisters. The tokens may comprise FAST fields if theinput stream is encoded in FAST format or in a formatderived from FAST, FIX fields if it is encoded in FIX, oralternatively bytes (binary format).[0091] In step 502, it is determined if enough tokenshave been received. In this step, the number of tokens(bytes in the case of a binary input stream or FIX fieldsin case of a FIX-based input stream or FAST fields in thecase of a FAST-based input stream) received in the pre-vious step is compared to a threshold number corre-sponding to the number of tokens (bytes/fields) that areexpected to be received in the current state of the FiniteMachine State 41. The threshold number may be fixedby the compiler at compilation time, depending on thecontent of the description file 5.[0092] If the number of tokens is sufficient, the enginethen proceeds with the other steps 503 to 505. Otherwise,the process remains in the same FSM state (506) andwaits for the next clock cycle. At the next clock cycle,

13 14

EP 2 858 323 A1

9

5

10

15

20

25

30

35

40

45

50

55

steps 500 and 502 will be reiterated to determine if moretokens can be read.[0093] At step 503, output commands are generated.[0094] Some states of the finite state machine 41 gen-erate commands on the output interface of the engine 4,depending on the XML description file. These commandsmay be composed of data read during the first step 500or during the previous cycles (previous iterations of step500 in other FSM states).[0095] Some errors due to the formatting of the inputstream may be further checked in step 504. Such errorsmay occur due to an unexpected end of a packet causedby a truncated packet or when a record is exceeded. Arecord designates a set of fields which length is given inthe input stream. When a record is exceeded, either therecord length, as provided in the input stream is wrong,or too many data have been read due to an error in thedescription file 5 which was used to generate the enginefor example.[0096] At step 505, the next state of the Finite StateMachine 41 is selected and the process jumps to thisstate. Next state of the finite state machine 41 may beselected from:

- the data read in the first step 500 or in a previouscycle (previous iterations of step 500 in other FSMstates);

- error check results obtained at step 504 (If errors aredetected, the Finite State Machine jumps to errorspecific states where it waits for the end of the packetbefore processing the next packet normally); or

- a back-pressure signal representing a signal fromthe next core in the chain, in particular the order man-agement core 12 that may request the engine to slowdown so that the engine moves to a special rest statewhere it is inactive.

[0097] It should be noted that all steps 500 to 506 areexecuted in a single clock cycle, and that at each newclock cycle all these steps are repeated.[0098] The man skilled in the art will readily understandthat some steps of figure 5 can be performed accordingto another order. For example, it is possible to checkerrors (step 504) before generating command outputs(step 503).[0099] Figure 6 is a flowchart illustrating the operationof a Finite State Machine, according to a simplified ex-ample of the invention. In the example of figure 6, thedata streams received from the market are in binary for-mat, and each packet received comprises a header fol-lowed by a unique message (among three possible mes-sages A, B and C). The header and the different messagethat can be used have the following features:

- A header comprising :

a first field of 4 bytes;a second field of 2 bytes;

a message type of 1 byte.a message A comprising :a first field A1 of 4 bytes;a second field A2 of 2 bytes.

- A Message B comprising :

a first field B1 of 3 bytes;a second field B2 of 1 byte.

- A Message C comprising :

a first field C1 of 4 bytes;andFor a number of sub-messages (correspondingto 1 byte):

a second field C2 of 1 byte;a third field C3 of 2 bytes.

[0100] Fields C2 et C3 are repeated as many times asthe number of sub-messages.[0101] In the example of figure 6, together with eachmessage A and B, a command is sent at the end of mes-sages A or B which comprises the information containedin the message as well as the header. A similar commandis sent for message C. However, the command is sentwith the sub-message in order to transmit all the infor-mation comprised in the sub-messages.[0102] The Finite Machine State 41 is configured tostore at least a part of the information obtained from theinput stream by the decoding device 10 in the internalregisters and send at least a part of this information inoutput commands.[0103] The Finite State Machine 41 generated for a 4byte input bus is illustrated in figure 6. As shown, in certainstates (2, A2, C2, C3), the maximum number of bytes (4bytes in the example) is not read:

- In state A2, this occurs because the end of the packetis reached;

- In state 2, this occurs because the decision made instate 3 is needed; It could be considered reading afourth byte in state 4. In such case, it should be de-cided which internal register to use (A1, B1, C1, etc).If the fourth byte is not stored in any register, it cannotbe used anymore.

- In states C2 and C3, this is due to the presence ofthe loop and to the fact that the number of sub-mes-sages can be equal to zero;

- In states 3 and 4, no data is read.

[0104] As will be readily understood by the skilled per-son, figure 6 is only a simplified example to facilitate theunderstanding of certain embodiments of the invention.The invention is not limited to the exemplary structure ofpackets, messages, header, fields described in relationwith figure 6. In particular, input data streams as providedby the market may comprise several message types, sev-

15 16

EP 2 858 323 A1

10

5

10

15

20

25

30

35

40

45

50

55

eral messages per packets, headers with packet sizesand message sizes that are checked at the reception ofthe packet/message, etc. Further, although in the exem-plary representation of figure 6, error and back-pressurehandling states of the finite state machine 41 have notbeen represented, the skilled person will readily under-stand that such states can be included.[0105] For market data input streams in FAST format,operators are applied to each field. Further, some fieldsof the FAST input stream may be "presence maps" fieldson which depends the number of tokens read in certainstates. In the example of figure 6, the number of tokensread in a given state is predefined: for example in stateA2, two tokens are read (= 2 bytes). In FAST, the numberof read tokens may further depends on presence maps.[0106] In addition, for market data input streams inFAST format, the operators may be applied in parallel toeach of the different read field.[0107] It should be noted that a specific encoding maybe used for decimals in FAST: when the value to be trans-mitted in the stream is optional, a unique and null tokenmay be used to indicate that the value is absent, a non-null token being then interpreted as the exponent of thedecimal and the Finite State Machine 41 then reads afurther token for the mantissa of the decimal (the valueis considered as present). This is another example wherethe number of read tokens may depend on the value ofother tokens.[0108] The decoding device according to the inventionis particularly adapted for implementation in reconfigura-ble platforms, such as FPGA.[0109] The invention also allows parallelized decodingof input streams provided in any data representation for-mat based on the code generation approach (engine 4built around the Finite State Machine 41, generated bythe compiler 3 from description files 5 and instantiatedby the decoding device 10).[0110] This allows meeting the performance require-ments of financial markets, while being easy to adapt tonew input stream formats. For each market data feed todecode, the decoders according to the embodiments ofthe invention sustain 10Gb/s data decoding with no back-pressure applied on the network.[0111] The decoding device 10 is easy to use with anormalized output bus to the user logic that includes amarket specific part depending on the feed’s character-istics. Most of the common market data fields used fortrading fit in the normalized part of the output bus, ena-bling user logic to support different market data feedswith no design changes.[0112] According to the invention, the tokens can beprocessed in parallel while the overall process of the de-coding device is sequential. By allowing the processingof several tokens in parallel, the invention improves theperformances of the decoding device 10.[0113] In a particular embodiment of the invention, thedecoding device 10 could be implemented in the form ofa plurality of decoders executed by respective FPGA log-

ic, in particular two decoders, to process in parallel theinput streams received by the market. In such embodi-ment, each decoder may comprise its own output format-ting unit and respective tokenizers 40. Such embodimentmay apply in specific situations, for example when thedecoders are connected to different 10G ports and whenthey only process 10G each and/or when decoding sev-eral market data streams from different markets in differ-ent formats, simultaneously. Further, in such embodi-ment, an arbiter device may used between the decodersand the order management core so that the order man-agement core does not receive the commands twice.

Claims

1. A decoding device (10), implemented on an integrat-ed circuit, for decoding a market data input streamreceived in a given data representation format, saiddecoding device comprising an engine (4) builtaround a finite state machine (41), the engine (4)being generated from at least one description file (5)and configured to perform the following steps, in acurrent state of the finite state machine:

i) dividing the input market data stream into anumber of tokens and reading a set of tokens,ii) accumulating said set of read tokens in stor-age elements,iii) generating output commands from the tokensaccumulated in said storage elements depend-ing on a condition related to the tokens accumu-lated in the storage elements, andiv) selecting the next state of the Finite StateMachine based on a triggering condition.

2. The decoding device of claim 1, wherein steps i, ii,iii and iv are performed in the same clock cycle if thecondition related to the tokens accumulated in thestorage elements is satisfied.

3. The decoding device of claim 1, wherein steps i, iiare performed in the same clock cycle, and if thecondition related to the tokens accumulated in thestorage elements is not satisfied in step iii, steps i toiv are iterated for the next clock cycles until the sat-isfaction of the condition related to the tokens accu-mulated in the internal registers.

4. The decoding device of any preceding claim whereinsaid condition of step iii relates to the number of to-kens accumulated in the storage elements.

5. The decoding device of any preceding claim, where-in the decoding device comprises at least one token-izer (40) for performing said step i of dividing themarket data input stream into a number of tokensand reading a set of tokens, said at least one token-

17 18

EP 2 858 323 A1

11

5

10

15

20

25

30

35

40

45

50

55

izer being controlled by the finite state machine (41).

6. The decoding device of claim 5, wherein said at leastone tokenizer (40) comprises a parser (400) for di-viding the input stream (1) into tokens depending onthe data representation format of the input stream,and a buffer (401) for buffering the tokens providedby the parser (400).

7. The decoding device of any preceding claim 5 and6, wherein said at least one tokenizer (40) comprisesa read management core (402) to read a set of to-kens obtained from said division of the input stream,and present the read tokens at its output interface.

8. The decoding device of claim 7, wherein the set oftokens that are to be read in certain states is deter-mined based on conditions on the number of tokensthat are to be read at each clock cycle.

9. The decoding device of claim 8, wherein said con-dition on the number of tokens that are to be readdepends on the value of a specific token, the FiniteState Machine (41) being configured to:

- read the value of said specific token in the cur-rent clock cycle, and- read a number of tokens among the availabletokens depending on the read value of said spe-cific token during the next clock cycles.

10. The decoding device of any preceding claim 7 to 9,wherein the tokens that have not been read from theread management core by the finite state machineare appended to new tokens and presented againon the output interface of said read managementcore (402) during subsequent clock cycle(s).

11. The decoding device of claim 10, wherein the inputstream has a binary format, and the read manage-ment core (402) is configured to allow for randombyte length reads in the input stream.

12. The decoding device of any preceding claim, where-in said triggering condition comprises at least oneamong the following conditions: a condition relatedto the result of error checks performed in the currentstate of said finite state machine to determine if theinput data stream comprises formatting errors, a con-dition depending on the data stored in storage ele-ments, and a condition depending on a back-pres-sure signal received from a next core in the process-ing thread.

13. The decoding device of any preceding claim, where-in the format of the input stream is either FIX-based,FAST-based or in a Binary format.

14. The decoding device of any preceding claim, where-in it comprises a set of conversion units (6) to furthernormalize the commands output by the engine (4).

15. A method for decoding an input market data stream,received in a given data representation format, saidmethod being implemented on an integrated circuit,wherein it comprises, for each received market datastream, providing a finite state machine generatedfrom at least one description file (5), said methodfurther comprising the following steps, in a currentstate of the finite state machine :

i) dividing the market data input stream into anumber of tokens and reading a set of tokens,ii) accumulating said set of read tokens in stor-age elements,iii) generating output commands from the tokensaccumulated in said storage elements depend-ing on a condition related to the tokens accumu-lated in the storage elements, andiv) selecting the next state of the Finite StateMachine based on a triggering condition.

19 20

EP 2 858 323 A1

12

EP 2 858 323 A1

13

EP 2 858 323 A1

14

EP 2 858 323 A1

15

EP 2 858 323 A1

16

5

10

15

20

25

30

35

40

45

50

55

EP 2 858 323 A1

17

5

10

15

20

25

30

35

40

45

50

55

EP 2 858 323 A1

18

5

10

15

20

25

30

35

40

45

50

55

(19) tzz ¥ ¥ t - patentimages.storage.googleapis.com · -figure 4 shows the tokenizer...

Documents