united states patent and trademark …...petition for inter partes review of 7,237,036 ex. 1003...

UNITED STATES PATENT AND TRADEMARK OFFICE

______________________

BEFORE THE PATENT TRIAL AND APPEAL BOARD ______________________

INTEL CORPORATION Petitioner

v.

ALACRITECH, INC. Patent Owner

________________________

Case IPR. No. Unassigned U.S. Patent No. 7,237,036

Title: FAST-PATH APPARATUS FOR RECEIVING DATA CORRESPONDING A TCP CONNECTION

________________________

Declaration of Robert Horst, Ph.D. in Support of Petition for Inter Partes Review

of U.S. Patent No. 7,237,036

INTEL Ex.1003.001

Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)

ii

TABLE OF CONTENTS Page

I. INTRODUCTION AND QUALIFICATIONS .......................................... 1

II. MATERIALS RELIED ON IN FORMING MY OPINION ..................... 3

III. UNDERSTANDING OF THE GOVERNING LAW ................................ 4

A. Invalidity by Anticipation ..................................................................... 4

B. Invalidity by Obviousness ..................................................................... 5

IV. LEVEL OF ORDINARY SKILL IN THE ART ........................................ 6

V. STATE OF THE ART AND OVERVIEW OF TECHNOLOGY AT ISSUE ................................................................................................... 7

A. Layered Network Protocols ................................................................... 8

1. OSI Layers .................................................................................. 8

2. TCP/IP Layers ............................................................................. 8

B. TCP/IP ................................................................................................. 10

1. Encapsulation ............................................................................ 11

2. Ethernet Header ......................................................................... 12

3. IP Header ................................................................................... 14

4. TCP header ................................................................................ 15

5. RFC 793 – TCP Specification................................................... 16

6. Prepending Headers .................................................................. 19

7. TCP Control Block (TCB) ........................................................ 20

8. Segmentation ............................................................................. 23

9. Advertising a Receive Window ................................................ 24

C. Protocol Offload .................................................................................. 25

1. RFC 647 – Front-Ending .......................................................... 25

2. RFC 929 – Outboard Processing .............................................. 26

3. Mediation Levels ....................................................................... 28

D. Offloaded Protocols ............................................................................. 31

1. OSI Protocol Offload ................................................................ 31

INTEL Ex.1003.002


ii

2. TCP/IP Protocol Offload ........................................................... 31

3. VMTP and XTP Protocol Offload ............................................ 31

4. Multi-Protocol Offload ............................................................. 31

E. Portions of the Protocol Offloaded ..................................................... 32

1. Checksum Offload .................................................................... 32

2. Full Offload ............................................................................... 33

3. Multi-Level Offload .................................................................. 34

4. Header Prediction ...................................................................... 34

F. Offload Implementation ...................................................................... 37

1. Multiprocessor Offload ............................................................. 37

2. Offload Adapters based on Microprocessors ............................ 39

3. Offload Adapters based on Custom Processors or Custom Logic ......................................................................................... 40

G. Protocol Offload Summary ................................................................. 43

H. Additional Background Technology ................................................... 44

1. DMA ......................................................................................... 44

2. Virtual and Physical Memory Addresses .................................. 46

VI. OVERVIEW OF 036 PATENT ............................................................... 47

VII. 036 PATENT PROSECUTION HISTORY ............................................. 50

VIII. CLAIM CONSTRUCTIONS ................................................................... 52

A. Legal Standard ..................................................................................... 52

B. “context” .............................................................................................. 52

C. “prepend” ............................................................................................. 53

IX. THE PRIOR ART ..................................................................................... 54

A. Tanenbaum96: A. Tanenbaum, Computer Networks, 3rd ed. (1996) ...................................................................................... 54

B. U.S. Patent No. 5,768,618 (“Erickson”) ............................................. 61

X. Obviousness Combinations – Motivations To Combine ......................... 73

A. Erickson in Combination with Tanenbaum96 .................................... 73

XI. GROUNDS OF INVALIDITY ................................................................ 80

INTEL Ex.1003.003


1

I, Robert Horst, hereby declare as follows:

I. INTRODUCTION AND QUALIFICATIONS

1. My name is Robert Horst. I have been retained on behalf of Petitioner

Intel Corporation (“Intel”) to provide this Declaration concerning technical subject

matter relevant to the petition for inter partes review (“Petition”) concerning U.S.

Patent No. 7,237,036 (Ex.1001, the “036 Patent”). I reserve the right to

supplement this Declaration in response to additional evidence that may come to

light.

2. I am over 18 years of age. I have personal knowledge of the facts

stated in this Declaration and could testify competently to them if asked to do so.

3. My compensation is not based on the resolution of this matter. My

findings are based on my education, experience, and background in the fields

discussed below.

4. I am an independent consultant with more than 30 years of expertise

in the design and architecture of computer systems. My current curriculum vitae is

submitted as Exhibit 1004 and some highlights follow.

5. Currently, I am an independent consultant at HT Consulting where my

work includes consulting on technology and intellectual property. I have testified

as an expert witness and consultant in patent and intellectual property litigation as

well as inter partes reviews and re-examination proceedings.

INTEL Ex.1003.004


2

6. I earned my M.S. (1978) in electrical engineering and Ph.D. (1991) in

computer science from the University of Illinois at Urbana-Champaign after

earning my B.S. (1975) in electrical engineering from Bradley University. During

my master’s program, I designed, constructed and debugged a shared memory

parallel microprocessor system. During my doctoral program, I designed and

simulated a massively parallel, multi-threaded task flow computer.

7. After receiving my bachelor’s degree and while pursuing my master’s

degree, I worked for Hewlett-Packard Co. While at Hewlett-Packard, I designed

the micro-sequencer and cache of the HP3000 Series 64 processor. From 1980 to

1999, I worked at Tandem Computers, which was acquired by Compaq Computers

in 1997. While at Tandem, I was a designer and architect of several generations of

fault-tolerant computer systems and was the principal architect of the NonStop

Cyclone superscalar processor. The system development work at Tandem also

included development of the ServerNet System Area Network and applications of

this network to fault tolerant systems and clusters of database servers.

8. Since leaving Compaq in 1999, I have worked with several

technology companies, including 3Ware, Network Appliance, Tibion, and AlterG

in the areas of network-attached storage and biomedical devices. From 2012 to

2015, I was Chief Technology Officer of Robotics at AlterG, Inc., where I worked

INTEL Ex.1003.005


3

on the design of anti-gravity treadmills and battery-powered orthotic devices to

assist those with impaired mobility.

9. In 2001, I was elected an IEEE Fellow “for contributions to the

architecture and design of fault tolerant systems and networks.” I have authored

over 30 publications, have worked with patent attorneys on numerous patent

applications, and I am a named inventor on 80 issued U.S. patents.

10. My patents include those directed to networks (e.g., U.S. Pat. No.

6,157,967: Method of data communication flow control in a data processing

system using busy/ready commands), storage (e.g., U.S. Pat. No. 6,549,977: Use of

deferred write completion interrupts to increase the performance of disk

operations), and multi-processor systems (e.g., U.S. Pat. No. 5,751,932: Fail-fast,

fail-functional, fault-tolerant multiprocessor system). My publications include a

conference paper that examined the performance and efficacy of protocol offload

engines Ex.1004.

11. My Curriculum Vitae, which is filed as a separate Exhibit (Ex.1004),

contains further details on my education, experience, publications, and other

qualifications to render this opinion as expert.

II. MATERIALS RELIED ON IN FORMING MY OPINION

12. In addition to reviewing U.S. Patent No. 7,237,036 (Ex.1001), I also

reviewed and considered the prosecution history of the 036 Patent (Ex.1002). I

INTEL Ex.1003.006


4

also reviewed U.S. Pat. No. 5,768,618, to Erickson (Ex.1005), and A. Tanenbaum,

3rd ed. (1996) (Ex.1006). I also considered the background materials cited herein.

III. UNDERSTANDING OF THE GOVERNING LAW

13. I understand that a patent claim is invalid if it is anticipated or

rendered obvious in view of the prior art. I further understand that invalidity of a

patent claim requires that the claim be anticipated or obvious from the perspective

of a person of ordinary skill in the relevant art at the time the invention was made.

A. Invalidity by Anticipation

14. I have been informed that a patent claim is invalid as anticipated

under 35 U.S.C. § 102 if each and every element of a claim, as properly construed,

is found either explicitly or inherently in a single prior art reference.

15. I have been informed that a claim is invalid under 35 U.S.C. § 102(a)

if the claimed invention was patented or published anywhere, before the applicant's

invention. I further have been informed that a claim is invalid under 35 U.S.C. §

102(b) if the invention was patented or published anywhere more than one year

prior to the first effective filing date of the patent application (critical date). I

further have been informed that a claim is invalid under 35 U.S.C. § 102(e) if an

invention described by that claim was disclosed in a U.S. patent granted on an

application for a patent by another that was filed in the U.S. before the date of

invention for such a claim.

INTEL Ex.1003.007


5

B. Invalidity by Obviousness

16. I have been informed that a patent claim is invalid as obvious under

35 U.S.C. § 103 if it would have been obvious to a person of ordinary skill in the

art, taking into account (1) the scope and content of the prior art, (2) the differences

between the prior art and the claims, (3) the level of ordinary skill in the art, and

(4) any so called “secondary considerations” of non-obviousness, which include:

(i) “long felt need” for the claimed invention, (ii) commercial success attributable

to the claimed invention, (iii) unexpected results of the claimed invention, and (iv)

“copying” of the claimed invention by others. I further understand that it is

improper to rely on hindsight in making the obviousness determination. I have

been informed that Alacritech claims a filing priority date no later than October 14,

1997 for claims 1-7 of the 036 Patent. Accordingly my analysis of the prior art for

the claims of the 036 Patent is based on the prior art and knowledge of a person

having ordinary skill in the art (“POSA”) as of October 14, 1997.

17. I have been informed that a claim can be obvious in light of a single

prior art reference or multiple prior art references. I further understand that

exemplary rationales that may support a conclusion of obviousness include:

(A) Combining prior art elements according to known methods to yield

predictable results;

INTEL Ex.1003.008


6

(B) Simple substitution of one known element for another to obtain

predictable results;

(C) Use of known technique to improve similar devices (methods, or

products) in the same way;

(D) Applying a known technique to a known device (method, or product)

ready for improvement to yield predictable results;

(E) “Obvious to try” - choosing from a finite number of identified,

predictable solutions, with a reasonable expectation of success;

(F) Known work in one field of endeavor may prompt variations of it for use

in either the same field or a different one based on design incentives or other

market forces if the variations are predictable to one of ordinary skill in the

art;

(G) Some teaching, suggestion, or motivation in the prior art that would

have led one of ordinary skill to modify the prior art reference or to combine

prior art reference teachings to arrive at the claimed invention.

IV. LEVEL OF ORDINARY SKILL IN THE ART

18. I have been informed that factors that may be considered in

determining the level of ordinary skill in the art may include: (A) “type of

problems encountered in the art;” (B) “prior art solutions to those problems;” (C)

“rapidity with which innovations are made;” (D) “sophistication of the

INTEL Ex.1003.009


7

technology;” and (E) “educational level of active workers in the field.” I also

understand that, every factor may not be present for a given case, and one or more

factors may predominate. Here, the 036 Patent is directed to an apparatus and

methods for network protocol offload. In my experience, systems such as those

capable of protocol offload are not designed by a single person but instead require

a design team with wide ranging skills and experience including computer

architecture, network design, software development and hardware development.

Moreover, the design team typically would have comprised individuals with

advanced degrees and some industry experience, or significant industry experience.

19. Accordingly, and while it would be rare to find all of these skills in a

single individual, it is my opinion that a person of ordinary skill in the art

(“POSA”) is a person with at least the equivalent of a B.S. degree in computer

science, computer engineering or electrical engineering with at least five years of

industry experience including experience in computer architecture, network design,

network protocols, software development, and hardware development.

20. The statements that I make in this declaration when I refer to a POSA

are from the perspective of October 14, 1997.

V. STATE OF THE ART AND OVERVIEW OF TECHNOLOGY AT ISSUE

21. In this section, I provide an overview of the technology at issue and

illustrate the state of the art.

INTEL Ex.1003.010


8

A. Layered Network Protocols

22. The primary goal of computer networking is to provide fast, reliable

data communications between computer systems. Interoperability has been

accomplished through adherence to standards, and performance has steadily

increased through new technology and optimizations of hardware and software.

1. OSI Layers

23. Computer networking standards provide inter-system communications

across a wide range of hardware and software implementations. The seven-layer

OSI model describes a logical layering including physical, data link, network,

transport, session, presentation and application as illustrated below.

2. TCP/IP Layers

24. The TCP/IP layering is slightly different and corresponds more

closely to the way the networking code is typically partitioned in some popular

Unix variants. TCP/IP layers include physical (e.g. 100baseT, 1000baseT), data

link1 (e.g. IEEE 802 Ethernet, ATM, Token Ring), Internet (e.g. IPv4, IPv6),

1 References on TCP/IP use different terminology to describe the layer under IP.

The data link layer is also called the “host-to-network layer” in Tanenebaum96 and

the “interface layer” in Stevens2 (see below for description of these references).

Some Alacritech patents use “data link layer,” “link layer” and “MAC layer.” Prior

INTEL Ex.1003.011


9

transport (e.g. TCP, UDP, VMTP, XTP), and Application (e.g. FTP, SMTP,

Telnet, HTTP). The following figure shows the relationship between the OSI and

TCP/IP layering.

Available at http://mitigationlog.com/how-tcpip-and-reference-osi-model-works/.2

25. At a conceptual level, each layer is responsible only for its respective

functions. This enables, for example, hiding the complexity of the physical data

art references use many of these terms and also sometimes use the name of a

specific implementation (e.g. Ethernet, ATM).

2 It appears that this diagram was made in 2012. It is being used for illustrative

purposes only.

INTEL Ex.1003.012


10

connection (that is, actually transmitting the data onto the physical wires) from

layers above the physical, data link, and network layers above. Likewise, the

lower layers must transmit the data on the physical wires, but need not worry about

what application the data belongs to or how the user data has been partitioned into

individual packets.

B. TCP/IP

26. By the mid 1990s, TCP/IP was a firmly entrenched standard and was

a widespread networking protocol to, for example, access the Internet and World

Wide Web. By that time, detailed descriptions of the protocols and open-source

implementations were widely available from books technical papers, and code

repositories. Standard reference books on TCP/IP included Stevens1 (Ex.1008),

Stevens2 (Ex.1013), and Tanenbaum96 (Ex.1006), all of which were widely cited

and relied upon.3 A series of technical memos called RFCs (request for comments)

document the progression of design concepts of the Internet. A few of the key

RFCs are quoted below to establish when certain concepts were proposed and

documented.

3 These books were well known resources to a POSA. Consistent with that,

Alacritech patents cite editions of the Tanenbaum and Stevens books.

INTEL Ex.1003.013


11

1. Encapsulation

27. Network layering corresponds to the encapsulation of higher levels by

lower levels. The following figure shows an example with application data

accompanied by an application header. The application header-data combination

becomes the application data of a TCP segment. The TCP segment containing the

application header-data combination along with the IP header forms an IP

datagram. The IP datagram along with an appropriate MAC (media access control)

layer header forms the frame that is sent over the physical interconnect. The

diagram below shows an example of such encapsulation where the MAC layer is

Ethernet. Some software implementations implement the layers separately with

data, or pointers to data, passed between the software modules for each layer. In

this case, one module creates the user data and application header, another module

then encapsulates that with a TCP header, etc. The processing occurs sequentially,

from top to bottom, as shown below.

INTEL Ex.1003.014


12

Ex.1008, Stevens1 at .034.

2. Ethernet Header

28. The 14-byte Ethernet header includes 48-bit (6 byte) source and

destination MAC (media access control) addresses for uniquely identifying the

network adapters at each end of the link.

INTEL Ex.1003.015


13


29. The MAC address can be determined by a routing table in the

protocol stack. In an Ethernet-based network, the 48-bit MAC address corresponds

to a physical interface, such as a network interface card (NIC) or WiFi modem in a

server or router. The MAC address field of the destination in the Ethernet header

determines the next hop along the route to the destination. At each router along the

path, the MAC address field is changed to the MAC address of the next router. The

final router changes the MAC address field to the MAC address of the destination.

INTEL Ex.1003.016


14

3. IP Header


30. An IP header is illustrated by the figure above from Stevens1. The IP

header includes source and destination IP addresses for identifying the end points

of the connection. The 32-bit IPv4 addresses are usually expressed in dotted

decimal notion. For example, an IP address of Google.com is 216.58.216.46.

INTEL Ex.1003.017


15

4. TCP header


31. A TCP header is illustrated by the figure above from Stevens1. The

TCP header includes 16-bit source and destination port numbers for identifying the

processes that are communicating. TCP is used to establish connections between

processes at IP addresses across the network and the TCP port numbers identify

which processes are communicating. For instance, Email may use SMTP (simple

mail transfer protocol) on port 25 (SMTP’s well-known port number) while a web

server is using HTTP on port 80 (HTTP’s well-known port number). The TCP

layer performs several important functions such as ensuring that the segments are

assembled in the proper order. As shown above, a “sequence number” is included

for several reasons such as identifying segments and performing reassembly. For

INTEL Ex.1003.018


16

more information on TCP, see Stevens1 (Ex.1008) Chapter 17, “TCP:

Transmission Control Protocol,” pp. 223-228. A sender or receiver maintains the

“sequence number” as a variable for these purposes. Accordingly, routing packets

between source and destination processes over Ethernet is based on the MAC

addresses, IP addresses and TCP ports.

5. RFC 793 – TCP Specification

32. The original TCP specification was published in RFC 793 (Ex.1007)

in September 1981. RFC 793 is a full specification for TCP and shows, among

many other things, that identifying a TCP connection by its source and destination

IP addresses and TCP ports were known more than 15 years before the earliest

priority dates of the Alacritech patents.

a) Sockets

33. The combination of an IP address and a port number is sometimes

called a “socket.” A TCP connection is formed by a pair of sockets which includes

a source IP address and TCP port number and a destination IP address and TCP

port number. IP addresses and TCP ports can be specified by the application. For

instance, a browser accessing Google.com may open a socket to IP address

216.58.216.46 and port 80:

The combination of an IP address and a port number is sometimes

called a socket. This term appeared in the original TCP specification

(RFC 793), and later it also became used as the name of the Berkeley-

INTEL Ex.1003.019


17

derived programming interface (Section 1.15). It is the socket pair (the

4-tuple consisting of the client IP address, client port number, server

IP address, and server port number) that specifies the two end points

that uniquely identifies each TCP connection in an internet.


34. Much of the software for network communications leverages standard

application programming interfaces (APIs) and libraries. An early standard is

Berkeley Sockets, also known as BSD Sockets or just “sockets.” Tanenabaum96

offers this overview of sockets:

Let us now briefly inspect another set of transport primitives, the

socket primitives used in Berkeley UNIX for TCP. They are listed in

Fig. 6-6. Roughly speaking, they follow the model of our first

example but offer more features and flexibility. We will not look at

the corresponding TPDUs here. That discussion will have to wait until

we study TCP later in this chapter.

The first four primitives in the list [SOCKET, BIND, LISTEN,

ACCEPT] are executed in that order by servers. The SOCKET

primitive creates a new end point and allocates table space for it

within the transport entity. The parameters of the call specify the

addressing format to be used, the type of service desired (e.g., reliable

byte stream), and the protocol.

Ex.1006, Tanenbaum96 at .504-.505.

Now let us look at the client side. Here, too, a socket must first be

created using the SOCKET primitive, but BIND is not required since

INTEL Ex.1003.020


18

the address used does not matter to the server. The CONNECT

primitive blocks the caller and actively starts the connection process.

When it completes (i.e., when the appropriate TPDU is received from

the server), the client process is unblocked and the connection is

established. Both sides can now use SEND and RECEIVE to transmit

and receive data over the full-duplex connection.

Id. at .505.

35. Establishing a connection over TCP is sometimes called “opening a

socket.” As described above, after the server has executed its sequence of

primitives SOCKET, BIND, LISTEN, ACCEPT, the client executes the SOCKET

and CONNECT primitives, then both sides can communicate using SEND and

RECEIVE. These primitives involve control packet transmissions (versus simply

sending a data packet that includes application data). Opening a socket

(establishing a connection) thus requires both the sender and receiver exchanging a

series of control messages, interpreting the messages, and in response to certain

control messages, responding with the appropriate message. Accordingly, it is a

more complex process to open a connection (and enter an ESTABLISHED state)

than simply one side sending a single data packet transmission. As described

below, this is why it was known in the art for the host to open the connection (the

more complex aspect of communication) and to offload only the sending and

receiving of data packets to a separate device.

INTEL Ex.1003.021


19

6. Prepending Headers

36. When a socket is opened and after the connection is established,

application data is sent and received by constructing packets that encapsulate the

data. Standard UDP/IP and TCP/IP implementations, such as BSD 4.4-Lite, copy

headers and data into linked list structures called mbufs. Stevens2 describes how

headers are prepended to the data in the mbuf chain:


INTEL Ex.1003.022


20

37. Figure 1.8 above shows an Mbuf chain for UDP, but Stevens2 later

broadens the discussion to include TCP and shows diagrams of TCP and UDP

mbuf chains.

Figure 2.2 shows an example of two packets on a queue. It is a

modification of Figure 1.8. We have placed the UDP datagram onto

the interface output queue (showing that the 14-byte Ethernet header

has-been prepended to the IP header in the first mbuf on the chain)

and have added a second packet to the queue: a TCP segment

containing 1460 bytes of user data. The TCP data is contained in a

cluster and an mbuf has been prepended to contain its Ethernet, IP,

and TCP headers.


38. Note that the outgoing frames include all three headers – MAC (e.g.

Ethernet), IP and TCP. The first hop MAC address is determined based on the

route to the destination. Once the destination MAC address is determined, it is

stored and accessed when constructing the outgoing frames. Accordingly, the

construction of packets, whether on the host or network interface card (see

offloading discussion below), requires adding the TCP, IP, and MAC headers.

7. TCP Control Block (TCB)

39. Established connections need to maintain certain state information.

For example, the state of a TCP connection is used to track acknowledgements

(ACKs) with the connection that requested the data in order to later retransmit the

INTEL Ex.1003.023


21

segment if required. The TCP state is held in a structure called the TCB (TCP or

transmission control block).

The maintenance of a TCP connection requires the remembering of

several variables. We conceive of these variables being stored in a

connection record called a Transmission Control Block or TCB.

Among the variables stored in the TCB are the local and remote

socket numbers, the security and precedence of the connection,

pointers to the user’s send and receive buffers, pointers to the

retransmit queue and to the current segment. 1. In addition several

variables relating to the send and receive sequence numbers are stored

in the TCB.

Ex.1007, RFC 793 at .024.

40. TCB’s maintain the state at each end of a TCP connection:

Protocol control blocks (PCBs) are used at the protocol layer to hold

the various pieces of information required for each UDP or TCP

socket. The Internet protocols maintain Internet protocol control

blocks and TCP control blocks.


41. RFC 2140 shows a list of the information contained in a TCB:

The TCP Control Block (TCB)

A TCB is associated with each connection, i.e., with each association of a pair of applications across the network. The TCB can be summarized as containing [9]:

Local process state

INTEL Ex.1003.024


22

pointers to send and receive buffers

pointers to retransmission queue and current segment

pointers to Internet Protocol (IP) PCB

Per-connection shared state

macro-state

connection state

timers

flags

local and remote host numbers and ports

micro-state

send and receive window state (size*, current number)

round-trip time and variance

cong. window size*

cong. window size threshold*

max windows seen*

MSS#

round-trip time and variance#

Ex.1014, RFC2140 at .002.

As part of the TCP layer, the sequence number must be kept. For example, when

sending subsequent packets, the TCP layer must increment the sequence variable,

placing this new number into the next packet. Ex.1006, Tanenbaum96 at .584. As

INTEL Ex.1003.025


23

will be shown below, either the host can perform this function or an offloading

device that offloads this function from the host to a different device.

8. Segmentation

42. TCP sends data as a sequence of segments:

The sending and receiving TCP entities exchange data in the form of

segments. A segment consists of a fixed 20-byte header (plus an

optional part) followed by zero or more data bytes. The TCP software

decides how big segments should be. It can accumulate data from

several writes into one segment or split data from one write over

multiple segments. Two limits restrict the segment size. First, each

segment, including the TCP header, must fit in the 65,535 byte IP

payload. Second, each network has a maximum transfer unit or MTU,

and each segment must fit in the MTU.

Ex.1006, Tanenbaum96 at .543.

43. The application programs are generally unaware of the way TCP data

is segmented, buffered and copied by the operating system. Application programs

send or receive a stream of bytes through the TCP connection:

A stream of 8-bit bytes is exchanged across the TCP connection

between the two applications. There are no record markers

automatically inserted by TCP. This is what we called a byte stream

service. If the application on one end writes 10 bytes, of a write of 20

bytes, followed by a write of 50 bytes, the application at the other end

of the connection cannot tell what size the individual writes were. The

INTEL Ex.1003.026


24

other end may read the 80 bytes in four reads 20 bytes at a time. One

end puts a stream of bytes into TCP and the same, identical stream of

bytes appears at the other end.


44. A large stream of data is sent as a series of segments, generally with

all but the last segment sent as an MSS (maximum segment sized) TCP segment.

A “TCP Segment” is defined as: “The unit of data exchanged between TCP

modules (including the TCP header).” Ex.1036, RFC 791 at .034. Segments and

segmentation are commonly discussed in reference to the transport layer of TCP

and ATM networks.

45. As with all of these functions of protocol procession, as will be shown

below, either the host can perform the function or an offloading device that

offloads protocol processing can perform the function.

9. Advertising a Receive Window

46. In TCP, the amount of data a sender is allowed to send is based on an

advertised window size sent from the receiver:

Window management in TCP is not directly tied to

acknowledgements as it is in most data link protocols. For example,

suppose the receiver has a 4096-byte buffer as shown in Fig. 6-29. If

the sender transmits a 2048-byte segment that correctly received, the

receiver will acknowledge the segment. However, since it now has

only 2048 of buffer space (until the application removes some data

INTEL Ex.1003.027


25

from the buffer), it will advertise a window of 2048 starting at the

next byte expected.

Now the sender transmits another 2048 bytes, which are

acknowledged, but the advertised window is 0. The sender must stop

until the application process on the receiving host has removed some

data from the buffer, at which time TCP can advertise a larger

window.

Ex.1006, Tanenbaum96 at .551-.552.

47. This effectively allows the receiver to ensure that the sender does not

overflow it with data. It “advertises” this value by including it in the TCP header.

The sender adjusts the amount of data that it sends in view of this value.

C. Protocol Offload

48. To increase performance, designers have employed different

techniques such as parallel processing, improved hardware, memory copy

reduction via hardware and/or software, and hardware to offload all or part of the

protocol stack.

1. RFC 647 – Front-Ending

49. As early as 1974, front-end protocol offload was already being

considered for standardization as described in request-for-comments RFC 647. At

that time, NCP (Network Control Protocol) was the protocol used in ARPANET,

the predecessor to the modern Internet.

“FRONT-ENDING”

INTEL Ex.1003.028


26

In what might be thought of as the greater network community, the

consensus is so broad that the front-ending is desirable that the topic

needs almost no discussion here. Basically, a small machine (a PDP-

11 is widely held to be most suitable) is interposed between the IMP

and the host in order to shield the host from the complexities of the

NCP.

Ex.1019, RFC 647 at .002.

50. RFC 647 goes on to discuss rigid and flexible front-end (FE)

alternatives and includes a high-level discussion of a protocol for interfacing

between the host and FE.

2. RFC 929 – Outboard Processing

51. In 1984, RFC 929 was distributed to begin work on a possible

standard for interfacing between a host and an OPE (Outboard Processing

Environment)4:

4 Other names have been used to describe the OPE concept. Names for protocol

offload implementations included Front-End Processor, Network Front-End,

Protocol Processor, Protocol Engine, Protocol Accelerator, Hardware Bypass,

Smart Network Interface, SMART NIC, Smart Adapter, Protocol Processing

Engine, IO Adapter, Intelligent I/O Processor and intelligent Network Interface

Card.

INTEL Ex.1003.029


27

There are two fundamental motivations for doing outboard

processing. One is to conserve the Hosts' resources (CPU cycles and

memory) in a resource sharing intercomputer network, by offloading

as much of the required networking software from the Hosts to

Outboard Processing Environments (or "Network Front-Ends") as

possible. The other is to facilitate procurement of implementations of

the various intercomputer networking protocols for the several types

of Host in play in a typical heterogeneous intercomputer network, by

employing common implementations in the OPE.

Ex.1009, RFC 929 at .002.

The interaction between the Host and the OPE must be capable of

providing a suitable interface between processes (or protocol

interpreters) in the Host and the off-loaded protocol interpreters in the

OPE. This interaction must not, however, burden the Host more

heavily than would have resulted from supporting the protocols

inboard, lest the advantage of using an OPE be overridden.

Id. at .003.

52. RFC 929 includes a “protocol parameter” for selecting the protocol to

be offloaded. TCP, UDP and IP were among the protocols to be offloaded:

INTEL Ex.1003.030


28

Id. at .013.

3. Mediation Levels

53. The 1984 proposal to standardize offload implementations in RFC

929 is evidence that there was already much activity in offload implementations at

that time. The authors of RFC 929 anticipated different types of outboard

processors and recognized that the amount of work to be done by the outboard

processor might vary from none to partial to full offload. To handle this range, a

“mediation level” parameter was proposed.

The mediation level parameter is an indication of the role the Host

wishes the OPE to play in the operation of the protocol. The extreme

ranges of this mediation would be the case where the Host wished to

remain completely uninvolved, and the case where the Host wished to

make every possible decision. The specific interpretation of this

parameter is dependent upon the particular off-loaded protocol.

The concept of mediation level can best be clarified by means of

example. A full inboard implementation of the Telnet protocol places

several responsibilities on the Host. These responsibilities include

negotiation and provision of protocol options, translation between

INTEL Ex.1003.031


29

local and network character codes and formats, and monitoring the

well-known socket for incoming connection requests. The mediation

level indicates whether these responsibilities are assigned to the Host

or to the OPE when the Telnet implementation is outboard. If no OPE

mediation is selected, the Host is involved with all negotiation of the

Telnet options, and all format conversions.

With full OPE mediation, all option negotiation and all format

conversions are performed by the OPE. An intermediate level of

mediation might have ordinary option negotiation, format conversion,

and socket monitoring done in the OPE, while options not known to

the OPE are handled by the Host.

The parameter is represented with a single ASCII digit. The value 9

represents full OPE mediation, and the value 0 represents no OPE

mediation. Other values may be defined for some protocols (e.g., the

intermediate mediation level discussed above for Telnet). The default

value for this parameter is 9.

Id. at.015-.016.

54. More than a decade passed between the publication of RFC 929 and

the priority date of the earliest Alacritech provisional application. During that

time, protocol offload was the subject of many papers and systems across the range

anticipated by RFC 929. These implementations can be categorized based on the

three principal dimensions of protocol offload: 1) The set of protocols to be

offloaded (e.g. TCP/IP, VMTP, OSI), 2) the portions of the protocol that are

INTEL Ex.1003.032


30

offloaded (e.g. full offload, partial offload, fast path offload, no offload), 3) the

offload implementation (e.g. parallel processor, standard microprocessor, custom

processor, custom hardware). The cited references below include many different

combinations of these three dimensions, but it should be noted that each cited

combination was primarily a design decision among a small, finite number of

choices. It would have been obvious to alter these implementations along one or

more of the dimensions for a new implementation that would have produced

predictable results. In other words, it was well recognized that depending on the

application, it was desirable to vary the extent of offloading. The simplest example

is that while offloading the entire protocol may seem on the surface advantageous,

it was expensive because handling every type of data packet requires a complex

offloading device. For example, it was well known that setting up a connection

and entering the ESTABLISHED state was much more complex than simply

receiving and sending data packets. Ex.1006, Tanenbaum96 at .583 (“The key to

fast TPDU processing is to separate out the normal case (one-way data transfer)

and handle it specially. Although a sequence of special TPDUs are needed to get

into the ESTABLISHED state, once there, TPDU processing is straightforward until

one side starts to close the connection.”).

INTEL Ex.1003.033


31

D. Offloaded Protocols

55. By the mid-1990s, TCP/IP was becoming a predominant network

standard, but many other networks were still in use and new network protocols

were being investigated.

1. OSI Protocol Offload

56. OSI protocol offload engines were built and tested by Thia and

Woodside. Ex.1015, Thia; Ex.1038, Woodside.

2. TCP/IP Protocol Offload

57. TCP/IP offload engines were built or described by many in the field

including Bach, Erickson, Morris, Cooper, Kung, Rütsche and Chesson. Ex.1020,

Bach; Ex.1005, Erickson; Ex.1021, Morris; Ex.1022, Cooper; Ex.1023, Kung;

Ex.1017, Rütsche92; Ex.1018, Rütsche93; Ex.1024, Chesson.

3. VMTP and XTP Protocol Offload

58. VMTP and XTP were proposed as alternatives to TCP. A VMTP

offload engine was described by Kanakia, and an XTP protocol accelerator was

described by Chesson. Ex.1025, Kanakia; Ex.1024, Chesson.

4. Multi-Protocol Offload

59. General-purpose offload engines were also proposed. Erickson

discloses a range of protocol scripts for offloading different protocols.

Each type of protocol will have its own script. Types of protocols

include, but are not limited to, TCP/IP, UDP/IP, BYNET lightweight

INTEL Ex.1003.034


32

datagrams, deliberate shared memory, active message handler, SCSI,

and [Fibre] Channel.

Ex.1005, Erickson at 5:47-51.

60. Kung and Cooper describe the Nectar network-based multicomputer

system in which the processors communicate via Communications Acceleration

Boards (CABs) that can run different protocols.

The CAB runtime system currently supports several transport

protocols with different reliability/overhead tradeoffs [10]. They

include the standard TCP/IP protocol suite besides a number of

Nectar-specific protocols.

Ex.1026, Kung and Cooper at .003.

E. Portions of the Protocol Offloaded

61. The portion of the protocol offloaded (called “mediation level” in

RFC 929) falls into several types that range from partial offload to full offload.

That is, either part of the protocol processing can be offloaded (partial offload) or

the entire protocol processing can be offload (full offload).

1. Checksum Offload

62. One of the first parts of protocol processing to be offloaded was the

checksum calculation (a partial offload). An adapter doing only checksum offload

is less complex because it does not require the adapter to maintain the connection

state.

INTEL Ex.1003.035


33

63. Dalton describes the HP Afterburner card with optional hardware for

checksum calculation:

To support the use of the on-card memory as clusters, we have written

a small number of functions. The most important is a special copy

routine, functionally equivalent to the BSD function bcopy. It is

optimized for moving data over the I/O bus, and also optionally uses

the card's built-in unit to calculate the IP checksum of the data it

moves. Another function converts a single-copy cluster into a chain of

normal clusters and mbufs; it also calculates the checksum.

Ex.1027, Dalton at .011 (emphasis added).

2. Full Offload

64. Exemplary full offload papers and systems include Murphy, Bach,

MacLean, Cooper and Rütsche.5 Ex.1028, Murphy; Ex.1020, Bach; Ex.1029,

MacLean; Ex.1022, Cooper; Ex.1017, Rütsche92; Ex.1018, Rütsche93.

5 In a “full offload,” the adapter does not typically initiate connections on its own.

The host initiates the connection by opening a socket to an IP address and TCP

port. The host establishes the connection and directs the stack of protocol layers to

create the connection. Yet those of skill in the art often still refer to such systems

as “full offload.”

INTEL Ex.1003.036


34

3. Multi-Level Offload

65. Chesson describes a protocol chip plus an optional control processor

that can do a range of offloads from partial (checksum, sequence numbers, etc.) to

full offload. Ex.1024, Chesson.

4. Header Prediction

66. In 1988, Van Jacobson proposed a header prediction algorithm for

improving the performance of TCP/IP implementations. This “header prediction”

teaching led to various types of partial offload. The code, which uses header

templates, is partitioned into one module for the commonly executed path (the fast

path) and another module to handle the more complex cases and exception

handling (the slow path).

67. Code to implement the header prediction algorithm was incorporated

in the BSD 4.4-Lite distribution.

Most IP packets carry no options. Of the 20-byte header, 14 of the

bytes will be the same for all IP packets sent by a particular TCP

connection. The IP length, ID, and checksum fields (6 bytes total) will

probably be different for each packet. Also, if a packet carries any

options, all packets for that TCP connection will be likely to carry the

same options.

The Berkeley implementation of UNIX makes some use of this

observation, associating with each connection a template of the IP and

TCP headers with a few of the fixed fields filled in. To get better

INTEL Ex.1003.037


35

performance, we designed an IP layer that created a template with all

the constant fields filled in. When TCP wished to send a packet on

that connection, it would call IP and pass it the template and the

length of the packet. Then IP would block-copy the template into the

space for the IP header, fill in the length field, fill in the unique ID

field, and calculate the IP header checksum.

This idea can also be used with TCP, as was demonstrated in an

earlier, very simple TCP implemented by some of us at MIT [6]. In

that TCP, which was designed to support remote login, the entire state

of the output side, including the unsent data, was stored as a

preformatted output packet. This reduced the cost of sending a packet

to a few lines of code.

A more sophisticated example of header prediction involves applying

the idea to the input side. In the most recent version of TCP for

Berkeley UNIX, one of us (Jacobson) and Mike Karels have added

code to precompute what values should be found in the next incoming

packet header for the connection. If the packets arrive in order, a few

simple comparisons suffice to complete header processing.

Ex.1030, Clark at .003.

68. The 1995 book (Stevens2) walks through the Jacobson BSD header

prediction code including the conditions for selecting the fast or slow path. In order

to take the fast receive path, six conditions must be met, including:

1. The connection must be established.

INTEL Ex.1003.038


36

2. The following four control flags must not be on: SYN, FIN,

RST, or URG. The ACK flag must be on.

3.-6. [Conditions to assure that the received segments are in-order]

Ex.1013, Stevens2 at .962-.963.

a) Partial Offload with Header Prediction

69. The fast and slow paths described by Stevens gave a natural division

for protocol offload implementations. Building on the Jacobson BSD header

prediction code, Biersack (Ex.1016) describes TCP protocol offload with fast and

slow paths. Thia (Ex.1015) also build upon the Jacobson BSD header prediction

algorithm and apply its teachings to derive an OSI protocol offload with the fast

path implemented in hardware.

70. The header prediction code in the FreeBSD release is also discussed

in the Alacritech 1997 Provisional application:

The base for the receive processing done by the INIC on an existing

context is the fast-path or “header prediction” code in the FreeBSD

release.

Ex.1031, Alacritech 1997 Provisional Application at .057.

71. Thus, the Jacobson header prediction code forms the basis of what

Alacritech offloads to its intelligent network interface card (INIC).

INTEL Ex.1003.039


37

F. Offload Implementation

72. Offloading the transport layer to an interface card was discussed in

Tanenbaum96:

The hardware and/or software within the transport layer that does the

work is called the transport entity. The transport entity can be in the

operating system kernel, in a separate user process, in a library

package bound into network applications, or on the network interface

card.

Ex.1006, Tanenbaum96 at .498 (emphasis added).

73. Others have disclosed more details of offload hardware including

implementations based on multiprocessors, microprocessors, custom processors

and custom logic.

1. Multiprocessor Offload

74. Several groups proposed or built systems in which protocol

processing is offloaded from the application processor to one or more dedicated

processors in a multiprocessor configuration.

75. The Nectar system:

The Nectar communication processor together with its host can be

viewed as a (heterogeneous) shared-memory multiprocessor.

Dedicating one processor of a multiprocessor host to communication

tasks can achieve some of the benefits of the Nectar approach, but this

constrains the choice of host operating system and hardware. In

INTEL Ex.1003.040


38

contrast, the Nectar communication processor has been used with a

variety of hosts and host operating systems.

Ex.1022, Cooper at .006.

76. The Parallel Protocol Engine:

In this paper our goal is to demonstrate that a careful implementation

of a standard transport protocol stack on a general purpose

multiprocessor architecture allows efficient use of the bandwidth

available in today’s high-speed networks. As an example, we chose

to implement the TCP/IP protocol suite on our 4-processor prototype

of the PPE.

Ex.1017, Rütsche92 at .009.

77. Rütsche also designed a Gb/s Multimedia Protocol Adapter based on

the PPE:

In this paper we present a new multiprocessor communication

subsystem architecture, the Multimedia Protocol Adapter (MPA),

which is based on the experience with the Parallel Protocol Engine

(PPE) [Kaiserswerth 92] and is designed to connect to a 622 Mb/s

ATM network. The MPA architecture exploits the inherent

parallelism between the transmitter and receiver parts of a protocol

and provides support for the handling of new multimedia protocols.

INTEL Ex.1003.041


39


2. Offload Adapters based on Microprocessors

78. Protocol offloading may be implemented by executing code in one or

more microprocessors on an intelligent network interface card or on a network

accelerator board used in conjunction with a standard NIC (network interface

card).

79. Kanakia describes a network adapter board with a microprocessor and

other support chips:

The prototype Network Adapter Board (NAB) has been designed

using Motorola’s MC68020 as the on-board processor, running at 16

Mhz clock rate; it uses about 200 hundred standard MSI and LSI

components. The current version is designed for connecting two VMP

multiprocessor system with a 100 megabit/sec point-to-point

connection.

Ex.1025, Kanakia at .010.

80. MacLean describes microprocessor-based protocol accelerators

residing on a VME card:

The internal functions and data flows of the protocol accelerator

shown in Figure 2. We use a dual CPU approach to protocol

processing, with one CPU subsystem dedicated to the transmission,

and the other to the reception. The transmit and receive CPUs are both

68020 (25 MHz) based, each with its own private resources: ROM,

INTEL Ex.1003.042


40

parallel I/O, interrupt circuitry and 128 kilobytes of random access

memory (RAM). In addition there is 128 kilobytes of RAM shared by

both CPUs which is also accessible to the two host busses, VME and

VSB.

Ex.1029, MacLean at .004.

81. Rütsche describes a multimedia protocol adapter (MPA) using a pair

of “transputer” microprocessors:

The selection of the inmos2 T9000 [inmos 91] is based on our good

experience with the transputer family of processors in the PPE. The

most significant improvements of the T9000 over the T425 for

protocol processing are faster programmable link interfaces, a faster

memory interface, and a cache.


3. Offload Adapters based on Custom Processors or Custom Logic

82. Other designers have proposed custom processors and/or custom logic

for protocol offload. Chesson describes a Protocol Engine chipset for real-time

protocol processing. Depending on the amount of protocol offload desired, an

adapter can be built with or without the custom control processor (CP):

The Protocol Engine® chipset offers real-time protocol processing for

high-speed networks. A wide range of cost-performance subsystem

solutions are available through various configurations based on the PE

Chipset. The chipset (shown in Figure 1) consists of four chips:

INTEL Ex.1003.043


41

MPORT, HPORT, BCTL, and CP. A basic configuration consists of

MPORT, HPORT, and BCTL.

Ex.1024, Chesson at .006.

83. The optional Chesson Control processor is a custom processor

designed for fast protocol processing:

Control Processor (CP) of the Protocol Engine® chipset is a 32-bit,

multi-thread execution unit that provides high speed protocol

processing.

Id. at .039.

84. Thia also discloses the design of a custom VLSI chip for protocol

offload:

The chip design based on bypassing is called ROPE, for Reduced

Operation Protocol Engine. The contribution of this paper is to define

the host/chip interface and the chip operation, and to report on a

VHDL-based feasibility study of the chip design. It appears to be

feasible to support an end-system single-connection data rate

approaching 1 Gbps.

Ex.1015, Thia at .002.

85. Culler describes the Berkeley Network of Workstations (NOW) in

which the Active Messages protocol is offloaded to intelligent NICs built with

Myricom LANai chips:

The hardware configuration of the Berkeley NOW system consists of

one hundred and five Sun Ultra 170 workstations, connected by a

INTEL Ex.1003.044


42

large Myricom network[Bode95], and packaged into 19-inch racks.

Each workstation contains a 167 MHz Ultra1 microprocessor with

512 KB level-2 cache, 128 MB of memory, two 2.3 GB disks,

ethernet, and a Myricom “Lanai” network interface card (NIC) on the

SBus. The NIC has a 37.5 MHz embedded processor and three DMA

engines, which compete for bandwidth to 256 KB of embedded

SRAM. The node architecture is shown in Figure 1.

Ex.1032, Culler at .001.

Id. at .003.

86. Alteon describes their third generation intelligent Ethernet adapter that

includes performance improvements from protocol offload, reduction in memory

copies and reduction of interrupts.

Using an intelligent adapter with an onboard RISC-based processor

specially designed for embedded application processing, Alteon’s

Gigabit Ethernet technology not only reduces the number of times

INTEL Ex.1003.045


43

data is copied among processing entities, it allows a single interrupt to

be issued for multiple data packets—radically altering the ratio of

interrupts to packets, and eliminating the scalability problems inherent

in older adapter designs.

Ex.1033, Alteon at .022.

87. HP discloses a custom chip called Tachyon that includes send offload,

receive offload, hardware checksum calculation, DMA, and headers/data splitting:

Ex.1034, Smith at .004.

G. Protocol Offload Summary

88. The preceding paragraphs have shown many offload implementations

foreshadowed by RFC 929 described above. These implementations include many

variations along the three dimensions of network protocol offload: 1) the set of

protocols to be offloaded, 2) the portions of the protocol that are offloaded, and 3)

the offload implementation. The citations show that each of the individual

concepts was well known and that many different combinations along the three

dimensions were successfully implemented by practitioners. It would have been

obvious to alter these implementations along one or more of the dimensions for a

new implementation that would have produced predictable results.

INTEL Ex.1003.046


44

H. Additional Background Technology

89. Protocol offload adapters have incorporated many well-known design

techniques originally developed for general purpose processors. Some of these

concepts, such as DMA and virtual memory, are briefly described below. More

information is available from textbooks on Computer Architecture. See e.g., David

A. Patterson and John L. Hennessy, Computer Architecture: A Quantitative

Approach, Morgan Kaufmann Publishers Inc., San Mateo, CA, USA., 1990.

(Ex.1035, Patterson).

1. DMA

90. DMA (Direct Memory Access) is a hardware-based technique for

transferring data between memory systems or between a host memory and an I/O

device.

Since I/O events so often involve block transfers, direct memory

access (DMA) hardware is added to many computer systems to allow

transfers of numbers of words without intervention by the CPU.

Ex.1035, Patterson at .151.

91. Before DMA was common, processors used I/O (input/output)

instructions to transfer data to I/O devices. A benefit of using DMA is that fewer

processor cycles are required to transfer the data. With DMA, the DMA engine is

loaded with an address and count of data to be moved, then the data movement

proceeds while the processor is doing other tasks. In some implementations, DMA

INTEL Ex.1003.047


45

engines are under the control of a host processor, while in others a DMA engine is

controlled by an intelligent controller on an I/O adapter. The DMA engine itself

may be located either in the host or on an I/O adapter.

92. DMA may be used either to read from host memory or to write to host

memory. In some implementations, there are separate send and receive DMA

engines and in others, a common DMA engine can be programmed to transfer to or

from host memory:

Outbound Block Mover. The outbound block mover block’s function

is to transfer outbound data from host memory to the outbound

sequence manager via DMA. It takes as input an address/length pair

from the outbound sequence manager block, initiates the Tachyon

system interface bus ownerships, and performs the most efficient

number and size of transactions on the Tachyon system interface bus

to pull in the data requested.

…

Inbound Block Mover. The inbound block mover is responsible for

DMA transfers of inbound data into buffers specified by the

multiframe sequence buffer queue, the single-frame sequence buffer

queue, the inbound message queue, or the SCSI buffer manager. The

inbound block mover accepts an address from the inbound data

manager, then accepts the subsequent data stream and places the data

into the location specified by the address.

Ex.1034, Smith at .007, .009.

INTEL Ex.1003.048


46

Movement of data across the host bus interface are minimized by

using an on-chip DMA for fast block data transfer to/from the host

system memory.

Ex.1015, Thia at .007.

Bus Controller (BC): The BC is a programmable busmaster DMA

controller. It provides a small FIFO and a table for DMA requests.

The FIFO contains a pointer to the linked list of source data and a

connection identifier. The BC determines the destination memory

address through the connection identifier in the table. The list format

is the same for the BC and the DMAU. In the transmit BC the host

writes to the FIFO and the protocol processor to the table. In the

receive BC the protocol processor writes to the FIFO and the host to

the table.

Ex.1018, Rütsche93 at .004-.005.

2. Virtual and Physical Memory Addresses

93. I/O adapters that transfer data directly to or from memory need to be

provided with the memory addresses of the buffers. Many processors use virtual

addressing in which large buffers appear to the processor as single contiguous

memory space even though the addressed pages may not be contiguous in physical

memory. To translate from virtual to physical memory addresses, the processor

uses page tables that store the appropriate mappings from virtual to physical pages.

With virtual memory, the CPU produces virtual addresses that are

translated by a combination of hardware and software to physical

INTEL Ex.1003.049


47

addresses, which can be used to access main memory. This process is

called memory mapping or address translation.

Ex.1035, Patterson at .050 (emphasis in original).

94. In order for an I/O device to access the main memory buffers, either

the physical address may be supplied for each page, or a translation table may be

maintained on the I/O controller to allow it to operate on virtual addresses.

Erickson has a “physical address buffer map” in the adapter memory and discusses

some options for handling the translation:

The vtophys( ) function performs a translation of the user-provided

virtual address into a physical address usable by the adapter. In all

likelihood, the adapter would have a very limited knowledge of the

user process’ virtual address space, probably only knowing how to

map virtual-to-physical for a very limited range, maybe as small as a

single page. Pages in the user process’ virtual address space for such

buffers would need to be fixed. The udpscript procedure would need

to be enhanced if the user data were allowed to span page boundaries.


VI. OVERVIEW OF 036 PATENT

95. The 036 Patent relates to offloading TCP protocol processing from a

host onto a network interface card (NIC). Ex.1001, 036 Patent at Abstract. See

Section V.C.-G. above for a description of prior art offloading. The specification

INTEL Ex.1003.050


48

of the 036 Patent refers to the disclosed NIC, which performs offloading, as an

“intelligent network interface card (INIC)”. See id. at Abstract.

96. The INIC of the 036 Patent permits two modes of operation: a “fast

path” in which protocol processing from the physical layer through the TCP layer

is performed on the INIC, and a “slow path” in which network frames are handed

to the host at the MAC layer and passed up through the host protocol stack

conventionally. The concept is illustrated in Fig. 24, shown below:

The answer shown in FIG.24 is to use two modes of operation: One

in which the network frames are processed on the INIC through TCP

and one in which the card operates like a typical dumb NIC. We call

these two modes fast-path, and slow-path. In the slow-path case,

network frames are handed to the system at the MAC layer and passed

up through the host protocol stack like any other network frame. In

INTEL Ex.1003.051


49

the fast path case, network data is given to the host after the headers

have been processed and stripped.

The transmit case works in much the same fashion. In slow-path mode

the packets are given to the INIC with all of the headers attached. The

INIC simply sends these packets out as if it were a dumb NIC. In fast-

path mode, the host gives raw data to the INIC which it must carve

into MSS sized segments, add headers to the data, perform checksums

on the segment, and then send it out on the wire.

Ex.1001, 036 Patent at 39:10-27, Fig. 24.

97. The INIC uses a “connection context” to determine which “path”

should be used for a received packet:

The IP source address of the IP header, the IP destination address of

the IP header, the TCP source address of the TCP header, and the TCP

destination address of the TCP header together uniquely define a

single connection context (TCB) with which the packet is associated.

Processor 470 examines these addresses of the TCP and IP headers

and determines the connection context of the packet. Processor 470

then checks a list of connection contexts that are under the control of

INIC card 200 and determines whether the packet is associated with a

connection context (TCB) under the control of INIC card 200.

If the connection context is not in the list, then the “fast-path

candidate” packet is determined not to be a “fast-path packet.” In such

a case, the entire packet (headers 20 and data) is transferred to a buffer

in host 20 for “slow-path” processing by the protocol stack of host 20.

INTEL Ex.1003.052


50

Ex.1001, 036 Patent at 31:7-22 (emphasis added).

98. The “context” for each connection “summarize[es] various features of

the connection.” Id. at 7:62-66, 8:2-15, 10:19-22. The host may create the context

by processing an initial request packet, e.g., as part of opening a connection. Id. at

10:19-22.

VII. 036 PATENT PROSECUTION HISTORY

99. I have reviewed the prosecution history of the 036 Patent. I present a

brief summary of the prosecution with respect to claims 1-7 (which correspond to

claims 1-7 in the file history).

100. On November 16, 2005, claims 1-7 were rejected as being anticipated

by U.S. Pat. No. 6,122,670 (“Bennett”). Ex.1002, 036 File History at .259-.263.

On March 31, 2006, Applicant amended claim 1 to include the “context”

limitation. Id. at .273. Applicant also made amendments to claims 3-4 and 6-7.

Id. at .273-.279. Applicant then attempted to distinguish Bennett, arguing that it

does not disclose the context being employed to transfer data, updating state

information, or that the context is updated. Id. at .280.

101. On July 5, 2006, the Examiner rejected claim 1 as being obvious over

Bennett in view of U.S. Pat. No. 6,195,739 (“Wright”). See id. at .289-.293. The

Examiner stated that Bennett did not disclose “running instructions to process a

message packet such that the context is employed to transfer data contained in said

INTEL Ex.1003.053


51

packet to the first apparatus memory and the state information is updated by said

second processor,” but stated that a different reference, Wright, did. See id. at

.308-.309.

102. On October 10, 2006, Applicant amended claim 1 to specify that the

updated state information is “TCP state information.” Id. at .302-.303. Applicant

then argued that Bennett was not enabled, id. at .309-.310, and that Wright was

filed after the effective filing date of the 036 Patent (alleging that the effective

filing date of the 036 Patent is October 14, 1997), id. at .310. Next, Applicant

argued that Wright’s disclosures relate to operation underneath the TCP layer, and

thus it does not disclose “the TCP state information is updated” or “that the context

is employed to transfer data contained in said packet to the first apparatus

memory.” Id. at .311. Finally, Applicant argued that one of ordinary skill in the

art would not have combined Bennett and Wright, and that even if they were

combined, there would still be nonobvious differences over the combination. Id. at

.311-.312.

103. On February 7, 2007, the Examiner issued a notice of allowance, but

it is not clear from this notice what the Examiner’s basis for the allowance was.

INTEL Ex.1003.054


52

VIII. CLAIM CONSTRUCTIONS

A. Legal Standard

104. I understand that in deciding whether to institute inter partes review,

“[a] claim in an unexpired patent shall be given its broadest reasonable

construction in light of the specification of the patent in which it appears.” 37

C.F.R. § 42.100(b). I further understand that “the broader standard serves to

identify ambiguities in the claims that can then be clarified through claim

amendments.” Final Rule, 77 Fed. Reg. 48680, 48699 (Aug. 14, 2012).

105. In forming my opinions as set forth in this declaration, I have

accorded all claim terms in claims 1-7 in the 036 Patent their broadest reasonable

interpretation, as would be understood by a person of ordinary skill in the art at the

time of the alleged invention of the alleged invention of the 036 Patent.

106. I was also asked to provide my opinion on how a POSA would have

understood the terms “context” and “prepend” under the broadest reasonable

interpretation standard.

B. “context”

107. The term “context” appears in claim 1 of the 036 Patent. I understand that in

the copending district court litigation, Alacritech takes the position that

“context” means “data regarding an active connection,” while Petitioner has

taken the position that it is indefinite. For my analysis in this Declaration, I

have been asked to use Alacritech’s construction.

INTEL Ex.1003.055


53

C. “prepend”

108. The term “prepend” appears in claim 4 of the 036 Patent. Under the

broadest reasonable construction standard, this term in light of the specification

would have been understood by a POSA to mean “adds to the front.” The

specification defines “prepends” in this manner: “Once the packet control

sequencer 176 detects that all of the packet has been processed by the fly-by

sequencer 178, the packet control sequencer 176 … prepends (adds to the front)

that status information to the packet …” Id. at 14:5-12. This is consistent with

how a POSA would have understood “prepend” in the context of the 036 Patent.

That is, the claimed header is “prepended,” or added to the front, of the data

portion of the packet.

INTEL Ex.1003.056


54

IX. THE PRIOR ART

A. Tanenbaum96: A. Tanenbaum, Computer Networks, 3rd ed. (1996)6

109. Tanenbaum96, “Computer Networks,” is a 700+ page text book

covering network hardware, software, protocols and standards. It is a third edition

of the 1981 Tanenbaum book. The 1996 edition is cited and incorporated by

reference in the 036 Patent.

110. Tanenbaum96 describes both TCP and UDP protocols. Note that

UDP, unlike TCP, is connectionless and thus does require setting up a connection:

The Internet has two main protocols in the transport layer, a

connection oriented protocol and a connectionless one. In the

following sections we will study both of them. The connection-

oriented protocol is TCP. The connectionless protocol is UDP.

Ex.1006, Tandenbaum96 at .539.

6 Tanenbaum96 was a well-known resource to a POSA. I understand that it is prior

art because it was published before October 14, 1997, the date to which Alacritech

claims priority. See Ex. 1006, Tanenbaum96.

INTEL Ex.1003.057


55

Id. at .055, Fig 1-19.

111. Tanenbaum96 recognizes that an “obstacle to fast networking is

protocol software,” and teaches “fast path” processing for TCP as a solution.

Ex.1006 at .583-585. This “fast path” solution is based off “header prediction.”

See Section V.E.4 above for a description of the development of “header

prediction” and the state of art with respect to “header prediction.”

112. Tanenbaum96 teaches fast path transmissions using a prototype

header stored in the transport entity, because in the normal case of an established

TCP connection, only a few fields of the header change in consecutive packets.

Compare Section V.B.5.a. (describing complexity of opening a connection, i.e., a

“socket”). In other words, the transport entity only needs to change a few fields to

send subsequent packets:

The first thing the transport entity does is make a test to see if this is

the normal case: the state is ESTABLISHED, neither side is trying to

close the connection, a regular (i.e., not an out-of-band) full TPDU

[Transport Protocol Data Unit, i.e. packet] is being sent, and there is

INTEL Ex.1003.058


56

enough window space available at the receiver. If all conditions are

met, no further tests are needed and the fast path through the sending

transport entity can be taken.

In the normal case, the headers of consecutive data TPDUs are almost

the same. To take advantage of this fact, a prototype header is stored

within the transport entity. At the start of the fast path, it is copied as

fast as possible to a scratch buffer, word by word. Those fields that

change from TPDU to TPDU are then overwritten in the buffer.

Id. at .583 (emphasis added).

113. Tanenbaum96 teaches that the transport entity can be implemented by

the host operating system, or can be offloaded to the NIC (e.g., as a processor on

the NIC):

The hardware and/or software within the transport layer that does the

work is called the transport entity. The transport entity can be in the

operating system kernel, in a separate user process, in a library

package bound into network applications, or on the network interface

card.

Id. at .498 (underlining added, bold in original).

114. Tanenbaum96 discloses that the TCP transport entity divides data

streams into TCP segments for subsequent transmission (i.e., to make the data the

correct size for the data payload part of the packet). See Section V.B.8.

INTEL Ex.1003.059


57

(segmentation description). The receiving TCP transport entity reconstructs the

byte stream from the received TCP segments.

Each machine supporting TCP has a TCP transport entity, either a

user process or part of the kernel that manages TCP streams and

interfaces to the IP layer. A TCP entity accepts user data streams from

local processes, breaks them up into pieces not exceeding 64K bytes

(in practice, usually about 1500 bytes), and sends each piece as a

separate IP datagram. When IP datagrams containing TCP data arrive

at a machine, they are given to the TCP entity, which reconstructs the

original byte streams.

Id. at .540.

115. Tanenbaum96 goes on to describe a TCP prototype header (i.e., a

header template that is used to create additional headers for sending packets) and

offloading protocol processing by the transport entity in detail:

INTEL Ex.1003.060


58


116. Tanenbaum96 also teaches TCP fast path receiving by looking up a

TCP connection record based on the IP source address, TCP source port, IP

destination address and TCP destination address, checking to see if it the packet is

a normal one in the ESTABLISHED state, and then putting the data into user

memory. In other words, Tanenbaum96 is teaching that the transport entity

performs this check to determine whether the packet is suitable for fast path

processing. See Section V.E.4. (header prediction offload). Note that there may be

multiple connections on a single computer, and thus when a packet comes in, it

must be checked against the connection records that may represent multiple

connections:

INTEL Ex.1003.061


59

Now let us look at fast path processing on the receiving side…. For

TCP, the connection record can be stored in a hash table for which

some simple function of the two IP addresses and two ports is the key.

Once the connection record has been located, both addresses and both

ports must be compared to verify that the correct record has been

found….

[T]he TPDU [Transport Protocol Data Unit, i.e. packet] is then

checked to see if it is a normal one: the state is ESTABLISHED,

neither side is trying to close the connection, the TPDU is a full one,

no special flags are set, and the sequence number is the one expected.

These tests take just a handful of instructions. If all conditions are

met, a special fast path TCP procedure is called.

The fast path updates the connection record and copies the data to the

user. While it is copying, it also computes the checksum, eliminating

an extra pass over the data. If the checksum is correct, the connection

record is updated and an acknowledgement is sent back. The general

scheme of first making a quick check to see if the header is what is

expected, and having a special procedure to handle that case, is called

header prediction. Many TCP implementations use it.

Ex.1006, Tanenbaum96 at .584-.585 (underlining added, bold in original).

117. The “connection record” disclosed in Tanenbaum96 is used to

maintain TCP state:

When an application on the client machine issues a CONNECT

request, the local TCP entity creates a connection record, marks it as

INTEL Ex.1003.062


60

being in the SYN SENT state, and sends a SYN segment. Note that

many connections may be open (or being opened) at the same time on

behalf of multiple applications, so the state is per connection and

recorded in the connection record.


118. The “connection record” is the same as the “Transmission Control

Block (TCB)” described in RFC 793, the TCP protocol specification:

Before we can discuss very much about the operation of the TCP we

need to introduce some detailed terminology. The maintenance of a

TCP connection requires the remembering of several variables. We

conceive of these variables being stored in a connection record called

a Transmission Control Block or TCB.

Ex.1007, RFC 793 at .024 (emphasis added).

119. I describe a TCB and RFC 793 in Section V.B.5.

120. Tanenbaum96 teaches that “[f]or TCP, the connection record can be

stored in a hash table for which some simple function of the two IP addresses and

two ports is the key.” Ex.1006, Tanenbaum96 at .585. Tanenbaum96 thus teaches

the “connection context” as described in the 036 Patent:

IP source address of the IP header, the IP destination address of the IP

header, the TCP source address of the TCP header, and the TCP

destination address of the TCP header [that] together uniquely define

a single connection context (TCB) with which the packet is

associated.

INTEL Ex.1003.063


61

Ex.1001, 036 Patent 31:7-12.

121. Again, there may be multiple connections, and Tanenbaum96 is

teaching a technique to quickly lookup the connection record that corresponds to

the received packet.

122. Like the context of claim 1 of the 036 Patent, the connection record of

Tanenbaum96 is used to transfer data to host memory. Ex.1006, Tanenbaum96 at

.584-.585. Note that Tanenbaum96’s transport entity, when on the NIC,

corresponds to the I/O adapter device of Erickson (see below).

B. U.S. Patent No. 5,768,618 (“Erickson”)7

123. Erickson discloses IO Adapter 314 for protocol offload of fast and

slow applications as shown in Fig. 3:

7 I understand that Erickson is prior art to the 036 Patent because it was filed years

before October 14, 1997, the date to which Alacritech claims priority. See Ex.

1005.

INTEL Ex.1003.064


62

Ex.1005, Erickson at Fig. 3 (annotated).

FIG. 3 is a flow diagram describing the system data flow of fast and

slow applications 302, 304, and 306 compatible with the present

invention. A traditional slow application 306 uses normal streams

processing 308 to send information to a pass-through driver 310. The

pass-through driver 310 initializes the physical hardware registers 320

of the I/O device adapter 314 to subsequently transfer the information

through the I/O device adapter 314 to the commodity interface 322.

With the present invention, fast user applications 302 and 304 directly

use a setup driver 312 to initialize the physical hardware registers 320,

then send the information directly through the I/O device adapter 314

to the commodity interface 322 via virtual hardware 316 and 318.

Thus, the overhead of the normal streams processing 308 and pass-

through driver 310 are eliminated with the use of the virtual hardware

316 and 318 of the present invention, and fast applications 302 and

INTEL Ex.1003.065


63

304 are able to send and receive information more quickly than slow

application 306.

Ex.1005, Erickson at 4:53-5:3 (emphasis added).

124. The IO adapter runs scripts that offload protocol processing. As it is

running scripts (program code), the I/O adapter includes a processor. The adapter

accesses application data (to transmit over the network) via programmed I/O or

DMA. Control information (to direct the communication) is communicated by

snooping the host memory bus. Specifically, user processes that wish to

communicate over the network open a device driver, and specify the details of the

desired communication mode. The device driver sets up a protocol script and

protocol specific endpoint data for the connection:

Each user process that has access to the virtual hardware is typically

assigned a page-sized area of physical memory on the I/O device

adapter, which is then mapped into the virtual address space of the

user process. The I/O device adapter typically is implemented with

snooping logic to detect accesses within the page-sized range of

memory on the I/O device adapter. If the I/O device adapter detects

access to the physical memory page, a predefined script is then

executed by the I/O device adapter in order to direct the data as

appropriate.

Id. at 5:31-40.

INTEL Ex.1003.066


64

Typically, when a user process opens a device driver, the process

specifies its type, which may include, but is not limited to, a UDP

datagram, source port number, or register address. The user process

also specifies either a synchronous or asynchronous connection. The

device driver sets up the registers 508 and 504, endpoint table 514,

and endpoint protocol data 518. The protocol script 516 is typically

based upon the endpoint data type, and the endpoint protocol data 518

depends on protocol specific data.

Id. at 6:1-9.

Instead, the adapter would most likely retrieve the needed user data

from the user process’ virtual address space using direct memory'

access (DMA) into the main memory over the bus and retrieving the

user data into some portion of the adapter’s memory, where it could

be referenced more efficiently. The programming steps performed in

the udpscript( ) procedure above might need to be changed to reflect

that.

Id. at 8:30-37.

125. The endpoint data is stored on the I/O device adapter. The adapter

uses the endpoint data to move data from the adapter to user memory, i.e., when

receiving data packets and performing fast path processing on I/O device adapter:

The I/O device adapter implementation includes a software register

508 and a physical address buffer map 510 in the adapter's memory

512. An endpoint table 514 in the memory 512 is used to organize

multiple memory pages for individual user processes. Each entry

INTEL Ex.1003.067


65

within the endpoint table 514 points to various protocol data 518 in

the memory 512 in order to accommodate multiple communication

protocols, as well as previously defined protocol scripts 516 in the

memory 512, which indicate how data or information is to be

transferred from the memory 512 of the I/O device adapter to the

portions of main memory 502 associated with a user process.

Id. at 5:56-67.

126. Erickson discloses that scripts may be written for a variety of

protocols including TCP/IP:

Each type of protocol will have its own script. Types of protocols

include, but are not limited to, TCP/IP, UDP/TP, BYNET lightweight

datagrams, deliberate shared memory, active message handler. SCSI,

and [Fibre] Channel.

Id. at 5:47-51.

127. Erickson discloses sample user code (running on the host) for

triggering the UDP fast path offload, and also discloses a script that runs in the

adapter:

INTEL Ex.1003.068


66

Id. at 7:19-33.

Id. at 7:50-63.

The script that executes the above function provides the

USERDATA_ ADDRESS and USERDATA_LENGTH which the

user process programmed into the adapter's memory. This information

quite likely varies from datagram 602 to datagram 602. The script is

also passed the appropriate datagram 702 template based on the

specific software register (508 in FIG. 5 or 316 in FIG. 3). There are

INTEL Ex.1003.069


67

different scripts for different types of datagrams 702 (e.g., UDP or

TCP).

Id. at 7:65- 8:6.

128. Before the senduserdatagram is executed in the host, and the udpscript

is run in the adapter, protocol header information (the template) is transferred to

the interface device (IO device adapter) (see also “pre-negotiated” discussion

below). The header information includes information including initial value of

checksums and Datagram ID, the IP Addresses, and MAC addresses:

A user process typically causes a script to execute by using four

virtual registers, which include STARTINGADDRESS, LENGTH.

GO. and STATUS. The user process preferably first writes

information into memory at the locations specified by the values in

the STARTTNGADDRESS and LENGTH virtual registers. Next, the

process then accesses the GO virtual register to commence execution

of the script. Finally, the user process accesses or polls the STATUS

virtual register to determine information about the operation or

completion of this I/O request.

Id. at 6:12-21.

Within the udpscript procedure described above, the

nextid() function provides a monotonically increasing 16-

bit counter required by the IP protocol.

Id. at 8:10-12.

INTEL Ex.1003.070


68

Id. at Fig. 7.

129. The dotted fields in Fig. 7 are those that may change during the

transfer. Each new packet may change the remaining Total Length, the Datagram

ID of the next packet, the IP checksum, the UDP length and the UDP Checksum.

The data follows the completed header as shown in Figure 6.

INTEL Ex.1003.071


69

130. Erickson discloses that many fields of the header are “pre-negotiated,”

including the source and target IP addresses and source and target MAC (Ethernet)

addresses. The pre-negotiated fields are provided by the host to the I/O adapter8:

In this example, the user process and the device driver has pre-

negotiated the following fields from FIG. 6: (1) Ethernet Header 604

(Target Ethernet Address, Source Ethernet Address, and Protocol

Type); (2) IP Header 606 (Version, IP header Length, Service Type,

Flag, Fragment Offset, Time_to_Live, IP Protocol, IP Address of

Source, and IP Address of Destination); and (3) UDP Header 608

(Source Port and Destination Port). Only the shaded fields in FIG. 6,

and the user data 610, need to be changed on a per-datagram basis.

Ex.1005, Erickson at 6:63-7:4.

131. Specifically, Erickson discloses an exemplary pre-negotiation of

transport-layer UDP/IP/MAC protocol information:

Each user process has basically pre-negotiated almost everything

about the datagram 602, except the actual user data 610. This means

most of the fields in the three header areas 604, 606, and 608 are

predetermined.

In this example, the user process and the device driver has pre-

negotiated the following fields from FIG. 6: (1) Ethernet Header 604

(Target Ethernet Address, Source Ethernet Address, and Protocol

8 See Section V.B.2. for Ethernet (MAC layer) description.

INTEL Ex.1003.072


70

Type); (2) IP Header 606 (Version, IP header Length, Service Type,

Flag, Fragment Offset, Time_to_Live, IP Protocol, IP Address of

Source, and IP Address of Destination); and (3) UDP Header 608

(Source Port and Destination Port). Only the shaded fields in FIG. 6,

and the user data 610, need to be changed on a per-datagram basis.

Id. at 6:57-7:4, see also Figs. 6 and 7.

132. Erickson discloses that after the pre-negotiation, the I/O device

adapter runs protocol scripts to process outgoing and incoming data packets,

thereby offloading the protocol processing onto the I/O device adapter. Id. at 4:18-

23. The scripts are used to locate an application endpoint and to generate packet

headers from a pre-negotiated template header:

Protocol scripts typically serve two functions. The first function is to

describe the protocol the software application is using. This includes

but is not limited to how to locate an application endpoint, and how to

fill in a protocol header template from the application specific data

buffer. The second function is to define a particular set of instructions

to be performed based upon the protocol type. Each type of protocol

will have its own script. Types of protocols include, but are not

limited to, TCP/IP, UDP/IP, BYNET lightweight datagrams,

deliberate shared memory, active message handler, SCSI, and File

Channel.

Id. at 5:41-51 (emphasis added).

INTEL Ex.1003.073


71

133. Here, the user process identifies a block of raw data to be transmitted

and “spanks” (i.e. sets to 1) a GO register to trigger the adapter to take the raw

data, encapsulate it into a packet with UDP, IP and MAC headers, and transmit it.

See id. at 7:39-47.

134. In other words, Erickson’s network interface device creates headers

for packets to be transmitted using the pre-negotiated UDP, IP and MAC header

information. A user program (senduserdatagram at 7:22) identifies raw data in

host memory to be transmitted (by providing a USERDATA_ADDRESS and

USERDATA_LENGTH) and then triggers the network interface device (by

“spanking” the GO register) as shown at id. at 7:18-33. In response, the network

interface device executes a UDP protocol script (udpscript at 7:51) that creates

headers from the pre-negotiated context by populating UDP/IP/MAC datagram

template headers as shown in Fig. 7 with appropriate values for IP Length, IP

Datagram ID, IP Checksum, UDP Length and UDP Checksum. The network

interface device then encapsulates the data with the headers, and sends the

completed packet. Id. at 7:39-64.

135. Erickson discloses fast-path receiving of data packets by directly

writing the data to the host memory space corresponding to the user process (i.e.,

the fast application), bypassing the protocol stack on the host. Id. at 5:53-67. The

INTEL Ex.1003.074


72

transfer of data directly to the host memory (and from the host memory to the I/O

device adapter) occurs via a Direct Application Interface (DAI):

FIG. 4 is a block diagram describing a direct application interface

(DAI) and routing of data between processes and an external data

connection which is compatible with the present invention. Processes

402 and 404 transmit and receive information directly to and from an

interconnect 410 (e.g., I/O device adapter) through the DAI interface

408. The information coming from the interconnect 410 is routed

directly to a process 402 or 404 by use of virtual hardware and

registers, rather than using a traditional operating system interface

406.

Id. at 5:5-5:14.

INTEL Ex.1003.075


73

136. Erickson refers to a variety of scripts including TCP, but does not

include a sample TCP script. It would have been within the skills of a POSA to

adapt the UDP script for TCP. That adaptation would have been obvious based on

a POSA's knowledge of common implementations of TCP/IP, or based on common

reference texts on TCP/IP such as Tanenbaum96. See Section X (motivations to

combine).

137. Note that the scripts as disclosed in Erickson are simplified and do not

spell out all of the details provided by conventional UDP implementations,

including IP fragmentation for frame lengths exceeding the maximum Ethernet

frame length. A POSA would have understood the standard functionality of UDP

would be included in the adapter script and it within the ordinary level of

knowledge to a POSA well before October 1997. A POSA would also understand

that analogous code for segmentation would also be required for TCP. Such code

would be within the skills of a POSA (and part of the ordinary knowledge of a

POSA). See Section X (motivations to combine).

X. OBVIOUSNESS COMBINATIONS – MOTIVATIONS TO COMBINE

A. Erickson in Combination with Tanenbaum96

138. Erickson incorporates Tanenbaum81 by reference:

A discussion of the form and structure of TCP sockets and packets,

which are well-known within the art, may be found in many

references, including Computer Networks by Andrew S. Tanenbaum,

INTEL Ex.1003.076


74

Prentice-Hall, New Jersey, 1981, pp. 326-327, 373-377, which is

herein incorporated by reference.

Id. at 4:38-43.

139. The third edition of the 1981 Tanenbaum book was published in

March of 1996, more than one year before the claimed priority date of the 036

Patent. A POSA implementing a TCP script as suggested by Erickson would have

naturally turned to the most recent edition of the Tanenbaum book, Tanenbaum96,

for more details about TCP.

140. In 1996, the Internet and World Wide Web, using TCP/IP, was

growing extremely popular. See generally Section V.A.-B. Erickson expressly

references TCP/IP scripts. Ex.1005, Erickson at 5:41-51. Given this, a POSA at

this time would have been motivated to implement the TCP/IP fast path protocol

processing described by Erickson, using Erickson’s Ethernet I/O device adapter. A

POSA would have further been motivated to consult a reference book on TCP/IP,

such as Tanenbaum96, to do so. At the time, there were a finite number of

networking protocols, particularly that were as popular as TCP/IP, and thus it

would have further been obvious to try to implement TCP/IP using Erickson’s I/O

adapter. See generally Section V.A.-B.

141. As I have described in Section V.A.2. and V.B., a POSA would have

understood TCP/IP well and standards for TCP/IP are set forth in well-known

INTEL Ex.1003.077


75

Request for Comments (RFCs). Accordingly, a POSA would have had a high

expectation of success in implementing TCP/IP on Erickson’s I/O device adapter.

Specifically, the “prototype headers” in Tanenbaum96 are the TCP/IP equivalent

of the UDP/IP header shown in Fig. 7 of Erickson. Ex.1006, Tanenbaum96 at

.584; Ex.1005, Erickson at Fig. 7. The unshaded fields in Tanenbaum96 Fig. 6-50

are those that may change during the TCP/IP transfer, and the dotted fields in

Erickson Fig. 7 are those that may change during the UDP/IP transfer. Id. A

POSA, when adapting Erickson’s UDP script to TCP, would understand that rather

than filling in the UDP Length and Checksum shown in Erickson (for UDP), the

script needs to fill in the TCP Sequence number and Checksum (for TCP). Id. For

a multi-segment TCP send, the initial sequence number is determined by the host

stack, and the sequence number is adjusted by the adapter each time it sends a

packet. See generally Section V.B.4. (discussing TCP sequence numbers). As

noted above, the scripts as disclosed in Erickson are simplified and do not spell out

all of the details provided by conventional UDP implementations, including IP

fragmentation for frame lengths exceeding the maximum Ethernet frame length. A

POSA would understand that the standard functionality of UDP would be included

in the adapter script. A POSA would also understand that analogous code for

segmentation would also be required for TCP. Such code would be within the

skills of a POSA. See Section V.A.-B.

INTEL Ex.1003.078


76

142. Note that both Erickson and Tanenbaum96 disclose an IP prototype

header. Each new packet changes the Identification (the Datagram Id in Erickson

Fig. 7) and Header checksum (IP header checksum in Erickson Fig. 7) in the same

way. Ex.1006, Tanenbaum96 at .584; Ex.1005, Erickson at Fig. 7. This further

illustrates the similarity between the approaches and the easy adaption of Erickson

to using TCP/IP.

143. Tanenbaum96’s teachings of connection records corresponds to, for

example, Erickson’s endpoint information, protocol scripts, and pre-negotiated

protocol information. The records and pre-negotiated information includes

information about the connection (e.g., sender and receiver address) and how to

transfer data to the host information for received packets. See above at ¶¶116-20

(Tanenbaum96); ¶¶124-25, 128-30, 132 (Erickson). Accordingly, it would have

been routine to adapt Erickson using Tanenbaum96’s TCP/IP teachings that use

connection records.

144. Similarly, Tanenbaum96’s teachings of fast path TCP processing

using a prototype header and header prediction correspond to, and could be used to

modify Erickson’s endpoint information, pre-negotiated protocol information,

template header and UDP script to perform TCP protocol processing. Both

Tanenbaum96 and Erickson have a slow and fast path. See above at ¶¶111-12

(Tanenbaum96); ¶123 (Erickson). Both use prototype headers. See above at

INTEL Ex.1003.079


77

¶¶112, 115 (Tanenbaum96); ¶¶128-29 (Erickson). Both include a transport entity

or an I/O adapter to perform the offloaded protocol processing. See above at

¶¶113-14 (Tanenbaum96); ¶124 (Erickson). Accordingly, it would have been

routine to adapt Erickson using Tanenbaum96’s TCP/IP teachings of a prototype

header and header prediction. Moreover, these techniques were well known at this

time. See Section V.C.-G. An exemplary TCP script for Erickson in view of

Tannenbaum96’s transport entity and fast path teachings is as follows. The TCP

script may transfer an entire block (via DMA) to the adapter memory in one large

transfer. The script, knowing the maximum segment size (MSS), sends one MSS

sized block of data at a time. The I/O adapter updates the TCP sequence number in

the connection record on the network device for each segment and any other state

information. This requires only one “spank” of the GO register for a multi-

segment send. The adapter would then repeatedly extract one segment of data at a

time from the transferred block, encapsulate it in a packet, and transmit. The

segmentation code is within the skills of a POSA in light of the disclosures by

Tanenbaum96.

145. Given that Erickson does not detail a bypass test for selecting fast or

slow path, a POSA would be motivated to consider Tanenbaum96’s teaching,

which were well known and proven, see Section V.E.4. (“header prediction”) and

which further reduce the complexity of the I/O device (see below at ¶148).

INTEL Ex.1003.080


78

146. As to receiving packets, Erickson discloses an endpoint table and

protocol scripts which store protocol state information and indicate how data is to

be transferred from the network interface device to portions of main memory

associated with a user process. Ex.1005, Erickson at 5:59-67. A POSA would

understand that Erickson’s endpoint table, pre-negotiated protocol information and

protocol scripts correspond to Tanenbaum96’s connection records, and that

Erickson’s looking up endpoint protocol information in the endpoint table

corresponds to Tanenbaum96’s looking up a connection record to copy data to the

user after a quick check that the packet is what is expected (header prediction).

Ex.1006, Tanenbaum96 at .584-.585. A POSA would therefore be motivated to

use the Tanenbaum96 teachings of header prediction to provide Erickson’s fast

path receive processing for TCP. That is, both work effectively the same:

Tanenbaum96 and Erickson receive data, strip off the headers, and copy the data to

memory. See above at ¶116 (Tanenbaum96); ¶135 (Erickson). A POSA would

have been motivated to apply Tanenbaum96’s teaching for TCP/IP receiving to

Erickson, and had a high expectation of success, given both effectively accomplish

the receiving and copying to memory in the same way except being of difficult

transport protocols.

147. Combining Tanenbaum96’s TCP/IP and header prediction with

Erickson would have been understood as combining known methods to yield

INTEL Ex.1003.081


79

predictable results. For example, TCP/IP was well known. See Section V.B.

Header prediction was well known. See Section V.E.4. Offloading protocol

processing was also generally well known. See Section V.B.C.-G.

148. A POSA would have been motivated to implement Tanenbaum96’s

header prediction teachings on the Erickson I/O adapter to reduce the complexity

and expense of the I/O adapter. I explain in Section V.C.-F. that various levels of

offloading are possible. Only offloading packets that are for data transfer, not for

setting up a connection, reduces the complexity of the offloading processing.

Ex.1006, Tanenbaum96 at .583 (“The key to fast TPDU processing is to separate

out the normal case (one-way data transfer) and handle it specially. Although a

sequence of special TPDUs are needed to get into the ESTABLISHED state, once

there, TPDU processing is straightforward until one side starts to close the

connection.”). This is because, for example, opening a connection requires several

different types of control packet transmission and receptions. See Section V.B.5.

149. Note that as part of its header prediction teachings, Tanenbaum96

specifically teaches that connection records (corresponding to Erickson’s endpoint

table, pre-negotiated protocol information and protocol scripts) can be stored in a

“hash table for which some simple function of the two IP addresses and two ports

is the key.” Ex.1006, Tanenbaum96 at .584-.585. That is, Tanenbaum96 details a

lookup technique using a “simple function” to implement the bypass test. This

INTEL Ex.1003.082


80

“simple function” could be used to look up the corresponding TCP connection in

the Erickson I/O adapter.

XI. GROUNDS OF INVALIDITY

150. I detail how the prior art invalidates the claims at issue in the

Appendix A claim chart. In summary, my opinion is that claims 1-7 of the 036

Patent are invalid over Erickson in view of Tanenbaum96.

INTEL Ex.1003.083

Declaration

Petition for Inter Partes Review of 7,237,036 Ex. 1003 ("Horst Deel.")

151. I declare that all statements made herein on my own knowledge are

true and that all statements made on information and belief are believed to be true,

and further, that these statements were made with the knowledge that willful false

statements and the like so made are punishable by fine or imprisonment, or both,

under Section 1001 of Title 18 of the United States Code.

Respectfully submitted,

~Witz;--, I

Robert Horst, Ph.D.

Date: April 17, 2017

81

INTEL Ex.1003.084


APPENDIX A

A-i

TABLE OF CONTENTS

Page

[1.P.1] A device for use with a first apparatus that is connectable to a second apparatus ..................................................................................................... 1

[1.P.2] the first apparatus containing a memory and a first processor ...................... 4

[1.P.3] [a first processor] operating a stack of protocol processing layers that create a context for communication, the context including a media access control (MAC) layer address, an Internet Protocol (IP) address and Transmission Control Protocol (TCP) state information, the device comprising: ........................................................... 5

[1.1] a communication processing mechanism connected to the first processor, .................................................................................................. 16

[1.2] said communication processing mechanism containing a second processor ................................................................................................... 18

[1.3] [second processor] running instructions to process a message packet such that the context is employed to transfer data contained in said packet to the first apparatus memory and ................................................. 21

[1.4] [second processor running instructions to process a message packet such that] the TCP state information is updated by said second processor. .................................................................................................. 24

[2.1] The device of claim 1, wherein said communication processing mechanism includes a receive sequencer with directions to classify said packet, wherein said packet contains control information corresponding to the stack of protocol layers. .......................................... 26

[3.1] The device of claim 1, wherein said communication processing mechanism includes a receive sequencer with directions to generate a summary of a second message packet received from the network, said second packet containing control information corresponding to the stack of protocol layers, and said instructions including an instruction to compare said summary with said context. ..................................................................................................... 30

[4.1] The device of claim 1, wherein said instructions include a first instruction to create a header corresponding to said context and

INTEL Ex.1003.085


APPENDIX A

A-ii

having control information corresponding to several of the protocol processing layers, and ............................................................................... 34

[4.2] said instructions include a second instruction to prepend said header to second data for transmission of a second packet. ..................................... 37

[5.1] The device of claim 1, wherein said communication processing mechanism has a direct memory access unit to send, based upon said context, said data from said communication processing mechanism to the first apparatus memory, ............................................... 40

[5.2] without a header accompanying said data. ...................................................... 46

[6.1] The device of claim 1, wherein said context includes a receive window of space in the memory that is available to store application data, and said communication processing mechanism advertises said receive window. ........................................................................................ 47

[7.1] The device of claim 1, wherein said context includes TCP ports of said first and said second apparatuses. ............................................................. 50

INTEL Ex.1003.086


APPENDIX A

A-1

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96

[1.P.1] A device for use with a first apparatus that is connectable to a second apparatus To the extent that the preamble is limiting, Erickson discloses a device for use with a first apparatus that is connectable to a second apparatus. Specifically, Erickson discloses an “I/O device adapter” (a device) that is connected to, and for use with, the host “computer” (a first apparatus) and a “receiver” (a second apparatus) that are connectable over a network:

To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a method of controlling an input/output (I/O) device connected to a computer to facilitate fast I/O data transfers.

Ex.1005, Erickson at 1:63-679. Erickson also refers to the “computer” as a “sender”:

FIG. 1 is a flow diagram illustrating a conventional I/O data flow between a sender and a receiver. At 102, a sender application sends information across the memory bus to a user buffer 104, which in turn is then read back across the memory bus by protocol modules 110. The information is subsequently buffered through the operating system kernel 108 before it is sent out through conventional network interface 114 to the network media access control (MAC) 116. It will be noted that in this system model, the data makes at least three trips across the memory bus at S2, S3 and S5. For the receiving application, the steps are reversed from those of the sender application, and once again the data makes at least three trips across the memory

9 Emphasis added unless otherwise noted.

INTEL Ex.1003.087


APPENDIX A

A-2

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.P.1] A device for use with a first apparatus that is connectable to a second apparatus

bus at R1, R4, and R5.

Id. at 3:23-36.

Maintaining security between multiple software processes is important when sharing a single I/O device adapter. If the I/O device adapter controls a network interface, such as an Ethernet device, then the access rights granted to the user process by the operating system could be analogous to a Transmission Control Protocol (TCP) address or socket.

Id. at 4:28-33. Note that Erickson discloses that the sender, i.e., the host computer, of its invention includes an I/O device with a fast path for such network connections. See id. at 1:63-67; 4:53-5:5. Otherwise, it connects (is connectable) to second apparatus in the same way relative to the “conventional” disclosure above. I annotate these components below:

INTEL Ex.1003.088


APPENDIX A

A-3

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.P.1] A device for use with a first apparatus that is connectable to a second apparatus

Id. at Fig. 3. Accordingly, Erickson in view of Tanenbaum96 discloses a device (I/O device) for use with a first apparatus (host computer, i.e., sender) that is connectable to a second apparatus (second computer, i.e., receiver).

INTEL Ex.1003.089


APPENDIX A

A-4

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.P.2] the first apparatus containing a memory and a first processor

To the extent that the preamble is limiting, Erickson discloses that the first apparatus contains a memory and a first processor. It is well known by those of ordinary skill in the art, and certainly obvious, that a “computer” (which Erickson discloses) includes memory (e.g., main memory) and a processor (containing a memory and a first processor). Id. at 1:63-67; see also id. at 9:48 (“memory of computer”) and Fig. 5 (“main memory”). For example, the computer includes user processes that open a device driver, which means that the processor of the host computer executes software for the user process instance and for opening of the device driver, and which further means that the host computer is utilizing its memory to both store and execute the user process and device driver. See also id. at 2:54-58 (describing the “user processes in a single computer node,” i.e., the user processes run on the host computers).

Typically, when a user process opens a device driver, the process specifies its type, which may include, but is not limited to, a UDP datagram, source port number, or register address.

Id. at 6:1-4. Accordingly, Erickson in view of Tanenbaum96 discloses that the first apparatus (host computer) contains a memory (e.g., main memory of host computer) and a first processor (its CPU).

INTEL Ex.1003.090


APPENDIX A

A-5

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.P.3] [a first processor] operating a stack of protocol processing layers that create a context for communication, the context including a media access control (MAC) layer address, an Internet Protocol (IP) address and Transmission Control Protocol (TCP) state information, the device comprising: To the extent that the preamble is limiting, Erickson in view of Tanenbaum96 discloses that the first processor of the host computer operates a stack of protocol processing layers that create a context for communication, the context including a media access control (MAC) layer address, an Internet Protocol (IP) address and Transmission Control Protocol (TCP) state information. The host computer of Erickson operates a stack of protocol layers for “normal streams processing” for “slow applications”:

Id. at Fig. 3.

FIG. 3 is a flow diagram describing the system data flow of fast and slow applications 302, 304, and 306 compatible with the present invention. A traditional slow application 306 uses normal streams processing 308 to send information to a pass-through driver 310. The pass-through driver 310 initializes the physical hardware registers 320 of the I/O device adapter 314 to

INTEL Ex.1003.091


APPENDIX A

A-6

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.P.3] [a first processor] operating a stack of protocol processing layers that create a context for communication, the context including a media access control (MAC) layer address, an Internet Protocol (IP) address and Transmission Control Protocol (TCP) state information, the device comprising:

subsequently transfer the information through the I/O device adapter 314 to the commodity interface 322.

Id. at 4:52-61. As shown above, Erickson also discloses a fast-path (for “fast applications”), wherein the I/O device performs some of the protocol processing:

With the present invention, fast user applications 302 and 304 directly use a setup driver 312 to initialize the physical hardware registers 320, then send the information directly through the I/O device adapter 314 to the commodity interface 322 via virtual hardware 316 and 318. Thus, the overhead of the normal streams processing 308 and pass-through driver 310 are eliminated with the use of the virtual hardware 316 and 318 of the present invention, and fast applications 302 and 304 are able to send and receive information more quickly than slow application 306. As a result, the present invention provides higher bandwidth, less latency, less system overhead, and shorter path lengths.

Id. at 4:61-5:5. Erickson discloses that a user process on the host computer creates a context for communication by (1) opening a device driver and specifying the protocol type (e.g. UDP or TCP), source port number or address, whether the connection is synchronous or asynchronous, and setting up memory mapped registers, an endpoint table and endpoint protocol data used for a protocol-specific script, and (2) pre-negotiating connection details including a template header.

Typically, when a user process opens a device driver, the process specifies its type, which may include, but is not limited

INTEL Ex.1003.092


APPENDIX A

A-7


to, a UDP datagram, source port number, or register address. The user process also specifies either a synchronous or asynchronous connection. The device driver sets up the registers 508 and 504, endpoint table 514, and endpoint protocol data 518. The protocol script 516 is typically based upon the endpoint data type, and the endpoint protocol data 518 depends on protocol specific data.

Id. 6:1-9.

In the present application, the access privileges given to the user processes are very narrow. Each user process has basically pre-negotiated almost everything about the datagram 602, except the actual user data 610. This means most of the fields in the three header areas 604, 606, and 608 are predetermined. In this example, the user process and the device driver has pre-negotiated the following fields from FIG. 6: (1) Ethernet Header 604 (Target Ethernet Address, Source Ethernet Address, and Protocol Type); (2) IP Header 606 (Version, IP header Length, Service Type, Flag, Fragment Offset, Time_to_Live, IP Protocol, IP Address of Source, and IP Address of Destination); and (3) UDP Header 608 (Source Port and Destination Port). Only the shaded fields in FIG. 6, and the user data 610, need to be changed on a per-datagram basis.

Id. at 6:57-7:4.

The script is also passed the appropriate datagram 702 template based on the specific software register (508 in FIG. 5 or 316 in FIG. 3). There are different scripts for different types of datagrams 702 (e.g., UDP or TCP). Also, the script would most likely make a copy of the datagram 702 template (not shown

INTEL Ex.1003.093


APPENDIX A

A-8


here), so that multiple datagrams 602 for the same user could be simultaneously in transit.

Id. at 8:2-9; see also id. at Figs. 6-7.

INTEL Ex.1003.094


APPENDIX A

A-9

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.P.3] [a first processor] operating a stack of protocol processing layers that create a context for communication, the context including a media access control (MAC) layer address, an Internet Protocol (IP) address and Transmission Control Protocol (TCP) state information, the device comprising: Ex.1005, Erickson at Fig. 7 (as shown above, this pre-negotiated information includes the Ethernet (MAC) address and IP addresses); see also Sections V.G.2-3,5 (describing MAC and IP addresses and opening a socket with these addresses). As shown above, Erickson discloses that this context includes “almost everything” concerning a UDP datagram “except the actual user data.” “[A]lmost everything” refers to, for example, the protocol information for the headers of the protocol layers. “[U]ser data” refers to the data payload that is part of the packets (the non-header parts of the packets). This “almost everything,” as described above and shown in Figs. 6 and 7, is Erickson’s pre-negotiated context, and includes a MAC layer address, IP address and UDP address. See id. at Figs. 6-7. Accordingly, Erickson teaches that “the context including a media access control (MAC) layer address, an Internet Protocol (IP) address.” As to the TCP state information, the above Erickson exemplary context is UDP over IP. UDP is a connectionless protocol. That is, single packets are sent without establishing a “connection.” See above at ¶109. However, Erickson also discloses protocol scripts for the other protocols including TCP/IP:

Protocol scripts typically serve two functions. The first function is to describe the protocol the software application is using. This includes but is not limited to how to locate an application endpoint, and how to fill in a protocol header template from the application specific data buffer. The second function is to define a particular set of instructions to be performed based upon the protocol type. Each type of protocol will have its own script. Types of protocols include, but are not limited to, TCP/IP, UDP/IP. BYNET lightweight datagrams, deliberate

INTEL Ex.1003.095


APPENDIX A

A-10


shared memory, active message handler, SCSI, and File Channel.

Id. at 5:41-51. A person having ordinary skill in the art (POSA) would have been motivated to consider Tanenbaum96’s teaching to implement the TCP/IP connection on Erickson’s I/O device in Section 9.4 (motivations to combine). Unlike UDP, TCP requires establishing a connection before sending a packet. See Section V.B.4. (describing TCP layer). Tanenbaum96 teaches that for TCP, only connections in the ESTABLISHED state should be processed on the fast path.

The key to fast TPDU [i.e. packet] processing is to separate out the normal case (one-way data transfer) and handle it specially. Although a sequence of special TPDUs are needed to get into the ESTABLISHED state, once there, TPDU processing is straightforward until one side starts to close the connection. Let us begin by examining the sending side in the ESTABLISHED state when there are data to be transmitted. … The first thing the transport entity does is make a test to see if this is the normal case: the state is ESTABLISHED, neither side is trying to close the connection, a regular (i.e., not an out-of-band) full TPDU is being sent, and there is enough window space available at the receiver. If all conditions are met, no further tests are needed and the fast path through the sending transport entity can be taken.

Ex.1006, Tanenbaum96 at .583. To enter the ESTABLISHED state, a series of control packets are sent back

INTEL Ex.1003.096


APPENDIX A

A-11

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.P.3] [a first processor] operating a stack of protocol processing layers that create a context for communication, the context including a media access control (MAC) layer address, an Internet Protocol (IP) address and Transmission Control Protocol (TCP) state information, the device comprising: and forth between the sender and receiver. See Sections V.B.4-5 (explaining TCP/IP connections and opening a connection). Tanenbaum96 is teaching that this series of control packets are sent and received on the slow path, i.e., is operating the stack of protocol processing layers to open the connection. As I describe above at ¶¶33-35, 146, establishing the connection on the slow path reduces the complexity of the offloading device. This slow path corresponds to Erickson’s “normal stream processing” for the slow applications which operates the protocol processing layers. Accordingly, a POSA would understand that, in view of Tanenbaum96, Erickson’s host operates the stack of protocol processing layers to create a TCP connection using Erickson’s slow path. Once the connection is in the ESTABLISHED state, the host uses the fast path for TCP communication. It would have been routine to modify Erickson’s UDP/IP fast path context to support TCP/IP based on the TCP/IP prototype header disclosed in Tanenbaum96:

INTEL Ex.1003.097


APPENDIX A

A-12


Ex.1006, Tanenbaum96 at .584. As the above teaches, the TCP headers on a series of TCP/IP packets require only changing a few fields, such as sequence number. As Tanenbaum96 teaches above, it was simple to simply change a few fields to create the new headers, thereby offloading the protocol processing. It further would have been routine to modify Erickson’s UDP/IP fast path context to include the TCP connection records described in Tanenbaum96:

Now let us look at fast path processing on the receiving side…. For TCP, the connection record can be stored in a hash table for which some simple function of the two IP addresses and two ports is the key. Once the connection record has been located,

INTEL Ex.1003.098


APPENDIX A

A-13


both addresses and both ports must be compared to verify that the correct record has been found…. the TPDU [Transport Protocol Data Unit, i.e. packet] is then checked to see if it is a normal one: the state is ESTABLISHED, neither side is trying to close the connection, the TPDU is a full one, no special flags are set, and the sequence number is the one expected. These tests take just a handful of instructions. If all conditions are met, a special fast path TCP procedure is called.

The fast path updates the connection record and copies the data to the user. While it is copying, it also computes the checksum, eliminating an extra pass over the data. If the checksum is correct, the connection record is updated and an acknowledgement is sent back. The general scheme of first making a quick check to see if the header is what is expected, and having a special procedure to handle that case, is called header prediction. Many TCP implementations use it.

Ex.1006, Tanenbaum96 at .584-.585 (underlining added, bold in original). The “connection records” disclosed in Tanenbaum96 are used to maintain TCP state:

When an application on the client machine issues a CONNECT request, the local TCP entity creates a connection record, marks it as being in the SYN SENT state, and sends a SYN segment. Note that many connections may be open (or being opened) at the same time on behalf of multiple applications, so the state is per connection and recorded in the connection record.

Ex.1006, Tanenbaum96 at .549 (underlining added). This state information includes, for example, the TCP sequence number and

INTEL Ex.1003.099


APPENDIX A

A-14

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.P.3] [a first processor] operating a stack of protocol processing layers that create a context for communication, the context including a media access control (MAC) layer address, an Internet Protocol (IP) address and Transmission Control Protocol (TCP) state information, the device comprising: window size (see above Tanenbaum96 quote at .584). See, e.g., above at ¶¶ 31, 39-41. Tanenbaum96, for example, notes that the sequence number is changed between packets. The connection record stores this sequence number and is used for sending and receiving packets corresponding to the respective connection. See Ex.1006, Tanenbaum96 at .583-.584. Erickson in view of Tanenbaum96 thus teaches “the context including a media access control (MAC) layer address, an Internet Protocol (IP) address and Transmission Control Protocol (TCP) state information.” Note that the IP and MAC header information is the same, and thus the above Erickson’s disclosures apply. Tanenbaum96 teaches the TCP state information as part of the connection record. To the extent that Erickson does not expressly disclose the host operating the stack of protocol layers to pre-negotiate the connection record, it is also my opinion that it would be obvious in view of Tanenbaum96. Tanenbaum96 teaches a bypass test that separates processing between a fast and slow path. In Tanenbaum96’s bypass test, a TCP connection must be in the ESTABLISHED state for fast path processing. Id. at .584-.585. Tanenbaum96 teaches that checking for established connections requires only a handful of instructions, and that packet processing for established connections is straightforward. Id. at .583. This is because checking a packet against a connection record to determine whether it is in an ESTABLISHED state requires merely checking a header against entries in a table (see claims 2-3 below) and processing a data transfer packet is also straightforward, e.g., creating the packets with headers and data portions (see claim 4). On the other hand, handling control packets to, for example, open or close a connection, requires much more processing. See Section V.b.5. (describing opening a connection). Accordingly, to reduce the complexity of the offloading device (e.g., Erickson’s I/O device), it was known (as Tanenbaum96 teaches) to only handle ESTABLISHED connections on the fast path. See id at .583-.584. Thus, a POSA would have been motivated to apply these teachings of Tanenbaum96 to Erickson. See

INTEL Ex.1003.100


APPENDIX A

A-15

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.P.3] [a first processor] operating a stack of protocol processing layers that create a context for communication, the context including a media access control (MAC) layer address, an Internet Protocol (IP) address and Transmission Control Protocol (TCP) state information, the device comprising: Section X.A. (motivations to combine). Specifically, a POSA would have been motivated for the host of Erickson to operate the stack of protocol processing layers to establish the connection (its “Slow Applications” path), which creates the context (the connection record), and then process subsequent packets on the Erickson I/O device. Accordingly, Erickson in view of Tanenbaum96 discloses that the first processor of the host computer operates a stack of protocol processing layers (its slow path stack for normal stream processing to set up a connection) that create a context for communication (registers 508 and 504, endpoint table 514, and endpoint protocol data 518, TCP protocol script and the pre-negotiated protocol information), the context including a media access control (MAC) layer address, an Internet Protocol (IP) address and Transmission Control Protocol (TCP) state information (address information to send the TCP/IP packet over the network that Erickson pre-negotiates, as well as TCP fields such as the sequence number and window size).

INTEL Ex.1003.101


APPENDIX A

A-16

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.1] a communication processing mechanism connected to the first processor, Erickson discloses a communication processing mechanism connected to the first processor. Specifically, the I/O device of Erickson (the “device”) includes a communication processing mechanism as shown in the below annotated figure, which includes a second processor (see below element [1.2]) with scripts to send data (1) to the second apparatus (via a commodity interface that handles the physical connection), and (2) when receiving data from the second apparatus, to host endpoint applications.

FIG. 3 is a flow diagram describing the system data flow of fast and slow applications 302, 304, and 306 compatible with the present invention. A traditional slow application 306 uses normal streams processing 308 to send information to a pass-through driver 310. The pass-through driver 310 initializes the physical hardware registers 320 of the I/O device adapter 314 to subsequently transfer the information through the I/O device adapter 314 to the commodity interface 322. With the present invention, fast user applications 302 and 304 directly use a setup driver 312 to initialize the physical hardware registers 320, then send the information directly through the I/O device adapter 314 to the commodity interface 322 via virtual hardware 316 and 318. Thus, the overhead of the normal streams processing 308 and pass-through driver 310 are eliminated with the use of the virtual hardware 316 and 318 of the present invention, and fast applications 302 and 304 are able to send and receive information more quickly than slow application 306. As a result, the present invention provides higher bandwidth, less latency, less system overhead, and shorter path lengths. FIG. 4 is a block diagram describing a direct application interface (DAI) and routing of data between processes and an external data connection which is compatible with the present invention. Processes 402 and 404 transmit and receive information directly to and from an interconnect 410 (e.g., I/O

INTEL Ex.1003.102


APPENDIX A

A-17

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.1] a communication processing mechanism connected to the first processor,

device adapter) through the DAI interface 408. The information coming from the interconnect 410 is routed directly to a process 402 or 404 by use of virtual hardware and registers, rather than using a traditional operating system interface 406.

Ex.1005, Erickson at 4:53-5:14, see also id. at 4:18-23 (running scripts). The communication processing mechanism of the I/O device is connected to the first processor through standard device buses:

FIG. 2 is a block diagram illustrating a virtual hardware memory organization compatible with the present invention. I/O device adapters on standard I/O buses, such as ISA, EISA, MCA, or PCI buses, frequently have some amount of memory and memory-mapped registers which are addressable from a device driver in the operating system.

Id. at 3:36-42. Accordingly, Erickson discloses a communication processing mechanism (the processor of the I/O device) connected to the first processor (via buses and addressing mapping).

INTEL Ex.1003.103


APPENDIX A

A-18

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.2] said communication processing mechanism containing a second processor Erickson discloses that the communication processing mechanism contains a second processor. Specifically, the I/O device includes memory and runs scripts:

An I/O device adapter typically can have an arbitrary amount of random access memory (RAM) ranging from several hundred kilobytes to several megabytes, which may be used for mapping several user processes in a single communications node.

Id. at 5:27-31.

A script is prepared by the operating system for the I/O device adapter to execute each time the specific user process programs its specific virtual hardware. The user process is given a virtual address in the user process' address space that allows the user process very specific access capabilities to the I/O device adapter.

Id. at 4:18-23; see also id. at 7:48-8:26 (example script). I annotate these on Figure 5 of Erickson below:

INTEL Ex.1003.104


APPENDIX A

A-19

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.2] said communication processing mechanism containing a second processor

Id. at Fig. 5 (annotated). It would have been understood, and certainly obvious, that the I/O device of Erickson includes a processor because it is executing scripts in a high level language that requires a processor. See, e.g., id. at 7:48-8:26 (example high level script). Further, Erickson discloses that the I/O device computes the checksum via a function call, which would be understood as using the CPU to perform arithmetic functions to compute this value (i.e., using a processor). See id. Note that the scripts are in an un-interpreted language, meaning that a processor must first compile the scripts into an instruction set for the processor, and then execute the script. See, e.g., id. at 7:48-8:26 (example script). Note that the definition of a processor is that interprets and executes commands, i.e., the scripts of Erickson. Ex.1037, Computer Dictionary, Microsoft (1994) at .010, .011. Accordingly, Erickson discloses that the communication processing

INTEL Ex.1003.105


APPENDIX A

A-20

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.2] said communication processing mechanism containing a second processor mechanism contains a second processor (the processor of the I/O device to execute the scripts).

INTEL Ex.1003.106


APPENDIX A

A-21


[1.3] [second processor] running instructions to process a message packet such that the context is employed to transfer data contained in said packet to the first apparatus memory and Erickson in view of Tanenbaum96 discloses running instructions to process a message packet such that the context is employed to transfer data contained in said packet to the first apparatus memory. Specifically, Erickson discloses running scripts, i.e., instructions:

A script is prepared by the operating system for the I/O device adapter to execute each time the specific user process programs its specific virtual hardware. The user process is given a virtual address in the user process' address space that allows the user process very specific access capabilities to the I/O device adapter.

Id. at 4:18-23; see also id. at 7:48-8:26 (example script). The user process invokes a script for the protocol to be used for the connection. The particular set of instructions for that script is part of the context.

The second function is to define a particular set of instructions to be performed based upon the protocol type. Each type of protocol will have its own script. Types of protocols include, but are not limited to, TCP/IP, UDP/IP, BYNET lightweight datagrams, deliberate shared memory, active message handler, SCSI, and File Channel.

Id. at 5:45-51. These scripts include processing incoming data and transferring that data to the memory of the first apparatus (the memory of the host computer that corresponds to the user process that is ultimately receiving the user data) by employing fields of the context (present in, e.g., registers 504 and 508, endpoint table 514, and endpoint protocol data 518):

INTEL Ex.1003.107


APPENDIX A

A-22

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.3] [second processor] running instructions to process a message packet such that the context is employed to transfer data contained in said packet to the first apparatus memory and

FIG. 5 is a block diagram illustrating the system organization between a main memory and an I/O device adapter memory which is compatible with the present invention. The main memory 502 implementation includes a hardware register 504 and a buffer pool 506. The I/O device adapter implementation includes a software register 508 and a physical address buffer map 510 in the adapter's memory 512. An endpoint table 514 in the memory 512 is used to organize multiple memory pages for individual user processes. Each entry within the endpoint table 514 points to various protocol data 518 in the memory 512 in order to accommodate multiple communication protocols, as well as previously defined protocol scripts 516 in the memory 512, which indicate how data or information is to be transferred from the memory 512 of the I/O device adapter to the portions of main memory 502 associated with a user process.

Id. at 5:53-67. Recall that the user processes, via the device driver, sets up register 504 and 508, endpoint table 514, and endpoint protocol data 518 to create parts of the context. Id. at 6:1-9, see also id. at 6:57-7:4 (pre-negotiating, i.e., providing to the I/O device, header information for the context). Erickson’s protocol scripts plus other context information (present in, e.g., registers 504 and 508, endpoint table 514, and endpoint protocol data 518 and protocol script), include instructions to process a message packet such that the context is employed to transfer data contained in said packet to the first apparatus memory to transfer incoming data “from the memory 512 of the I/O device adapter to the portions of main memory 502 associated with a process.” Id. at 5:53-67. Erickson further details the transferring to the host memory, depicting the I/O device receiving data (adapter of I/O device 410) and directly providing it to a user process (via memory of the host computer) in Figure 4:

INTEL Ex.1003.108


APPENDIX A

A-23

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.3] [second processor] running instructions to process a message packet such that the context is employed to transfer data contained in said packet to the first apparatus memory and

FIG. 4 is a block diagram describing a direct application interface (DAI) and routing of data between processes and an external data connection which is compatible with the present invention. Processes 402 and 404 transmit and receive information directly to and from an interconnect 410 (e.g., I/O device adapter) through the DAI interface 408. The information coming from the interconnect 410 is routed directly to a process 402 or 404 by use of virtual hardware and registers, rather than using a traditional operating system interface 406.

Ex.1005, Erickson at 5:6-5:14, see also 4:53-5:5 and Fig. 3 (illustrating that I/O device 314 sends data to applications 302 and 304 that reside within the memory of the host computer). Accordingly, Erickson in view of Tanenbaum96 disclose running instructions (specified by the scripts) to process a message packet such that the context (including the script, registers 508 and 504, endpoint table 514, endpoint protocol data 518, pre-negotiated information, and pointer to main memory) is employed (to identify the relevant protocol script to run, and further to identify where to write the received data) to transfer data contained in said packet to the first apparatus memory (memory of host computer)

INTEL Ex.1003.109


APPENDIX A

A-24

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.4] [second processor running instructions to process a message packet such that] the TCP state information is updated by said second processor. Erickson in view of Tanenbaum96 discloses the second processor running instructions to process a message such that the TCP state information is updated by the second processor. Specifically, Tanenbaum96 discloses:

…

Ex.1006, Tanenbaum96 at .584-.585. The “connection records” disclosed in Tanenbaum96 are used to maintain TCP state:

When an application on the client machine issues a CONNECT request, the local TCP entity creates a connection record, marks it as being in the SYN SENT state, and sends a SYN segment. Note that many connections may be open (or being opened) at the same time on behalf of multiple applications, so the state is per connection and recorded in the connection record.

Id. at .549. This connection record includes the TCB (transmission control block) that corresponds to the connection. The connection information for the connection is often referred to as the TCB. See Section V.B.7.

INTEL Ex.1003.110


APPENDIX A

A-25

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.4] [second processor running instructions to process a message packet such that] the TCP state information is updated by said second processor.

Id. at .584. Erickson has an analogous disclosure (to, e.g., updating the TCP sequence number) in which it updates the IP 16-bit counters between packets.

Within the udpscript procedure described above, the nextid( ) function provides a monotonically increasing 16-bit counter required by the IP protocol.

Ex.1005, Erickson at 8:10-12. Accordingly, Erickson in view of Tanenbaum96 teaches the TCP state information is updated by said second processor (the connection record, which includes TCP state information, e.g., sequence number).

INTEL Ex.1003.111


APPENDIX A

A-26

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [2.1] The device of claim 1, wherein said communication processing mechanism includes a receive sequencer with directions to classify said packet, wherein said packet contains control information corresponding to the stack of protocol layers. Erickson in view of Tanenbaum96 discloses wherein said communication processing mechanism includes a receive sequencer with directions to classify said packet, wherein said packet contains control information corresponding to the stack of protocol layers. As an initial matter, note that Erickson discloses a slow path (Slow Application) and fast path (Fast Application) and thus teaches the concept of classifying packets for each path. See id. at Fig. 3. Erickson also teaches classifying packets by protocol type because each requires a different script. See id. at 5:41-51. In view of these disclosures, it would have been obvious distinguish between fast and slow path processing using Tanenbaum96’s teachings, namely, its “header prediction.” See Section X.A. (discussing motivations to combine). First, Tanenbaum96 discloses a receive sequencer (as part of its “transport entity,” which is a processor executing instructions and that receives a sequence of packets for respective connections). Note that the “transport entity” of Tanenbaum96, consistent with the I/O device of Erickson, may reside on the network interface card. Ex.1006, Tanenbaum96 at .515-.516. Including this hardware in Erickson, as Tanenbaum teaches below, would perform the bypass test by, e.g., checking the sequence number as part of its “header prediction,” checking the connection record, and classifying the packet according to fast or slow path (Erickson Fast Application or Slow Application):

INTEL Ex.1003.112


APPENDIX A

A-27

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [2.1] The device of claim 1, wherein said communication processing mechanism includes a receive sequencer with directions to classify said packet, wherein said packet contains control information corresponding to the stack of protocol layers.

Ex.1006, Tanenbaum96 at .585. Note the “transport entity” (a processor executing instructions) of Tanenbaum96 performs the testing:

INTEL Ex.1003.113


APPENDIX A

A-28

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [2.1] The device of claim 1, wherein said communication processing mechanism includes a receive sequencer with directions to classify said packet, wherein said packet contains control information corresponding to the stack of protocol layers.

Id. at .583. Second, as shown above, Tanenbaum96’s transport entity uses “instructions,” that is, it has “directions to classify said packet.” This is consistent with Erickson’s teachings of scripts. Third, the TCP/IP packet contains control information corresponding to the stack of protocol layers:

Id. at .584. For example, control information in the TCP header includes a “sequence

INTEL Ex.1003.114


APPENDIX A

A-29

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [2.1] The device of claim 1, wherein said communication processing mechanism includes a receive sequencer with directions to classify said packet, wherein said packet contains control information corresponding to the stack of protocol layers. number” that controls the packet’s placement within the application data and “port” fields that control the communication flow. Control information in the IP header includes a VER field (version of IP packet that controls its processing) and address fields that also control the communication flow. Accordingly, Erickson in view of Tanenbaum96 discloses that the communication processing mechanism includes a receive sequencer (Erickson’s I/O adapter using Tanenbaum96’s header prediction to classify packets for fast path processing) with directions (instructions) to classify said packet (fast versus slow path), wherein said packet contains control information corresponding to the stack of protocol layers (control information in TCP/IP headers).

INTEL Ex.1003.115


APPENDIX A

A-30


[3.1] The device of claim 1, wherein said communication processing mechanism includes a receive sequencer with directions to generate a summary of a second message packet received from the network, said second packet containing control information corresponding to the stack of protocol layers, and said instructions including an instruction to compare said summary with said context. Erickson in view of Tanenbaum96 discloses wherein said communication processing mechanism includes a receive sequencer with directions to generate a summary of a second message packet received from the network, said second packet containing control information corresponding to the stack of protocol layers, and said instructions including an instruction to compare said summary with said context. First, Erickson in view of Tanenbaum96 discloses that the communication processing mechanism includes a receive sequencer with directions to generate a summary of a second message packet received from the network. As noted in claim 2, it would have been obvious to combine Erickson with Tanenbaum96’s header prediction teachings. See Section X.A. (discussing motivations to combine). As noted in claim 2, Erickson in view of Tanenbaum96 discloses a receive sequencer with directions. Further, Tanenbaum96 discloses this receive sequencer (the “transport entity” hardware and header prediction) may produce a summary (the IP addresses and port portions of the headers) of the incoming packets (and thus a “second packet”) and use a hash of the IP addresses to look up the context; the summary is then compared against the context to “verify that the correct record has been found”:

INTEL Ex.1003.116


APPENDIX A

A-31

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [3.1] The device of claim 1, wherein said communication processing mechanism includes a receive sequencer with directions to generate a summary of a second message packet received from the network, said second packet containing control information corresponding to the stack of protocol layers, and said instructions including an instruction to compare said summary with said context.

Ex.1006, Tanenbaum96 at .584-.585. Second, Erickson in view of Tanenbaum96 discloses that the second packet contains control information corresponding to the stack of protocol layers:

INTEL Ex.1003.117


APPENDIX A

A-32

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [3.1] The device of claim 1, wherein said communication processing mechanism includes a receive sequencer with directions to generate a summary of a second message packet received from the network, said second packet containing control information corresponding to the stack of protocol layers, and said instructions including an instruction to compare said summary with said context.

Id. at .584. For example, control information in the TCP header includes a “sequence number” that controls the packet’s placement within the application data and “port” fields that control the communication flow. Control information in the IP header includes a VER field (version of IP packet that controls its processing) and address fields that also control the communication flow. Third, Erickson in view of Tanenbaum96 discloses that the instructions include an instruction to compare said summary with said context. As shown above, Tanenbaum96 discloses using the summary (the IP addresses and ports) to compare against the context (connection record) to verify the correct record is found. This determines whether to take the fast or slow path. A POSA would have understood that the process of using the hash to fetch the connection record and comparing with the addresses and ports is performed by instructions (Tanenbaum refers to “instructions” above). A POSA would also understand that comparing a summary to a context, as required by this claim element, must involve steps of extracting the relevant

INTEL Ex.1003.118


APPENDIX A

A-33

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [3.1] The device of claim 1, wherein said communication processing mechanism includes a receive sequencer with directions to generate a summary of a second message packet received from the network, said second packet containing control information corresponding to the stack of protocol layers, and said instructions including an instruction to compare said summary with said context. fields and performing operations equivalent to the steps disclosed by Tanenbaum96. Such operations would involve multiple operations (e.g. fetch, mask, compare, conditional branch) and would not typically be done by a single computer instruction. However, a POSA would have understood that the whole process could be performed by a single macro instruction that indicates failure or success and thus the packet is a proper candidate for fast path processing (for the I/O device protocol processing when applying this teaching to Erickson). Performing these operations with a sequence of instructions or a single macro instruction is a simple design choice that could be performed by a POSA with predictable results. Hence, Tanenbaum96 discloses an instruction to compare said summary with said context. Accordingly, Erickson in view of Tanenbaum96 discloses that the communication processing mechanism includes a receive sequencer (Erickson’s I/O adapter with Tanenbaum96’s header prediction) with directions (instructions) to generate a summary of a second message packet received from the network (hash of addresses), said second packet containing control information corresponding to the stack of protocol layers (TCP/IP control information in headers), and said instructions including an instruction to compare said summary with said context (the instruction indicating fast or slow path based on using the hash to fetch the connection record and perform the address and port comparisons).

INTEL Ex.1003.119


APPENDIX A

A-34

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [4.1] The device of claim 1, wherein said instructions include a first instruction to create a header corresponding to said context and having control information corresponding to several of the protocol processing layers, and Erickson in view of Tanenbaum96 discloses that the instructions include a first instruction to create a header corresponding to said context and having control information corresponding to several of the protocol processing layers. Specifically, Erickson discloses that the I/O device uses the scripts (instructions) to create a header:

Protocol scripts typically serve two functions. The first function is to describe the protocol the software application is using. This includes but is not limited to how to locate an application endpoint, and how to fill in a protocol header template from the application specific data buffer. The second function is to define a particular set of instructions to be performed based upon the protocol type. Each type of protocol will have its own script. Types of protocols include, but are not limited to, TCP/IP, UDP/IP, BYNET lightweight datagrams, deliberate shared memory, active message handler, SCSI, and File Channel.


FIG. 7 is a block diagram illustrating a UDP datagram template 702 (without a user data area) residing in the I/O device adapter's memory. The user process provides the starting address and the length for the user data in its virtual address space, and then "spanks" a GO register to trigger the I/O device adapter's execution of a predetermined script. The I/O device adapter stores the user data provided by the user process in the I/O device adapter's memory, and then transmits the completed UDP datagram 702 over the media.

Id. at 7:39-47.

INTEL Ex.1003.120


APPENDIX A

A-35

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [4.1] The device of claim 1, wherein said instructions include a first instruction to create a header corresponding to said context and having control information corresponding to several of the protocol processing layers, and As discussed in element [1.4], Erickson discloses updating control information, which in the TCP/IP context, would include the sequence number. In light of these disclosures, it would have been obvious to create a TCP/IP header with, for example, updated sequence number in view of Tanenbaum96’s teachings. See Section X.A. (discussing motivations to combine). Tanenbaum96 discloses creating such TCP/IP headers:

Ex.1006, Tanenbaum96 at .584. Note that the header includes control information:

INTEL Ex.1003.121


APPENDIX A

A-36

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [4.1] The device of claim 1, wherein said instructions include a first instruction to create a header corresponding to said context and having control information corresponding to several of the protocol processing layers, and

Id. at .584. For example, control information in the TCP header includes a “sequence number” that controls the packet’s placement within the application data and “port” fields that control the communication flow. Control information in the IP header includes a VER field (version of IP packet that controls its processing) and address fields that also control the communication flow. A POSA would understand that the header creation could be done by a sequence of instructions or a single macro instruction (a first instruction). As noted above, Tanenbaum refers to “instructions.” Accordingly, Erickson in view of Tanenbaum96 discloses that the instructions (scripts) include a first instruction to create a header corresponding to said context (using header templates) and having control information corresponding to several of the protocol processing layers (control information in the TCP/IP headers).

INTEL Ex.1003.122


APPENDIX A

A-37

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [4.2] said instructions include a second instruction to prepend said header to second data for transmission of a second packet. Erickson in view of Tanenbaum96 discloses that the instructions include a second instruction to prepend said header to second data for transmission of a second packet. Under the broadest reasonable construction standard, “prepend” would have been understood to mean “adds to the front.” See Section VIII.C. First, Erickson discloses a script for filling in a protocol header template:

Protocol scripts typically serve two functions. The first function is to describe the protocol the software application is using. This includes but is not limited to how to locate an application endpoint, and how to fill in a protocol header template from the application specific data buffer. The second function is to define a particular set of instructions to be performed based upon the protocol type. Each type of protocol will have its own script. Types of protocols include, but are not limited to, TCP/IP, UDP/IP, BYNET lightweight datagrams, deliberate shared memory, active message handler, SCSI, and File Channel10.


FIG. 7 is a block diagram illustrating a UDP datagram template 702 (without a user data area) residing in the I/O device adapter's memory. The user process provides the starting address and the length for the user data in its virtual address space, and then "spanks" a GO register to trigger the I/O device

10 This is most likely a typo in Erickson and should have said “Fibre Channel,” an

industry-standard storage network.

INTEL Ex.1003.123


APPENDIX A

A-38

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [4.2] said instructions include a second instruction to prepend said header to second data for transmission of a second packet.

adapter's execution of a predetermined script. The I/O device adapter stores the user data provided by the user process in the I/O device adapter's memory, and then transmits the completed UDP datagram 702 over the media.

Id. at 7:39-47. As noted above, it would have been obvious to create a TCP/IP header with, for example, updated sequence number in view of Tanenbaum96’s teachings that disclose this. See Section X.A. (discussing motivations to combine). Applying these teachings, the I/O device would prepend the TCP/IP header via prepending the header to buffer memory and filling in the application data:

Ex.1006, Tanenbaum96 at .584.

INTEL Ex.1003.124


APPENDIX A

A-39

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [4.2] said instructions include a second instruction to prepend said header to second data for transmission of a second packet. It would have been obvious to add the headers to the front of data for transmission. There are at least two obvious approaches: prepending headers to data, or appending data to headers. Each approach is predictable and easy to implement. Each simply requires adding either the data portion or the header portions to the other portion. A POSA would have been motivated to prepend the header because the data may already be residing in the I/O device, while the header requires calculating, for example, the next sequence number. Accordingly, after such calculations are performed, the headers can then be prepended onto the data portion. Moreover, the headers portions are of a defined size, so it would have been understood as a simple implementation to reserve buffer space for the headers (in front of the data), and then prepend the headers onto the data in that buffer space. Finally, prepending was standard. See Section V.B.6. A POSA would also understand that the header creation could be done by a sequence of instructions or a single macro instruction (a second instruction). Accordingly, Erickson in view of Tanenbaum96 discloses that the instructions include a second instruction to prepend said header (via the Erickson header template) to second data for transmission of a second packet (the I/O device sending the second packet onto the network).

INTEL Ex.1003.125


APPENDIX A

A-40

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [5.1] The device of claim 1, wherein said communication processing mechanism has a direct memory access unit to send, based upon said context, said data from said communication processing mechanism to the first apparatus memory, Erickson in view of Tanenbaum96 discloses that the communication processing mechanism has a direct memory access unit to send, based upon said context, said data from said communication processing mechanism to the first apparatus memory, without a header accompanying said data. Erickson’s protocol scripts plus other context information (resident in registers 504 and 508, endpoint table 514, and endpoint protocol data 518) transfer data to the memory of the first apparatus (the memory of the host computer):


Ex.1005, Erickson at 5:53-67. Note that the user processes, via the device driver, sets up register 504 and 508, endpoint table 514, and endpoint protocol data 518 to create the context:

Typically, when a user process opens a device driver, the

INTEL Ex.1003.126


APPENDIX A

A-41

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [5.1] The device of claim 1, wherein said communication processing mechanism has a direct memory access unit to send, based upon said context, said data from said communication processing mechanism to the first apparatus memory,

process specifies its type, which may include, but is not limited to, a UDP datagram, source port number, or register address. The user process also specifies either a synchronous or asynchronous connection. The device driver sets up the registers 508 and 504, endpoint table 514, and endpoint protocol data 518. The protocol script 516 is typically based upon the endpoint data type, and the endpoint protocol data 518 depends on protocol specific data.

Id. at 6:1-9. Erickson further depicts the I/O device receiving data and directly providing it to an application (via memory of the host computer) in Figure 4:

FIG. 4 is a block diagram describing a direct application interface (DAI) and routing of data between processes and an external data connection which is compatible with the present invention. Processes 402 and 404 transmit and receive information directly to and from an interconnect 410 (e.g., I/O device adapter) through the DAI interface 408. The information

INTEL Ex.1003.127


APPENDIX A

A-42

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [5.1] The device of claim 1, wherein said communication processing mechanism has a direct memory access unit to send, based upon said context, said data from said communication processing mechanism to the first apparatus memory,

coming from the interconnect 410 is routed directly to a process 402 or 404 by use of virtual hardware and registers, rather than using a traditional operating system interface 406.

Id. at 5:6-5:14, see also at 4:53-5:5 and Fig. 3 (illustrating that I/O device 314 sends data to applications 302 and 304 that reside within the memory of the host computer). Similarly, Tanenbaum discloses that the “fast path … copies the data to the user.” Ex.1006, Tanenbaum96 at .585. The “user” refers to the application running on the host, i.e., it is using the protocol stack for communication. Erickson specifically describes a direct memory access (DMA) unit of the I/O device which would be understood, and certainly obvious, to perform the function of directly sending data from the I/O device to the memory of the host computer:

The adapter would not want to be forced to access the user data twice over the I/O bus, once for the calculation performed by the udpchecksum() function, and a second time for transmission over the media. Instead, the adapter would most likely retrieve the needed user data from the user process' virtual address space using direct memory access (DMA) into the main memory over the bus and retrieving the user data into some portion of the adapter's memory, where it could be referenced more efficiently. The programming steps performed in the udpscript() procedure above might need to be changed to reflect that.

Id. at 8:27-37. Erickson also depicts, as it would be understood by a person having ordinary skill, using DMA to directly write from the I/O device to the host memory in

INTEL Ex.1003.128


APPENDIX A

A-43

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [5.1] The device of claim 1, wherein said communication processing mechanism has a direct memory access unit to send, based upon said context, said data from said communication processing mechanism to the first apparatus memory, Figure 5:

Id. at Fig. 5 (annotated). DMA (Direct Memory Access) is a hardware-based technique for transferring data between memory systems or between a host memory and an I/O device. See Section V.H.1. (explaining DMA). DMA enables hardware to access direct memory without requiring processor involvement during the read or write process. See, e.g., Ex.1012, U.S. Pat. No. 4,831,523, at 9:2-7. Erickson discloses DMA, but only describes its use for transferring data from main memory to the adapter:

…the adapter would most likely retrieve the needed user data from the user process’ virtual address space using direct memory access (DMA) into the main memory over the bus and retrieving the user data into some portion of the adapter's memory, where it could be referenced more efficiently.

INTEL Ex.1003.129


APPENDIX A

A-44

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [5.1] The device of claim 1, wherein said communication processing mechanism has a direct memory access unit to send, based upon said context, said data from said communication processing mechanism to the first apparatus memory, Ex.1005, Erickson at 8:30-35. A POSA would understand that typical DMA engines can be used for both reading and writing data and it would also be beneficial to use DMA to send data from the adapter memory to main memory. The use of DMA would allow data movement without consuming processor cycles on either the application processor or the adapter processor running scripts. Thus Erickson, along with the knowledge of a POSA, discloses a direct memory access unit to send… data from said communication processing mechanism (the adapter) to the first apparatus memory (main memory). See also Section V.H.1. Further, Erickson employs context to receive packets, process the packets, and transfer the data to the host computer memory:


Id. at 5:53-67.

INTEL Ex.1003.130


APPENDIX A

A-45

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [5.1] The device of claim 1, wherein said communication processing mechanism has a direct memory access unit to send, based upon said context, said data from said communication processing mechanism to the first apparatus memory, As to where exactly Erickson writes the data to the host memory, Erickson discloses, with respect to the I/O device sending data, using pointers to indicate where data is in the host memory so that the I/O device can directly retrieve the data from the host memory, add headers, and send the data. This would be understood by a POSA that in reverse (when receiving data), the I/O device would use a pointer to instruct the DMA to directly write the data to the host memory at the pointer’s address. See id. at 6:1-41 (describing pointer STARTADDRESS, which the I/O device stores in its memory, as the pointer to the data for the I/O device to retrieve, packet, and transmit). Accordingly, Erickson in view of Tanenbaum96 discloses that the communication processing mechanism has a direct memory access unit (Erickson’s DMA) to send, based upon said context (e.g., information in registers 508, endpoint table 514, protocol data 518, and obvious pointers to user process memory space), said data (received packets) from said communication processing mechanism to the first apparatus memory (main memory).

INTEL Ex.1003.131


APPENDIX A

A-46

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [5.2] without a header accompanying said data.

Erickson in view of Tanenbaum96 teaches without a header accompanying said data. Note that as the I/O device is performing the TCP/IP processing, it strips the TCP, IP and MAC headers of the data before transferring data to main memory at the virtual address of the user process. See Section V.B. (describing TCP/IP layer processing encapsulation process – upon receiving data, it works in reverse). Accordingly, to the extent that Erickson does not expressly disclose stripping off these headers, it would be obvious, as the entire point of offloading this processing is so that the host does not perform these functions (moreover, the user application is expecting only data, not data with headers, because it only receives data after protocol processing). Moreover, removing the headers before sending the data to host memory would be obvious in view of Tanenbaum96. Tanenbaum96 describes the fast path “cop[ying] the data to the user,” i.e., the “data” and not the header (the data portion of the packets). Ex.1006, Tanenbaum96 at .567. Recall that the transport entity is performing these functions. Id. at .565-.567. And recall that the “transport entity” of Tanenbaum96, consistent with the I/O device of Erickson, may reside on the network interface card. Id. at .497-.498. Accordingly, in view of Tanenbaum96, the Erickson I/O device would copy “the data to the user,” i.e., would copy only the data to the host memory. The reason for offloading this processing is so that the host does not perform these functions (moreover, the user application is expecting only data, not data with headers, because it only receives data after protocol processing). Here, the I/O device is offloading this function. Accordingly, Erickson in view of Tanenbaum teaches providing the data to the host without a header accompanying said data.

INTEL Ex.1003.132


APPENDIX A

A-47

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [6.1] The device of claim 1, wherein said context includes a receive window of space in the memory that is available to store application data, and said communication processing mechanism advertises said receive window. Erickson in view of Tanenbaum96 discloses that the context includes a receive window of space in the memory that is available to store application data, and said communication processing mechanism advertises said receive window. As noted, it would be obvious to combine Erickson with Tanenbaum96’s TCP/IP teachings to effectuate a TCP/IP connection with Erickson’s I/O device. See Section X.A. (describing motivations to combine). TCP inherently includes interfaces in which a communication processing mechanism advertises said receive window because the use of the receive window is required for systems to communicate using TCP. See Section V.B.9. (describing advertising a receive window). The TCP/IP headers, as Tanenbaum96 teaches, includes the “window size” field. This field is part of the context because it is in the header prototype used by the I/O device to create headers. The Window size field is part of the context that Erickson pre-negotiates in view of Tanenbaum96:

Ex.1006, Tanenbaum96 at .584 (annotated).

INTEL Ex.1003.133


APPENDIX A

A-48

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [6.1] The device of claim 1, wherein said context includes a receive window of space in the memory that is available to store application data, and said communication processing mechanism advertises said receive window. The window size is how much space in the memory (e.g., a buffer) that is available to store application data (i.e., incoming application data), and said communication processing mechanism advertises the receive window by including it (and dynamically adjusting it) in each packet:

Id. at .554-.555. The receive window of space in the memory that is available to store application data is the receiver’s buffer, as described above. Combining Tanenbaum96 with Erickson, the receive buffer would be located in Erickson’s main memory.

INTEL Ex.1003.134


APPENDIX A

A-49

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [6.1] The device of claim 1, wherein said context includes a receive window of space in the memory that is available to store application data, and said communication processing mechanism advertises said receive window. Accordingly, Erickson in view of Tanenbaum96 discloses wherein said context includes a receive window of space in the memory that is available to store application data (window size field of the TCP/IP packet), and said communication processing mechanism advertises said receive window (via sending the packets).

INTEL Ex.1003.135


APPENDIX A

A-50

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [7.1] The device of claim 1, wherein said context includes TCP ports of said first and said second apparatuses. Erickson in view of Tanenbaum96 discloses that the context includes TCP ports of said first and said second apparatuses. Specifically, Erickson discloses pre-negotiating ports for datagrams to the I/O device as part of the connection setup:

In this example, the user process and the device driver has pre-negotiated the following fields from FIG. 6: (1) Ethernet Header 604 (Target Ethernet Address, Source Ethernet Address, and Protocol Type); (2) IP Header 606 (Version, IP header Length, Service Type, Flag, Fragment Offset, Time_to_Live, IP Protocol, IP Address of Source, and IP Address of Destination); and (3) UDP Header 608 (Source Port and Destination Port). Only the shaded fields in FIG. 6, and the user data 610, need to be changed on a per-datagram basis.

Ex.1005, Erickson at 6:63-7:4; see also id. at Fig. 6. As noted, it would be obvious to combine Erickson with Tanenbaum96’s teachings for a TCP/IP connection. See Section X.A. (describing motivations to combine). A TCP packet includes a TCP source and destination port number, and thus the Erickson pre-negotiating for a TCP/IP connection in view of Tanenbaum96 would include creating these values as part of the context (as it must use them to create headers):

INTEL Ex.1003.136


APPENDIX A

A-51

U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [7.1] The device of claim 1, wherein said context includes TCP ports of said first and said second apparatuses.

Ex.1006, Tanenbaum96 at .544. Accordingly, Erickson in view of Tanenbaum96 discloses wherein said context includes TCP ports of said first and said second apparatuses.

INTEL Ex.1003.137

united states patent and trademark …...petition for inter partes review of 7,237,036 ex. 1003...

Documents