united states patent and trademark …...petition for inter partes review of 7,237,036 ex. 1003...
TRANSCRIPT
UNITED STATES PATENT AND TRADEMARK OFFICE
______________________
BEFORE THE PATENT TRIAL AND APPEAL BOARD ______________________
INTEL CORPORATION Petitioner
v.
ALACRITECH, INC. Patent Owner
________________________
Case IPR. No. Unassigned U.S. Patent No. 7,237,036
Title: FAST-PATH APPARATUS FOR RECEIVING DATA CORRESPONDING A TCP CONNECTION
________________________
Declaration of Robert Horst, Ph.D. in Support of Petition for Inter Partes Review
of U.S. Patent No. 7,237,036
INTEL Ex.1003.001
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
ii
TABLE OF CONTENTS Page
I. INTRODUCTION AND QUALIFICATIONS .......................................... 1
II. MATERIALS RELIED ON IN FORMING MY OPINION ..................... 3
III. UNDERSTANDING OF THE GOVERNING LAW ................................ 4
A. Invalidity by Anticipation ..................................................................... 4
B. Invalidity by Obviousness ..................................................................... 5
IV. LEVEL OF ORDINARY SKILL IN THE ART ........................................ 6
V. STATE OF THE ART AND OVERVIEW OF TECHNOLOGY AT ISSUE ................................................................................................... 7
A. Layered Network Protocols ................................................................... 8
1. OSI Layers .................................................................................. 8
2. TCP/IP Layers ............................................................................. 8
B. TCP/IP ................................................................................................. 10
1. Encapsulation ............................................................................ 11
2. Ethernet Header ......................................................................... 12
3. IP Header ................................................................................... 14
4. TCP header ................................................................................ 15
5. RFC 793 – TCP Specification................................................... 16
6. Prepending Headers .................................................................. 19
7. TCP Control Block (TCB) ........................................................ 20
8. Segmentation ............................................................................. 23
9. Advertising a Receive Window ................................................ 24
C. Protocol Offload .................................................................................. 25
1. RFC 647 – Front-Ending .......................................................... 25
2. RFC 929 – Outboard Processing .............................................. 26
3. Mediation Levels ....................................................................... 28
D. Offloaded Protocols ............................................................................. 31
1. OSI Protocol Offload ................................................................ 31
INTEL Ex.1003.002
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
ii
2. TCP/IP Protocol Offload ........................................................... 31
3. VMTP and XTP Protocol Offload ............................................ 31
4. Multi-Protocol Offload ............................................................. 31
E. Portions of the Protocol Offloaded ..................................................... 32
1. Checksum Offload .................................................................... 32
2. Full Offload ............................................................................... 33
3. Multi-Level Offload .................................................................. 34
4. Header Prediction ...................................................................... 34
F. Offload Implementation ...................................................................... 37
1. Multiprocessor Offload ............................................................. 37
2. Offload Adapters based on Microprocessors ............................ 39
3. Offload Adapters based on Custom Processors or Custom Logic ......................................................................................... 40
G. Protocol Offload Summary ................................................................. 43
H. Additional Background Technology ................................................... 44
1. DMA ......................................................................................... 44
2. Virtual and Physical Memory Addresses .................................. 46
VI. OVERVIEW OF 036 PATENT ............................................................... 47
VII. 036 PATENT PROSECUTION HISTORY ............................................. 50
VIII. CLAIM CONSTRUCTIONS ................................................................... 52
A. Legal Standard ..................................................................................... 52
B. “context” .............................................................................................. 52
C. “prepend” ............................................................................................. 53
IX. THE PRIOR ART ..................................................................................... 54
A. Tanenbaum96: A. Tanenbaum, Computer Networks, 3rd ed. (1996) ...................................................................................... 54
B. U.S. Patent No. 5,768,618 (“Erickson”) ............................................. 61
X. Obviousness Combinations – Motivations To Combine ......................... 73
A. Erickson in Combination with Tanenbaum96 .................................... 73
XI. GROUNDS OF INVALIDITY ................................................................ 80
INTEL Ex.1003.003
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
1
I, Robert Horst, hereby declare as follows:
I. INTRODUCTION AND QUALIFICATIONS
1. My name is Robert Horst. I have been retained on behalf of Petitioner
Intel Corporation (“Intel”) to provide this Declaration concerning technical subject
matter relevant to the petition for inter partes review (“Petition”) concerning U.S.
Patent No. 7,237,036 (Ex.1001, the “036 Patent”). I reserve the right to
supplement this Declaration in response to additional evidence that may come to
light.
2. I am over 18 years of age. I have personal knowledge of the facts
stated in this Declaration and could testify competently to them if asked to do so.
3. My compensation is not based on the resolution of this matter. My
findings are based on my education, experience, and background in the fields
discussed below.
4. I am an independent consultant with more than 30 years of expertise
in the design and architecture of computer systems. My current curriculum vitae is
submitted as Exhibit 1004 and some highlights follow.
5. Currently, I am an independent consultant at HT Consulting where my
work includes consulting on technology and intellectual property. I have testified
as an expert witness and consultant in patent and intellectual property litigation as
well as inter partes reviews and re-examination proceedings.
INTEL Ex.1003.004
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
2
6. I earned my M.S. (1978) in electrical engineering and Ph.D. (1991) in
computer science from the University of Illinois at Urbana-Champaign after
earning my B.S. (1975) in electrical engineering from Bradley University. During
my master’s program, I designed, constructed and debugged a shared memory
parallel microprocessor system. During my doctoral program, I designed and
simulated a massively parallel, multi-threaded task flow computer.
7. After receiving my bachelor’s degree and while pursuing my master’s
degree, I worked for Hewlett-Packard Co. While at Hewlett-Packard, I designed
the micro-sequencer and cache of the HP3000 Series 64 processor. From 1980 to
1999, I worked at Tandem Computers, which was acquired by Compaq Computers
in 1997. While at Tandem, I was a designer and architect of several generations of
fault-tolerant computer systems and was the principal architect of the NonStop
Cyclone superscalar processor. The system development work at Tandem also
included development of the ServerNet System Area Network and applications of
this network to fault tolerant systems and clusters of database servers.
8. Since leaving Compaq in 1999, I have worked with several
technology companies, including 3Ware, Network Appliance, Tibion, and AlterG
in the areas of network-attached storage and biomedical devices. From 2012 to
2015, I was Chief Technology Officer of Robotics at AlterG, Inc., where I worked
INTEL Ex.1003.005
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
3
on the design of anti-gravity treadmills and battery-powered orthotic devices to
assist those with impaired mobility.
9. In 2001, I was elected an IEEE Fellow “for contributions to the
architecture and design of fault tolerant systems and networks.” I have authored
over 30 publications, have worked with patent attorneys on numerous patent
applications, and I am a named inventor on 80 issued U.S. patents.
10. My patents include those directed to networks (e.g., U.S. Pat. No.
6,157,967: Method of data communication flow control in a data processing
system using busy/ready commands), storage (e.g., U.S. Pat. No. 6,549,977: Use of
deferred write completion interrupts to increase the performance of disk
operations), and multi-processor systems (e.g., U.S. Pat. No. 5,751,932: Fail-fast,
fail-functional, fault-tolerant multiprocessor system). My publications include a
conference paper that examined the performance and efficacy of protocol offload
engines Ex.1004.
11. My Curriculum Vitae, which is filed as a separate Exhibit (Ex.1004),
contains further details on my education, experience, publications, and other
qualifications to render this opinion as expert.
II. MATERIALS RELIED ON IN FORMING MY OPINION
12. In addition to reviewing U.S. Patent No. 7,237,036 (Ex.1001), I also
reviewed and considered the prosecution history of the 036 Patent (Ex.1002). I
INTEL Ex.1003.006
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
4
also reviewed U.S. Pat. No. 5,768,618, to Erickson (Ex.1005), and A. Tanenbaum,
3rd ed. (1996) (Ex.1006). I also considered the background materials cited herein.
III. UNDERSTANDING OF THE GOVERNING LAW
13. I understand that a patent claim is invalid if it is anticipated or
rendered obvious in view of the prior art. I further understand that invalidity of a
patent claim requires that the claim be anticipated or obvious from the perspective
of a person of ordinary skill in the relevant art at the time the invention was made.
A. Invalidity by Anticipation
14. I have been informed that a patent claim is invalid as anticipated
under 35 U.S.C. § 102 if each and every element of a claim, as properly construed,
is found either explicitly or inherently in a single prior art reference.
15. I have been informed that a claim is invalid under 35 U.S.C. § 102(a)
if the claimed invention was patented or published anywhere, before the applicant's
invention. I further have been informed that a claim is invalid under 35 U.S.C. §
102(b) if the invention was patented or published anywhere more than one year
prior to the first effective filing date of the patent application (critical date). I
further have been informed that a claim is invalid under 35 U.S.C. § 102(e) if an
invention described by that claim was disclosed in a U.S. patent granted on an
application for a patent by another that was filed in the U.S. before the date of
invention for such a claim.
INTEL Ex.1003.007
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
5
B. Invalidity by Obviousness
16. I have been informed that a patent claim is invalid as obvious under
35 U.S.C. § 103 if it would have been obvious to a person of ordinary skill in the
art, taking into account (1) the scope and content of the prior art, (2) the differences
between the prior art and the claims, (3) the level of ordinary skill in the art, and
(4) any so called “secondary considerations” of non-obviousness, which include:
(i) “long felt need” for the claimed invention, (ii) commercial success attributable
to the claimed invention, (iii) unexpected results of the claimed invention, and (iv)
“copying” of the claimed invention by others. I further understand that it is
improper to rely on hindsight in making the obviousness determination. I have
been informed that Alacritech claims a filing priority date no later than October 14,
1997 for claims 1-7 of the 036 Patent. Accordingly my analysis of the prior art for
the claims of the 036 Patent is based on the prior art and knowledge of a person
having ordinary skill in the art (“POSA”) as of October 14, 1997.
17. I have been informed that a claim can be obvious in light of a single
prior art reference or multiple prior art references. I further understand that
exemplary rationales that may support a conclusion of obviousness include:
(A) Combining prior art elements according to known methods to yield
predictable results;
INTEL Ex.1003.008
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
6
(B) Simple substitution of one known element for another to obtain
predictable results;
(C) Use of known technique to improve similar devices (methods, or
products) in the same way;
(D) Applying a known technique to a known device (method, or product)
ready for improvement to yield predictable results;
(E) “Obvious to try” - choosing from a finite number of identified,
predictable solutions, with a reasonable expectation of success;
(F) Known work in one field of endeavor may prompt variations of it for use
in either the same field or a different one based on design incentives or other
market forces if the variations are predictable to one of ordinary skill in the
art;
(G) Some teaching, suggestion, or motivation in the prior art that would
have led one of ordinary skill to modify the prior art reference or to combine
prior art reference teachings to arrive at the claimed invention.
IV. LEVEL OF ORDINARY SKILL IN THE ART
18. I have been informed that factors that may be considered in
determining the level of ordinary skill in the art may include: (A) “type of
problems encountered in the art;” (B) “prior art solutions to those problems;” (C)
“rapidity with which innovations are made;” (D) “sophistication of the
INTEL Ex.1003.009
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
7
technology;” and (E) “educational level of active workers in the field.” I also
understand that, every factor may not be present for a given case, and one or more
factors may predominate. Here, the 036 Patent is directed to an apparatus and
methods for network protocol offload. In my experience, systems such as those
capable of protocol offload are not designed by a single person but instead require
a design team with wide ranging skills and experience including computer
architecture, network design, software development and hardware development.
Moreover, the design team typically would have comprised individuals with
advanced degrees and some industry experience, or significant industry experience.
19. Accordingly, and while it would be rare to find all of these skills in a
single individual, it is my opinion that a person of ordinary skill in the art
(“POSA”) is a person with at least the equivalent of a B.S. degree in computer
science, computer engineering or electrical engineering with at least five years of
industry experience including experience in computer architecture, network design,
network protocols, software development, and hardware development.
20. The statements that I make in this declaration when I refer to a POSA
are from the perspective of October 14, 1997.
V. STATE OF THE ART AND OVERVIEW OF TECHNOLOGY AT ISSUE
21. In this section, I provide an overview of the technology at issue and
illustrate the state of the art.
INTEL Ex.1003.010
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
8
A. Layered Network Protocols
22. The primary goal of computer networking is to provide fast, reliable
data communications between computer systems. Interoperability has been
accomplished through adherence to standards, and performance has steadily
increased through new technology and optimizations of hardware and software.
1. OSI Layers
23. Computer networking standards provide inter-system communications
across a wide range of hardware and software implementations. The seven-layer
OSI model describes a logical layering including physical, data link, network,
transport, session, presentation and application as illustrated below.
2. TCP/IP Layers
24. The TCP/IP layering is slightly different and corresponds more
closely to the way the networking code is typically partitioned in some popular
Unix variants. TCP/IP layers include physical (e.g. 100baseT, 1000baseT), data
link1 (e.g. IEEE 802 Ethernet, ATM, Token Ring), Internet (e.g. IPv4, IPv6),
1 References on TCP/IP use different terminology to describe the layer under IP.
The data link layer is also called the “host-to-network layer” in Tanenebaum96 and
the “interface layer” in Stevens2 (see below for description of these references).
Some Alacritech patents use “data link layer,” “link layer” and “MAC layer.” Prior
INTEL Ex.1003.011
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
9
transport (e.g. TCP, UDP, VMTP, XTP), and Application (e.g. FTP, SMTP,
Telnet, HTTP). The following figure shows the relationship between the OSI and
TCP/IP layering.
Available at http://mitigationlog.com/how-tcpip-and-reference-osi-model-works/.2
25. At a conceptual level, each layer is responsible only for its respective
functions. This enables, for example, hiding the complexity of the physical data
art references use many of these terms and also sometimes use the name of a
specific implementation (e.g. Ethernet, ATM).
2 It appears that this diagram was made in 2012. It is being used for illustrative
purposes only.
INTEL Ex.1003.012
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
10
connection (that is, actually transmitting the data onto the physical wires) from
layers above the physical, data link, and network layers above. Likewise, the
lower layers must transmit the data on the physical wires, but need not worry about
what application the data belongs to or how the user data has been partitioned into
individual packets.
B. TCP/IP
26. By the mid 1990s, TCP/IP was a firmly entrenched standard and was
a widespread networking protocol to, for example, access the Internet and World
Wide Web. By that time, detailed descriptions of the protocols and open-source
implementations were widely available from books technical papers, and code
repositories. Standard reference books on TCP/IP included Stevens1 (Ex.1008),
Stevens2 (Ex.1013), and Tanenbaum96 (Ex.1006), all of which were widely cited
and relied upon.3 A series of technical memos called RFCs (request for comments)
document the progression of design concepts of the Internet. A few of the key
RFCs are quoted below to establish when certain concepts were proposed and
documented.
3 These books were well known resources to a POSA. Consistent with that,
Alacritech patents cite editions of the Tanenbaum and Stevens books.
INTEL Ex.1003.013
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
11
1. Encapsulation
27. Network layering corresponds to the encapsulation of higher levels by
lower levels. The following figure shows an example with application data
accompanied by an application header. The application header-data combination
becomes the application data of a TCP segment. The TCP segment containing the
application header-data combination along with the IP header forms an IP
datagram. The IP datagram along with an appropriate MAC (media access control)
layer header forms the frame that is sent over the physical interconnect. The
diagram below shows an example of such encapsulation where the MAC layer is
Ethernet. Some software implementations implement the layers separately with
data, or pointers to data, passed between the software modules for each layer. In
this case, one module creates the user data and application header, another module
then encapsulates that with a TCP header, etc. The processing occurs sequentially,
from top to bottom, as shown below.
INTEL Ex.1003.014
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
12
Ex.1008, Stevens1 at .034.
2. Ethernet Header
28. The 14-byte Ethernet header includes 48-bit (6 byte) source and
destination MAC (media access control) addresses for uniquely identifying the
network adapters at each end of the link.
INTEL Ex.1003.015
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
13
Ex.1013, Stevens2 at .125.
29. The MAC address can be determined by a routing table in the
protocol stack. In an Ethernet-based network, the 48-bit MAC address corresponds
to a physical interface, such as a network interface card (NIC) or WiFi modem in a
server or router. The MAC address field of the destination in the Ethernet header
determines the next hop along the route to the destination. At each router along the
path, the MAC address field is changed to the MAC address of the next router. The
final router changes the MAC address field to the MAC address of the destination.
INTEL Ex.1003.016
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
14
3. IP Header
Ex.1008, Stevens1 at .058.
30. An IP header is illustrated by the figure above from Stevens1. The IP
header includes source and destination IP addresses for identifying the end points
of the connection. The 32-bit IPv4 addresses are usually expressed in dotted
decimal notion. For example, an IP address of Google.com is 216.58.216.46.
INTEL Ex.1003.017
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
15
4. TCP header
Ex.1008, Stevens1 at .249.
31. A TCP header is illustrated by the figure above from Stevens1. The
TCP header includes 16-bit source and destination port numbers for identifying the
processes that are communicating. TCP is used to establish connections between
processes at IP addresses across the network and the TCP port numbers identify
which processes are communicating. For instance, Email may use SMTP (simple
mail transfer protocol) on port 25 (SMTP’s well-known port number) while a web
server is using HTTP on port 80 (HTTP’s well-known port number). The TCP
layer performs several important functions such as ensuring that the segments are
assembled in the proper order. As shown above, a “sequence number” is included
for several reasons such as identifying segments and performing reassembly. For
INTEL Ex.1003.018
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
16
more information on TCP, see Stevens1 (Ex.1008) Chapter 17, “TCP:
Transmission Control Protocol,” pp. 223-228. A sender or receiver maintains the
“sequence number” as a variable for these purposes. Accordingly, routing packets
between source and destination processes over Ethernet is based on the MAC
addresses, IP addresses and TCP ports.
5. RFC 793 – TCP Specification
32. The original TCP specification was published in RFC 793 (Ex.1007)
in September 1981. RFC 793 is a full specification for TCP and shows, among
many other things, that identifying a TCP connection by its source and destination
IP addresses and TCP ports were known more than 15 years before the earliest
priority dates of the Alacritech patents.
a) Sockets
33. The combination of an IP address and a port number is sometimes
called a “socket.” A TCP connection is formed by a pair of sockets which includes
a source IP address and TCP port number and a destination IP address and TCP
port number. IP addresses and TCP ports can be specified by the application. For
instance, a browser accessing Google.com may open a socket to IP address
216.58.216.46 and port 80:
The combination of an IP address and a port number is sometimes
called a socket. This term appeared in the original TCP specification
(RFC 793), and later it also became used as the name of the Berkeley-
INTEL Ex.1003.019
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
17
derived programming interface (Section 1.15). It is the socket pair (the
4-tuple consisting of the client IP address, client port number, server
IP address, and server port number) that specifies the two end points
that uniquely identifies each TCP connection in an internet.
Ex.1008, Stevens1 at .250.
34. Much of the software for network communications leverages standard
application programming interfaces (APIs) and libraries. An early standard is
Berkeley Sockets, also known as BSD Sockets or just “sockets.” Tanenabaum96
offers this overview of sockets:
Let us now briefly inspect another set of transport primitives, the
socket primitives used in Berkeley UNIX for TCP. They are listed in
Fig. 6-6. Roughly speaking, they follow the model of our first
example but offer more features and flexibility. We will not look at
the corresponding TPDUs here. That discussion will have to wait until
we study TCP later in this chapter.
The first four primitives in the list [SOCKET, BIND, LISTEN,
ACCEPT] are executed in that order by servers. The SOCKET
primitive creates a new end point and allocates table space for it
within the transport entity. The parameters of the call specify the
addressing format to be used, the type of service desired (e.g., reliable
byte stream), and the protocol.
Ex.1006, Tanenbaum96 at .504-.505.
Now let us look at the client side. Here, too, a socket must first be
created using the SOCKET primitive, but BIND is not required since
INTEL Ex.1003.020
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
18
the address used does not matter to the server. The CONNECT
primitive blocks the caller and actively starts the connection process.
When it completes (i.e., when the appropriate TPDU is received from
the server), the client process is unblocked and the connection is
established. Both sides can now use SEND and RECEIVE to transmit
and receive data over the full-duplex connection.
Id. at .505.
35. Establishing a connection over TCP is sometimes called “opening a
socket.” As described above, after the server has executed its sequence of
primitives SOCKET, BIND, LISTEN, ACCEPT, the client executes the SOCKET
and CONNECT primitives, then both sides can communicate using SEND and
RECEIVE. These primitives involve control packet transmissions (versus simply
sending a data packet that includes application data). Opening a socket
(establishing a connection) thus requires both the sender and receiver exchanging a
series of control messages, interpreting the messages, and in response to certain
control messages, responding with the appropriate message. Accordingly, it is a
more complex process to open a connection (and enter an ESTABLISHED state)
than simply one side sending a single data packet transmission. As described
below, this is why it was known in the art for the host to open the connection (the
more complex aspect of communication) and to offload only the sending and
receiving of data packets to a separate device.
INTEL Ex.1003.021
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
19
6. Prepending Headers
36. When a socket is opened and after the connection is established,
application data is sent and received by constructing packets that encapsulate the
data. Standard UDP/IP and TCP/IP implementations, such as BSD 4.4-Lite, copy
headers and data into linked list structures called mbufs. Stevens2 describes how
headers are prepended to the data in the mbuf chain:
Ex.1013, Stevens2 at .043.
INTEL Ex.1003.022
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
20
37. Figure 1.8 above shows an Mbuf chain for UDP, but Stevens2 later
broadens the discussion to include TCP and shows diagrams of TCP and UDP
mbuf chains.
Figure 2.2 shows an example of two packets on a queue. It is a
modification of Figure 1.8. We have placed the UDP datagram onto
the interface output queue (showing that the 14-byte Ethernet header
has-been prepended to the IP header in the first mbuf on the chain)
and have added a second packet to the queue: a TCP segment
containing 1460 bytes of user data. The TCP data is contained in a
cluster and an mbuf has been prepended to contain its Ethernet, IP,
and TCP headers.
Ex.1013, Stevens2 at .060.
38. Note that the outgoing frames include all three headers – MAC (e.g.
Ethernet), IP and TCP. The first hop MAC address is determined based on the
route to the destination. Once the destination MAC address is determined, it is
stored and accessed when constructing the outgoing frames. Accordingly, the
construction of packets, whether on the host or network interface card (see
offloading discussion below), requires adding the TCP, IP, and MAC headers.
7. TCP Control Block (TCB)
39. Established connections need to maintain certain state information.
For example, the state of a TCP connection is used to track acknowledgements
(ACKs) with the connection that requested the data in order to later retransmit the
INTEL Ex.1003.023
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
21
segment if required. The TCP state is held in a structure called the TCB (TCP or
transmission control block).
The maintenance of a TCP connection requires the remembering of
several variables. We conceive of these variables being stored in a
connection record called a Transmission Control Block or TCB.
Among the variables stored in the TCB are the local and remote
socket numbers, the security and precedence of the connection,
pointers to the user’s send and receive buffers, pointers to the
retransmit queue and to the current segment. 1. In addition several
variables relating to the send and receive sequence numbers are stored
in the TCB.
Ex.1007, RFC 793 at .024.
40. TCB’s maintain the state at each end of a TCP connection:
Protocol control blocks (PCBs) are used at the protocol layer to hold
the various pieces of information required for each UDP or TCP
socket. The Internet protocols maintain Internet protocol control
blocks and TCP control blocks.
Ex.1013, Stevens2 at .739.
41. RFC 2140 shows a list of the information contained in a TCB:
The TCP Control Block (TCB)
A TCB is associated with each connection, i.e., with each association of a pair of applications across the network. The TCB can be summarized as containing [9]:
Local process state
INTEL Ex.1003.024
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
22
pointers to send and receive buffers
pointers to retransmission queue and current segment
pointers to Internet Protocol (IP) PCB
Per-connection shared state
macro-state
connection state
timers
flags
local and remote host numbers and ports
micro-state
send and receive window state (size*, current number)
round-trip time and variance
cong. window size*
cong. window size threshold*
max windows seen*
MSS#
round-trip time and variance#
Ex.1014, RFC2140 at .002.
As part of the TCP layer, the sequence number must be kept. For example, when
sending subsequent packets, the TCP layer must increment the sequence variable,
placing this new number into the next packet. Ex.1006, Tanenbaum96 at .584. As
INTEL Ex.1003.025
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
23
will be shown below, either the host can perform this function or an offloading
device that offloads this function from the host to a different device.
8. Segmentation
42. TCP sends data as a sequence of segments:
The sending and receiving TCP entities exchange data in the form of
segments. A segment consists of a fixed 20-byte header (plus an
optional part) followed by zero or more data bytes. The TCP software
decides how big segments should be. It can accumulate data from
several writes into one segment or split data from one write over
multiple segments. Two limits restrict the segment size. First, each
segment, including the TCP header, must fit in the 65,535 byte IP
payload. Second, each network has a maximum transfer unit or MTU,
and each segment must fit in the MTU.
Ex.1006, Tanenbaum96 at .543.
43. The application programs are generally unaware of the way TCP data
is segmented, buffered and copied by the operating system. Application programs
send or receive a stream of bytes through the TCP connection:
A stream of 8-bit bytes is exchanged across the TCP connection
between the two applications. There are no record markers
automatically inserted by TCP. This is what we called a byte stream
service. If the application on one end writes 10 bytes, of a write of 20
bytes, followed by a write of 50 bytes, the application at the other end
of the connection cannot tell what size the individual writes were. The
INTEL Ex.1003.026
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
24
other end may read the 80 bytes in four reads 20 bytes at a time. One
end puts a stream of bytes into TCP and the same, identical stream of
bytes appears at the other end.
Ex.1008, Stevens1 at .248.
44. A large stream of data is sent as a series of segments, generally with
all but the last segment sent as an MSS (maximum segment sized) TCP segment.
A “TCP Segment” is defined as: “The unit of data exchanged between TCP
modules (including the TCP header).” Ex.1036, RFC 791 at .034. Segments and
segmentation are commonly discussed in reference to the transport layer of TCP
and ATM networks.
45. As with all of these functions of protocol procession, as will be shown
below, either the host can perform the function or an offloading device that
offloads protocol processing can perform the function.
9. Advertising a Receive Window
46. In TCP, the amount of data a sender is allowed to send is based on an
advertised window size sent from the receiver:
Window management in TCP is not directly tied to
acknowledgements as it is in most data link protocols. For example,
suppose the receiver has a 4096-byte buffer as shown in Fig. 6-29. If
the sender transmits a 2048-byte segment that correctly received, the
receiver will acknowledge the segment. However, since it now has
only 2048 of buffer space (until the application removes some data
INTEL Ex.1003.027
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
25
from the buffer), it will advertise a window of 2048 starting at the
next byte expected.
Now the sender transmits another 2048 bytes, which are
acknowledged, but the advertised window is 0. The sender must stop
until the application process on the receiving host has removed some
data from the buffer, at which time TCP can advertise a larger
window.
Ex.1006, Tanenbaum96 at .551-.552.
47. This effectively allows the receiver to ensure that the sender does not
overflow it with data. It “advertises” this value by including it in the TCP header.
The sender adjusts the amount of data that it sends in view of this value.
C. Protocol Offload
48. To increase performance, designers have employed different
techniques such as parallel processing, improved hardware, memory copy
reduction via hardware and/or software, and hardware to offload all or part of the
protocol stack.
1. RFC 647 – Front-Ending
49. As early as 1974, front-end protocol offload was already being
considered for standardization as described in request-for-comments RFC 647. At
that time, NCP (Network Control Protocol) was the protocol used in ARPANET,
the predecessor to the modern Internet.
“FRONT-ENDING”
INTEL Ex.1003.028
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
26
In what might be thought of as the greater network community, the
consensus is so broad that the front-ending is desirable that the topic
needs almost no discussion here. Basically, a small machine (a PDP-
11 is widely held to be most suitable) is interposed between the IMP
and the host in order to shield the host from the complexities of the
NCP.
Ex.1019, RFC 647 at .002.
50. RFC 647 goes on to discuss rigid and flexible front-end (FE)
alternatives and includes a high-level discussion of a protocol for interfacing
between the host and FE.
2. RFC 929 – Outboard Processing
51. In 1984, RFC 929 was distributed to begin work on a possible
standard for interfacing between a host and an OPE (Outboard Processing
Environment)4:
4 Other names have been used to describe the OPE concept. Names for protocol
offload implementations included Front-End Processor, Network Front-End,
Protocol Processor, Protocol Engine, Protocol Accelerator, Hardware Bypass,
Smart Network Interface, SMART NIC, Smart Adapter, Protocol Processing
Engine, IO Adapter, Intelligent I/O Processor and intelligent Network Interface
Card.
INTEL Ex.1003.029
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
27
There are two fundamental motivations for doing outboard
processing. One is to conserve the Hosts' resources (CPU cycles and
memory) in a resource sharing intercomputer network, by offloading
as much of the required networking software from the Hosts to
Outboard Processing Environments (or "Network Front-Ends") as
possible. The other is to facilitate procurement of implementations of
the various intercomputer networking protocols for the several types
of Host in play in a typical heterogeneous intercomputer network, by
employing common implementations in the OPE.
Ex.1009, RFC 929 at .002.
The interaction between the Host and the OPE must be capable of
providing a suitable interface between processes (or protocol
interpreters) in the Host and the off-loaded protocol interpreters in the
OPE. This interaction must not, however, burden the Host more
heavily than would have resulted from supporting the protocols
inboard, lest the advantage of using an OPE be overridden.
Id. at .003.
52. RFC 929 includes a “protocol parameter” for selecting the protocol to
be offloaded. TCP, UDP and IP were among the protocols to be offloaded:
INTEL Ex.1003.030
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
28
Id. at .013.
3. Mediation Levels
53. The 1984 proposal to standardize offload implementations in RFC
929 is evidence that there was already much activity in offload implementations at
that time. The authors of RFC 929 anticipated different types of outboard
processors and recognized that the amount of work to be done by the outboard
processor might vary from none to partial to full offload. To handle this range, a
“mediation level” parameter was proposed.
The mediation level parameter is an indication of the role the Host
wishes the OPE to play in the operation of the protocol. The extreme
ranges of this mediation would be the case where the Host wished to
remain completely uninvolved, and the case where the Host wished to
make every possible decision. The specific interpretation of this
parameter is dependent upon the particular off-loaded protocol.
The concept of mediation level can best be clarified by means of
example. A full inboard implementation of the Telnet protocol places
several responsibilities on the Host. These responsibilities include
negotiation and provision of protocol options, translation between
INTEL Ex.1003.031
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
29
local and network character codes and formats, and monitoring the
well-known socket for incoming connection requests. The mediation
level indicates whether these responsibilities are assigned to the Host
or to the OPE when the Telnet implementation is outboard. If no OPE
mediation is selected, the Host is involved with all negotiation of the
Telnet options, and all format conversions.
With full OPE mediation, all option negotiation and all format
conversions are performed by the OPE. An intermediate level of
mediation might have ordinary option negotiation, format conversion,
and socket monitoring done in the OPE, while options not known to
the OPE are handled by the Host.
The parameter is represented with a single ASCII digit. The value 9
represents full OPE mediation, and the value 0 represents no OPE
mediation. Other values may be defined for some protocols (e.g., the
intermediate mediation level discussed above for Telnet). The default
value for this parameter is 9.
Id. at.015-.016.
54. More than a decade passed between the publication of RFC 929 and
the priority date of the earliest Alacritech provisional application. During that
time, protocol offload was the subject of many papers and systems across the range
anticipated by RFC 929. These implementations can be categorized based on the
three principal dimensions of protocol offload: 1) The set of protocols to be
offloaded (e.g. TCP/IP, VMTP, OSI), 2) the portions of the protocol that are
INTEL Ex.1003.032
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
30
offloaded (e.g. full offload, partial offload, fast path offload, no offload), 3) the
offload implementation (e.g. parallel processor, standard microprocessor, custom
processor, custom hardware). The cited references below include many different
combinations of these three dimensions, but it should be noted that each cited
combination was primarily a design decision among a small, finite number of
choices. It would have been obvious to alter these implementations along one or
more of the dimensions for a new implementation that would have produced
predictable results. In other words, it was well recognized that depending on the
application, it was desirable to vary the extent of offloading. The simplest example
is that while offloading the entire protocol may seem on the surface advantageous,
it was expensive because handling every type of data packet requires a complex
offloading device. For example, it was well known that setting up a connection
and entering the ESTABLISHED state was much more complex than simply
receiving and sending data packets. Ex.1006, Tanenbaum96 at .583 (“The key to
fast TPDU processing is to separate out the normal case (one-way data transfer)
and handle it specially. Although a sequence of special TPDUs are needed to get
into the ESTABLISHED state, once there, TPDU processing is straightforward until
one side starts to close the connection.”).
INTEL Ex.1003.033
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
31
D. Offloaded Protocols
55. By the mid-1990s, TCP/IP was becoming a predominant network
standard, but many other networks were still in use and new network protocols
were being investigated.
1. OSI Protocol Offload
56. OSI protocol offload engines were built and tested by Thia and
Woodside. Ex.1015, Thia; Ex.1038, Woodside.
2. TCP/IP Protocol Offload
57. TCP/IP offload engines were built or described by many in the field
including Bach, Erickson, Morris, Cooper, Kung, Rütsche and Chesson. Ex.1020,
Bach; Ex.1005, Erickson; Ex.1021, Morris; Ex.1022, Cooper; Ex.1023, Kung;
Ex.1017, Rütsche92; Ex.1018, Rütsche93; Ex.1024, Chesson.
3. VMTP and XTP Protocol Offload
58. VMTP and XTP were proposed as alternatives to TCP. A VMTP
offload engine was described by Kanakia, and an XTP protocol accelerator was
described by Chesson. Ex.1025, Kanakia; Ex.1024, Chesson.
4. Multi-Protocol Offload
59. General-purpose offload engines were also proposed. Erickson
discloses a range of protocol scripts for offloading different protocols.
Each type of protocol will have its own script. Types of protocols
include, but are not limited to, TCP/IP, UDP/IP, BYNET lightweight
INTEL Ex.1003.034
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
32
datagrams, deliberate shared memory, active message handler, SCSI,
and [Fibre] Channel.
Ex.1005, Erickson at 5:47-51.
60. Kung and Cooper describe the Nectar network-based multicomputer
system in which the processors communicate via Communications Acceleration
Boards (CABs) that can run different protocols.
The CAB runtime system currently supports several transport
protocols with different reliability/overhead tradeoffs [10]. They
include the standard TCP/IP protocol suite besides a number of
Nectar-specific protocols.
Ex.1026, Kung and Cooper at .003.
E. Portions of the Protocol Offloaded
61. The portion of the protocol offloaded (called “mediation level” in
RFC 929) falls into several types that range from partial offload to full offload.
That is, either part of the protocol processing can be offloaded (partial offload) or
the entire protocol processing can be offload (full offload).
1. Checksum Offload
62. One of the first parts of protocol processing to be offloaded was the
checksum calculation (a partial offload). An adapter doing only checksum offload
is less complex because it does not require the adapter to maintain the connection
state.
INTEL Ex.1003.035
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
33
63. Dalton describes the HP Afterburner card with optional hardware for
checksum calculation:
To support the use of the on-card memory as clusters, we have written
a small number of functions. The most important is a special copy
routine, functionally equivalent to the BSD function bcopy. It is
optimized for moving data over the I/O bus, and also optionally uses
the card's built-in unit to calculate the IP checksum of the data it
moves. Another function converts a single-copy cluster into a chain of
normal clusters and mbufs; it also calculates the checksum.
Ex.1027, Dalton at .011 (emphasis added).
2. Full Offload
64. Exemplary full offload papers and systems include Murphy, Bach,
MacLean, Cooper and Rütsche.5 Ex.1028, Murphy; Ex.1020, Bach; Ex.1029,
MacLean; Ex.1022, Cooper; Ex.1017, Rütsche92; Ex.1018, Rütsche93.
5 In a “full offload,” the adapter does not typically initiate connections on its own.
The host initiates the connection by opening a socket to an IP address and TCP
port. The host establishes the connection and directs the stack of protocol layers to
create the connection. Yet those of skill in the art often still refer to such systems
as “full offload.”
INTEL Ex.1003.036
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
34
3. Multi-Level Offload
65. Chesson describes a protocol chip plus an optional control processor
that can do a range of offloads from partial (checksum, sequence numbers, etc.) to
full offload. Ex.1024, Chesson.
4. Header Prediction
66. In 1988, Van Jacobson proposed a header prediction algorithm for
improving the performance of TCP/IP implementations. This “header prediction”
teaching led to various types of partial offload. The code, which uses header
templates, is partitioned into one module for the commonly executed path (the fast
path) and another module to handle the more complex cases and exception
handling (the slow path).
67. Code to implement the header prediction algorithm was incorporated
in the BSD 4.4-Lite distribution.
Most IP packets carry no options. Of the 20-byte header, 14 of the
bytes will be the same for all IP packets sent by a particular TCP
connection. The IP length, ID, and checksum fields (6 bytes total) will
probably be different for each packet. Also, if a packet carries any
options, all packets for that TCP connection will be likely to carry the
same options.
The Berkeley implementation of UNIX makes some use of this
observation, associating with each connection a template of the IP and
TCP headers with a few of the fixed fields filled in. To get better
INTEL Ex.1003.037
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
35
performance, we designed an IP layer that created a template with all
the constant fields filled in. When TCP wished to send a packet on
that connection, it would call IP and pass it the template and the
length of the packet. Then IP would block-copy the template into the
space for the IP header, fill in the length field, fill in the unique ID
field, and calculate the IP header checksum.
This idea can also be used with TCP, as was demonstrated in an
earlier, very simple TCP implemented by some of us at MIT [6]. In
that TCP, which was designed to support remote login, the entire state
of the output side, including the unsent data, was stored as a
preformatted output packet. This reduced the cost of sending a packet
to a few lines of code.
A more sophisticated example of header prediction involves applying
the idea to the input side. In the most recent version of TCP for
Berkeley UNIX, one of us (Jacobson) and Mike Karels have added
code to precompute what values should be found in the next incoming
packet header for the connection. If the packets arrive in order, a few
simple comparisons suffice to complete header processing.
Ex.1030, Clark at .003.
68. The 1995 book (Stevens2) walks through the Jacobson BSD header
prediction code including the conditions for selecting the fast or slow path. In order
to take the fast receive path, six conditions must be met, including:
1. The connection must be established.
INTEL Ex.1003.038
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
36
2. The following four control flags must not be on: SYN, FIN,
RST, or URG. The ACK flag must be on.
3.-6. [Conditions to assure that the received segments are in-order]
Ex.1013, Stevens2 at .962-.963.
a) Partial Offload with Header Prediction
69. The fast and slow paths described by Stevens gave a natural division
for protocol offload implementations. Building on the Jacobson BSD header
prediction code, Biersack (Ex.1016) describes TCP protocol offload with fast and
slow paths. Thia (Ex.1015) also build upon the Jacobson BSD header prediction
algorithm and apply its teachings to derive an OSI protocol offload with the fast
path implemented in hardware.
70. The header prediction code in the FreeBSD release is also discussed
in the Alacritech 1997 Provisional application:
The base for the receive processing done by the INIC on an existing
context is the fast-path or “header prediction” code in the FreeBSD
release.
Ex.1031, Alacritech 1997 Provisional Application at .057.
71. Thus, the Jacobson header prediction code forms the basis of what
Alacritech offloads to its intelligent network interface card (INIC).
INTEL Ex.1003.039
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
37
F. Offload Implementation
72. Offloading the transport layer to an interface card was discussed in
Tanenbaum96:
The hardware and/or software within the transport layer that does the
work is called the transport entity. The transport entity can be in the
operating system kernel, in a separate user process, in a library
package bound into network applications, or on the network interface
card.
Ex.1006, Tanenbaum96 at .498 (emphasis added).
73. Others have disclosed more details of offload hardware including
implementations based on multiprocessors, microprocessors, custom processors
and custom logic.
1. Multiprocessor Offload
74. Several groups proposed or built systems in which protocol
processing is offloaded from the application processor to one or more dedicated
processors in a multiprocessor configuration.
75. The Nectar system:
The Nectar communication processor together with its host can be
viewed as a (heterogeneous) shared-memory multiprocessor.
Dedicating one processor of a multiprocessor host to communication
tasks can achieve some of the benefits of the Nectar approach, but this
constrains the choice of host operating system and hardware. In
INTEL Ex.1003.040
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
38
contrast, the Nectar communication processor has been used with a
variety of hosts and host operating systems.
Ex.1022, Cooper at .006.
76. The Parallel Protocol Engine:
In this paper our goal is to demonstrate that a careful implementation
of a standard transport protocol stack on a general purpose
multiprocessor architecture allows efficient use of the bandwidth
available in today’s high-speed networks. As an example, we chose
to implement the TCP/IP protocol suite on our 4-processor prototype
of the PPE.
Ex.1017, Rütsche92 at .009.
77. Rütsche also designed a Gb/s Multimedia Protocol Adapter based on
the PPE:
In this paper we present a new multiprocessor communication
subsystem architecture, the Multimedia Protocol Adapter (MPA),
which is based on the experience with the Parallel Protocol Engine
(PPE) [Kaiserswerth 92] and is designed to connect to a 622 Mb/s
ATM network. The MPA architecture exploits the inherent
parallelism between the transmitter and receiver parts of a protocol
and provides support for the handling of new multimedia protocols.
INTEL Ex.1003.041
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
39
Ex.1018, Rütsche93 at .001.
2. Offload Adapters based on Microprocessors
78. Protocol offloading may be implemented by executing code in one or
more microprocessors on an intelligent network interface card or on a network
accelerator board used in conjunction with a standard NIC (network interface
card).
79. Kanakia describes a network adapter board with a microprocessor and
other support chips:
The prototype Network Adapter Board (NAB) has been designed
using Motorola’s MC68020 as the on-board processor, running at 16
Mhz clock rate; it uses about 200 hundred standard MSI and LSI
components. The current version is designed for connecting two VMP
multiprocessor system with a 100 megabit/sec point-to-point
connection.
Ex.1025, Kanakia at .010.
80. MacLean describes microprocessor-based protocol accelerators
residing on a VME card:
The internal functions and data flows of the protocol accelerator
shown in Figure 2. We use a dual CPU approach to protocol
processing, with one CPU subsystem dedicated to the transmission,
and the other to the reception. The transmit and receive CPUs are both
68020 (25 MHz) based, each with its own private resources: ROM,
INTEL Ex.1003.042
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
40
parallel I/O, interrupt circuitry and 128 kilobytes of random access
memory (RAM). In addition there is 128 kilobytes of RAM shared by
both CPUs which is also accessible to the two host busses, VME and
VSB.
Ex.1029, MacLean at .004.
81. Rütsche describes a multimedia protocol adapter (MPA) using a pair
of “transputer” microprocessors:
The selection of the inmos2 T9000 [inmos 91] is based on our good
experience with the transputer family of processors in the PPE. The
most significant improvements of the T9000 over the T425 for
protocol processing are faster programmable link interfaces, a faster
memory interface, and a cache.
Ex.1018, Rütsche93 at .003.
3. Offload Adapters based on Custom Processors or Custom Logic
82. Other designers have proposed custom processors and/or custom logic
for protocol offload. Chesson describes a Protocol Engine chipset for real-time
protocol processing. Depending on the amount of protocol offload desired, an
adapter can be built with or without the custom control processor (CP):
The Protocol Engine® chipset offers real-time protocol processing for
high-speed networks. A wide range of cost-performance subsystem
solutions are available through various configurations based on the PE
Chipset. The chipset (shown in Figure 1) consists of four chips:
INTEL Ex.1003.043
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
41
MPORT, HPORT, BCTL, and CP. A basic configuration consists of
MPORT, HPORT, and BCTL.
Ex.1024, Chesson at .006.
83. The optional Chesson Control processor is a custom processor
designed for fast protocol processing:
Control Processor (CP) of the Protocol Engine® chipset is a 32-bit,
multi-thread execution unit that provides high speed protocol
processing.
Id. at .039.
84. Thia also discloses the design of a custom VLSI chip for protocol
offload:
The chip design based on bypassing is called ROPE, for Reduced
Operation Protocol Engine. The contribution of this paper is to define
the host/chip interface and the chip operation, and to report on a
VHDL-based feasibility study of the chip design. It appears to be
feasible to support an end-system single-connection data rate
approaching 1 Gbps.
Ex.1015, Thia at .002.
85. Culler describes the Berkeley Network of Workstations (NOW) in
which the Active Messages protocol is offloaded to intelligent NICs built with
Myricom LANai chips:
The hardware configuration of the Berkeley NOW system consists of
one hundred and five Sun Ultra 170 workstations, connected by a
INTEL Ex.1003.044
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
42
large Myricom network[Bode95], and packaged into 19-inch racks.
Each workstation contains a 167 MHz Ultra1 microprocessor with
512 KB level-2 cache, 128 MB of memory, two 2.3 GB disks,
ethernet, and a Myricom “Lanai” network interface card (NIC) on the
SBus. The NIC has a 37.5 MHz embedded processor and three DMA
engines, which compete for bandwidth to 256 KB of embedded
SRAM. The node architecture is shown in Figure 1.
Ex.1032, Culler at .001.
Id. at .003.
86. Alteon describes their third generation intelligent Ethernet adapter that
includes performance improvements from protocol offload, reduction in memory
copies and reduction of interrupts.
Using an intelligent adapter with an onboard RISC-based processor
specially designed for embedded application processing, Alteon’s
Gigabit Ethernet technology not only reduces the number of times
INTEL Ex.1003.045
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
43
data is copied among processing entities, it allows a single interrupt to
be issued for multiple data packets—radically altering the ratio of
interrupts to packets, and eliminating the scalability problems inherent
in older adapter designs.
Ex.1033, Alteon at .022.
87. HP discloses a custom chip called Tachyon that includes send offload,
receive offload, hardware checksum calculation, DMA, and headers/data splitting:
Ex.1034, Smith at .004.
G. Protocol Offload Summary
88. The preceding paragraphs have shown many offload implementations
foreshadowed by RFC 929 described above. These implementations include many
variations along the three dimensions of network protocol offload: 1) the set of
protocols to be offloaded, 2) the portions of the protocol that are offloaded, and 3)
the offload implementation. The citations show that each of the individual
concepts was well known and that many different combinations along the three
dimensions were successfully implemented by practitioners. It would have been
obvious to alter these implementations along one or more of the dimensions for a
new implementation that would have produced predictable results.
INTEL Ex.1003.046
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
44
H. Additional Background Technology
89. Protocol offload adapters have incorporated many well-known design
techniques originally developed for general purpose processors. Some of these
concepts, such as DMA and virtual memory, are briefly described below. More
information is available from textbooks on Computer Architecture. See e.g., David
A. Patterson and John L. Hennessy, Computer Architecture: A Quantitative
Approach, Morgan Kaufmann Publishers Inc., San Mateo, CA, USA., 1990.
(Ex.1035, Patterson).
1. DMA
90. DMA (Direct Memory Access) is a hardware-based technique for
transferring data between memory systems or between a host memory and an I/O
device.
Since I/O events so often involve block transfers, direct memory
access (DMA) hardware is added to many computer systems to allow
transfers of numbers of words without intervention by the CPU.
Ex.1035, Patterson at .151.
91. Before DMA was common, processors used I/O (input/output)
instructions to transfer data to I/O devices. A benefit of using DMA is that fewer
processor cycles are required to transfer the data. With DMA, the DMA engine is
loaded with an address and count of data to be moved, then the data movement
proceeds while the processor is doing other tasks. In some implementations, DMA
INTEL Ex.1003.047
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
45
engines are under the control of a host processor, while in others a DMA engine is
controlled by an intelligent controller on an I/O adapter. The DMA engine itself
may be located either in the host or on an I/O adapter.
92. DMA may be used either to read from host memory or to write to host
memory. In some implementations, there are separate send and receive DMA
engines and in others, a common DMA engine can be programmed to transfer to or
from host memory:
Outbound Block Mover. The outbound block mover block’s function
is to transfer outbound data from host memory to the outbound
sequence manager via DMA. It takes as input an address/length pair
from the outbound sequence manager block, initiates the Tachyon
system interface bus ownerships, and performs the most efficient
number and size of transactions on the Tachyon system interface bus
to pull in the data requested.
…
Inbound Block Mover. The inbound block mover is responsible for
DMA transfers of inbound data into buffers specified by the
multiframe sequence buffer queue, the single-frame sequence buffer
queue, the inbound message queue, or the SCSI buffer manager. The
inbound block mover accepts an address from the inbound data
manager, then accepts the subsequent data stream and places the data
into the location specified by the address.
Ex.1034, Smith at .007, .009.
INTEL Ex.1003.048
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
46
Movement of data across the host bus interface are minimized by
using an on-chip DMA for fast block data transfer to/from the host
system memory.
Ex.1015, Thia at .007.
Bus Controller (BC): The BC is a programmable busmaster DMA
controller. It provides a small FIFO and a table for DMA requests.
The FIFO contains a pointer to the linked list of source data and a
connection identifier. The BC determines the destination memory
address through the connection identifier in the table. The list format
is the same for the BC and the DMAU. In the transmit BC the host
writes to the FIFO and the protocol processor to the table. In the
receive BC the protocol processor writes to the FIFO and the host to
the table.
Ex.1018, Rütsche93 at .004-.005.
2. Virtual and Physical Memory Addresses
93. I/O adapters that transfer data directly to or from memory need to be
provided with the memory addresses of the buffers. Many processors use virtual
addressing in which large buffers appear to the processor as single contiguous
memory space even though the addressed pages may not be contiguous in physical
memory. To translate from virtual to physical memory addresses, the processor
uses page tables that store the appropriate mappings from virtual to physical pages.
With virtual memory, the CPU produces virtual addresses that are
translated by a combination of hardware and software to physical
INTEL Ex.1003.049
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
47
addresses, which can be used to access main memory. This process is
called memory mapping or address translation.
Ex.1035, Patterson at .050 (emphasis in original).
94. In order for an I/O device to access the main memory buffers, either
the physical address may be supplied for each page, or a translation table may be
maintained on the I/O controller to allow it to operate on virtual addresses.
Erickson has a “physical address buffer map” in the adapter memory and discusses
some options for handling the translation:
The vtophys( ) function performs a translation of the user-provided
virtual address into a physical address usable by the adapter. In all
likelihood, the adapter would have a very limited knowledge of the
user process’ virtual address space, probably only knowing how to
map virtual-to-physical for a very limited range, maybe as small as a
single page. Pages in the user process’ virtual address space for such
buffers would need to be fixed. The udpscript procedure would need
to be enhanced if the user data were allowed to span page boundaries.
Ex.1005, Erickson at 8:14-24.
VI. OVERVIEW OF 036 PATENT
95. The 036 Patent relates to offloading TCP protocol processing from a
host onto a network interface card (NIC). Ex.1001, 036 Patent at Abstract. See
Section V.C.-G. above for a description of prior art offloading. The specification
INTEL Ex.1003.050
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
48
of the 036 Patent refers to the disclosed NIC, which performs offloading, as an
“intelligent network interface card (INIC)”. See id. at Abstract.
96. The INIC of the 036 Patent permits two modes of operation: a “fast
path” in which protocol processing from the physical layer through the TCP layer
is performed on the INIC, and a “slow path” in which network frames are handed
to the host at the MAC layer and passed up through the host protocol stack
conventionally. The concept is illustrated in Fig. 24, shown below:
The answer shown in FIG.24 is to use two modes of operation: One
in which the network frames are processed on the INIC through TCP
and one in which the card operates like a typical dumb NIC. We call
these two modes fast-path, and slow-path. In the slow-path case,
network frames are handed to the system at the MAC layer and passed
up through the host protocol stack like any other network frame. In
INTEL Ex.1003.051
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
49
the fast path case, network data is given to the host after the headers
have been processed and stripped.
The transmit case works in much the same fashion. In slow-path mode
the packets are given to the INIC with all of the headers attached. The
INIC simply sends these packets out as if it were a dumb NIC. In fast-
path mode, the host gives raw data to the INIC which it must carve
into MSS sized segments, add headers to the data, perform checksums
on the segment, and then send it out on the wire.
Ex.1001, 036 Patent at 39:10-27, Fig. 24.
97. The INIC uses a “connection context” to determine which “path”
should be used for a received packet:
The IP source address of the IP header, the IP destination address of
the IP header, the TCP source address of the TCP header, and the TCP
destination address of the TCP header together uniquely define a
single connection context (TCB) with which the packet is associated.
Processor 470 examines these addresses of the TCP and IP headers
and determines the connection context of the packet. Processor 470
then checks a list of connection contexts that are under the control of
INIC card 200 and determines whether the packet is associated with a
connection context (TCB) under the control of INIC card 200.
If the connection context is not in the list, then the “fast-path
candidate” packet is determined not to be a “fast-path packet.” In such
a case, the entire packet (headers 20 and data) is transferred to a buffer
in host 20 for “slow-path” processing by the protocol stack of host 20.
INTEL Ex.1003.052
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
50
Ex.1001, 036 Patent at 31:7-22 (emphasis added).
98. The “context” for each connection “summarize[es] various features of
the connection.” Id. at 7:62-66, 8:2-15, 10:19-22. The host may create the context
by processing an initial request packet, e.g., as part of opening a connection. Id. at
10:19-22.
VII. 036 PATENT PROSECUTION HISTORY
99. I have reviewed the prosecution history of the 036 Patent. I present a
brief summary of the prosecution with respect to claims 1-7 (which correspond to
claims 1-7 in the file history).
100. On November 16, 2005, claims 1-7 were rejected as being anticipated
by U.S. Pat. No. 6,122,670 (“Bennett”). Ex.1002, 036 File History at .259-.263.
On March 31, 2006, Applicant amended claim 1 to include the “context”
limitation. Id. at .273. Applicant also made amendments to claims 3-4 and 6-7.
Id. at .273-.279. Applicant then attempted to distinguish Bennett, arguing that it
does not disclose the context being employed to transfer data, updating state
information, or that the context is updated. Id. at .280.
101. On July 5, 2006, the Examiner rejected claim 1 as being obvious over
Bennett in view of U.S. Pat. No. 6,195,739 (“Wright”). See id. at .289-.293. The
Examiner stated that Bennett did not disclose “running instructions to process a
message packet such that the context is employed to transfer data contained in said
INTEL Ex.1003.053
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
51
packet to the first apparatus memory and the state information is updated by said
second processor,” but stated that a different reference, Wright, did. See id. at
.308-.309.
102. On October 10, 2006, Applicant amended claim 1 to specify that the
updated state information is “TCP state information.” Id. at .302-.303. Applicant
then argued that Bennett was not enabled, id. at .309-.310, and that Wright was
filed after the effective filing date of the 036 Patent (alleging that the effective
filing date of the 036 Patent is October 14, 1997), id. at .310. Next, Applicant
argued that Wright’s disclosures relate to operation underneath the TCP layer, and
thus it does not disclose “the TCP state information is updated” or “that the context
is employed to transfer data contained in said packet to the first apparatus
memory.” Id. at .311. Finally, Applicant argued that one of ordinary skill in the
art would not have combined Bennett and Wright, and that even if they were
combined, there would still be nonobvious differences over the combination. Id. at
.311-.312.
103. On February 7, 2007, the Examiner issued a notice of allowance, but
it is not clear from this notice what the Examiner’s basis for the allowance was.
INTEL Ex.1003.054
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
52
VIII. CLAIM CONSTRUCTIONS
A. Legal Standard
104. I understand that in deciding whether to institute inter partes review,
“[a] claim in an unexpired patent shall be given its broadest reasonable
construction in light of the specification of the patent in which it appears.” 37
C.F.R. § 42.100(b). I further understand that “the broader standard serves to
identify ambiguities in the claims that can then be clarified through claim
amendments.” Final Rule, 77 Fed. Reg. 48680, 48699 (Aug. 14, 2012).
105. In forming my opinions as set forth in this declaration, I have
accorded all claim terms in claims 1-7 in the 036 Patent their broadest reasonable
interpretation, as would be understood by a person of ordinary skill in the art at the
time of the alleged invention of the alleged invention of the 036 Patent.
106. I was also asked to provide my opinion on how a POSA would have
understood the terms “context” and “prepend” under the broadest reasonable
interpretation standard.
B. “context”
107. The term “context” appears in claim 1 of the 036 Patent. I understand that in
the copending district court litigation, Alacritech takes the position that
“context” means “data regarding an active connection,” while Petitioner has
taken the position that it is indefinite. For my analysis in this Declaration, I
have been asked to use Alacritech’s construction.
INTEL Ex.1003.055
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
53
C. “prepend”
108. The term “prepend” appears in claim 4 of the 036 Patent. Under the
broadest reasonable construction standard, this term in light of the specification
would have been understood by a POSA to mean “adds to the front.” The
specification defines “prepends” in this manner: “Once the packet control
sequencer 176 detects that all of the packet has been processed by the fly-by
sequencer 178, the packet control sequencer 176 … prepends (adds to the front)
that status information to the packet …” Id. at 14:5-12. This is consistent with
how a POSA would have understood “prepend” in the context of the 036 Patent.
That is, the claimed header is “prepended,” or added to the front, of the data
portion of the packet.
INTEL Ex.1003.056
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
54
IX. THE PRIOR ART
A. Tanenbaum96: A. Tanenbaum, Computer Networks, 3rd ed. (1996)6
109. Tanenbaum96, “Computer Networks,” is a 700+ page text book
covering network hardware, software, protocols and standards. It is a third edition
of the 1981 Tanenbaum book. The 1996 edition is cited and incorporated by
reference in the 036 Patent.
110. Tanenbaum96 describes both TCP and UDP protocols. Note that
UDP, unlike TCP, is connectionless and thus does require setting up a connection:
The Internet has two main protocols in the transport layer, a
connection oriented protocol and a connectionless one. In the
following sections we will study both of them. The connection-
oriented protocol is TCP. The connectionless protocol is UDP.
Ex.1006, Tandenbaum96 at .539.
6 Tanenbaum96 was a well-known resource to a POSA. I understand that it is prior
art because it was published before October 14, 1997, the date to which Alacritech
claims priority. See Ex. 1006, Tanenbaum96.
INTEL Ex.1003.057
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
55
Id. at .055, Fig 1-19.
111. Tanenbaum96 recognizes that an “obstacle to fast networking is
protocol software,” and teaches “fast path” processing for TCP as a solution.
Ex.1006 at .583-585. This “fast path” solution is based off “header prediction.”
See Section V.E.4 above for a description of the development of “header
prediction” and the state of art with respect to “header prediction.”
112. Tanenbaum96 teaches fast path transmissions using a prototype
header stored in the transport entity, because in the normal case of an established
TCP connection, only a few fields of the header change in consecutive packets.
Compare Section V.B.5.a. (describing complexity of opening a connection, i.e., a
“socket”). In other words, the transport entity only needs to change a few fields to
send subsequent packets:
The first thing the transport entity does is make a test to see if this is
the normal case: the state is ESTABLISHED, neither side is trying to
close the connection, a regular (i.e., not an out-of-band) full TPDU
[Transport Protocol Data Unit, i.e. packet] is being sent, and there is
INTEL Ex.1003.058
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
56
enough window space available at the receiver. If all conditions are
met, no further tests are needed and the fast path through the sending
transport entity can be taken.
In the normal case, the headers of consecutive data TPDUs are almost
the same. To take advantage of this fact, a prototype header is stored
within the transport entity. At the start of the fast path, it is copied as
fast as possible to a scratch buffer, word by word. Those fields that
change from TPDU to TPDU are then overwritten in the buffer.
Id. at .583 (emphasis added).
113. Tanenbaum96 teaches that the transport entity can be implemented by
the host operating system, or can be offloaded to the NIC (e.g., as a processor on
the NIC):
The hardware and/or software within the transport layer that does the
work is called the transport entity. The transport entity can be in the
operating system kernel, in a separate user process, in a library
package bound into network applications, or on the network interface
card.
Id. at .498 (underlining added, bold in original).
114. Tanenbaum96 discloses that the TCP transport entity divides data
streams into TCP segments for subsequent transmission (i.e., to make the data the
correct size for the data payload part of the packet). See Section V.B.8.
INTEL Ex.1003.059
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
57
(segmentation description). The receiving TCP transport entity reconstructs the
byte stream from the received TCP segments.
Each machine supporting TCP has a TCP transport entity, either a
user process or part of the kernel that manages TCP streams and
interfaces to the IP layer. A TCP entity accepts user data streams from
local processes, breaks them up into pieces not exceeding 64K bytes
(in practice, usually about 1500 bytes), and sends each piece as a
separate IP datagram. When IP datagrams containing TCP data arrive
at a machine, they are given to the TCP entity, which reconstructs the
original byte streams.
Id. at .540.
115. Tanenbaum96 goes on to describe a TCP prototype header (i.e., a
header template that is used to create additional headers for sending packets) and
offloading protocol processing by the transport entity in detail:
INTEL Ex.1003.060
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
58
Id. at .584 (emphasis added).
116. Tanenbaum96 also teaches TCP fast path receiving by looking up a
TCP connection record based on the IP source address, TCP source port, IP
destination address and TCP destination address, checking to see if it the packet is
a normal one in the ESTABLISHED state, and then putting the data into user
memory. In other words, Tanenbaum96 is teaching that the transport entity
performs this check to determine whether the packet is suitable for fast path
processing. See Section V.E.4. (header prediction offload). Note that there may be
multiple connections on a single computer, and thus when a packet comes in, it
must be checked against the connection records that may represent multiple
connections:
INTEL Ex.1003.061
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
59
Now let us look at fast path processing on the receiving side…. For
TCP, the connection record can be stored in a hash table for which
some simple function of the two IP addresses and two ports is the key.
Once the connection record has been located, both addresses and both
ports must be compared to verify that the correct record has been
found….
[T]he TPDU [Transport Protocol Data Unit, i.e. packet] is then
checked to see if it is a normal one: the state is ESTABLISHED,
neither side is trying to close the connection, the TPDU is a full one,
no special flags are set, and the sequence number is the one expected.
These tests take just a handful of instructions. If all conditions are
met, a special fast path TCP procedure is called.
The fast path updates the connection record and copies the data to the
user. While it is copying, it also computes the checksum, eliminating
an extra pass over the data. If the checksum is correct, the connection
record is updated and an acknowledgement is sent back. The general
scheme of first making a quick check to see if the header is what is
expected, and having a special procedure to handle that case, is called
header prediction. Many TCP implementations use it.
Ex.1006, Tanenbaum96 at .584-.585 (underlining added, bold in original).
117. The “connection record” disclosed in Tanenbaum96 is used to
maintain TCP state:
When an application on the client machine issues a CONNECT
request, the local TCP entity creates a connection record, marks it as
INTEL Ex.1003.062
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
60
being in the SYN SENT state, and sends a SYN segment. Note that
many connections may be open (or being opened) at the same time on
behalf of multiple applications, so the state is per connection and
recorded in the connection record.
Id. at .549 (emphasis added).
118. The “connection record” is the same as the “Transmission Control
Block (TCB)” described in RFC 793, the TCP protocol specification:
Before we can discuss very much about the operation of the TCP we
need to introduce some detailed terminology. The maintenance of a
TCP connection requires the remembering of several variables. We
conceive of these variables being stored in a connection record called
a Transmission Control Block or TCB.
Ex.1007, RFC 793 at .024 (emphasis added).
119. I describe a TCB and RFC 793 in Section V.B.5.
120. Tanenbaum96 teaches that “[f]or TCP, the connection record can be
stored in a hash table for which some simple function of the two IP addresses and
two ports is the key.” Ex.1006, Tanenbaum96 at .585. Tanenbaum96 thus teaches
the “connection context” as described in the 036 Patent:
IP source address of the IP header, the IP destination address of the IP
header, the TCP source address of the TCP header, and the TCP
destination address of the TCP header [that] together uniquely define
a single connection context (TCB) with which the packet is
associated.
INTEL Ex.1003.063
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
61
Ex.1001, 036 Patent 31:7-12.
121. Again, there may be multiple connections, and Tanenbaum96 is
teaching a technique to quickly lookup the connection record that corresponds to
the received packet.
122. Like the context of claim 1 of the 036 Patent, the connection record of
Tanenbaum96 is used to transfer data to host memory. Ex.1006, Tanenbaum96 at
.584-.585. Note that Tanenbaum96’s transport entity, when on the NIC,
corresponds to the I/O adapter device of Erickson (see below).
B. U.S. Patent No. 5,768,618 (“Erickson”)7
123. Erickson discloses IO Adapter 314 for protocol offload of fast and
slow applications as shown in Fig. 3:
7 I understand that Erickson is prior art to the 036 Patent because it was filed years
before October 14, 1997, the date to which Alacritech claims priority. See Ex.
1005.
INTEL Ex.1003.064
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
62
Ex.1005, Erickson at Fig. 3 (annotated).
FIG. 3 is a flow diagram describing the system data flow of fast and
slow applications 302, 304, and 306 compatible with the present
invention. A traditional slow application 306 uses normal streams
processing 308 to send information to a pass-through driver 310. The
pass-through driver 310 initializes the physical hardware registers 320
of the I/O device adapter 314 to subsequently transfer the information
through the I/O device adapter 314 to the commodity interface 322.
With the present invention, fast user applications 302 and 304 directly
use a setup driver 312 to initialize the physical hardware registers 320,
then send the information directly through the I/O device adapter 314
to the commodity interface 322 via virtual hardware 316 and 318.
Thus, the overhead of the normal streams processing 308 and pass-
through driver 310 are eliminated with the use of the virtual hardware
316 and 318 of the present invention, and fast applications 302 and
INTEL Ex.1003.065
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
63
304 are able to send and receive information more quickly than slow
application 306.
Ex.1005, Erickson at 4:53-5:3 (emphasis added).
124. The IO adapter runs scripts that offload protocol processing. As it is
running scripts (program code), the I/O adapter includes a processor. The adapter
accesses application data (to transmit over the network) via programmed I/O or
DMA. Control information (to direct the communication) is communicated by
snooping the host memory bus. Specifically, user processes that wish to
communicate over the network open a device driver, and specify the details of the
desired communication mode. The device driver sets up a protocol script and
protocol specific endpoint data for the connection:
Each user process that has access to the virtual hardware is typically
assigned a page-sized area of physical memory on the I/O device
adapter, which is then mapped into the virtual address space of the
user process. The I/O device adapter typically is implemented with
snooping logic to detect accesses within the page-sized range of
memory on the I/O device adapter. If the I/O device adapter detects
access to the physical memory page, a predefined script is then
executed by the I/O device adapter in order to direct the data as
appropriate.
Id. at 5:31-40.
INTEL Ex.1003.066
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
64
Typically, when a user process opens a device driver, the process
specifies its type, which may include, but is not limited to, a UDP
datagram, source port number, or register address. The user process
also specifies either a synchronous or asynchronous connection. The
device driver sets up the registers 508 and 504, endpoint table 514,
and endpoint protocol data 518. The protocol script 516 is typically
based upon the endpoint data type, and the endpoint protocol data 518
depends on protocol specific data.
Id. at 6:1-9.
Instead, the adapter would most likely retrieve the needed user data
from the user process’ virtual address space using direct memory'
access (DMA) into the main memory over the bus and retrieving the
user data into some portion of the adapter’s memory, where it could
be referenced more efficiently. The programming steps performed in
the udpscript( ) procedure above might need to be changed to reflect
that.
Id. at 8:30-37.
125. The endpoint data is stored on the I/O device adapter. The adapter
uses the endpoint data to move data from the adapter to user memory, i.e., when
receiving data packets and performing fast path processing on I/O device adapter:
The I/O device adapter implementation includes a software register
508 and a physical address buffer map 510 in the adapter's memory
512. An endpoint table 514 in the memory 512 is used to organize
multiple memory pages for individual user processes. Each entry
INTEL Ex.1003.067
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
65
within the endpoint table 514 points to various protocol data 518 in
the memory 512 in order to accommodate multiple communication
protocols, as well as previously defined protocol scripts 516 in the
memory 512, which indicate how data or information is to be
transferred from the memory 512 of the I/O device adapter to the
portions of main memory 502 associated with a user process.
Id. at 5:56-67.
126. Erickson discloses that scripts may be written for a variety of
protocols including TCP/IP:
Each type of protocol will have its own script. Types of protocols
include, but are not limited to, TCP/IP, UDP/TP, BYNET lightweight
datagrams, deliberate shared memory, active message handler. SCSI,
and [Fibre] Channel.
Id. at 5:47-51.
127. Erickson discloses sample user code (running on the host) for
triggering the UDP fast path offload, and also discloses a script that runs in the
adapter:
INTEL Ex.1003.068
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
66
Id. at 7:19-33.
Id. at 7:50-63.
The script that executes the above function provides the
USERDATA_ ADDRESS and USERDATA_LENGTH which the
user process programmed into the adapter's memory. This information
quite likely varies from datagram 602 to datagram 602. The script is
also passed the appropriate datagram 702 template based on the
specific software register (508 in FIG. 5 or 316 in FIG. 3). There are
INTEL Ex.1003.069
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
67
different scripts for different types of datagrams 702 (e.g., UDP or
TCP).
Id. at 7:65- 8:6.
128. Before the senduserdatagram is executed in the host, and the udpscript
is run in the adapter, protocol header information (the template) is transferred to
the interface device (IO device adapter) (see also “pre-negotiated” discussion
below). The header information includes information including initial value of
checksums and Datagram ID, the IP Addresses, and MAC addresses:
A user process typically causes a script to execute by using four
virtual registers, which include STARTINGADDRESS, LENGTH.
GO. and STATUS. The user process preferably first writes
information into memory at the locations specified by the values in
the STARTTNGADDRESS and LENGTH virtual registers. Next, the
process then accesses the GO virtual register to commence execution
of the script. Finally, the user process accesses or polls the STATUS
virtual register to determine information about the operation or
completion of this I/O request.
Id. at 6:12-21.
Within the udpscript procedure described above, the
nextid() function provides a monotonically increasing 16-
bit counter required by the IP protocol.
Id. at 8:10-12.
INTEL Ex.1003.070
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
68
Id. at Fig. 7.
129. The dotted fields in Fig. 7 are those that may change during the
transfer. Each new packet may change the remaining Total Length, the Datagram
ID of the next packet, the IP checksum, the UDP length and the UDP Checksum.
The data follows the completed header as shown in Figure 6.
INTEL Ex.1003.071
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
69
130. Erickson discloses that many fields of the header are “pre-negotiated,”
including the source and target IP addresses and source and target MAC (Ethernet)
addresses. The pre-negotiated fields are provided by the host to the I/O adapter8:
In this example, the user process and the device driver has pre-
negotiated the following fields from FIG. 6: (1) Ethernet Header 604
(Target Ethernet Address, Source Ethernet Address, and Protocol
Type); (2) IP Header 606 (Version, IP header Length, Service Type,
Flag, Fragment Offset, Time_to_Live, IP Protocol, IP Address of
Source, and IP Address of Destination); and (3) UDP Header 608
(Source Port and Destination Port). Only the shaded fields in FIG. 6,
and the user data 610, need to be changed on a per-datagram basis.
Ex.1005, Erickson at 6:63-7:4.
131. Specifically, Erickson discloses an exemplary pre-negotiation of
transport-layer UDP/IP/MAC protocol information:
Each user process has basically pre-negotiated almost everything
about the datagram 602, except the actual user data 610. This means
most of the fields in the three header areas 604, 606, and 608 are
predetermined.
In this example, the user process and the device driver has pre-
negotiated the following fields from FIG. 6: (1) Ethernet Header 604
(Target Ethernet Address, Source Ethernet Address, and Protocol
8 See Section V.B.2. for Ethernet (MAC layer) description.
INTEL Ex.1003.072
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
70
Type); (2) IP Header 606 (Version, IP header Length, Service Type,
Flag, Fragment Offset, Time_to_Live, IP Protocol, IP Address of
Source, and IP Address of Destination); and (3) UDP Header 608
(Source Port and Destination Port). Only the shaded fields in FIG. 6,
and the user data 610, need to be changed on a per-datagram basis.
Id. at 6:57-7:4, see also Figs. 6 and 7.
132. Erickson discloses that after the pre-negotiation, the I/O device
adapter runs protocol scripts to process outgoing and incoming data packets,
thereby offloading the protocol processing onto the I/O device adapter. Id. at 4:18-
23. The scripts are used to locate an application endpoint and to generate packet
headers from a pre-negotiated template header:
Protocol scripts typically serve two functions. The first function is to
describe the protocol the software application is using. This includes
but is not limited to how to locate an application endpoint, and how to
fill in a protocol header template from the application specific data
buffer. The second function is to define a particular set of instructions
to be performed based upon the protocol type. Each type of protocol
will have its own script. Types of protocols include, but are not
limited to, TCP/IP, UDP/IP, BYNET lightweight datagrams,
deliberate shared memory, active message handler, SCSI, and File
Channel.
Id. at 5:41-51 (emphasis added).
INTEL Ex.1003.073
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
71
133. Here, the user process identifies a block of raw data to be transmitted
and “spanks” (i.e. sets to 1) a GO register to trigger the adapter to take the raw
data, encapsulate it into a packet with UDP, IP and MAC headers, and transmit it.
See id. at 7:39-47.
134. In other words, Erickson’s network interface device creates headers
for packets to be transmitted using the pre-negotiated UDP, IP and MAC header
information. A user program (senduserdatagram at 7:22) identifies raw data in
host memory to be transmitted (by providing a USERDATA_ADDRESS and
USERDATA_LENGTH) and then triggers the network interface device (by
“spanking” the GO register) as shown at id. at 7:18-33. In response, the network
interface device executes a UDP protocol script (udpscript at 7:51) that creates
headers from the pre-negotiated context by populating UDP/IP/MAC datagram
template headers as shown in Fig. 7 with appropriate values for IP Length, IP
Datagram ID, IP Checksum, UDP Length and UDP Checksum. The network
interface device then encapsulates the data with the headers, and sends the
completed packet. Id. at 7:39-64.
135. Erickson discloses fast-path receiving of data packets by directly
writing the data to the host memory space corresponding to the user process (i.e.,
the fast application), bypassing the protocol stack on the host. Id. at 5:53-67. The
INTEL Ex.1003.074
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
72
transfer of data directly to the host memory (and from the host memory to the I/O
device adapter) occurs via a Direct Application Interface (DAI):
FIG. 4 is a block diagram describing a direct application interface
(DAI) and routing of data between processes and an external data
connection which is compatible with the present invention. Processes
402 and 404 transmit and receive information directly to and from an
interconnect 410 (e.g., I/O device adapter) through the DAI interface
408. The information coming from the interconnect 410 is routed
directly to a process 402 or 404 by use of virtual hardware and
registers, rather than using a traditional operating system interface
406.
Id. at 5:5-5:14.
INTEL Ex.1003.075
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
73
136. Erickson refers to a variety of scripts including TCP, but does not
include a sample TCP script. It would have been within the skills of a POSA to
adapt the UDP script for TCP. That adaptation would have been obvious based on
a POSA's knowledge of common implementations of TCP/IP, or based on common
reference texts on TCP/IP such as Tanenbaum96. See Section X (motivations to
combine).
137. Note that the scripts as disclosed in Erickson are simplified and do not
spell out all of the details provided by conventional UDP implementations,
including IP fragmentation for frame lengths exceeding the maximum Ethernet
frame length. A POSA would have understood the standard functionality of UDP
would be included in the adapter script and it within the ordinary level of
knowledge to a POSA well before October 1997. A POSA would also understand
that analogous code for segmentation would also be required for TCP. Such code
would be within the skills of a POSA (and part of the ordinary knowledge of a
POSA). See Section X (motivations to combine).
X. OBVIOUSNESS COMBINATIONS – MOTIVATIONS TO COMBINE
A. Erickson in Combination with Tanenbaum96
138. Erickson incorporates Tanenbaum81 by reference:
A discussion of the form and structure of TCP sockets and packets,
which are well-known within the art, may be found in many
references, including Computer Networks by Andrew S. Tanenbaum,
INTEL Ex.1003.076
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
74
Prentice-Hall, New Jersey, 1981, pp. 326-327, 373-377, which is
herein incorporated by reference.
Id. at 4:38-43.
139. The third edition of the 1981 Tanenbaum book was published in
March of 1996, more than one year before the claimed priority date of the 036
Patent. A POSA implementing a TCP script as suggested by Erickson would have
naturally turned to the most recent edition of the Tanenbaum book, Tanenbaum96,
for more details about TCP.
140. In 1996, the Internet and World Wide Web, using TCP/IP, was
growing extremely popular. See generally Section V.A.-B. Erickson expressly
references TCP/IP scripts. Ex.1005, Erickson at 5:41-51. Given this, a POSA at
this time would have been motivated to implement the TCP/IP fast path protocol
processing described by Erickson, using Erickson’s Ethernet I/O device adapter. A
POSA would have further been motivated to consult a reference book on TCP/IP,
such as Tanenbaum96, to do so. At the time, there were a finite number of
networking protocols, particularly that were as popular as TCP/IP, and thus it
would have further been obvious to try to implement TCP/IP using Erickson’s I/O
adapter. See generally Section V.A.-B.
141. As I have described in Section V.A.2. and V.B., a POSA would have
understood TCP/IP well and standards for TCP/IP are set forth in well-known
INTEL Ex.1003.077
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
75
Request for Comments (RFCs). Accordingly, a POSA would have had a high
expectation of success in implementing TCP/IP on Erickson’s I/O device adapter.
Specifically, the “prototype headers” in Tanenbaum96 are the TCP/IP equivalent
of the UDP/IP header shown in Fig. 7 of Erickson. Ex.1006, Tanenbaum96 at
.584; Ex.1005, Erickson at Fig. 7. The unshaded fields in Tanenbaum96 Fig. 6-50
are those that may change during the TCP/IP transfer, and the dotted fields in
Erickson Fig. 7 are those that may change during the UDP/IP transfer. Id. A
POSA, when adapting Erickson’s UDP script to TCP, would understand that rather
than filling in the UDP Length and Checksum shown in Erickson (for UDP), the
script needs to fill in the TCP Sequence number and Checksum (for TCP). Id. For
a multi-segment TCP send, the initial sequence number is determined by the host
stack, and the sequence number is adjusted by the adapter each time it sends a
packet. See generally Section V.B.4. (discussing TCP sequence numbers). As
noted above, the scripts as disclosed in Erickson are simplified and do not spell out
all of the details provided by conventional UDP implementations, including IP
fragmentation for frame lengths exceeding the maximum Ethernet frame length. A
POSA would understand that the standard functionality of UDP would be included
in the adapter script. A POSA would also understand that analogous code for
segmentation would also be required for TCP. Such code would be within the
skills of a POSA. See Section V.A.-B.
INTEL Ex.1003.078
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
76
142. Note that both Erickson and Tanenbaum96 disclose an IP prototype
header. Each new packet changes the Identification (the Datagram Id in Erickson
Fig. 7) and Header checksum (IP header checksum in Erickson Fig. 7) in the same
way. Ex.1006, Tanenbaum96 at .584; Ex.1005, Erickson at Fig. 7. This further
illustrates the similarity between the approaches and the easy adaption of Erickson
to using TCP/IP.
143. Tanenbaum96’s teachings of connection records corresponds to, for
example, Erickson’s endpoint information, protocol scripts, and pre-negotiated
protocol information. The records and pre-negotiated information includes
information about the connection (e.g., sender and receiver address) and how to
transfer data to the host information for received packets. See above at ¶¶116-20
(Tanenbaum96); ¶¶124-25, 128-30, 132 (Erickson). Accordingly, it would have
been routine to adapt Erickson using Tanenbaum96’s TCP/IP teachings that use
connection records.
144. Similarly, Tanenbaum96’s teachings of fast path TCP processing
using a prototype header and header prediction correspond to, and could be used to
modify Erickson’s endpoint information, pre-negotiated protocol information,
template header and UDP script to perform TCP protocol processing. Both
Tanenbaum96 and Erickson have a slow and fast path. See above at ¶¶111-12
(Tanenbaum96); ¶123 (Erickson). Both use prototype headers. See above at
INTEL Ex.1003.079
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
77
¶¶112, 115 (Tanenbaum96); ¶¶128-29 (Erickson). Both include a transport entity
or an I/O adapter to perform the offloaded protocol processing. See above at
¶¶113-14 (Tanenbaum96); ¶124 (Erickson). Accordingly, it would have been
routine to adapt Erickson using Tanenbaum96’s TCP/IP teachings of a prototype
header and header prediction. Moreover, these techniques were well known at this
time. See Section V.C.-G. An exemplary TCP script for Erickson in view of
Tannenbaum96’s transport entity and fast path teachings is as follows. The TCP
script may transfer an entire block (via DMA) to the adapter memory in one large
transfer. The script, knowing the maximum segment size (MSS), sends one MSS
sized block of data at a time. The I/O adapter updates the TCP sequence number in
the connection record on the network device for each segment and any other state
information. This requires only one “spank” of the GO register for a multi-
segment send. The adapter would then repeatedly extract one segment of data at a
time from the transferred block, encapsulate it in a packet, and transmit. The
segmentation code is within the skills of a POSA in light of the disclosures by
Tanenbaum96.
145. Given that Erickson does not detail a bypass test for selecting fast or
slow path, a POSA would be motivated to consider Tanenbaum96’s teaching,
which were well known and proven, see Section V.E.4. (“header prediction”) and
which further reduce the complexity of the I/O device (see below at ¶148).
INTEL Ex.1003.080
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
78
146. As to receiving packets, Erickson discloses an endpoint table and
protocol scripts which store protocol state information and indicate how data is to
be transferred from the network interface device to portions of main memory
associated with a user process. Ex.1005, Erickson at 5:59-67. A POSA would
understand that Erickson’s endpoint table, pre-negotiated protocol information and
protocol scripts correspond to Tanenbaum96’s connection records, and that
Erickson’s looking up endpoint protocol information in the endpoint table
corresponds to Tanenbaum96’s looking up a connection record to copy data to the
user after a quick check that the packet is what is expected (header prediction).
Ex.1006, Tanenbaum96 at .584-.585. A POSA would therefore be motivated to
use the Tanenbaum96 teachings of header prediction to provide Erickson’s fast
path receive processing for TCP. That is, both work effectively the same:
Tanenbaum96 and Erickson receive data, strip off the headers, and copy the data to
memory. See above at ¶116 (Tanenbaum96); ¶135 (Erickson). A POSA would
have been motivated to apply Tanenbaum96’s teaching for TCP/IP receiving to
Erickson, and had a high expectation of success, given both effectively accomplish
the receiving and copying to memory in the same way except being of difficult
transport protocols.
147. Combining Tanenbaum96’s TCP/IP and header prediction with
Erickson would have been understood as combining known methods to yield
INTEL Ex.1003.081
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
79
predictable results. For example, TCP/IP was well known. See Section V.B.
Header prediction was well known. See Section V.E.4. Offloading protocol
processing was also generally well known. See Section V.B.C.-G.
148. A POSA would have been motivated to implement Tanenbaum96’s
header prediction teachings on the Erickson I/O adapter to reduce the complexity
and expense of the I/O adapter. I explain in Section V.C.-F. that various levels of
offloading are possible. Only offloading packets that are for data transfer, not for
setting up a connection, reduces the complexity of the offloading processing.
Ex.1006, Tanenbaum96 at .583 (“The key to fast TPDU processing is to separate
out the normal case (one-way data transfer) and handle it specially. Although a
sequence of special TPDUs are needed to get into the ESTABLISHED state, once
there, TPDU processing is straightforward until one side starts to close the
connection.”). This is because, for example, opening a connection requires several
different types of control packet transmission and receptions. See Section V.B.5.
149. Note that as part of its header prediction teachings, Tanenbaum96
specifically teaches that connection records (corresponding to Erickson’s endpoint
table, pre-negotiated protocol information and protocol scripts) can be stored in a
“hash table for which some simple function of the two IP addresses and two ports
is the key.” Ex.1006, Tanenbaum96 at .584-.585. That is, Tanenbaum96 details a
lookup technique using a “simple function” to implement the bypass test. This
INTEL Ex.1003.082
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
80
“simple function” could be used to look up the corresponding TCP connection in
the Erickson I/O adapter.
XI. GROUNDS OF INVALIDITY
150. I detail how the prior art invalidates the claims at issue in the
Appendix A claim chart. In summary, my opinion is that claims 1-7 of the 036
Patent are invalid over Erickson in view of Tanenbaum96.
INTEL Ex.1003.083
Declaration
Petition for Inter Partes Review of 7,237,036 Ex. 1003 ("Horst Deel.")
151. I declare that all statements made herein on my own knowledge are
true and that all statements made on information and belief are believed to be true,
and further, that these statements were made with the knowledge that willful false
statements and the like so made are punishable by fine or imprisonment, or both,
under Section 1001 of Title 18 of the United States Code.
Respectfully submitted,
~Witz;--, I
Robert Horst, Ph.D.
Date: April 17, 2017
81
INTEL Ex.1003.084
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-i
TABLE OF CONTENTS
Page
[1.P.1] A device for use with a first apparatus that is connectable to a second apparatus ..................................................................................................... 1
[1.P.2] the first apparatus containing a memory and a first processor ...................... 4
[1.P.3] [a first processor] operating a stack of protocol processing layers that create a context for communication, the context including a media access control (MAC) layer address, an Internet Protocol (IP) address and Transmission Control Protocol (TCP) state information, the device comprising: ........................................................... 5
[1.1] a communication processing mechanism connected to the first processor, .................................................................................................. 16
[1.2] said communication processing mechanism containing a second processor ................................................................................................... 18
[1.3] [second processor] running instructions to process a message packet such that the context is employed to transfer data contained in said packet to the first apparatus memory and ................................................. 21
[1.4] [second processor running instructions to process a message packet such that] the TCP state information is updated by said second processor. .................................................................................................. 24
[2.1] The device of claim 1, wherein said communication processing mechanism includes a receive sequencer with directions to classify said packet, wherein said packet contains control information corresponding to the stack of protocol layers. .......................................... 26
[3.1] The device of claim 1, wherein said communication processing mechanism includes a receive sequencer with directions to generate a summary of a second message packet received from the network, said second packet containing control information corresponding to the stack of protocol layers, and said instructions including an instruction to compare said summary with said context. ..................................................................................................... 30
[4.1] The device of claim 1, wherein said instructions include a first instruction to create a header corresponding to said context and
INTEL Ex.1003.085
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-ii
having control information corresponding to several of the protocol processing layers, and ............................................................................... 34
[4.2] said instructions include a second instruction to prepend said header to second data for transmission of a second packet. ..................................... 37
[5.1] The device of claim 1, wherein said communication processing mechanism has a direct memory access unit to send, based upon said context, said data from said communication processing mechanism to the first apparatus memory, ............................................... 40
[5.2] without a header accompanying said data. ...................................................... 46
[6.1] The device of claim 1, wherein said context includes a receive window of space in the memory that is available to store application data, and said communication processing mechanism advertises said receive window. ........................................................................................ 47
[7.1] The device of claim 1, wherein said context includes TCP ports of said first and said second apparatuses. ............................................................. 50
INTEL Ex.1003.086
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-1
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96
[1.P.1] A device for use with a first apparatus that is connectable to a second apparatus To the extent that the preamble is limiting, Erickson discloses a device for use with a first apparatus that is connectable to a second apparatus. Specifically, Erickson discloses an “I/O device adapter” (a device) that is connected to, and for use with, the host “computer” (a first apparatus) and a “receiver” (a second apparatus) that are connectable over a network:
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a method of controlling an input/output (I/O) device connected to a computer to facilitate fast I/O data transfers.
Ex.1005, Erickson at 1:63-679. Erickson also refers to the “computer” as a “sender”:
FIG. 1 is a flow diagram illustrating a conventional I/O data flow between a sender and a receiver. At 102, a sender application sends information across the memory bus to a user buffer 104, which in turn is then read back across the memory bus by protocol modules 110. The information is subsequently buffered through the operating system kernel 108 before it is sent out through conventional network interface 114 to the network media access control (MAC) 116. It will be noted that in this system model, the data makes at least three trips across the memory bus at S2, S3 and S5. For the receiving application, the steps are reversed from those of the sender application, and once again the data makes at least three trips across the memory
9 Emphasis added unless otherwise noted.
INTEL Ex.1003.087
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-2
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.P.1] A device for use with a first apparatus that is connectable to a second apparatus
bus at R1, R4, and R5.
Id. at 3:23-36.
Maintaining security between multiple software processes is important when sharing a single I/O device adapter. If the I/O device adapter controls a network interface, such as an Ethernet device, then the access rights granted to the user process by the operating system could be analogous to a Transmission Control Protocol (TCP) address or socket.
Id. at 4:28-33. Note that Erickson discloses that the sender, i.e., the host computer, of its invention includes an I/O device with a fast path for such network connections. See id. at 1:63-67; 4:53-5:5. Otherwise, it connects (is connectable) to second apparatus in the same way relative to the “conventional” disclosure above. I annotate these components below:
INTEL Ex.1003.088
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-3
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.P.1] A device for use with a first apparatus that is connectable to a second apparatus
Id. at Fig. 3. Accordingly, Erickson in view of Tanenbaum96 discloses a device (I/O device) for use with a first apparatus (host computer, i.e., sender) that is connectable to a second apparatus (second computer, i.e., receiver).
INTEL Ex.1003.089
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-4
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.P.2] the first apparatus containing a memory and a first processor
To the extent that the preamble is limiting, Erickson discloses that the first apparatus contains a memory and a first processor. It is well known by those of ordinary skill in the art, and certainly obvious, that a “computer” (which Erickson discloses) includes memory (e.g., main memory) and a processor (containing a memory and a first processor). Id. at 1:63-67; see also id. at 9:48 (“memory of computer”) and Fig. 5 (“main memory”). For example, the computer includes user processes that open a device driver, which means that the processor of the host computer executes software for the user process instance and for opening of the device driver, and which further means that the host computer is utilizing its memory to both store and execute the user process and device driver. See also id. at 2:54-58 (describing the “user processes in a single computer node,” i.e., the user processes run on the host computers).
Typically, when a user process opens a device driver, the process specifies its type, which may include, but is not limited to, a UDP datagram, source port number, or register address.
Id. at 6:1-4. Accordingly, Erickson in view of Tanenbaum96 discloses that the first apparatus (host computer) contains a memory (e.g., main memory of host computer) and a first processor (its CPU).
INTEL Ex.1003.090
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-5
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.P.3] [a first processor] operating a stack of protocol processing layers that create a context for communication, the context including a media access control (MAC) layer address, an Internet Protocol (IP) address and Transmission Control Protocol (TCP) state information, the device comprising: To the extent that the preamble is limiting, Erickson in view of Tanenbaum96 discloses that the first processor of the host computer operates a stack of protocol processing layers that create a context for communication, the context including a media access control (MAC) layer address, an Internet Protocol (IP) address and Transmission Control Protocol (TCP) state information. The host computer of Erickson operates a stack of protocol layers for “normal streams processing” for “slow applications”:
Id. at Fig. 3.
FIG. 3 is a flow diagram describing the system data flow of fast and slow applications 302, 304, and 306 compatible with the present invention. A traditional slow application 306 uses normal streams processing 308 to send information to a pass-through driver 310. The pass-through driver 310 initializes the physical hardware registers 320 of the I/O device adapter 314 to
INTEL Ex.1003.091
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-6
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.P.3] [a first processor] operating a stack of protocol processing layers that create a context for communication, the context including a media access control (MAC) layer address, an Internet Protocol (IP) address and Transmission Control Protocol (TCP) state information, the device comprising:
subsequently transfer the information through the I/O device adapter 314 to the commodity interface 322.
Id. at 4:52-61. As shown above, Erickson also discloses a fast-path (for “fast applications”), wherein the I/O device performs some of the protocol processing:
With the present invention, fast user applications 302 and 304 directly use a setup driver 312 to initialize the physical hardware registers 320, then send the information directly through the I/O device adapter 314 to the commodity interface 322 via virtual hardware 316 and 318. Thus, the overhead of the normal streams processing 308 and pass-through driver 310 are eliminated with the use of the virtual hardware 316 and 318 of the present invention, and fast applications 302 and 304 are able to send and receive information more quickly than slow application 306. As a result, the present invention provides higher bandwidth, less latency, less system overhead, and shorter path lengths.
Id. at 4:61-5:5. Erickson discloses that a user process on the host computer creates a context for communication by (1) opening a device driver and specifying the protocol type (e.g. UDP or TCP), source port number or address, whether the connection is synchronous or asynchronous, and setting up memory mapped registers, an endpoint table and endpoint protocol data used for a protocol-specific script, and (2) pre-negotiating connection details including a template header.
Typically, when a user process opens a device driver, the process specifies its type, which may include, but is not limited
INTEL Ex.1003.092
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-7
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.P.3] [a first processor] operating a stack of protocol processing layers that create a context for communication, the context including a media access control (MAC) layer address, an Internet Protocol (IP) address and Transmission Control Protocol (TCP) state information, the device comprising:
to, a UDP datagram, source port number, or register address. The user process also specifies either a synchronous or asynchronous connection. The device driver sets up the registers 508 and 504, endpoint table 514, and endpoint protocol data 518. The protocol script 516 is typically based upon the endpoint data type, and the endpoint protocol data 518 depends on protocol specific data.
Id. 6:1-9.
In the present application, the access privileges given to the user processes are very narrow. Each user process has basically pre-negotiated almost everything about the datagram 602, except the actual user data 610. This means most of the fields in the three header areas 604, 606, and 608 are predetermined. In this example, the user process and the device driver has pre-negotiated the following fields from FIG. 6: (1) Ethernet Header 604 (Target Ethernet Address, Source Ethernet Address, and Protocol Type); (2) IP Header 606 (Version, IP header Length, Service Type, Flag, Fragment Offset, Time_to_Live, IP Protocol, IP Address of Source, and IP Address of Destination); and (3) UDP Header 608 (Source Port and Destination Port). Only the shaded fields in FIG. 6, and the user data 610, need to be changed on a per-datagram basis.
Id. at 6:57-7:4.
The script is also passed the appropriate datagram 702 template based on the specific software register (508 in FIG. 5 or 316 in FIG. 3). There are different scripts for different types of datagrams 702 (e.g., UDP or TCP). Also, the script would most likely make a copy of the datagram 702 template (not shown
INTEL Ex.1003.093
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-8
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.P.3] [a first processor] operating a stack of protocol processing layers that create a context for communication, the context including a media access control (MAC) layer address, an Internet Protocol (IP) address and Transmission Control Protocol (TCP) state information, the device comprising:
here), so that multiple datagrams 602 for the same user could be simultaneously in transit.
Id. at 8:2-9; see also id. at Figs. 6-7.
INTEL Ex.1003.094
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-9
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.P.3] [a first processor] operating a stack of protocol processing layers that create a context for communication, the context including a media access control (MAC) layer address, an Internet Protocol (IP) address and Transmission Control Protocol (TCP) state information, the device comprising: Ex.1005, Erickson at Fig. 7 (as shown above, this pre-negotiated information includes the Ethernet (MAC) address and IP addresses); see also Sections V.G.2-3,5 (describing MAC and IP addresses and opening a socket with these addresses). As shown above, Erickson discloses that this context includes “almost everything” concerning a UDP datagram “except the actual user data.” “[A]lmost everything” refers to, for example, the protocol information for the headers of the protocol layers. “[U]ser data” refers to the data payload that is part of the packets (the non-header parts of the packets). This “almost everything,” as described above and shown in Figs. 6 and 7, is Erickson’s pre-negotiated context, and includes a MAC layer address, IP address and UDP address. See id. at Figs. 6-7. Accordingly, Erickson teaches that “the context including a media access control (MAC) layer address, an Internet Protocol (IP) address.” As to the TCP state information, the above Erickson exemplary context is UDP over IP. UDP is a connectionless protocol. That is, single packets are sent without establishing a “connection.” See above at ¶109. However, Erickson also discloses protocol scripts for the other protocols including TCP/IP:
Protocol scripts typically serve two functions. The first function is to describe the protocol the software application is using. This includes but is not limited to how to locate an application endpoint, and how to fill in a protocol header template from the application specific data buffer. The second function is to define a particular set of instructions to be performed based upon the protocol type. Each type of protocol will have its own script. Types of protocols include, but are not limited to, TCP/IP, UDP/IP. BYNET lightweight datagrams, deliberate
INTEL Ex.1003.095
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-10
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.P.3] [a first processor] operating a stack of protocol processing layers that create a context for communication, the context including a media access control (MAC) layer address, an Internet Protocol (IP) address and Transmission Control Protocol (TCP) state information, the device comprising:
shared memory, active message handler, SCSI, and File Channel.
Id. at 5:41-51. A person having ordinary skill in the art (POSA) would have been motivated to consider Tanenbaum96’s teaching to implement the TCP/IP connection on Erickson’s I/O device in Section 9.4 (motivations to combine). Unlike UDP, TCP requires establishing a connection before sending a packet. See Section V.B.4. (describing TCP layer). Tanenbaum96 teaches that for TCP, only connections in the ESTABLISHED state should be processed on the fast path.
The key to fast TPDU [i.e. packet] processing is to separate out the normal case (one-way data transfer) and handle it specially. Although a sequence of special TPDUs are needed to get into the ESTABLISHED state, once there, TPDU processing is straightforward until one side starts to close the connection. Let us begin by examining the sending side in the ESTABLISHED state when there are data to be transmitted. … The first thing the transport entity does is make a test to see if this is the normal case: the state is ESTABLISHED, neither side is trying to close the connection, a regular (i.e., not an out-of-band) full TPDU is being sent, and there is enough window space available at the receiver. If all conditions are met, no further tests are needed and the fast path through the sending transport entity can be taken.
Ex.1006, Tanenbaum96 at .583. To enter the ESTABLISHED state, a series of control packets are sent back
INTEL Ex.1003.096
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-11
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.P.3] [a first processor] operating a stack of protocol processing layers that create a context for communication, the context including a media access control (MAC) layer address, an Internet Protocol (IP) address and Transmission Control Protocol (TCP) state information, the device comprising: and forth between the sender and receiver. See Sections V.B.4-5 (explaining TCP/IP connections and opening a connection). Tanenbaum96 is teaching that this series of control packets are sent and received on the slow path, i.e., is operating the stack of protocol processing layers to open the connection. As I describe above at ¶¶33-35, 146, establishing the connection on the slow path reduces the complexity of the offloading device. This slow path corresponds to Erickson’s “normal stream processing” for the slow applications which operates the protocol processing layers. Accordingly, a POSA would understand that, in view of Tanenbaum96, Erickson’s host operates the stack of protocol processing layers to create a TCP connection using Erickson’s slow path. Once the connection is in the ESTABLISHED state, the host uses the fast path for TCP communication. It would have been routine to modify Erickson’s UDP/IP fast path context to support TCP/IP based on the TCP/IP prototype header disclosed in Tanenbaum96:
INTEL Ex.1003.097
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-12
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.P.3] [a first processor] operating a stack of protocol processing layers that create a context for communication, the context including a media access control (MAC) layer address, an Internet Protocol (IP) address and Transmission Control Protocol (TCP) state information, the device comprising:
Ex.1006, Tanenbaum96 at .584. As the above teaches, the TCP headers on a series of TCP/IP packets require only changing a few fields, such as sequence number. As Tanenbaum96 teaches above, it was simple to simply change a few fields to create the new headers, thereby offloading the protocol processing. It further would have been routine to modify Erickson’s UDP/IP fast path context to include the TCP connection records described in Tanenbaum96:
Now let us look at fast path processing on the receiving side…. For TCP, the connection record can be stored in a hash table for which some simple function of the two IP addresses and two ports is the key. Once the connection record has been located,
INTEL Ex.1003.098
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-13
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.P.3] [a first processor] operating a stack of protocol processing layers that create a context for communication, the context including a media access control (MAC) layer address, an Internet Protocol (IP) address and Transmission Control Protocol (TCP) state information, the device comprising:
both addresses and both ports must be compared to verify that the correct record has been found…. the TPDU [Transport Protocol Data Unit, i.e. packet] is then checked to see if it is a normal one: the state is ESTABLISHED, neither side is trying to close the connection, the TPDU is a full one, no special flags are set, and the sequence number is the one expected. These tests take just a handful of instructions. If all conditions are met, a special fast path TCP procedure is called.
The fast path updates the connection record and copies the data to the user. While it is copying, it also computes the checksum, eliminating an extra pass over the data. If the checksum is correct, the connection record is updated and an acknowledgement is sent back. The general scheme of first making a quick check to see if the header is what is expected, and having a special procedure to handle that case, is called header prediction. Many TCP implementations use it.
Ex.1006, Tanenbaum96 at .584-.585 (underlining added, bold in original). The “connection records” disclosed in Tanenbaum96 are used to maintain TCP state:
When an application on the client machine issues a CONNECT request, the local TCP entity creates a connection record, marks it as being in the SYN SENT state, and sends a SYN segment. Note that many connections may be open (or being opened) at the same time on behalf of multiple applications, so the state is per connection and recorded in the connection record.
Ex.1006, Tanenbaum96 at .549 (underlining added). This state information includes, for example, the TCP sequence number and
INTEL Ex.1003.099
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-14
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.P.3] [a first processor] operating a stack of protocol processing layers that create a context for communication, the context including a media access control (MAC) layer address, an Internet Protocol (IP) address and Transmission Control Protocol (TCP) state information, the device comprising: window size (see above Tanenbaum96 quote at .584). See, e.g., above at ¶¶ 31, 39-41. Tanenbaum96, for example, notes that the sequence number is changed between packets. The connection record stores this sequence number and is used for sending and receiving packets corresponding to the respective connection. See Ex.1006, Tanenbaum96 at .583-.584. Erickson in view of Tanenbaum96 thus teaches “the context including a media access control (MAC) layer address, an Internet Protocol (IP) address and Transmission Control Protocol (TCP) state information.” Note that the IP and MAC header information is the same, and thus the above Erickson’s disclosures apply. Tanenbaum96 teaches the TCP state information as part of the connection record. To the extent that Erickson does not expressly disclose the host operating the stack of protocol layers to pre-negotiate the connection record, it is also my opinion that it would be obvious in view of Tanenbaum96. Tanenbaum96 teaches a bypass test that separates processing between a fast and slow path. In Tanenbaum96’s bypass test, a TCP connection must be in the ESTABLISHED state for fast path processing. Id. at .584-.585. Tanenbaum96 teaches that checking for established connections requires only a handful of instructions, and that packet processing for established connections is straightforward. Id. at .583. This is because checking a packet against a connection record to determine whether it is in an ESTABLISHED state requires merely checking a header against entries in a table (see claims 2-3 below) and processing a data transfer packet is also straightforward, e.g., creating the packets with headers and data portions (see claim 4). On the other hand, handling control packets to, for example, open or close a connection, requires much more processing. See Section V.b.5. (describing opening a connection). Accordingly, to reduce the complexity of the offloading device (e.g., Erickson’s I/O device), it was known (as Tanenbaum96 teaches) to only handle ESTABLISHED connections on the fast path. See id at .583-.584. Thus, a POSA would have been motivated to apply these teachings of Tanenbaum96 to Erickson. See
INTEL Ex.1003.100
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-15
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.P.3] [a first processor] operating a stack of protocol processing layers that create a context for communication, the context including a media access control (MAC) layer address, an Internet Protocol (IP) address and Transmission Control Protocol (TCP) state information, the device comprising: Section X.A. (motivations to combine). Specifically, a POSA would have been motivated for the host of Erickson to operate the stack of protocol processing layers to establish the connection (its “Slow Applications” path), which creates the context (the connection record), and then process subsequent packets on the Erickson I/O device. Accordingly, Erickson in view of Tanenbaum96 discloses that the first processor of the host computer operates a stack of protocol processing layers (its slow path stack for normal stream processing to set up a connection) that create a context for communication (registers 508 and 504, endpoint table 514, and endpoint protocol data 518, TCP protocol script and the pre-negotiated protocol information), the context including a media access control (MAC) layer address, an Internet Protocol (IP) address and Transmission Control Protocol (TCP) state information (address information to send the TCP/IP packet over the network that Erickson pre-negotiates, as well as TCP fields such as the sequence number and window size).
INTEL Ex.1003.101
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-16
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.1] a communication processing mechanism connected to the first processor, Erickson discloses a communication processing mechanism connected to the first processor. Specifically, the I/O device of Erickson (the “device”) includes a communication processing mechanism as shown in the below annotated figure, which includes a second processor (see below element [1.2]) with scripts to send data (1) to the second apparatus (via a commodity interface that handles the physical connection), and (2) when receiving data from the second apparatus, to host endpoint applications.
FIG. 3 is a flow diagram describing the system data flow of fast and slow applications 302, 304, and 306 compatible with the present invention. A traditional slow application 306 uses normal streams processing 308 to send information to a pass-through driver 310. The pass-through driver 310 initializes the physical hardware registers 320 of the I/O device adapter 314 to subsequently transfer the information through the I/O device adapter 314 to the commodity interface 322. With the present invention, fast user applications 302 and 304 directly use a setup driver 312 to initialize the physical hardware registers 320, then send the information directly through the I/O device adapter 314 to the commodity interface 322 via virtual hardware 316 and 318. Thus, the overhead of the normal streams processing 308 and pass-through driver 310 are eliminated with the use of the virtual hardware 316 and 318 of the present invention, and fast applications 302 and 304 are able to send and receive information more quickly than slow application 306. As a result, the present invention provides higher bandwidth, less latency, less system overhead, and shorter path lengths. FIG. 4 is a block diagram describing a direct application interface (DAI) and routing of data between processes and an external data connection which is compatible with the present invention. Processes 402 and 404 transmit and receive information directly to and from an interconnect 410 (e.g., I/O
INTEL Ex.1003.102
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-17
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.1] a communication processing mechanism connected to the first processor,
device adapter) through the DAI interface 408. The information coming from the interconnect 410 is routed directly to a process 402 or 404 by use of virtual hardware and registers, rather than using a traditional operating system interface 406.
Ex.1005, Erickson at 4:53-5:14, see also id. at 4:18-23 (running scripts). The communication processing mechanism of the I/O device is connected to the first processor through standard device buses:
FIG. 2 is a block diagram illustrating a virtual hardware memory organization compatible with the present invention. I/O device adapters on standard I/O buses, such as ISA, EISA, MCA, or PCI buses, frequently have some amount of memory and memory-mapped registers which are addressable from a device driver in the operating system.
Id. at 3:36-42. Accordingly, Erickson discloses a communication processing mechanism (the processor of the I/O device) connected to the first processor (via buses and addressing mapping).
INTEL Ex.1003.103
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-18
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.2] said communication processing mechanism containing a second processor Erickson discloses that the communication processing mechanism contains a second processor. Specifically, the I/O device includes memory and runs scripts:
An I/O device adapter typically can have an arbitrary amount of random access memory (RAM) ranging from several hundred kilobytes to several megabytes, which may be used for mapping several user processes in a single communications node.
Id. at 5:27-31.
A script is prepared by the operating system for the I/O device adapter to execute each time the specific user process programs its specific virtual hardware. The user process is given a virtual address in the user process' address space that allows the user process very specific access capabilities to the I/O device adapter.
Id. at 4:18-23; see also id. at 7:48-8:26 (example script). I annotate these on Figure 5 of Erickson below:
INTEL Ex.1003.104
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-19
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.2] said communication processing mechanism containing a second processor
Id. at Fig. 5 (annotated). It would have been understood, and certainly obvious, that the I/O device of Erickson includes a processor because it is executing scripts in a high level language that requires a processor. See, e.g., id. at 7:48-8:26 (example high level script). Further, Erickson discloses that the I/O device computes the checksum via a function call, which would be understood as using the CPU to perform arithmetic functions to compute this value (i.e., using a processor). See id. Note that the scripts are in an un-interpreted language, meaning that a processor must first compile the scripts into an instruction set for the processor, and then execute the script. See, e.g., id. at 7:48-8:26 (example script). Note that the definition of a processor is that interprets and executes commands, i.e., the scripts of Erickson. Ex.1037, Computer Dictionary, Microsoft (1994) at .010, .011. Accordingly, Erickson discloses that the communication processing
INTEL Ex.1003.105
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-20
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.2] said communication processing mechanism containing a second processor mechanism contains a second processor (the processor of the I/O device to execute the scripts).
INTEL Ex.1003.106
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-21
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96
[1.3] [second processor] running instructions to process a message packet such that the context is employed to transfer data contained in said packet to the first apparatus memory and Erickson in view of Tanenbaum96 discloses running instructions to process a message packet such that the context is employed to transfer data contained in said packet to the first apparatus memory. Specifically, Erickson discloses running scripts, i.e., instructions:
A script is prepared by the operating system for the I/O device adapter to execute each time the specific user process programs its specific virtual hardware. The user process is given a virtual address in the user process' address space that allows the user process very specific access capabilities to the I/O device adapter.
Id. at 4:18-23; see also id. at 7:48-8:26 (example script). The user process invokes a script for the protocol to be used for the connection. The particular set of instructions for that script is part of the context.
The second function is to define a particular set of instructions to be performed based upon the protocol type. Each type of protocol will have its own script. Types of protocols include, but are not limited to, TCP/IP, UDP/IP, BYNET lightweight datagrams, deliberate shared memory, active message handler, SCSI, and File Channel.
Id. at 5:45-51. These scripts include processing incoming data and transferring that data to the memory of the first apparatus (the memory of the host computer that corresponds to the user process that is ultimately receiving the user data) by employing fields of the context (present in, e.g., registers 504 and 508, endpoint table 514, and endpoint protocol data 518):
INTEL Ex.1003.107
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-22
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.3] [second processor] running instructions to process a message packet such that the context is employed to transfer data contained in said packet to the first apparatus memory and
FIG. 5 is a block diagram illustrating the system organization between a main memory and an I/O device adapter memory which is compatible with the present invention. The main memory 502 implementation includes a hardware register 504 and a buffer pool 506. The I/O device adapter implementation includes a software register 508 and a physical address buffer map 510 in the adapter's memory 512. An endpoint table 514 in the memory 512 is used to organize multiple memory pages for individual user processes. Each entry within the endpoint table 514 points to various protocol data 518 in the memory 512 in order to accommodate multiple communication protocols, as well as previously defined protocol scripts 516 in the memory 512, which indicate how data or information is to be transferred from the memory 512 of the I/O device adapter to the portions of main memory 502 associated with a user process.
Id. at 5:53-67. Recall that the user processes, via the device driver, sets up register 504 and 508, endpoint table 514, and endpoint protocol data 518 to create parts of the context. Id. at 6:1-9, see also id. at 6:57-7:4 (pre-negotiating, i.e., providing to the I/O device, header information for the context). Erickson’s protocol scripts plus other context information (present in, e.g., registers 504 and 508, endpoint table 514, and endpoint protocol data 518 and protocol script), include instructions to process a message packet such that the context is employed to transfer data contained in said packet to the first apparatus memory to transfer incoming data “from the memory 512 of the I/O device adapter to the portions of main memory 502 associated with a process.” Id. at 5:53-67. Erickson further details the transferring to the host memory, depicting the I/O device receiving data (adapter of I/O device 410) and directly providing it to a user process (via memory of the host computer) in Figure 4:
INTEL Ex.1003.108
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-23
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.3] [second processor] running instructions to process a message packet such that the context is employed to transfer data contained in said packet to the first apparatus memory and
FIG. 4 is a block diagram describing a direct application interface (DAI) and routing of data between processes and an external data connection which is compatible with the present invention. Processes 402 and 404 transmit and receive information directly to and from an interconnect 410 (e.g., I/O device adapter) through the DAI interface 408. The information coming from the interconnect 410 is routed directly to a process 402 or 404 by use of virtual hardware and registers, rather than using a traditional operating system interface 406.
Ex.1005, Erickson at 5:6-5:14, see also 4:53-5:5 and Fig. 3 (illustrating that I/O device 314 sends data to applications 302 and 304 that reside within the memory of the host computer). Accordingly, Erickson in view of Tanenbaum96 disclose running instructions (specified by the scripts) to process a message packet such that the context (including the script, registers 508 and 504, endpoint table 514, endpoint protocol data 518, pre-negotiated information, and pointer to main memory) is employed (to identify the relevant protocol script to run, and further to identify where to write the received data) to transfer data contained in said packet to the first apparatus memory (memory of host computer)
INTEL Ex.1003.109
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-24
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.4] [second processor running instructions to process a message packet such that] the TCP state information is updated by said second processor. Erickson in view of Tanenbaum96 discloses the second processor running instructions to process a message such that the TCP state information is updated by the second processor. Specifically, Tanenbaum96 discloses:
…
Ex.1006, Tanenbaum96 at .584-.585. The “connection records” disclosed in Tanenbaum96 are used to maintain TCP state:
When an application on the client machine issues a CONNECT request, the local TCP entity creates a connection record, marks it as being in the SYN SENT state, and sends a SYN segment. Note that many connections may be open (or being opened) at the same time on behalf of multiple applications, so the state is per connection and recorded in the connection record.
Id. at .549. This connection record includes the TCB (transmission control block) that corresponds to the connection. The connection information for the connection is often referred to as the TCB. See Section V.B.7.
INTEL Ex.1003.110
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-25
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [1.4] [second processor running instructions to process a message packet such that] the TCP state information is updated by said second processor.
Id. at .584. Erickson has an analogous disclosure (to, e.g., updating the TCP sequence number) in which it updates the IP 16-bit counters between packets.
Within the udpscript procedure described above, the nextid( ) function provides a monotonically increasing 16-bit counter required by the IP protocol.
Ex.1005, Erickson at 8:10-12. Accordingly, Erickson in view of Tanenbaum96 teaches the TCP state information is updated by said second processor (the connection record, which includes TCP state information, e.g., sequence number).
INTEL Ex.1003.111
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-26
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [2.1] The device of claim 1, wherein said communication processing mechanism includes a receive sequencer with directions to classify said packet, wherein said packet contains control information corresponding to the stack of protocol layers. Erickson in view of Tanenbaum96 discloses wherein said communication processing mechanism includes a receive sequencer with directions to classify said packet, wherein said packet contains control information corresponding to the stack of protocol layers. As an initial matter, note that Erickson discloses a slow path (Slow Application) and fast path (Fast Application) and thus teaches the concept of classifying packets for each path. See id. at Fig. 3. Erickson also teaches classifying packets by protocol type because each requires a different script. See id. at 5:41-51. In view of these disclosures, it would have been obvious distinguish between fast and slow path processing using Tanenbaum96’s teachings, namely, its “header prediction.” See Section X.A. (discussing motivations to combine). First, Tanenbaum96 discloses a receive sequencer (as part of its “transport entity,” which is a processor executing instructions and that receives a sequence of packets for respective connections). Note that the “transport entity” of Tanenbaum96, consistent with the I/O device of Erickson, may reside on the network interface card. Ex.1006, Tanenbaum96 at .515-.516. Including this hardware in Erickson, as Tanenbaum teaches below, would perform the bypass test by, e.g., checking the sequence number as part of its “header prediction,” checking the connection record, and classifying the packet according to fast or slow path (Erickson Fast Application or Slow Application):
INTEL Ex.1003.112
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-27
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [2.1] The device of claim 1, wherein said communication processing mechanism includes a receive sequencer with directions to classify said packet, wherein said packet contains control information corresponding to the stack of protocol layers.
Ex.1006, Tanenbaum96 at .585. Note the “transport entity” (a processor executing instructions) of Tanenbaum96 performs the testing:
INTEL Ex.1003.113
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-28
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [2.1] The device of claim 1, wherein said communication processing mechanism includes a receive sequencer with directions to classify said packet, wherein said packet contains control information corresponding to the stack of protocol layers.
Id. at .583. Second, as shown above, Tanenbaum96’s transport entity uses “instructions,” that is, it has “directions to classify said packet.” This is consistent with Erickson’s teachings of scripts. Third, the TCP/IP packet contains control information corresponding to the stack of protocol layers:
Id. at .584. For example, control information in the TCP header includes a “sequence
INTEL Ex.1003.114
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-29
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [2.1] The device of claim 1, wherein said communication processing mechanism includes a receive sequencer with directions to classify said packet, wherein said packet contains control information corresponding to the stack of protocol layers. number” that controls the packet’s placement within the application data and “port” fields that control the communication flow. Control information in the IP header includes a VER field (version of IP packet that controls its processing) and address fields that also control the communication flow. Accordingly, Erickson in view of Tanenbaum96 discloses that the communication processing mechanism includes a receive sequencer (Erickson’s I/O adapter using Tanenbaum96’s header prediction to classify packets for fast path processing) with directions (instructions) to classify said packet (fast versus slow path), wherein said packet contains control information corresponding to the stack of protocol layers (control information in TCP/IP headers).
INTEL Ex.1003.115
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-30
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96
[3.1] The device of claim 1, wherein said communication processing mechanism includes a receive sequencer with directions to generate a summary of a second message packet received from the network, said second packet containing control information corresponding to the stack of protocol layers, and said instructions including an instruction to compare said summary with said context. Erickson in view of Tanenbaum96 discloses wherein said communication processing mechanism includes a receive sequencer with directions to generate a summary of a second message packet received from the network, said second packet containing control information corresponding to the stack of protocol layers, and said instructions including an instruction to compare said summary with said context. First, Erickson in view of Tanenbaum96 discloses that the communication processing mechanism includes a receive sequencer with directions to generate a summary of a second message packet received from the network. As noted in claim 2, it would have been obvious to combine Erickson with Tanenbaum96’s header prediction teachings. See Section X.A. (discussing motivations to combine). As noted in claim 2, Erickson in view of Tanenbaum96 discloses a receive sequencer with directions. Further, Tanenbaum96 discloses this receive sequencer (the “transport entity” hardware and header prediction) may produce a summary (the IP addresses and port portions of the headers) of the incoming packets (and thus a “second packet”) and use a hash of the IP addresses to look up the context; the summary is then compared against the context to “verify that the correct record has been found”:
INTEL Ex.1003.116
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-31
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [3.1] The device of claim 1, wherein said communication processing mechanism includes a receive sequencer with directions to generate a summary of a second message packet received from the network, said second packet containing control information corresponding to the stack of protocol layers, and said instructions including an instruction to compare said summary with said context.
Ex.1006, Tanenbaum96 at .584-.585. Second, Erickson in view of Tanenbaum96 discloses that the second packet contains control information corresponding to the stack of protocol layers:
INTEL Ex.1003.117
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-32
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [3.1] The device of claim 1, wherein said communication processing mechanism includes a receive sequencer with directions to generate a summary of a second message packet received from the network, said second packet containing control information corresponding to the stack of protocol layers, and said instructions including an instruction to compare said summary with said context.
Id. at .584. For example, control information in the TCP header includes a “sequence number” that controls the packet’s placement within the application data and “port” fields that control the communication flow. Control information in the IP header includes a VER field (version of IP packet that controls its processing) and address fields that also control the communication flow. Third, Erickson in view of Tanenbaum96 discloses that the instructions include an instruction to compare said summary with said context. As shown above, Tanenbaum96 discloses using the summary (the IP addresses and ports) to compare against the context (connection record) to verify the correct record is found. This determines whether to take the fast or slow path. A POSA would have understood that the process of using the hash to fetch the connection record and comparing with the addresses and ports is performed by instructions (Tanenbaum refers to “instructions” above). A POSA would also understand that comparing a summary to a context, as required by this claim element, must involve steps of extracting the relevant
INTEL Ex.1003.118
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-33
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [3.1] The device of claim 1, wherein said communication processing mechanism includes a receive sequencer with directions to generate a summary of a second message packet received from the network, said second packet containing control information corresponding to the stack of protocol layers, and said instructions including an instruction to compare said summary with said context. fields and performing operations equivalent to the steps disclosed by Tanenbaum96. Such operations would involve multiple operations (e.g. fetch, mask, compare, conditional branch) and would not typically be done by a single computer instruction. However, a POSA would have understood that the whole process could be performed by a single macro instruction that indicates failure or success and thus the packet is a proper candidate for fast path processing (for the I/O device protocol processing when applying this teaching to Erickson). Performing these operations with a sequence of instructions or a single macro instruction is a simple design choice that could be performed by a POSA with predictable results. Hence, Tanenbaum96 discloses an instruction to compare said summary with said context. Accordingly, Erickson in view of Tanenbaum96 discloses that the communication processing mechanism includes a receive sequencer (Erickson’s I/O adapter with Tanenbaum96’s header prediction) with directions (instructions) to generate a summary of a second message packet received from the network (hash of addresses), said second packet containing control information corresponding to the stack of protocol layers (TCP/IP control information in headers), and said instructions including an instruction to compare said summary with said context (the instruction indicating fast or slow path based on using the hash to fetch the connection record and perform the address and port comparisons).
INTEL Ex.1003.119
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-34
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [4.1] The device of claim 1, wherein said instructions include a first instruction to create a header corresponding to said context and having control information corresponding to several of the protocol processing layers, and Erickson in view of Tanenbaum96 discloses that the instructions include a first instruction to create a header corresponding to said context and having control information corresponding to several of the protocol processing layers. Specifically, Erickson discloses that the I/O device uses the scripts (instructions) to create a header:
Protocol scripts typically serve two functions. The first function is to describe the protocol the software application is using. This includes but is not limited to how to locate an application endpoint, and how to fill in a protocol header template from the application specific data buffer. The second function is to define a particular set of instructions to be performed based upon the protocol type. Each type of protocol will have its own script. Types of protocols include, but are not limited to, TCP/IP, UDP/IP, BYNET lightweight datagrams, deliberate shared memory, active message handler, SCSI, and File Channel.
Ex.1005, Erickson at 5:41-51.
FIG. 7 is a block diagram illustrating a UDP datagram template 702 (without a user data area) residing in the I/O device adapter's memory. The user process provides the starting address and the length for the user data in its virtual address space, and then "spanks" a GO register to trigger the I/O device adapter's execution of a predetermined script. The I/O device adapter stores the user data provided by the user process in the I/O device adapter's memory, and then transmits the completed UDP datagram 702 over the media.
Id. at 7:39-47.
INTEL Ex.1003.120
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-35
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [4.1] The device of claim 1, wherein said instructions include a first instruction to create a header corresponding to said context and having control information corresponding to several of the protocol processing layers, and As discussed in element [1.4], Erickson discloses updating control information, which in the TCP/IP context, would include the sequence number. In light of these disclosures, it would have been obvious to create a TCP/IP header with, for example, updated sequence number in view of Tanenbaum96’s teachings. See Section X.A. (discussing motivations to combine). Tanenbaum96 discloses creating such TCP/IP headers:
Ex.1006, Tanenbaum96 at .584. Note that the header includes control information:
INTEL Ex.1003.121
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-36
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [4.1] The device of claim 1, wherein said instructions include a first instruction to create a header corresponding to said context and having control information corresponding to several of the protocol processing layers, and
Id. at .584. For example, control information in the TCP header includes a “sequence number” that controls the packet’s placement within the application data and “port” fields that control the communication flow. Control information in the IP header includes a VER field (version of IP packet that controls its processing) and address fields that also control the communication flow. A POSA would understand that the header creation could be done by a sequence of instructions or a single macro instruction (a first instruction). As noted above, Tanenbaum refers to “instructions.” Accordingly, Erickson in view of Tanenbaum96 discloses that the instructions (scripts) include a first instruction to create a header corresponding to said context (using header templates) and having control information corresponding to several of the protocol processing layers (control information in the TCP/IP headers).
INTEL Ex.1003.122
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-37
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [4.2] said instructions include a second instruction to prepend said header to second data for transmission of a second packet. Erickson in view of Tanenbaum96 discloses that the instructions include a second instruction to prepend said header to second data for transmission of a second packet. Under the broadest reasonable construction standard, “prepend” would have been understood to mean “adds to the front.” See Section VIII.C. First, Erickson discloses a script for filling in a protocol header template:
Protocol scripts typically serve two functions. The first function is to describe the protocol the software application is using. This includes but is not limited to how to locate an application endpoint, and how to fill in a protocol header template from the application specific data buffer. The second function is to define a particular set of instructions to be performed based upon the protocol type. Each type of protocol will have its own script. Types of protocols include, but are not limited to, TCP/IP, UDP/IP, BYNET lightweight datagrams, deliberate shared memory, active message handler, SCSI, and File Channel10.
Ex.1005, Erickson at 5:41-51.
FIG. 7 is a block diagram illustrating a UDP datagram template 702 (without a user data area) residing in the I/O device adapter's memory. The user process provides the starting address and the length for the user data in its virtual address space, and then "spanks" a GO register to trigger the I/O device
10 This is most likely a typo in Erickson and should have said “Fibre Channel,” an
industry-standard storage network.
INTEL Ex.1003.123
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-38
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [4.2] said instructions include a second instruction to prepend said header to second data for transmission of a second packet.
adapter's execution of a predetermined script. The I/O device adapter stores the user data provided by the user process in the I/O device adapter's memory, and then transmits the completed UDP datagram 702 over the media.
Id. at 7:39-47. As noted above, it would have been obvious to create a TCP/IP header with, for example, updated sequence number in view of Tanenbaum96’s teachings that disclose this. See Section X.A. (discussing motivations to combine). Applying these teachings, the I/O device would prepend the TCP/IP header via prepending the header to buffer memory and filling in the application data:
Ex.1006, Tanenbaum96 at .584.
INTEL Ex.1003.124
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-39
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [4.2] said instructions include a second instruction to prepend said header to second data for transmission of a second packet. It would have been obvious to add the headers to the front of data for transmission. There are at least two obvious approaches: prepending headers to data, or appending data to headers. Each approach is predictable and easy to implement. Each simply requires adding either the data portion or the header portions to the other portion. A POSA would have been motivated to prepend the header because the data may already be residing in the I/O device, while the header requires calculating, for example, the next sequence number. Accordingly, after such calculations are performed, the headers can then be prepended onto the data portion. Moreover, the headers portions are of a defined size, so it would have been understood as a simple implementation to reserve buffer space for the headers (in front of the data), and then prepend the headers onto the data in that buffer space. Finally, prepending was standard. See Section V.B.6. A POSA would also understand that the header creation could be done by a sequence of instructions or a single macro instruction (a second instruction). Accordingly, Erickson in view of Tanenbaum96 discloses that the instructions include a second instruction to prepend said header (via the Erickson header template) to second data for transmission of a second packet (the I/O device sending the second packet onto the network).
INTEL Ex.1003.125
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-40
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [5.1] The device of claim 1, wherein said communication processing mechanism has a direct memory access unit to send, based upon said context, said data from said communication processing mechanism to the first apparatus memory, Erickson in view of Tanenbaum96 discloses that the communication processing mechanism has a direct memory access unit to send, based upon said context, said data from said communication processing mechanism to the first apparatus memory, without a header accompanying said data. Erickson’s protocol scripts plus other context information (resident in registers 504 and 508, endpoint table 514, and endpoint protocol data 518) transfer data to the memory of the first apparatus (the memory of the host computer):
FIG. 5 is a block diagram illustrating the system organization between a main memory and an I/O device adapter memory which is compatible with the present invention. The main memory 502 implementation includes a hardware register 504 and a buffer pool 506. The I/O device adapter implementation includes a software register 508 and a physical address buffer map 510 in the adapter's memory 512. An endpoint table 514 in the memory 512 is used to organize multiple memory pages for individual user processes. Each entry within the endpoint table 514 points to various protocol data 518 in the memory 512 in order to accommodate multiple communication protocols, as well as previously defined protocol scripts 516 in the memory 512, which indicate how data or information is to be transferred from the memory 512 of the I/O device adapter to the portions of main memory 502 associated with a user process.
Ex.1005, Erickson at 5:53-67. Note that the user processes, via the device driver, sets up register 504 and 508, endpoint table 514, and endpoint protocol data 518 to create the context:
Typically, when a user process opens a device driver, the
INTEL Ex.1003.126
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-41
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [5.1] The device of claim 1, wherein said communication processing mechanism has a direct memory access unit to send, based upon said context, said data from said communication processing mechanism to the first apparatus memory,
process specifies its type, which may include, but is not limited to, a UDP datagram, source port number, or register address. The user process also specifies either a synchronous or asynchronous connection. The device driver sets up the registers 508 and 504, endpoint table 514, and endpoint protocol data 518. The protocol script 516 is typically based upon the endpoint data type, and the endpoint protocol data 518 depends on protocol specific data.
Id. at 6:1-9. Erickson further depicts the I/O device receiving data and directly providing it to an application (via memory of the host computer) in Figure 4:
FIG. 4 is a block diagram describing a direct application interface (DAI) and routing of data between processes and an external data connection which is compatible with the present invention. Processes 402 and 404 transmit and receive information directly to and from an interconnect 410 (e.g., I/O device adapter) through the DAI interface 408. The information
INTEL Ex.1003.127
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-42
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [5.1] The device of claim 1, wherein said communication processing mechanism has a direct memory access unit to send, based upon said context, said data from said communication processing mechanism to the first apparatus memory,
coming from the interconnect 410 is routed directly to a process 402 or 404 by use of virtual hardware and registers, rather than using a traditional operating system interface 406.
Id. at 5:6-5:14, see also at 4:53-5:5 and Fig. 3 (illustrating that I/O device 314 sends data to applications 302 and 304 that reside within the memory of the host computer). Similarly, Tanenbaum discloses that the “fast path … copies the data to the user.” Ex.1006, Tanenbaum96 at .585. The “user” refers to the application running on the host, i.e., it is using the protocol stack for communication. Erickson specifically describes a direct memory access (DMA) unit of the I/O device which would be understood, and certainly obvious, to perform the function of directly sending data from the I/O device to the memory of the host computer:
The adapter would not want to be forced to access the user data twice over the I/O bus, once for the calculation performed by the udpchecksum() function, and a second time for transmission over the media. Instead, the adapter would most likely retrieve the needed user data from the user process' virtual address space using direct memory access (DMA) into the main memory over the bus and retrieving the user data into some portion of the adapter's memory, where it could be referenced more efficiently. The programming steps performed in the udpscript() procedure above might need to be changed to reflect that.
Id. at 8:27-37. Erickson also depicts, as it would be understood by a person having ordinary skill, using DMA to directly write from the I/O device to the host memory in
INTEL Ex.1003.128
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-43
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [5.1] The device of claim 1, wherein said communication processing mechanism has a direct memory access unit to send, based upon said context, said data from said communication processing mechanism to the first apparatus memory, Figure 5:
Id. at Fig. 5 (annotated). DMA (Direct Memory Access) is a hardware-based technique for transferring data between memory systems or between a host memory and an I/O device. See Section V.H.1. (explaining DMA). DMA enables hardware to access direct memory without requiring processor involvement during the read or write process. See, e.g., Ex.1012, U.S. Pat. No. 4,831,523, at 9:2-7. Erickson discloses DMA, but only describes its use for transferring data from main memory to the adapter:
…the adapter would most likely retrieve the needed user data from the user process’ virtual address space using direct memory access (DMA) into the main memory over the bus and retrieving the user data into some portion of the adapter's memory, where it could be referenced more efficiently.
INTEL Ex.1003.129
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-44
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [5.1] The device of claim 1, wherein said communication processing mechanism has a direct memory access unit to send, based upon said context, said data from said communication processing mechanism to the first apparatus memory, Ex.1005, Erickson at 8:30-35. A POSA would understand that typical DMA engines can be used for both reading and writing data and it would also be beneficial to use DMA to send data from the adapter memory to main memory. The use of DMA would allow data movement without consuming processor cycles on either the application processor or the adapter processor running scripts. Thus Erickson, along with the knowledge of a POSA, discloses a direct memory access unit to send… data from said communication processing mechanism (the adapter) to the first apparatus memory (main memory). See also Section V.H.1. Further, Erickson employs context to receive packets, process the packets, and transfer the data to the host computer memory:
FIG. 5 is a block diagram illustrating the system organization between a main memory and an I/O device adapter memory which is compatible with the present invention. The main memory 502 implementation includes a hardware register 504 and a buffer pool 506. The I/O device adapter implementation includes a software register 508 and a physical address buffer map 510 in the adapter's memory 512. An endpoint table 514 in the memory 512 is used to organize multiple memory pages for individual user processes. Each entry within the endpoint table 514 points to various protocol data 518 in the memory 512 in order to accommodate multiple communication protocols, as well as previously defined protocol scripts 516 in the memory 512, which indicate how data or information is to be transferred from the memory 512 of the I/O device adapter to the portions of main memory 502 associated with a user process.
Id. at 5:53-67.
INTEL Ex.1003.130
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-45
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [5.1] The device of claim 1, wherein said communication processing mechanism has a direct memory access unit to send, based upon said context, said data from said communication processing mechanism to the first apparatus memory, As to where exactly Erickson writes the data to the host memory, Erickson discloses, with respect to the I/O device sending data, using pointers to indicate where data is in the host memory so that the I/O device can directly retrieve the data from the host memory, add headers, and send the data. This would be understood by a POSA that in reverse (when receiving data), the I/O device would use a pointer to instruct the DMA to directly write the data to the host memory at the pointer’s address. See id. at 6:1-41 (describing pointer STARTADDRESS, which the I/O device stores in its memory, as the pointer to the data for the I/O device to retrieve, packet, and transmit). Accordingly, Erickson in view of Tanenbaum96 discloses that the communication processing mechanism has a direct memory access unit (Erickson’s DMA) to send, based upon said context (e.g., information in registers 508, endpoint table 514, protocol data 518, and obvious pointers to user process memory space), said data (received packets) from said communication processing mechanism to the first apparatus memory (main memory).
INTEL Ex.1003.131
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-46
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [5.2] without a header accompanying said data.
Erickson in view of Tanenbaum96 teaches without a header accompanying said data. Note that as the I/O device is performing the TCP/IP processing, it strips the TCP, IP and MAC headers of the data before transferring data to main memory at the virtual address of the user process. See Section V.B. (describing TCP/IP layer processing encapsulation process – upon receiving data, it works in reverse). Accordingly, to the extent that Erickson does not expressly disclose stripping off these headers, it would be obvious, as the entire point of offloading this processing is so that the host does not perform these functions (moreover, the user application is expecting only data, not data with headers, because it only receives data after protocol processing). Moreover, removing the headers before sending the data to host memory would be obvious in view of Tanenbaum96. Tanenbaum96 describes the fast path “cop[ying] the data to the user,” i.e., the “data” and not the header (the data portion of the packets). Ex.1006, Tanenbaum96 at .567. Recall that the transport entity is performing these functions. Id. at .565-.567. And recall that the “transport entity” of Tanenbaum96, consistent with the I/O device of Erickson, may reside on the network interface card. Id. at .497-.498. Accordingly, in view of Tanenbaum96, the Erickson I/O device would copy “the data to the user,” i.e., would copy only the data to the host memory. The reason for offloading this processing is so that the host does not perform these functions (moreover, the user application is expecting only data, not data with headers, because it only receives data after protocol processing). Here, the I/O device is offloading this function. Accordingly, Erickson in view of Tanenbaum teaches providing the data to the host without a header accompanying said data.
INTEL Ex.1003.132
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-47
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [6.1] The device of claim 1, wherein said context includes a receive window of space in the memory that is available to store application data, and said communication processing mechanism advertises said receive window. Erickson in view of Tanenbaum96 discloses that the context includes a receive window of space in the memory that is available to store application data, and said communication processing mechanism advertises said receive window. As noted, it would be obvious to combine Erickson with Tanenbaum96’s TCP/IP teachings to effectuate a TCP/IP connection with Erickson’s I/O device. See Section X.A. (describing motivations to combine). TCP inherently includes interfaces in which a communication processing mechanism advertises said receive window because the use of the receive window is required for systems to communicate using TCP. See Section V.B.9. (describing advertising a receive window). The TCP/IP headers, as Tanenbaum96 teaches, includes the “window size” field. This field is part of the context because it is in the header prototype used by the I/O device to create headers. The Window size field is part of the context that Erickson pre-negotiates in view of Tanenbaum96:
Ex.1006, Tanenbaum96 at .584 (annotated).
INTEL Ex.1003.133
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-48
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [6.1] The device of claim 1, wherein said context includes a receive window of space in the memory that is available to store application data, and said communication processing mechanism advertises said receive window. The window size is how much space in the memory (e.g., a buffer) that is available to store application data (i.e., incoming application data), and said communication processing mechanism advertises the receive window by including it (and dynamically adjusting it) in each packet:
Id. at .554-.555. The receive window of space in the memory that is available to store application data is the receiver’s buffer, as described above. Combining Tanenbaum96 with Erickson, the receive buffer would be located in Erickson’s main memory.
INTEL Ex.1003.134
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-49
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [6.1] The device of claim 1, wherein said context includes a receive window of space in the memory that is available to store application data, and said communication processing mechanism advertises said receive window. Accordingly, Erickson in view of Tanenbaum96 discloses wherein said context includes a receive window of space in the memory that is available to store application data (window size field of the TCP/IP packet), and said communication processing mechanism advertises said receive window (via sending the packets).
INTEL Ex.1003.135
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-50
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [7.1] The device of claim 1, wherein said context includes TCP ports of said first and said second apparatuses. Erickson in view of Tanenbaum96 discloses that the context includes TCP ports of said first and said second apparatuses. Specifically, Erickson discloses pre-negotiating ports for datagrams to the I/O device as part of the connection setup:
In this example, the user process and the device driver has pre-negotiated the following fields from FIG. 6: (1) Ethernet Header 604 (Target Ethernet Address, Source Ethernet Address, and Protocol Type); (2) IP Header 606 (Version, IP header Length, Service Type, Flag, Fragment Offset, Time_to_Live, IP Protocol, IP Address of Source, and IP Address of Destination); and (3) UDP Header 608 (Source Port and Destination Port). Only the shaded fields in FIG. 6, and the user data 610, need to be changed on a per-datagram basis.
Ex.1005, Erickson at 6:63-7:4; see also id. at Fig. 6. As noted, it would be obvious to combine Erickson with Tanenbaum96’s teachings for a TCP/IP connection. See Section X.A. (describing motivations to combine). A TCP packet includes a TCP source and destination port number, and thus the Erickson pre-negotiating for a TCP/IP connection in view of Tanenbaum96 would include creating these values as part of the context (as it must use them to create headers):
INTEL Ex.1003.136
Petition for Inter Partes Review of 7,237,036 Ex. 1003 (“Horst Decl.”)
APPENDIX A
A-51
U.S. Pat. No. 5,768,618 (“Erickson”) in view of Tanenbaum96 [7.1] The device of claim 1, wherein said context includes TCP ports of said first and said second apparatuses.
Ex.1006, Tanenbaum96 at .544. Accordingly, Erickson in view of Tanenbaum96 discloses wherein said context includes TCP ports of said first and said second apparatuses.
INTEL Ex.1003.137