an overview of systems and networking research at microsoft research michael b. jones systems and...

Post on 21-Dec-2015

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

An Overview of Systems and An Overview of Systems and Networking Research at Networking Research at

Microsoft ResearchMicrosoft Research

Michael B. Jones

Systems and Networking Research Group,Microsoft Research

April 1999

Microsoft ResearchMicrosoft ResearchA quick primerA quick primer

Founded in 1991 Goal: Pursue strategic technologies

for Microsoft Original research groups:

Natural Language Processing Operating Systems Programming Languages

Microsoft ResearchMicrosoft Research Over 300 researchers in 27 areas

Speech, Decision Theory, Graphics, Databases, to Statistical Physics

Research lab locations: Redmond, San Francisco, Cambridge (UK),

Beijing

Internationally recognized research teams Hundreds of publications, presentations Leadership roles in professional societies,

journals, conferences

Fastest Growing CS Fastest Growing CS Research Organization Research Organization

In The WorldIn The World Grew by factor of four from ’94 to ’97 Decided in ’97 to grow by a factor of

three in three years 200 in FY ’97 => 600 in FY ’00,

primarily in Redmond

Major impact on Microsoft products Virtually all MS products shipped today

use technology from Microsoft Research

Systems and Networking Systems and Networking Research GroupResearch Group

One of the original three research groups at Microsoft Research in Redmond Formerly called the “Operating Systems

Research Group” Name changed in 1998 to explicitly include

networking

Group presently 15 members Working in four areas

Past ProjectsPast Projects

Tiger Scalable, fault-tolerant multimedia file

system using commodity hardware

Rialto Real-time kernel enabling predictable

concurrent execution of independent real-time programs

Both were used in Microsoft's Interactive TV trial in 1996-1997 with NTT in Yokosuka, Japan

Current Research AreasCurrent Research Areas

Networking Distributed Computing Operating Systems Real-Time Systems

Group Members andGroup Members andCurrent Research AreasCurrent Research Areas

Victor Bahl – Net Bill Bolosky – OS Gerald Cermak – Dist.

Sys. Scott Cutshall – OS Rich Draves – Net John Douceur – OS Alessandro (Sandro)

Forin – Net Johannes Helander – OS

Galen Hunt – Dist. Sys. Mike Jones – Real-Time

Sys. Steve Levi – Dist. Sys. Venkat Padmanabhan –

Net Marvin Theimer – OS Yi-Min Wang – Dist. Sys. Brian Zill – Net

Networking ProjectsNetworking Projects Location Aware Systems and Services Hardware Adapter for Light-Weight

Mobile Networking IPv6 Automatic Network Configuration High Performance & Sys. Area Networking DCOM over SAN TCP Fast Start, Network Performance

Improvement Multicast-based Data Dissemination

Distributed Computing Distributed Computing ProjectsProjects

Millennium Distributed, Fault-Tolerant

Applications Automatic Application Partitioning Distributed Java Virtual Machine

Operating Systems ProjectsOperating Systems Projects

Componentized System Architecture Single-Instance Store Filesystem Unobtrusive Background

Computation Transactional Filesystem

Real-Time Systems ProjectsReal-Time Systems Projects

Real-Time Scheduling Real-Time Latency Measurement

Current ProjectsCurrent Projects

Grouped by Research Areas

Networking ProjectsNetworking Projects

Location Aware Systems and Location Aware Systems and ServicesServices

In-building location-aware system Wireless mobile nodes precisely

compute their geographic location Enable new class of mobile applications

E.g., use nearest printer, etc.

Victor Bahl, Venkat Padmanabhan, Turner Whitted, Josh Broch (CMU)

Hardware Adapter for Light-Hardware Adapter for Light-Weight Mobile NetworkingWeight Mobile Networking

MCoM (Mobile Communicator) Project Light-weight devices network in both

ad-hoc and controlled manner Investigates protocol and systems

issues: Energy conservation Multi-hop routing In presence of link failures, mobility

Victor Bahl, Turner Whitted

IPv6IPv6

Internet Protocol Version 6 (IPv6) implementation for Windows NT Freely downloadable Numerous v6 utilities also available

Multi-homing issues Rich Draves, Brian Zill, ISI (Allison

Mankin, etc.) Published in ’98 USENIX NT

Automatic Network Automatic Network ConfigurationConfiguration

Algorithms for auto-configuring IP networks

Address and subnet assignment that optimize the network’s efficiency

Rich Draves, Chris King (Northeastern), Cheenu Venkatachary (WUSTL)

Published in InfoCom ’99

High Performance & System High Performance & System Area NetworkingArea Networking

High-performance networking under NT VIA-like and memory-like interconnects It’s WinSock! No need to rewrite apps No loss of performance Easily extensible (RDMA, registration, …) Gigabit Ethernet Jumbo Frames

TCP Switch Layered WSP over SAN vendor’s WSP

Sandro Forin, Johannes Helander, NT Published at DARPA NT Workshop

Hybrid SAN-TCP/IP ArchitectureHybrid SAN-TCP/IP Architecture

User

Kernel

Winsock

AFD

MsAfd

TCP/IP

Switch

TDI App

Winsock App

Switch

SAN WS Provider

SAN NDIS MiniPort

SAN TDIProvider

SAN NIC

Winsock

AFD

MsAfd

TCP/IP

TDI App

Winsock App

SAN NIC

SAN MiniPort

SANWS

Driver

TDI

NDIS

Winsock SPI

TDI

DCOM Over SANDCOM Over SAN

Millennium Falcon project Implement high-performance distributed

object systems For clusters of servers Connected by SANs

Take full advantage of user-mode nets Current implementation based on DCOM

and VIA Yi-Min Wang

TCP Fast Start, Network TCP Fast Start, Network Performance ImprovementPerformance Improvement Reuse information learned in past

Rather than rediscover it each time E.g., TCP congestion window

Venkat Padmanabhan, Randy Katz (Berkeley)

Published at Globecom ’98 Internet Mini-Conference

Multicast-based Data Multicast-based Data DisseminationDissemination

Quantify potential benefits of multicast for information dissemination Based on HTTP logs

Evaluate algorithms/heuristics for deciding which data should be multicast

Venkat Padmanabhan

Distributed Computing Distributed Computing ProjectsProjects

Distributed, Fault-Tolerant Distributed, Fault-Tolerant ApplicationsApplications

Millennium Project Unifying vision behind several individual

prototype projects

Galen Hunt, Yi-Min Wang, Gerald Cermak, Johannes Helander, Rick Rashid

Initial position paper published at HotOS-VI, 1997

Problem Building distributed, fault tolerant

applications is too hard, costs too much

Goal Raise the level of abstraction provided

by the operating system Individual computers, file systems,

networks unimportant to component builders

MillenniumMillennium

App

NTNTNT

AppApp

COM+COM+COM+

App App

Millennium:Millennium:Raise the Level of AbstractionRaise the Level of Abstraction

Maintain single system image.

Transparent invocation, migration, and recovery.

Individual computers, file systems, and networks become unimportant to application developers.

Millennium

Application

Automatic Application Automatic Application PartitioningPartitioning

Millennium Coign Project Galen Hunt Published in OSDI ’99

Before: After:

Coign: Coign: Automatic Distributed Automatic Distributed

PartitioningPartitioning Converts local COM applications into

distributed client-server applications without source code.

The Plan:The Plan:1. Find Components in Application Binaries

2. Identify Interfaces and Measure Communication

3. Partition and Distribute Components

COP: Component Object COP: Component Object ProxyProxy

Transparently remote Win32 API calls Factor Win32 interface Automatically create DCOM interfaces Transparently insert proxy objects Galen Hunt, Gerald Cermak

Millennium ContinuumMillennium Continuum

Provides single system image for Windows API

Automatic object placement and migration at run-time

Language neutral At least Visual Basic, C, C++, Java

Based on COM+ Galen Hunt, Gerald Cermak, Rick

Rashid

Distributed Java Virtual Distributed Java Virtual MachineMachine

Millennium Borg project Makes multiple JVMs appear to be one Unmodified Java programs may run as

distributed applications Transparent distribution, migration Johannes Helander

Operating Systems ProjectsOperating Systems Projects

Componentized System Componentized System ArchitectureArchitecture

MMLite Project Kernel object architecture stressing

adaptability, minimalism, reusability Many normally “built-in” components

selectable, loadable E.g., Virtual Memory, IPC

Johannes Helander, Sandro Forin Published at ’98 SigOps European

Workshop

Single-Instance Store Single-Instance Store FilesystemFilesystem

Enables single on-disk instance of files with multiple logical copies

Sharing transparent to applications Replicas found in background, coalesced

Bill Bolosky, Scott Cutshall, John Douceur, NT filesystem group

Planned to ship with Windows 2000

Unobtrusive Background Unobtrusive Background ComputationComputation

“How to be Really Nice” Background processes that don’t

interfere with foreground work Even if neither CPU-bound

Based on progress metrics Back off when statistically significant

slowdown observed

John Douceur, Bill Bolosky

Transactional FilesystemTransactional Filesystem

Research version of NTFS with transactional semantics

Marvin Theimer

Real-Time Systems ProjectsReal-Time Systems Projects

Real-Time SchedulingReal-Time Scheduling

Scheduling abstractions enabling predictable concurrent execution of independent real-time programs

Mike Jones, John Regehr (Virginia), formerly Daniela Rou (GA Tech), Marcel Rou (GA Tech), George Candea (MIT)

Published in ’96 SigOps, ’97 SOSP, ’98 & ’99 USENIX Windows NT

Real-Time Latency Real-Time Latency MeasurementMeasurement

Understand, fix sources of long thread scheduling latencies in NT

Mike Jones, John Regehr (Virginia) Published in ’98 NOSSDAV & ’99

HotOS

Problem: “Unimportant” Problem: “Unimportant” Background WorkBackground Work

DEC dc21x4 PCI Fast 10/100 Ethernet 6ms periodic DPC every 5s

Autosense processing Most of 6ms in five 0.88ms calls to

routine that reads device register that: Writes a HW register – 1.5µs Stalls for 5µs Writes HW register again – 1.5µs Stalls for 5µs Reads a HW register – 1.5µs Stalls for 5µs

And does this 16 times! (once per bit)

Another Long DPC: Intel EE 16Another Long DPC: Intel EE 16

Intel EtherExpress 16 ISA Ethernet 17ms DPC every 10s Card reset for no received packets

Amusing Observation Unplugging Ethernet makes latency

worse! Despite conventional wisdom to the

contrary

Even Worse: Video CardsEven Worse: Video Cards Video cards and drivers conspire to

hog the PCI bus Dragging large window locks out

interrupts for up to 30ms Obliterates sound I/O, for instance Can set registry key to ask drivers to

behave, but not default No problem when set correctly

Manufacturers’ motivation: WinBench ~ 5% improvement

Video CardVideo CardMisbehavior DetailsMisbehavior Details

Don’t check if card FIFO full before write Eliminates one PCI read Stalls PCI bus if full to prevent overflow Uses “PCI disconnect” feature

For More InformationFor More Information

Systems and Networking Research Group web pages: http://research.microsoft.com/sn/

top related