under the hood: storage and advanced application development brian dewey dat406 group program...

23
Under the Hood: Storage Under the Hood: Storage and Advanced Application and Advanced Application Development Development Brian Dewey Brian Dewey DAT406 DAT406 Group Program Manager Group Program Manager Microsoft Corporation Microsoft Corporation

Upload: constance-black

Post on 17-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Under the Hood: Storage and Under the Hood: Storage and Advanced Application Advanced Application DevelopmentDevelopment

Brian DeweyBrian DeweyDAT406 DAT406 Group Program ManagerGroup Program ManagerMicrosoft CorporationMicrosoft Corporation

2

Advanced Storage Advanced Storage ApplicationsApplications

PlatformPlatformSolutionSolution

System Protection System Protection PointsPoints

TransactionsTransactions

Windows BackupWindows Backup

System RestoreSystem Restore

““Previous versions”Previous versions”

TrustworthyTrustworthy

Symbolic linksSymbolic linksNFSNFSCross-Cross-PlatformPlatform

Remote Differential Remote Differential CompressionCompression

DFSDFS

PC-PC Sync PC-PC Sync

Offline filesOffline files

DistributedDistributed

3

What Is RDC?What Is RDC?

An algorithm to copy just the An algorithm to copy just the differences between two files over a differences between two files over a networknetwork

A library and SDK for Windows that A library and SDK for Windows that let you add differential copy to your let you add differential copy to your applicationapplication

An awesome way to reduce An awesome way to reduce bandwidth used by distributed bandwidth used by distributed applicationsapplications

4

RDC In ActionRDC In Action

5

Overview Of The Core Overview Of The Core AlgorithmAlgorithm

The files are divided into variable length chunks based on their contentsThe files are divided into variable length chunks based on their contentsFingerprint function computed based on a trailing window of the file’s contents (H3 or Fingerprint function computed based on a trailing window of the file’s contents (H3 or Rabin)Rabin)Modifications affect only small number of chunksModifications affect only small number of chunks

Hashes (MD4) are computed for each chunk, on both server and clientHashes (MD4) are computed for each chunk, on both server and clientStrong signature for entire file deals with collisionsStrong signature for entire file deals with collisions

The server transmits the list of strong hashes to the clientThe server transmits the list of strong hashes to the clientFor large files, the algorithm is applied recursively at this pointFor large files, the algorithm is applied recursively at this point

The client assembles file by using existing chunks from old file and requesting The client assembles file by using existing chunks from old file and requesting missing chunksmissing chunks

Original file Updated file

MD4MD42121 … MD4 … MD42525

Fetch new chunks 3, 4Fetch new chunks 3, 4

Updated file

The quick fox jumped

over the lazy brown dog.

the fox jumping over him.

““The brown dog was” The brown dog was”

““so lazy that he …” so lazy that he …”

copy

Client Server

MD421

MD422

MD423

MD424

MD425

MD411

MD412

MD413

MD414

Request file Request file The quick fox

jumped

over the lazy brown dog.

The dog wasso lazy that he

didn’t notice

the fox jumping over him.

The quick fox jumped

over the lazy brown dog.

The brown dog was

the fox jumping over him.

so lazy that he didn’t notice

The quick fox jumped

over the lazy brown dog.

The brown dog was

the fox jumping over him.

so lazy that he didn’t notice

[use recursion] [use recursion]

6

RDC SampleRDC Sample

ServerServerServerServerClientClientClientClient

RdcSdkTestClient.exe RdcSdkTestServer.dll

Msrdc.dlMsrdc.dlll

Msrdc.dlMsrdc.dlll

DCOMDCOM

SourceSourceSeedSeed TargetTarget

Simplest application of RDC – no advanced Simplest application of RDC – no advanced trickstricks

Will be in Beta 2 SDK – not today’sWill be in Beta 2 SDK – not today’sContact Contact [email protected]@microsoft.com to get started to get started earlyearly

7

Overview Of The Core Overview Of The Core AlgorithmAlgorithm

The files are divided into variable length chunks based on The files are divided into variable length chunks based on their contentstheir contents

Fingerprint function computed based on a trailing window of the Fingerprint function computed based on a trailing window of the file’s contents (H3 or Rabin)file’s contents (H3 or Rabin)Modifications affect only small number of chunksModifications affect only small number of chunks

Hashes (MD4) are computed for each chunk, on both server Hashes (MD4) are computed for each chunk, on both server and clientand clientThe server transmits the list of strong hashes to the clientThe server transmits the list of strong hashes to the clientThe client assembles file by using existing chunks from old file The client assembles file by using existing chunks from old file and requesting missing chunksand requesting missing chunks

Original file Updated file

MD4MD42121 … MD4 … MD42525

Fetch new chunks 3, 4Fetch new chunks 3, 4

Updated file

The quick fox jumped

over the lazy brown dog.

the fox jumping over him.

““The brown dog was” The brown dog was”

““so lazy that he …” so lazy that he …”

copy

Client Server

MD421

MD422

MD423

MD424

MD425

MD411

MD412

MD413

MD414

Request file Request file The quick fox

jumped

over the lazy brown dog.

The dog wasso lazy that he

didn’t notice

the fox jumping over him.

The quick fox jumped

over the lazy brown dog.

The brown dog was

the fox jumping over him.

so lazy that he didn’t notice

The quick fox jumped

over the lazy brown dog.

The brown dog was

the fox jumping over him.

so lazy that he didn’t notice

[use recursion] [use recursion]

8

Tips For Using RDCTips For Using RDC

Target file verification using a strong Target file verification using a strong hash hash

Do regular dictionary compression Do regular dictionary compression on trafficon traffic

Cache signaturesCache signatures

Multiple seed filesMultiple seed files

Use recursionUse recursion

Batch needsBatch needs

Transfer multiple files in parallelTransfer multiple files in parallel

Tune, tune, tuneTune, tune, tune

9

Symbolic LinksSymbolic Links

Symbolic links let one file/directory Symbolic links let one file/directory transparently redirect to another transparently redirect to another file/directory file/directory by nameby name

Example: C:\public\link can refer to:Example: C:\public\link can refer to:C:\Users\BillG\DocumentsC:\Users\BillG\Documents – “absolute” – “absolute” linklink

..\..\Users\BillG\Documents..\..\Users\BillG\Documents – “relative” – “relative” linklink

\\machine\Users\BillG\Documents\\machine\Users\BillG\Documents -- -- “remote” link“remote” link

10

Symbolic Links In ActionSymbolic Links In Action

11

C:\Users\BillG\Documents\link\file.txtC:\Users\BillG\Documents\link\file.txtC:\Users\SteveB\Documents\SharedC:\Users\SteveB\Documents\Shared\file.txt\file.txt

Under the Hood: Symbolic Under the Hood: Symbolic LinksLinks

link -> C:\Users\SteveB\Documents\Sharedlink -> C:\Users\SteveB\Documents\Shared

(Stored as NTFS reparse point)(Stored as NTFS reparse point)

C:\Users\BillG\Documents\linkC:\Users\BillG\Documents\link

12

C:\Users\BillG\Documents\C:\Users\BillG\Documents\..\..\public\documents..\..\public\documents\\file.txtfile.txt

C:\Users\public\documents\file.txtC:\Users\public\documents\file.txt

Under the Hood: Symbolic Under the Hood: Symbolic LinksLinks

C:\Users\BillG\Documents\link\file.txtC:\Users\BillG\Documents\link\file.txtlink -> ..\..\public\link -> ..\..\public\documentsdocuments

linlinkk

13

Symbolic Link TipsSymbolic Link Tips

Know what functions work on the link and Know what functions work on the link and what work on the targetwhat work on the target

General rule: “Data” operations work on the General rule: “Data” operations work on the target, “metadata” on the linktarget, “metadata” on the link

Data: Open, copy, modifyData: Open, copy, modifyMetadata: Delete, renameMetadata: Delete, rename

Currently documented on MSDNCurrently documented on MSDN

Use relative symbolic links when you copy Use relative symbolic links when you copy subtreessubtreesRemember: Links evaluated on the clientRemember: Links evaluated on the clientBe Be very carefulvery careful with name parsing with name parsing

\\machine\public\directory\file.txt\\machine\public\directory\file.txt might not be might not be a file on a file on \\machine\\machine... ...

14

System Protection PointsSystem Protection Points

Extension of “shadow copy” Extension of “shadow copy” technology from Windows Server technology from Windows Server 20032003

Shadow copiesShadow copies are copies of all data on are copies of all data on disk frozen at a point in timedisk frozen at a point in time

Copy-on-write ensures minimal physical Copy-on-write ensures minimal physical disk space consumptiondisk space consumption

Single disk shadow copy with multiple Single disk shadow copy with multiple usesuses

System RestoreSystem Restore

““Safe documents”Safe documents”

““Safe system”Safe system”

Single document recoverySingle document recovery

15

System Protection PointsSystem Protection PointsIn ActionIn Action

16

System Protection Point System Protection Point TipsTips

Shadow copies may be reclaimed at Shadow copies may be reclaimed at any timeany time

If you find an old version of data that is If you find an old version of data that is important, you must copy it to the “live” important, you must copy it to the “live” volume if you want to keep it!volume if you want to keep it!

Shadow copies are read-onlyShadow copies are read-only

17

Transactions In LonghornTransactions In Longhorn

Simple way to add data reliability to Simple way to add data reliability to your applicationyour application

Can transact updates to file system Can transact updates to file system and registryand registry

Transactions can be coordinated with Transactions can be coordinated with databases and with other machinesdatabases and with other machines

File system transactions work File system transactions work remotelyremotely

18

Transactions In ActionTransactions In Action

19

Transactions Under The Transactions Under The HoodHood

OLE <-> Win32

Win32

Kernel Mode Log APIs

Win32 Log APIs

Common Log Manager

Kernel Mode Trans Mgr

OLE <-> Win32

Win32

Kernel Mode Log APIs

Win32 Log APIs

Common Log Manager

Kernel Mode Trans Mgr

Kernel Mode TM APIs

Kernel Mode Log APIs

Win32 I/O APIs

Transactional

NTFS

Win32 I/O APIs

Transactional

NTFS

IRP Based

DTC & COM+

20

Transactional NTFS TipsTransactional NTFS Tips

Transactions lock the entire fileTransactions lock the entire file

Keep transaction lifespan shortKeep transaction lifespan short

Transactions are all-or-nothingTransactions are all-or-nothingConsider intermediate checkpoints for Consider intermediate checkpoints for operations like bulk file copiesoperations like bulk file copies

You can’t rollback once you commitYou can’t rollback once you commit

21

Where To Learn MoreWhere To Learn More

Transactions: FUN320 (Thursday, Transactions: FUN320 (Thursday, 5:15)5:15)

Sync Center: DAT317 (Thursday, Sync Center: DAT317 (Thursday, 10:00)10:00)

RDC: [email protected]: [email protected]

22

Call to ActionCall to Action

Platform FeaturePlatform Feature AvailabilityAvailability

DistributedDistributed Remote Differential Remote Differential CompressionCompression

Beta 2Beta 2

Cross-Cross-PlatformPlatform

Symbolic linksSymbolic links TodayToday

TrustworthyTrustworthy System Protection System Protection PointsPoints

TransactionsTransactions

TodayToday

TodayToday

Fill out your evaluation!Fill out your evaluation!

© 2005 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.