wikipedia: amazon web services (abbreviated aws) is a collection of remote computing services (also...

13
Wikipedia: Amazon Web Services Wikipedia: Amazon Web Services (abbreviated (abbreviated AWS AWS) is a collection of ) is a collection of remote computing services (also called services (also called web services ) that together make up a ) that together make up a cloud computing platform, offered over the Internet by platform, offered over the Internet by Amazon.com . The most central and well-known of these services are . The most central and well-known of these services are Amazon EC2 and and Amazon S3 . The . The service is advertised as providing a large computing capacity (potentially many servers) much faster service is advertised as providing a large computing capacity (potentially many servers) much faster and cheaper than building a physical server farm. and cheaper than building a physical server farm.[2] AWS is located in 8 geographical "regions": US East ( AWS is located in 8 geographical "regions": US East (Northern Virginia ), where the majority of AWS servers ), where the majority of AWS servers are based, are based,[3] US West ( US West (northern California ), US ), US West ( West (Oregon ), Brazil ( ), Brazil (São Paulo ), Europe ( ), Europe (Ireland ), South Asia ( ), South Asia (Singapore ), ), East Asia ( East Asia (Tokyo ), and Australia ( ), and Australia (Sydney ). ). List of products List of products Compute Compute Amazon Elastic Compute Cloud (EC2) provides scalable virtual private servers using (EC2) provides scalable virtual private servers using Xen . . Amazon Elastic MapReduce (EMR) allows developers et al to easily and cheaply process bigdata, using a hosted (EMR) allows developers et al to easily and cheaply process bigdata, using a hosted Hadoop on on EC2 and and Amazon S3 . . Networking Networking Amazon Route 53 provides a highly available and scalable Domain Name System (DNS) web service. provides a highly available and scalable Domain Name System (DNS) web service. Amazon Virtual Private Cloud (VPC) creates a logically isolated set of Amazon EC2 instances which can be (VPC) creates a logically isolated set of Amazon EC2 instances which can be connected to an existing network using a connected to an existing network using a VPN . . AWS Direct Connect provides dedicated network connections into AWS data centers, providing faster and cheaper AWS Direct Connect provides dedicated network connections into AWS data centers, providing faster and cheaper data throughput. data throughput. Content delivery Content delivery Amazon CloudFront , a , a content delivery network (CDN) for distributing objects to so-called (CDN) for distributing objects to so-called "edge locations" near the requester. "edge locations" near the requester. Storage and content delivery Storage and content delivery Amazon Simple Storage Service (S3) provides Web Service based storage. (S3) provides Web Service based storage. Amazon Glacier provides a low-cost, long-term storage option (compared to S3). High redundancy/availability, provides a low-cost, long-term storage option (compared to S3). High redundancy/availability, but low-frequent access times. Ideal for archiving. but low-frequent access times. Ideal for archiving. AWS Storage Gateway, an iSCSI block storage virtual appliance with cloud-based backup. AWS Storage Gateway, an iSCSI block storage virtual appliance with cloud-based backup. Amazon Elastic Block Store (EBS) provides persistent block-level storage volumes for EC2. (EBS) provides persistent block-level storage volumes for EC2. AWS Import/Export, accelerates moving large amounts of data into and out of AWS using portable storage AWS Import/Export, accelerates moving large amounts of data into and out of AWS using portable storage devices for transport. devices for transport. Database Database Amazon DynamoDB provides a scalable, low-latency NoSQL online Database Service backed by provides a scalable, low-latency NoSQL online Database Service backed by SSDs . . Amazon ElastiCache provides in-memory caching for web applications. This is Amazon's implementation of Amazon ElastiCache provides in-memory caching for web applications. This is Amazon's implementation of Memcached and and Redis . . Amazon Relational Database Service (RDS) provides a scalable (RDS) provides a scalable database server with server with MySQL , , Informix ,[20] Oracle , , SQL Server , and , and PostgreSQL support. support.[21] Amazon Redshift provides petabyte-scale data warehousing with column-based storage and multi-node compute. provides petabyte-scale data warehousing with column-based storage and multi-node compute. Amazon SimpleDB allows developers to run queries on structured data. It operates in concert with EC2 and S3 allows developers to run queries on structured data. It operates in concert with EC2 and S3 to provide "the core functionality of a database". to provide "the core functionality of a database". AWS Data Pipeline provides reliable service for data transfer between different AWS compute and storage AWS Data Pipeline provides reliable service for data transfer between different AWS compute and storage services services Amazon Kinesis, streams data in real time with the ability to process thousands of data streams on a per- Amazon Kinesis, streams data in real time with the ability to process thousands of data streams on a per- second basis. second basis. Deployment Deployment Amazon CloudFormation provides a file-based interface for provisioning other AWS resources. Amazon CloudFormation provides a file-based interface for provisioning other AWS resources. AWS Elastic Beanstalk provides quick deployment and management of applications in the cloud. provides quick deployment and management of applications in the cloud. AWS OpsWorks for configuration of EC2 services using AWS OpsWorks for configuration of EC2 services using Chef . .

Upload: tyrone-butler

Post on 13-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Wikipedia: Amazon Web ServicesWikipedia: Amazon Web Services (abbreviated (abbreviated AWSAWS) is a collection of ) is a collection of remote computing services (also called services (also called web services) that together make up a ) that together make up a cloud computing platform, offered over the Internet by platform, offered over the Internet by Amazon.com. The most central and well-known of these services are . The most central and well-known of these services are Amazon EC2 and and Amazon S3. The service is advertised as providing a large computing capacity (potentially many servers) much faster and cheaper than building a . The service is advertised as providing a large computing capacity (potentially many servers) much faster and cheaper than building a physical server farm.physical server farm.[2]

AWS is located in 8 geographical "regions": US East (AWS is located in 8 geographical "regions": US East (Northern Virginia), where the majority of AWS servers are based,), where the majority of AWS servers are based,[3] US West ( US West (northern California), US ), US West (West (Oregon), Brazil (), Brazil (São Paulo), Europe (), Europe (Ireland), South Asia (), South Asia (Singapore), East Asia (), East Asia (Tokyo), and Australia (), and Australia (Sydney

). ). List of productsList of productsComputeCompute Amazon Elastic Compute Cloud (EC2) provides scalable virtual private servers using (EC2) provides scalable virtual private servers using Xen. . Amazon Elastic MapReduce (EMR) allows developers et al to easily and cheaply process bigdata, using a hosted (EMR) allows developers et al to easily and cheaply process bigdata, using a hosted Hadoop on on EC2 and and Amazon S3. . Networking Networking Amazon Route 53 provides a highly available and scalable Domain Name System (DNS) web service. provides a highly available and scalable Domain Name System (DNS) web service. Amazon Virtual Private Cloud (VPC) creates a logically isolated set of Amazon EC2 instances which can be connected to an existing network using a (VPC) creates a logically isolated set of Amazon EC2 instances which can be connected to an existing network using a VPN. . AWS Direct Connect provides dedicated network connections into AWS data centers, providing faster and cheaper data throughput. AWS Direct Connect provides dedicated network connections into AWS data centers, providing faster and cheaper data throughput. Content deliveryContent delivery Amazon CloudFront, a , a content delivery network (CDN) for distributing objects to so-called "edge locations" near the requester. (CDN) for distributing objects to so-called "edge locations" near the requester. Storage and content deliveryStorage and content delivery Amazon Simple Storage Service (S3) provides Web Service based storage. (S3) provides Web Service based storage. Amazon Glacier provides a low-cost, long-term storage option (compared to S3). High redundancy/availability, but low-frequent access times. Ideal for provides a low-cost, long-term storage option (compared to S3). High redundancy/availability, but low-frequent access times. Ideal for

archiving. archiving. AWS Storage Gateway, an iSCSI block storage virtual appliance with cloud-based backup. AWS Storage Gateway, an iSCSI block storage virtual appliance with cloud-based backup. Amazon Elastic Block Store (EBS) provides persistent block-level storage volumes for EC2. (EBS) provides persistent block-level storage volumes for EC2. AWS Import/Export, accelerates moving large amounts of data into and out of AWS using portable storage devices for transport. AWS Import/Export, accelerates moving large amounts of data into and out of AWS using portable storage devices for transport. Database Database Amazon DynamoDB provides a scalable, low-latency NoSQL online Database Service backed by provides a scalable, low-latency NoSQL online Database Service backed by SSDs. . Amazon ElastiCache provides in-memory caching for web applications. This is Amazon's implementation of Amazon ElastiCache provides in-memory caching for web applications. This is Amazon's implementation of Memcached and and Redis. . Amazon Relational Database Service (RDS) provides a scalable (RDS) provides a scalable database server with server with MySQL, , Informix,,[20] Oracle, , SQL Server, and , and PostgreSQL support. support.[21] Amazon Redshift provides petabyte-scale data warehousing with column-based storage and multi-node compute. provides petabyte-scale data warehousing with column-based storage and multi-node compute. Amazon SimpleDB allows developers to run queries on structured data. It operates in concert with EC2 and S3 to provide "the core functionality of a database". allows developers to run queries on structured data. It operates in concert with EC2 and S3 to provide "the core functionality of a database". AWS Data Pipeline provides reliable service for data transfer between different AWS compute and storage services AWS Data Pipeline provides reliable service for data transfer between different AWS compute and storage services Amazon Kinesis, streams data in real time with the ability to process thousands of data streams on a per-second basis. Amazon Kinesis, streams data in real time with the ability to process thousands of data streams on a per-second basis. Deployment Deployment Amazon CloudFormation provides a file-based interface for provisioning other AWS resources. Amazon CloudFormation provides a file-based interface for provisioning other AWS resources. AWS Elastic Beanstalk provides quick deployment and management of applications in the cloud. provides quick deployment and management of applications in the cloud. AWS OpsWorks for configuration of EC2 services using AWS OpsWorks for configuration of EC2 services using Chef. . Management Management Amazon Identity and Access Management (IAM), an implicit service, the authentication infrastructure used to authenticate access to services. (IAM), an implicit service, the authentication infrastructure used to authenticate access to services. Amazon CloudWatch, provides monitoring for AWS cloud resources and applications, starting with EC2. , provides monitoring for AWS cloud resources and applications, starting with EC2. AWS Mgmt Console: web-based point/click interface to manage/monitor AWS Mgmt Console: web-based point/click interface to manage/monitor EC2, , EBS, , S3, , SQS, , Amazon Elastic MapReduce, and , and Amazon CloudFront. . Application servicesApplication servicesAmazon CloudSearch provides basic full-text search and indexing of textual content. Amazon CloudSearch provides basic full-text search and indexing of textual content. Amazon DevPay, currently in limited Amazon DevPay, currently in limited beta version, is a billing and account management system for applics that developers have built atop Amazon Web , is a billing and account management system for applics that developers have built atop Amazon Web

Services. Services. Amazon Elastic Transcoder (ETS) provides video transcoding of S3 hosted videos, marketed primarily as a way to convert source files into mobile-ready Amazon Elastic Transcoder (ETS) provides video transcoding of S3 hosted videos, marketed primarily as a way to convert source files into mobile-ready

versions. versions. Amazon Flexible Payments Service (FPS) provides an interface for (FPS) provides an interface for micropayments. . Amazon Simple Email Service (SES) provides bulk and transactional email sending. (SES) provides bulk and transactional email sending. Amazon Simple Queue Service (SQS) provides a hosted message queue for web applications. (SQS) provides a hosted message queue for web applications. Amazon Simple Notification Service (SNS) provides a hosted multi-protocol "Amazon Simple Notification Service (SNS) provides a hosted multi-protocol "push" messaging for applications. " messaging for applications. Amazon Simple Workflow (SWF) is a workflow service for building scalable, resilient applications.Amazon Simple Workflow (SWF) is a workflow service for building scalable, resilient applications.

Wikipedia: Wikipedia: Amazon S3 (Simple Storage Service) Amazon S3 (Simple Storage Service) online file storage web service by by Amazon Web Services provides storage thru ( provides storage thru (REST, , SOAP, , BitTorrent

Amazon S3 is reported to store more than 2 trillion objects as of April 2013 Amazon S3 is reported to store more than 2 trillion objects as of April 2013

Details of S3's design are not made public by Amazon, though it clearly manages data with an Details of S3's design are not made public by Amazon, though it clearly manages data with an object storage architecture. According to Amazon, S3's design architecture. According to Amazon, S3's design aims to provide aims to provide scalability, , high availability, and , and low latency at at commodity costs. costs.

S3 stores arbitrary objects (S3 stores arbitrary objects (computer files) up to 5 ) up to 5 terabytes in size, each accompanied by up to 2 in size, each accompanied by up to 2 kilobytes of of metadata. Objects are organized into . Objects are organized into bucketsbuckets (each owned by an (each owned by an Amazon Web Services account), and identified within each bucket by a unique, user-assigned key. account), and identified within each bucket by a unique, user-assigned key.

Buckets and objects can be created, listed, and retrieved using either a Buckets and objects can be created, listed, and retrieved using either a REST-style -style HTTP interface or a interface or a SOAP interface. Requests are authorized using an interface. Requests are authorized using an access control list associated with each bucket and object. associated with each bucket and object.

Bucket names and keys are chosen so that objects are addressable using HTTP Bucket names and keys are chosen so that objects are addressable using HTTP URLs: http://s3.amazonaws.com/: http://s3.amazonaws.com/bucketbucket//keykey http://http://bucketbucket.s3.amazonaws.com/.s3.amazonaws.com/keykey

http://http://bucketbucket//keykey (where (where bucketbucket is a is a DNS CNAME record pointing to bucket.s3.amazonaws.com) CNAME record pointing to bucket.s3.amazonaws.com) Because objects are accessible by unmodified HTTP clients, S3 can be used to replace significant existing (static) Because objects are accessible by unmodified HTTP clients, S3 can be used to replace significant existing (static) web hosting infrastructure. infrastructure.[16] Someone can construct a URL that can be handed off to a third-party for access for a period such as the next 30 minutes, or the next 24 hours.Someone can construct a URL that can be handed off to a third-party for access for a period such as the next 30 minutes, or the next 24 hours.Every item in a bucket can also be served up as a Every item in a bucket can also be served up as a BitTorrent feed. The S3 store can act as a seed host for a torrent and any BitTorrent client can retrieve the file. feed. The S3 store can act as a seed host for a torrent and any BitTorrent client can retrieve the file.

This drastically reduces the bandwidth costs for the download of popular objects. This drastically reduces the bandwidth costs for the download of popular objects. Amazon S3 provides options to host static websites with Index document support and error document support.Amazon S3 provides options to host static websites with Index document support and error document support.[19]Photo hosting service Photo hosting service SmugMug has used S3 since April 2006. They experienced a number of initial outages and slowdowns, but after one year they described it has used S3 since April 2006. They experienced a number of initial outages and slowdowns, but after one year they described it

as being "considerably more reliable than our own internal storage" and claimed to have saved almost $1 million in storage costs.as being "considerably more reliable than our own internal storage" and claimed to have saved almost $1 million in storage costs. [21]There are various There are various User Mode File System (FUSE)-based file systems for Unix-like operating systems (Linux, etc.) that can be used to mount an S3 bucket as a -based file systems for Unix-like operating systems (Linux, etc.) that can be used to mount an S3 bucket as a

file system. Note that as the semantics of the S3 file system are not that of a file system. Note that as the semantics of the S3 file system are not that of a Posix file system, the file system may not behave entirely as expected. file system, the file system may not behave entirely as expected.[22]Apache Hadoop file systems can be hosted on S3, as its requirements of a file system are met by S3. As a result, Hadoop can be used to run Apache Hadoop file systems can be hosted on S3, as its requirements of a file system are met by S3. As a result, Hadoop can be used to run MapReduce

algorithms on EC2 servers, reading data and writing results back to S3.algorithms on EC2 servers, reading data and writing results back to S3.Dropbox,Dropbox,[23] Bitcasa,,[24] StoreGrid, , Jumpshare, SyncBlaze,, SyncBlaze,[25] Tahoe-LAFS-on-S3,-on-S3,[26] Zmanda and and Ubuntu One,,[27] Fiabee are some of the many online are some of the many online

backup and synchronization services that use S3 as their storage and transfer facility.backup and synchronization services that use S3 as their storage and transfer facility.Minecraft hosts game updates and player skins on the S3 servers.Minecraft hosts game updates and player skins on the S3 servers.[28]Tumblr, Tumblr, Formspring, , Pinterest, and , and Posterous images are hosted on the S3 servers. images are hosted on the S3 servers.Alfresco (software) the OpenSource Alfresco (software) the OpenSource Enterprise Content Management provider are hosting data for the Alfresco in the cloud service on S3. provider are hosting data for the Alfresco in the cloud service on S3.LogicalDOC, the open source LogicalDOC, the open source Document management system provides a tool for disaster recovery based on S3. provides a tool for disaster recovery based on S3.S3 was used in the past by some enterprises as a long term archiving solution, until S3 was used in the past by some enterprises as a long term archiving solution, until Amazon Glacier was released. was released.Examples of competing S3 compliant storage implementations include:Examples of competing S3 compliant storage implementations include:Google Cloud Storage Cloud.com’s CloudStackCloud.com’s CloudStack[31] Cloudian, Inc.[32] an S3-compatible object storage software package. an S3-compatible object storage software package. Connectria Cloud StorageConnectria Cloud Storage[33][34] in 2011 became the first US cloud storage service provider based on the Scality RING organic storage technology in 2011 became the first US cloud storage service provider based on the Scality RING organic storage technology[35][36] Eucalyptus Nimbula (acquired by (acquired by Oracle) ) Riak CS, CS,[37] which implements a subset of the S3 API including REST and ACLs on objects and buckets. which implements a subset of the S3 API including REST and ACLs on objects and buckets. Ceph with RADOS gateway.[38] Ceph with RADOS gateway.[38] Caringo, Inc which implements the S3 API and extends Bucket Policies to include the addition of Domain Policies. Caringo, Inc which implements the S3 API and extends Bucket Policies to include the addition of Domain Policies.

From: Arjun Roy I did some tests to compare C/C++ Vs C# on some basic op to see if compilers From: Arjun Roy I did some tests to compare C/C++ Vs C# on some basic op to see if compilers handle cases differently.  I executed Dr. Wettstein's code in libPree library on Linux handle cases differently.  I executed Dr. Wettstein's code in libPree library on Linux Ubuntu 12.04 and my C# code on Windows 7. 4 GB RAM.Ubuntu 12.04 and my C# code on Windows 7. 4 GB RAM.

OR              1 million       1.1 sec         0.1013 millisecOR              1 million       1.1 sec         0.1013 millisecOR              10 million      23.33 sec       1.016 millisecOR              10 million      23.33 sec       1.016 millisecOR              100 million     266.38 sec      10.4494 millisecOR              100 million     266.38 sec      10.4494 millisecAND             1 million       0.99 sec        0.0989 millisecAND             1 million       0.99 sec        0.0989 millisecAND             10 million      23.29 sec       1.0235 millisecAND             10 million      23.29 sec       1.0235 millisecAND             100 million     279.92 sec      10.6166 millisecAND             100 million     279.92 sec      10.6166 millisecNumber of 1's   1 million       0.49 sec        0.9647 millisecNumber of 1's   1 million       0.49 sec        0.9647 millisecNumber of 1's   10 million      5.11 sec        6.6821 millisecNumber of 1's   10 million      5.11 sec        6.6821 millisecNumber of 1's   100 million     55.73 sec       57.8274 millisecNumber of 1's   100 million     55.73 sec       57.8274 millisec3 run replicate of a 100M bit PTree OR op done with 32-bit code (gcc 4.3.4) with -O2 opt3 run replicate of a 100M bit PTree OR op done with 32-bit code (gcc 4.3.4) with -O2 opt

PTree OR test: bitcount/iteration = 100000000/1 Count time: 60000 ticks [0.060000 PTree OR test: bitcount/iteration = 100000000/1 Count time: 60000 ticks [0.060000 secs.], Elapsed: 0secs.], Elapsed: 0real    0m0.076sreal    0m0.076suser    0m0.048suser    0m0.048ssys     0m0.028ssys     0m0.028sPTree OR test: bitcount/iteration = 100000000/1 Count time: 60000 ticks [0.060000 PTree OR test: bitcount/iteration = 100000000/1 Count time: 60000 ticks [0.060000 secs.], Elapsed: 0secs.], Elapsed: 0real    0m0.076sreal    0m0.076suser    0m0.048suser    0m0.048ssys     0m0.024ssys     0m0.024sPTree OR test: bitcount/iteration = 100000000/1 Count time: 70000 ticks [0.070000 PTree OR test: bitcount/iteration = 100000000/1 Count time: 70000 ticks [0.070000 secs.], Elapsed: 0secs.], Elapsed: 0real    0m0.076sreal    0m0.076suser    0m0.040suser    0m0.040ssys     0m0.032ssys     0m0.032s

So a 100 million bit OR op on my workstation, which is somewhat dated So a 100 million bit OR op on my workstation, which is somewhat dated in the tooth now, consistently yields about a 76 ms operation in the tooth now, consistently yields about a 76 ms operation time.   Doing the math suggests an effective applic memory time.   Doing the math suggests an effective applic memory bandwidth of 178.5 megabytes/second. Put in another frame bandwidth of 178.5 megabytes/second. Put in another frame of reference this suggests that my workstation could easily of reference this suggests that my workstation could easily saturate a 1 gigabit/second LAN connection with a streaming saturate a 1 gigabit/second LAN connection with a streaming PTree computation. I also ran a 3-tuple of a count test just to PTree computation. I also ran a 3-tuple of a count test just to make sure that it was consistent:make sure that it was consistent:

PTree count test: bitcount/iteration/population 100000000/1/500000 PTree count test: bitcount/iteration/population 100000000/1/500000 Count time: 110000 ticks [0.110000 secs.], Elapsed: 0Count time: 110000 ticks [0.110000 secs.], Elapsed: 0real    0m0.122sreal    0m0.122suser    0m0.112suser    0m0.112ssys     0m0.008ssys     0m0.008s

PTree count test: bitcount/iteration/population 100000000/1/500000 PTree count test: bitcount/iteration/population 100000000/1/500000 Count time: 120000 ticks [0.120000 secs.], Elapsed: 0Count time: 120000 ticks [0.120000 secs.], Elapsed: 0real    0m0.123sreal    0m0.123suser    0m0.120suser    0m0.120ssys     0m0.000ssys     0m0.000s

PTree count test: bitcount/iteration/population 100000000/1/500000 PTree count test: bitcount/iteration/population 100000000/1/500000 Count time: 110000 ticks [0.110000 secs.], Elapsed: 0Count time: 110000 ticks [0.110000 secs.], Elapsed: 0real    0m0.121sreal    0m0.121suser    0m0.112suser    0m0.112ssys     0m0.008ssys     0m0.008s

The above test reflects the amount of time required to root count a PTree with 100 million entries with 500,000 randomly populated one bits. This test suggests a count time of The above test reflects the amount of time required to root count a PTree with 100 million entries with 500,000 randomly populated one bits. This test suggests a count time of 120 milli-seconds for an effective application memory bandwidth rate of 109.2 megabytes/second.  Just under what it would take to saturate a LAN conection.120 milli-seconds for an effective application memory bandwidth rate of 109.2 megabytes/second.  Just under what it would take to saturate a LAN conection.

This test brings up an additional question that I would have with respect to the validity of the testing environment.  This test uses the default PTree counting implementation This test brings up an additional question that I would have with respect to the validity of the testing environment.  This test uses the default PTree counting implementation which uses word by word byte scanning with a 256 entry bitcount lookup table.  This means we impose an overhead of 12,500,000 memory references for the table which uses word by word byte scanning with a 256 entry bitcount lookup table.  This means we impose an overhead of 12,500,000 memory references for the table lookup and an additional 9,375,000 unaligned memory references for the three byte pointer lookups per 32-bit word.lookup and an additional 9,375,000 unaligned memory references for the three byte pointer lookups per 32-bit word.

PTree operations are by definition cache pessimal as we have discussed previously.  Lookup tables are also notorious for their cache pressure characteristics.  It is always PTree operations are by definition cache pessimal as we have discussed previously.  Lookup tables are also notorious for their cache pressure characteristics.  It is always difficult to predict how the hardware prefetch logic handles this but at a minimum the lookup table is going to bounce across two L1 cache-lines.difficult to predict how the hardware prefetch logic handles this but at a minimum the lookup table is going to bounce across two L1 cache-lines.

Given this, it is somewhat unusual in their test results, that the '1 counting' was five times faster then the basic AND/OR ops.Given this, it is somewhat unusual in their test results, that the '1 counting' was five times faster then the basic AND/OR ops.There are multiple '1 counting' implementations in the source code but they would have had to select those.  Included in those counting implementations were reduction There are multiple '1 counting' implementations in the source code but they would have had to select those.  Included in those counting implementations were reduction

counting and an variant which unrolls the backing array onto the MMX vector registers.  Those strategies provided incremental performance improvements but counting and an variant which unrolls the backing array onto the MMX vector registers.  Those strategies provided incremental performance improvements but nothing on the order of a 5-fold increase in counting throughput.nothing on the order of a 5-fold increase in counting throughput.

With respect to memory optimization PTree operations, I experimented with non-temporal loads and pre-fetch hinting and was never able demonstrate significant performance With respect to memory optimization PTree operations, I experimented with non-temporal loads and pre-fetch hinting and was never able demonstrate significant performance increases.  As we have discussed many times, PTree technology and data mining in general, is ultimately constrained by memory latency.increases.  As we have discussed many times, PTree technology and data mining in general, is ultimately constrained by memory latency.

So that would be my spin on the results.So that would be my spin on the results.The C# test results are approximately consistent with expected memory bandwidth speeds.  I took a few minutes and ran the 100 million count test on one of the nodes in our The C# test results are approximately consistent with expected memory bandwidth speeds.  I took a few minutes and ran the 100 million count test on one of the nodes in our

'chicken cruncher' and I am seeing around 20 milliseconds which is about twice as fast as their C# implementation which is reasonable given hdwre/OS/compiler-'chicken cruncher' and I am seeing around 20 milliseconds which is about twice as fast as their C# implementation which is reasonable given hdwre/OS/compiler-interpreter differences.interpreter differences.

Client Data Mining: How It WorksClient Data Mining: How It Works: PRISM, XKeyscore, and plenty more classified info in the client's vast surveillance program has been in the light. How : PRISM, XKeyscore, and plenty more classified info in the client's vast surveillance program has been in the light. How much data is there? How does the government sort through it? What are they learning about you? Here's a guide.much data is there? How does the government sort through it? What are they learning about you? Here's a guide.

Some Interesting Articles Some Interesting Articles We should know as much as we can about what "the client" is doing in order to best help them do it. From Popular Mechanics:We should know as much as we can about what "the client" is doing in order to best help them do it. From Popular Mechanics:

Most people were introduced to the arcane world of data miningMost people were introduced to the arcane world of data mining by a security breach revealing the government gathers billions of pieces of data—phone by a security breach revealing the government gathers billions of pieces of data—phone calls, emails, photos, and videos—from Google, Facebook, Microsoft, and other communications giants, then combs through the information for leads calls, emails, photos, and videos—from Google, Facebook, Microsoft, and other communications giants, then combs through the information for leads on national security threats. Here's a guide to big-data mining, NSA-styleon national security threats. Here's a guide to big-data mining, NSA-style

The Information Landscape How much data do we produce? An IBM study estimates 2.5 quintillion bytes ( 2.5 1018 ) of data every day. (If these data bytes were pennies laid out flat, they would blanket the earth five times.) That total includes stored information — photos, videos, social-media posts, word-processing files, phone-call records, financial records, and results from science experiments —and data that normally exists for mere moments, such as phone-call content and Skype chats

Veins of Useful Information Digital info can be analyzed to establish connections between people, and these links can generate investigative leads. But in order to examine data, it has to be collected-from everyone. As the data-mining saying goes: To find a needle in a haystack, you 1 st need to build a

haystack Data Has to Be Tagged Before It's Bagged Data mining relies on metadata tags that enable algorithms to identify connections. Metadata is data about data—

e.g., the names and sizes of files. The label placed on data is called a tag. Tagging data enables analysts to classify and organize the info so it can be searched and processed. Tagging also enables analysts to parse the info without examining the contents. This is an important legal point because the communications of U.S. citizens and lawful permanent resident aliens cannot be examined without a warrant. Metadata can!

Finding Patterns in the Noise The data-analysis firm IDC estimates that only 3 percent of the info in the digital universe is tagged when it's created, so the client has a sophisticated software program that puts billions of metadata markers on the info it collects. These tags are the backbone of any system that makes links among different kinds of data—such as video, documents, and phone records. For example, data mining could call attention to a suspect on a watch list who downloads terrorist propaganda, visits bomb-making websites, and buys a pressure cooker. (This pattern matches behavior of the Tsarnaev brothers, who are accused of planting bombs at the Boston Marathon.) This tactic assumes terrorists have well-defined data profiles—something many security experts doubt

Open Source and Top SecretOpen Source and Top Secret AccumuloAccumulo was designed precisely was designed precisely for tagging billions of pieces of unorganized, disparate datafor tagging billions of pieces of unorganized, disparate data. The custom tool isbased on . The custom tool isbased on Google programming and is open-source. Sqrrl commercialized it and hopes healthcare/finance industries will use it to manage their own big-data Google programming and is open-source. Sqrrl commercialized it and hopes healthcare/finance industries will use it to manage their own big-data setssets

The Miners: Who Does What The client is authorized to snoop on foreign communications and also collects a vast amount of data—trillions of pieces of communication generated by people across the globe. Itdoes not chase the crooks, terrorists, and spies it identifies; it sifts info on behalf of other government players such as the Pentagon, CIA, and FBI. Here are the basic steps:

1. A judge on a secret Foreign Intelligence Surveillance (FISA) Court gets an app from an agency to authorize a search of data collected by the Client.2. Once authorized the requests go to the FBI's Electronic Comms Surveillance Unit (ECSU), a legal safeguard - they review to ensure no US citizens targeted. 3. The ECSU passes appropriate requests to the FBI Data Intercept Technology Unit, which obtains the info from Internet company servers and then passes

it to the client to be examined. (Many companies have denied they open their servers. As of press time, it's not clear who is correct.) 4. The client then passes relevant information to the government agency that requested it

What is the client up To? Phone-Metadata Mining Dragged Into the Light: The controversy began when it was revealed that the U.S. government was collecting the phone-metadata records of every Verizon customer—including millions of Americans. At the request of the FBI, FISA Court judge Roger Vinson issued an order compelling the company to hand over its phone records. The content of the calls was not collected, but national security officials call it "an early warning system" for detecting terror plots.

PRISM Goes Public: Every collection platform or source of raw intelligence is given a name, called a Signals Intelligence Activity Designator (SIGAD), and a code name. SIGAD US-984XN is better known by its code name: PRISM.

PRISM involves the collection of digital photos, stored data, file transfers, emails, chats, videos, and video conferencing from nine Internet companies. U.S. officials say this tactic helped snare Khalid Ouazzani, a naturalized U.S. citizen who the FBI claimed was plotting to blow up the NY Stock Exc..

Mining Data as It's Created: Our client also operates real-time surveillance tools. Analysts can receive "real-time notification of an email event such as a login or sent message" and "real-time notification of a chat login". Whether real-time info can stop unprecedented attacks is subject to debate. Alerting a credit-card holder of sketchy purchases in real time is easy; building a reliable model of an impending attack in real time is infinitely harder

XKeyscoreXKeyscore is software that can is software that can search hundreds of databases for leadssearch hundreds of databases for leads . It enables low-level analysts to access communications w/o oversight, . It enables low-level analysts to access communications w/o oversight, circumventing the checks/balances of FISA court. The client vehemently deny this, and the documents don't indicate any misuse. The seems to be a circumventing the checks/balances of FISA court. The client vehemently deny this, and the documents don't indicate any misuse. The seems to be a powerful tool that allows analysts to find hidden links inside troves of info. "My target speaks German but is in Pakistan—how can I find him?" "My powerful tool that allows analysts to find hidden links inside troves of info. "My target speaks German but is in Pakistan—how can I find him?" "My target uses Google Maps to scope target locations—can I use this info to determine his email address?" This program enables analysts to submit one target uses Google Maps to scope target locations—can I use this info to determine his email address?" This program enables analysts to submit one query to search 700 servers around the world at once, combing disparate sources to find the answers to these questionsquery to search 700 servers around the world at once, combing disparate sources to find the answers to these questions

How Far Can the Data Stretch?: Oops—False Positives: Bomb-sniffing dogs sometimes bark at explosives that are not there. This kind of mistake is called a false positive. In data mining, the equivalent is a computer program sniffing around a data set and coming up with the wrong conclusion. This is when having a massive data set may be a liability. When a program examines trillions of connections between potential targets, even a very small false-positive rate 10K of dead-end leads that agents must chase down—not to mention the unneeded incursions into innocent people's lives

Analytics to See the Future; Ever wonder where those Netflix recommendations or suggested reading lists on Amazon come from? Your previous interests directed an algorithm to pitch those products to you. Big companies believe more of this kind of targeted marketing will boost sales

and reduce costs. For example, this year Walmart bought a predictive analytics startup called Inkiru. They make software that crunches data to help retailers develop marketing campaigns that target shoppers when they are most likely to buy certain products

Pattern Recognition or Prophecy? In 2011 British researchers created a game that simulated a van-bomb plot, and 60% of "terrorist" players were spotted by a program called DScent, based on their "purchases" and "visits" to the target site. The ability of a computer to automatically match security-camera footage with records of purchases may seem like a dream to law-enforcement agents trying to save lives, but it's the kind of ubiquitous tracking that alarms civil libertarians.

Client's Red Team Secret Operations with the Client's Red Team Secret Operations with the Government's Top HackersGovernment's Top Hackers By By Glenn DereneGlenn Derene When it comes When it comes to the to the U.S. government's computer security, we U.S. government's computer security, we in the tech press have a habit of in the tech press have a habit of reporting only the bad news—for reporting only the bad news—for instance, last year's hacks into instance, last year's hacks into Oak Ridge and and Los Alamos National Labs, a , a break-in to an e-mail server used by used by Defense Secretary Robert Gates ... the Defense Secretary Robert Gates ... the list goes on and on. Frankly that's list goes on and on. Frankly that's because the good news is usually a because the good news is usually a bunch of nonevents: "Hackers deterred bunch of nonevents: "Hackers deterred by diligent software patching at Army by diligent software patching at Army Corps of Engineers." Not too exciting Corps of Engineers." Not too exciting

So, in the world of IT securitySo, in the world of IT security, it must seem that the villains outnumber the heroes—but there are some good-guy celebrities in the world of cyber , it must seem that the villains outnumber the heroes—but there are some good-guy celebrities in the world of cyber security. In my years of reporting on the subject, I've often heard the red team referred to with a sense of breathless awe by security pros. security. In my years of reporting on the subject, I've often heard the red team referred to with a sense of breathless awe by security pros. These guys are purported to be just about the stealthiest, most skilled firewall-crackers in the game. Recently, I called up the secretive These guys are purported to be just about the stealthiest, most skilled firewall-crackers in the game. Recently, I called up the secretive government agency and asked if it could offer up a top red teamer for an interview, and, surprisingly, the answer came back, "Yes". government agency and asked if it could offer up a top red teamer for an interview, and, surprisingly, the answer came back, "Yes".

What are red teamsWhat are red teams? They're sort of like the special forces units of the security industry—highly skilled teams that clients pay to break into clients' ? They're sort of like the special forces units of the security industry—highly skilled teams that clients pay to break into clients' own networks. These guys find the security flaws so they can be patched before someone with more nefarious plans sneaks in. The cleint has own networks. These guys find the security flaws so they can be patched before someone with more nefarious plans sneaks in. The cleint has made plenty of news in the past few years for warrantless wiretapping and massive data-mining enterprises of questionable legality, but one made plenty of news in the past few years for warrantless wiretapping and massive data-mining enterprises of questionable legality, but one of the agency's primary functions is the protection of the military's secure computer networks, and that's where the red team comes in of the agency's primary functions is the protection of the military's secure computer networks, and that's where the red team comes in

In exchange for the interview, I agreed not to publish my source's nameIn exchange for the interview, I agreed not to publish my source's name . When I asked what I should call him, the best option I was offered . When I asked what I should call him, the best option I was offered was: "An official within the client's Vulnerability Analysis and Operations Group." So I'm just going to call him OWCVAOG for short. was: "An official within the client's Vulnerability Analysis and Operations Group." So I'm just going to call him OWCVAOG for short. And I'll try not to reveal any identifying details about the man whom I interviewed, except to say that his disciplined, military demeanor And I'll try not to reveal any identifying details about the man whom I interviewed, except to say that his disciplined, military demeanor shares little in common with the popular conception of the flippant geek-for-hire familiar to all too many movie fans (Dr. McKittrick in shares little in common with the popular conception of the flippant geek-for-hire familiar to all too many movie fans (Dr. McKittrick in WarGamesWarGames) and code geeks (n00b script-kiddie h4x0r in leetspeak ) and code geeks (n00b script-kiddie h4x0r in leetspeak

So what exactly does the red team actually doSo what exactly does the red team actually do? They provide "adversarial network services to the rest of the DOD," says OWCVAOG. That ? They provide "adversarial network services to the rest of the DOD," says OWCVAOG. That means that "customers" from the many branches of the Pentagon invite OWCVAOG and his crew to act like our country's shadowy means that "customers" from the many branches of the Pentagon invite OWCVAOG and his crew to act like our country's shadowy enemies (from the living-in-his-mother's-basement code tinkerer to a "well-funded hacker who has time and money to invest in the effort"), enemies (from the living-in-his-mother's-basement code tinkerer to a "well-funded hacker who has time and money to invest in the effort"), attempting to slip in unannounced and gain unauthorized access attempting to slip in unannounced and gain unauthorized access

These guys must conduct their work without doing damageThese guys must conduct their work without doing damage to or otherwise compromising the security of the networks they are tasked to analyze to or otherwise compromising the security of the networks they are tasked to analyze—that means no denial-of-service attacks, malicious Trojans or viruses. "The first rule," says OWCVAOG, "is `do no harm.'?" —that means no denial-of-service attacks, malicious Trojans or viruses. "The first rule," says OWCVAOG, "is `do no harm.'?"

So the majority of their work consists of probing their customers' networks, gaining user-level access and demonstrating just how compromised the So the majority of their work consists of probing their customers' networks, gaining user-level access and demonstrating just how compromised the network can be. Sometimes, the red team will leave an innocuous file on a secure part of a customer's network as a calling card, as if to say, network can be. Sometimes, the red team will leave an innocuous file on a secure part of a customer's network as a calling card, as if to say, "This is your friendly red team. We danced past the comical precautionary measures you call security hours ago. This file isn't doing "This is your friendly red team. We danced past the comical precautionary measures you call security hours ago. This file isn't doing anything, but if we were anywhere near as evil as the hackers we're simulating, it might just be deleting the very government secrets you anything, but if we were anywhere near as evil as the hackers we're simulating, it might just be deleting the very government secrets you were supposed to be protecting. Have a nice day!" were supposed to be protecting. Have a nice day!"

I'd heard from one of the Department of Defense clients who had previously worked with the red team that OWCVAOG and his team had a success I'd heard from one of the Department of Defense clients who had previously worked with the red team that OWCVAOG and his team had a success rate of close to 100 percent. "We don't keep statistics on that," OWCVAOG insisted when I pressed him on an internal measuring stick. rate of close to 100 percent. "We don't keep statistics on that," OWCVAOG insisted when I pressed him on an internal measuring stick. "We do get into most of the networks we target. That's because every network has some residual vulnerability. It is up to us, given the time "We do get into most of the networks we target. That's because every network has some residual vulnerability. It is up to us, given the time and the resources, to find the vulnerability that allows us to access it and the resources, to find the vulnerability that allows us to access it

It may seem unsettlingIt may seem unsettling to you—it did at first to me—to think that the digital locks protecting our government's most sensitive info are picked so to you—it did at first to me—to think that the digital locks protecting our government's most sensitive info are picked so constantly and seemingly with such ease. But I've been assured that these guys are only making it look easy because they're the best, and constantly and seemingly with such ease. But I've been assured that these guys are only making it look easy because they're the best, and that we all should take comfort, because they're on our side. The fact that they catch security flaws early means that, hopefully, we can that we all should take comfort, because they're on our side. The fact that they catch security flaws early means that, hopefully, we can patch up the holes before the black hats get to them patch up the holes before the black hats get to them

And like any good geek at a desk talking to a guy with a And like any good geek at a desk talking to a guy with a really cool job, I wondered where the client , I wondered where the client findsfinds the members of its superhacker squad. the members of its superhacker squad. "The bulk is military personnel, civilian government employees and a small cadre of contractors," OWCVAOG says. The military guys mainly "The bulk is military personnel, civilian government employees and a small cadre of contractors," OWCVAOG says. The military guys mainly

conduct the ops (the actual breaking and entering stuff), while the civilians and contractors mainly write code to support their endeavors. conduct the ops (the actual breaking and entering stuff), while the civilians and contractors mainly write code to support their endeavors. For those of you looking for a gig in the ultrasecret world of red teaming, this top hacker says the ideal profile is someone with "technical For those of you looking for a gig in the ultrasecret world of red teaming, this top hacker says the ideal profile is someone with "technical skills, an adversarial mind-set, perseverance and imagin skills, an adversarial mind-set, perseverance and imagin

Speaking of high-level, top-secret security jobs, this much I now know: The world's most difficult IT department to work for is most certainly lodged Speaking of high-level, top-secret security jobs, this much I now know: The world's most difficult IT department to work for is most certainly lodged within the Pentagon. Network admins at the Defense Department have to constantly fend off foreign governments, criminals and wannabes within the Pentagon. Network admins at the Defense Department have to constantly fend off foreign governments, criminals and wannabes trying to crack their security wall—trying to crack their security wall—andand worry about ace hackers with the same DOD stamp on their paychecks worry about ace hackers with the same DOD stamp on their paychecks

Security is an all-important issue Security is an all-important issue for the corporate world, too, but in that environment there is an acceptable level of risk that can be built into the , too, but in that environment there is an acceptable level of risk that can be built into the business model. And while banks build in fraud as part of the cost of doing business, there's no such thing as an acceptable loss when it business model. And while banks build in fraud as part of the cost of doing business, there's no such thing as an acceptable loss when it comes to national security. I spoke about this topic recently with Mark Morrison, chief info assurance officer of the Defense Intell Agency comes to national security. I spoke about this topic recently with Mark Morrison, chief info assurance officer of the Defense Intell Agency

"We meet with the financial community because there are a lot of parallels between what the intelligence community needs to protect and what the "We meet with the financial community because there are a lot of parallels between what the intelligence community needs to protect and what the financial community needs," Morrison said. "They, surprisingly, have staggeringly high acceptance levels for how much money they're financial community needs," Morrison said. "They, surprisingly, have staggeringly high acceptance levels for how much money they're willing to lose. We can't afford to have acceptable loss. So our risk profiles tend to be different, but in the long run, we end up accepting willing to lose. We can't afford to have acceptable loss. So our risk profiles tend to be different, but in the long run, we end up accepting similar levels of risk because we have to be able to provide actionable intelligence to the war fighter similar levels of risk because we have to be able to provide actionable intelligence to the war fighter

OWCVAOG agrees that military networks should be held to higher standards of security, but perfectly secure computers are perfectly unusable. OWCVAOG agrees that military networks should be held to higher standards of security, but perfectly secure computers are perfectly unusable. "There is a perfectly secure network," he said. "It's one that's shut off. We used to keep our info in safes. We knew that those safes were "There is a perfectly secure network," he said. "It's one that's shut off. We used to keep our info in safes. We knew that those safes were good, but they were not impenetrable, and were rated on the number of hours it took for people to break into them. This is a similar equation good, but they were not impenetrable, and were rated on the number of hours it took for people to break into them. This is a similar equation

A Clusterer A Clusterer (an unsupervised analytic) analyzes data objects without consulting a known class label (usually none are present). (an unsupervised analytic) analyzes data objects without consulting a known class label (usually none are present). Objects are clustered (grouped) based on Objects are clustered (grouped) based on maximizing intra-classmaximizing intra-class similaritysimilarity and and minimizing inter-classminimizing inter-class similarity. similarity.

FAUSTFAUST

FAUST Count Change (FCC) ClustererFAUST Count Change (FCC) Clusterer 1. Choose a 1. Choose a nextDnextD recursion plan, e.g., a. Initially use D = the diagonal with Max Standard Deviation (STD) or maximum STD/Spread.. b. Always recursion plan, e.g., a. Initially use D = the diagonal with Max Standard Deviation (STD) or maximum STD/Spread.. b. Always

use AM (Avg-Median). c. always use AFFA (Avg-FurthestFromAverage). d. always use FFAFFF (FurthestFromAvg-use AM (Avg-Median). c. always use AFFA (Avg-FurthestFromAverage). d. always use FFAFFF (FurthestFromAvg-FurthestFromFurthest). e. cycle thru diagonals: eFurthestFromFurthest). e. cycle thru diagonals: e1,...,1,...,..e..enn, e, e11ee22..; f. cycle thru AM, AFFA, FFAFFF; ... ..; f. cycle thru AM, AFFA, FFAFFF; ...

Choose a DensityThreshold (DT), DensityUniformityThreshold(DUT), Precipitous Count Change Definition (PCCD).Choose a DensityThreshold (DT), DensityUniformityThreshold(DUT), Precipitous Count Change Definition (PCCD).2. If DT (and/or DUT) not exceeded at C, (a cluster), partition C by cutting at each gap and PCC in C2. If DT (and/or DUT) not exceeded at C, (a cluster), partition C by cutting at each gap and PCC in C ooD using the next D in the recursion plan.D using the next D in the recursion plan.

FAUST Anomaly Detector (FAD)FAUST Anomaly Detector (FAD) (outlier detection analytic) identifies objects not complying with the ata model. (outlier detection analytic) identifies objects not complying with the ata model.In most cases, outliers are discarded as noise or exceptions, but in some applications, e.g., fraud detection, rare events are the interesting events. In most cases, outliers are discarded as noise or exceptions, but in some applications, e.g., fraud detection, rare events are the interesting events. Outliers can be detected with a. Statistics (statistical tests assume a distribution/probability model), b. Distance, c. Density or d. Deviation Outliers can be detected with a. Statistics (statistical tests assume a distribution/probability model), b. Distance, c. Density or d. Deviation

(deviation uses a dissimilarity to reduce overall dissimilarity by removing "deviation outliers"). Outlier mining can mean:(deviation uses a dissimilarity to reduce overall dissimilarity by removing "deviation outliers"). Outlier mining can mean:1. Given a set of n objects and given a k, find the top k objects in terms of dissimilarity from the rest of the objects.1. Given a set of n objects and given a k, find the top k objects in terms of dissimilarity from the rest of the objects.2. Given a Training Set, identify outlier objects within each class (correctly classified but noticeably dissimilar to their fellow class members). 2. Given a Training Set, identify outlier objects within each class (correctly classified but noticeably dissimilar to their fellow class members). 3. Determine "fuzzy" clusters, i.e., assign a weight for each (object, cluster) pair. (Does a dendogram do that?).3. Determine "fuzzy" clusters, i.e., assign a weight for each (object, cluster) pair. (Does a dendogram do that?).We believe that the FAUST Count Change clusterer is the best Anomaly Detector we have at this time. It seems important to be able to identify and We believe that the FAUST Count Change clusterer is the best Anomaly Detector we have at this time. It seems important to be able to identify and

remove large clusters in order to find small ones (anomalies).remove large clusters in order to find small ones (anomalies).

D compute midpt of D or low and hi of xD compute midpt of D or low and hi of xooD over each class, then build a polygonal hull, tightly for Linear and loosely for Medoid. For a loose D over each class, then build a polygonal hull, tightly for Linear and loosely for Medoid. For a loose hull examine for "hull examine for "none of the abovenone of the above" separately at great expense. Always build a tight polygonal hull, we end up with just one method:" separately at great expense. Always build a tight polygonal hull, we end up with just one method:

FAUST Polygon classifierFAUST Polygon classifier: Take the above series of : Take the above series of DD vectors vectors (Add additional D(Add additional Dss if you wish - the more the merrier - but watch out for the cost of computing {C if you wish - the more the merrier - but watch out for the cost of computing {C kkooD}D}k=1..Kk=1..K

e.g., add CAFFAs (Class Avg-FurthestFromAvg), CFFAFFF (Class FurthestFromAvg-FurthestFromFurthest)e.g., add CAFFAs (Class Avg-FurthestFromAvg), CFFAFFF (Class FurthestFromAvg-FurthestFromFurthest)D D in the D_series, let lin the D_series, let lDD,k,kmnCmnCkkooDD (or 1 (or 1stst PCI) and let h PCI) and let hDD,k,kmxCmxCkkooDD (or last PCD). (or last PCD).

y isa Cy isa Ckk iff y iff yHHkk where Hwhere Hkk = {z = {zSpace | lSpace | lDD,k ,k DDooz z h hDD,k,k DD in the series} in the series}

If y hull of >1 class, say, y HIf y hull of >1 class, say, y Hii11..H..Hii

hh, y can be , y can be

fuzzy classified by weight(y,k)=OneCount{Pfuzzy classified by weight(y,k)=OneCount{PCCkk & P & PHHii11

... & P ... & PHHiihh} and, if we wish, we can } and, if we wish, we can

declare y isa Cdeclare y isa Ckk where weight(y,k) is a maximum weight. where weight(y,k) is a maximum weight.

Or we can let Sphere(y,r)={z | (y-z)Or we can let Sphere(y,r)={z | (y-z)oo(y-z)<r(y-z)<r22} vote (but, requires the construction of Sphere(y,r) )} vote (but, requires the construction of Sphere(y,r) )

FCC FCC on IRIS150on IRIS150 DT=1 DT=1 PCCD: PCCs must involve a hi PCCD: PCCs must involve a hi 5 and 5 and 60% change ( 60% change ( 2 if high=5) from that high 2 if high=5) from that high. . Gaps must be Gaps must be 33

FCC on IRIS150: FCC on IRIS150: 11stst round, D=1011 round, D=1011, , F(x)=(x-p)F(x)=(x-p)ooDD, randomly selected p=(15 6 2 5)., randomly selected p=(15 6 2 5). 91.3% accurate after 191.3% accurate after 1stst round round

The FCC clusterer algorithmThe FCC clusterer algorithm Set D Set Dk k to 1 for each column with NSTD>NSTDT=0.2, to 1 for each column with NSTD>NSTDT=0.2,

so D=1011. Make Gap and PCC cuts on YoDso D=1011. Make Gap and PCC cuts on YoDIf Density < DT at a dendogram node, C, (a cluster), partition C at each gap and PCC in CIf Density < DT at a dendogram node, C, (a cluster), partition C at each gap and PCC in CooD using next D in recursion plan.D using next D in recursion plan.

NSTD1 2 3 4NSTD1 2 3 4 .22 .18 .29 .31.22 .18 .29 .31

Is there a better measure of gap potential than STD? How about normalized STD, Is there a better measure of gap potential than STD? How about normalized STD, NSTD NSTD Std(X Std(Xkk-minX-minXkk)/SpreadX)/SpreadXkk ? ?

We can develop a fusion method by looking for non-gaps in projections onto the vector connecting the medians of cluster We can develop a fusion method by looking for non-gaps in projections onto the vector connecting the medians of cluster pairspairs

F 0 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 37 38F 0 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 37 38Ct 3 1 2 1 1 2 3 1 6 5 2 5 4 4 3 2 3 1 1 1 1 2 1Ct 3 1 2 1 1 2 3 1 6 5 2 5 4 4 3 2 3 1 1 1 1 2 1Gp 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 15 1 2Gp 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 15 1 2(300) (11 0 0) (11 00)(200) (25 0 0)(300) (11 0 0) (11 00)(200) (25 0 0)

40 41 47 49 50 51 53 54 55 56 57 59 60 61 63 65 66 67 69 70 71 72 73 7440 41 47 49 50 51 53 54 55 56 57 59 60 61 63 65 66 67 69 70 71 72 73 74 1 1 2 1 2 1 2 2 3 5 1 2 1 2 1 4 2 2 2 5 2 3 5 21 1 2 1 2 1 2 2 3 5 1 2 1 2 1 4 2 2 2 5 2 3 5 2 1 6 2 1 1 2 1 1 1 1 2 1 1 2 2 1 1 2 1 1 1 1 1 11 6 2 1 1 2 1 1 1 1 2 1 1 2 2 1 1 2 1 1 1 1 1 1C1(0 4 C1(0 4 11) C2(0 17 ) C2(0 17 11) (0 17 0) C3(03) (0 17 0) C3(0322))

75 76 77 78 79 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 99 103 106 108 109 111 75 76 77 78 79 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 99 103 106 108 109 111 114114

2 2 1 2 1 2 3 1 1 1 1 2 3 2 2 1 1 1 2 2 1 2 1 1 2 1 1 2 2 1 2 1 2 3 1 1 1 1 2 3 2 2 1 1 1 2 2 1 2 1 1 2 1 1 1 1

1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 4 3 2 1 2 31 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 4 3 2 1 2 3 C4(0 C4(0 99 37) (003) (0 0 6) 37) (003) (0 0 6)

FCC on IRIS150: FCC on IRIS150: 22ndnd round D=0100 round D=0100, p=(15 6 2 5), p=(15 6 2 5) C1C1 F 37 38 40 41F 37 38 40 41 Ct 2 1 1 1Ct 2 1 1 1 Gp 1 2 1Gp 1 2 1 (040) (001)(040) (001)

C2C2 F 2 3 4 5 6 7 8 9 10F 2 3 4 5 6 7 8 9 10Ct 1 1 2 3 3 4 1 1 2Ct 1 1 2 3 3 4 1 1 2Gp 1 1 1 1 1 1 1 1Gp 1 1 1 1 1 1 1 1 (040) (02(040) (0211) (0 11 0)) (0 11 0)

C3C3 F 2 8 9 11 12F 2 8 9 11 12Ct 1 1 1 1 1Ct 1 1 1 1 1Gp 6 1 2 1Gp 6 1 2 1 (002) (030) (002) (030)

C4C4 71 72 73 74 75 76 77 78 79 81 82 83 84 85 86 87 88 89 90 91 92 93 94 9571 72 73 74 75 76 77 78 79 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 2 3 5 2 2 2 1 2 1 2 3 1 1 1 1 2 3 2 2 1 1 1 2 22 3 5 2 2 2 1 2 1 2 3 1 1 1 1 2 3 2 2 1 1 1 2 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 (04(0411)(005) C41(0)(005) C41(0555) (0 0 26) 5) (0 0 26)

95.3% accurate after 295.3% accurate after 2ndnd round round

FCC on IRIS150: FCC on IRIS150: 33rdrd round D=1000 round D=1000, p=(15 6 2 5), p=(15 6 2 5) C41C41 F 13 14 16 17 18 23F 13 14 16 17 18 23Ct 3 3 1 1 1 1Ct 3 3 1 1 1 1Gp 1 2 1 1 5 Gp 1 2 1 1 5 (04(0422) (0) (0112) (001) 2) (001)

97% accurate after 397% accurate after 3rdrd round round

FCC on IRIS150FCC on IRIS150 DT=1 DT=1 PCCD: PCCs must involve a hi PCCD: PCCs must involve a hi 5 and be at least a 60% change ( 5 and be at least a 60% change ( 2 if high=5) from that high 2 if high=5) from that high. . Gaps must be Gaps must be 33

FCC on IRIS150: FCC on IRIS150: 11stst rnd, D=1010 rnd, D=1010 (highest STD) (highest STD)

F 0 2 3 4 6 7 8 9 10 11 12 13 14 15 16 17 18 20 24 27 28 29 31 37 38 39 40 41 43 44 45 46 47 48 51 53 54 55 56 57 58 59 60 62 63 64 65F 0 2 3 4 6 7 8 9 10 11 12 13 14 15 16 17 18 20 24 27 28 29 31 37 38 39 40 41 43 44 45 46 47 48 51 53 54 55 56 57 58 59 60 62 63 64 65Ct 1 1 2 2 3 1 3 3 7 3 7 5 3 3 2 2 1 1 1 1 1 1 1 Ct 1 1 2 2 3 1 3 3 7 3 7 5 3 3 2 2 1 1 1 1 1 1 1 1 3 1 1 3 2 3 5 1 3 1 4 6 3 3 6 3 3 1 2 3 6 1 51 3 1 1 3 2 3 5 1 3 1 4 6 3 3 6 3 3 1 2 3 6 1 5Gp 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 4 3 1 1 2 6 1 1 1 1 2 1 1 1 1 1 3 2 1 1 1 1 1 1 1 2 1 1 1 1Gp 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 4 3 1 1 2 6 1 1 1 1 2 1 1 1 1 1 3 2 1 1 1 1 1 1 1 2 1 1 1 1 --------------------------(50 0 0)------------------(001) (0 4 0) C1(0 18 --------------------------(50 0 0)------------------(001) (0 4 0) C1(0 18 11) (050) C2(0 22 ) (050) C2(0 22 1818) (010)(008)) (010)(008)

66 69 70 71 72 73 76 78 79 81 82 84 88 89 90 9266 69 70 71 72 73 76 78 79 81 82 84 88 89 90 92 3 5 2 1 1 1 2 1 1 1 1 1 1 1 2 13 5 2 1 1 1 2 1 1 1 1 1 1 1 2 1 3 1 1 1 1 3 2 1 2 1 2 4 1 1 23 1 1 1 1 3 2 1 2 1 2 4 1 1 2 (005) (005) (0 0 7) (0 0 5)(005) (005) (0 0 7) (0 0 5)

87.3% accurate after 187.3% accurate after 1stst round round

C2 C2 2nd round D=00012nd round D=0001F 11 12 13 14 15 16 17 18 19 22 23F 11 12 13 14 15 16 17 18 19 22 23Ct 1 4 7 8 3 1 5 4 4 2 1Ct 1 4 7 8 3 1 5 4 4 2 1Gp 1 1 1 1 1 1 1 1 3 1Gp 1 1 1 1 1 1 1 1 3 1 C21(0 17 C21(0 17 33) (040) C22(0 ) (040) C22(0 11 12)(003) 12)(003)

C1 C1 D=0001D=0001 F 9 10 11 12 13 14 16F 9 10 11 12 13 14 16Ct 3 2 4 7 1 1 1Ct 3 2 4 7 1 1 1Gp 1 1 1 1 1 2 Gp 1 1 1 1 1 2 (0 18 0) (001)(0 18 0) (001)

97.3% accurate after 297.3% accurate after 2ndnd round round

FCC on IRIS150FCC on IRIS150 DT=1 DT=1 PCCD: PCCs must involve a hi PCCD: PCCs must involve a hi 5 and be at least a 60% change ( 5 and be at least a 60% change ( 2 if high=5) from that high 2 if high=5) from that high. . Gaps must be Gaps must be 33

FCC on IRIS150: FCC on IRIS150: 11stst rnd, D=1-111 rnd, D=1-111 (highest STD/spread)(highest STD/spread)

F 0 1 2 3 4 5 6 7 8 9 22 23 24 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 45 46 47 48 49 50 51 52 53 54 55 57 60F 0 1 2 3 4 5 6 7 8 9 22 23 24 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 45 46 47 48 49 50 51 52 53 54 55 57 60Ct 1 1 2 3 7 14 12 4 5 1 2 1 1 1 2 2 4 5 5 6 2 3 5 Ct 1 1 2 3 7 14 12 4 5 1 2 1 1 1 2 2 4 5 5 6 2 3 5 2 3 5 6 6 5 2 6 5 4 4 3 1 1 2 1 1 1 1 1 12 3 5 6 6 5 2 6 5 4 4 3 1 1 2 1 1 1 1 1 1Gp 1 1 1 1 1 1 1 1 1 13 1 1 2 1 1 1 1 1 1 1 1 1 1 Gp 1 1 1 1 1 1 1 1 1 13 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 3 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 3 ----------------(50 0 0)-------- C1(0 25 ----------------(50 0 0)-------- C1(0 25 22) (090) C2(0 16 ) (090) C2(0 16 1111) (0 0 37) ) (0 0 37)

91.3% accurate after 191.3% accurate after 1stst round round

22ndnd rnd, D=1-1-1-1 rnd, D=1-1-1-1 ( (which is the highest STD/spread of those which is the highest STD/spread of those to 1-111: 111-1 11-11 1-1-1-1 to 1-111: 111-1 11-11 1-1-1-1))C2C214 15 16 17 19 21 22 23 24 25 26 28 29 31 32 33 35 3814 15 16 17 19 21 22 23 24 25 26 28 29 31 32 33 35 38 1 1 3 2 1 1 1 1 2 1 1 1 3 1 1 4 1 11 1 3 2 1 1 1 1 2 1 1 1 3 1 1 4 1 1 1 1 1 2 2 1 1 1 1 1 2 1 2 1 1 2 31 1 1 2 2 1 1 1 1 1 2 1 2 1 1 2 3 C21(0 C21(0 33 10) (02 10) (0211) ----(0 11 0)----- ) ----(0 11 0)-----

C1C1F 17 19 21 27 28 29 30 32 33 34 35 36 37 39 40 41 49F 17 19 21 27 28 29 30 32 33 34 35 36 37 39 40 41 49Ct 1 1 1 3 3 1 1 1 1 1 4 1 2 2 2 1 1Ct 1 1 1 3 3 1 1 1 1 1 4 1 2 2 2 1 1Gp 2 2 6 1 1 1 2 1 1 1 1 1 2 1 1 8Gp 2 2 6 1 1 1 2 1 1 1 1 1 2 1 1 8 (001) (020) (0 23 0) (001) (001) (020) (0 23 0) (001)

97.3% accurate after 297.3% accurate after 2ndnd round round

FCC on SEED150FCC on SEED150 DT=1 PCCsDT=1 PCCs have to involve a high of at least 5 and be at least 60% change from that high have to involve a high of at least 5 and be at least 60% change from that high . . Gaps must be Gaps must be 3 3

Using UPCC with Using UPCC with D=1111D=1111First round only:First round only:

The Ulitmate PCC clusterer algorithmThe Ulitmate PCC clusterer algorithm ? ?Set DSet Dk k to 1 for each column with NSTD>ThresholdNSTD (NSTDT=0.25)to 1 for each column with NSTD>ThresholdNSTD (NSTDT=0.25)

Shift X column values as in Gap Preservation above giving the shifted table, YShift X column values as in Gap Preservation above giving the shifted table, YMake Gap and PCC cuts on YoDMake Gap and PCC cuts on YoDIf Density < DT at a dendogram node, C, (a cluster), partition C at each gap and PCC in CIf Density < DT at a dendogram node, C, (a cluster), partition C at each gap and PCC in CooD using next D in recursion plan.D using next D in recursion plan.

NSTD 1 2 3 4NSTD 1 2 3 4 0.28 0.28 0.22 0.28 0.28 0.22

0.470.47

NSTD NSTD Std(X Std(Xkk-minX-minXkk)/SpreadX)/SpreadXkk ? ?

F 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17F 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17Ct 1 6 10 22 25 16 12 1 9 6 14 12 7 3 1 3 1 1Ct 1 6 10 22 25 16 12 1 9 6 14 12 7 3 1 3 1 1GP 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1GP 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

CC11

(44L, 1M, 47H)(44L, 1M, 47H) CC33

(3L, 9M, 3H)(3L, 9M, 3H) CC4 4

(2L, 31M, 0H)(2L, 31M, 0H) CC55

(0L, 9M, 0H)(0L, 9M, 0H)

Using UPCC with Using UPCC with D=1001D=1001First round only:First round only:

F 0 1 2 3 4 5 6 7 8 9 10 11F 0 1 2 3 4 5 6 7 8 9 10 11Ct 18 25 18 18 14 6 9 7 8 21 2 4Ct 18 25 18 18 14 6 9 7 8 21 2 4GP 1 1 1 1 1 1 1 1 1 1 1GP 1 1 1 1 1 1 1 1 1 1 1

CC11

(42L, 1M, 50H)(42L, 1M, 50H) CC22

(0L, 22M, 0H)(0L, 22M, 0H) CC33

(8L, 21M, 0H)(8L, 21M, 0H)CC44

(0L, 6M, 0H)(0L, 6M, 0H)

errs=53 45 0 6 2 0 errs=53 45 0 6 2 0 spread=10 3 1 3 2 1 spread=10 3 1 3 2 1

CC22

(1L, 0M, 0H)(1L, 0M, 0H)

errs=51 43 0 8 0 errs=51 43 0 8 0 spread=6 2 1 2 1spread=6 2 1 2 1

D=1101 D=1101 1st rnd: 1st rnd: 0 1 2 3 4 5 6 7 8 9 10 11 12 130 1 2 3 4 5 6 7 8 9 10 11 12 1318 25 16 9 13 12 6 9 7 7 21 1 3 318 25 16 9 13 12 6 9 7 7 21 1 3 3 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1

CC11

(50L,22M,50H)(50L,22M,50H) CC22

(0L,21M,0H)(0L,21M,0H)CC33

(0L,7M, 0H)(0L,7M, 0H)errs=72 72 0 0 errs=72 72 0 0

spread=5 3 1 1spread=5 3 1 1

2nd round D=AFFA2nd round D=AFFA22 25 27 29 31 33 34 35 37 39 42 43 44 45 46 48 49 50 51 52 53 54 5522 25 27 29 31 33 34 35 37 39 42 43 44 45 46 48 49 50 51 52 53 54 551 1 2 6 4 1 1 2 3 3 1 1 1 5 4 7 1 3 2 5 1 3 31 1 2 6 4 1 1 2 3 3 1 1 1 5 4 7 1 3 2 5 1 3 33 2 2 2 2 1 1 2 2 3 1 1 1 1 2 1 1 1 1 1 1 1 13 2 2 2 2 1 1 2 2 3 1 1 1 1 2 1 1 1 1 1 1 1 1

56 57 58 59 60 61 62 63 64 65 68 69 70 71 56 57 58 59 60 61 62 63 64 65 68 69 70 71 7272

4 3 4 1 3 1 13 1 10 6 3 2 7 1 4 3 4 1 3 1 13 1 10 6 3 2 7 1 22

1 1 1 1 1 1 1 1 1 3 1 1 1 11 1 1 1 1 1 1 1 1 3 1 1 1 1CC1,11,1

(0L,4M,0H)(0L,4M,0H)CC1,21,2

((6L6L,17M,0H),17M,0H) CC1,31,3

(13L,(13L,1M,2H1M,2H)) CC1,41,4

(5L,0M,0H)(5L,0M,0H) CC1,51,5

(5L,0M,0H)(5L,0M,0H)CC1,61,6

(16L,0M,(16L,0M,6H6H)) CC1,71,7

((2L2L,0M,12H),0M,12H)CC1,81,8

((3L3L,0M,28H),0M,28H)

85% accurate85% accurate

CC1,21,2 3 3rdrdrnd rnd

D=0010D=0010

F 0 1F 0 1Ct 4 19Ct 4 19GP 1GP 1

CC1,2,11,2,1

(4L,0M,0H)(4L,0M,0H) CC1,2,11,2,1

((2L2L,17M),17M)

33rdrd round D=0010 C round D=0010 C1,61,6:: F 0 1 2 3 4 5 6 7F 0 1 2 3 4 5 6 7Ct 2 5 6 5 1 1 1 1Ct 2 5 6 5 1 1 1 1

CC1,6,11,6,1

(2L,0M,0H)(2L,0M,0H) CC1,6,21,6,2

(14L,0M,(14L,0M,2H2H)) CC1,6,31,6,3

(0L,0M,4H)(0L,0M,4H)

33rdrd round D=0010 C round D=0010 C1,81,8:: F 0 1 2 3 4 5F 0 1 2 3 4 5Ct 3 2 2 17 3 4Ct 3 2 2 17 3 4

CC1,8,11,8,1

(3L,0M,0H)(3L,0M,0H) CC1,8,21,8,2

(0L,0M,28H)(0L,0M,28H)

33rdrd round D=0010 C round D=0010 C1,31,3:: F 0 1 2 3 6 7F 0 1 2 3 6 7Ct 3 5 4 1 1 2Ct 3 5 4 1 1 2

CC1,3,11,3,1

(3L,0M,0H)(3L,0M,0H)CC1,3,21,3,2

(9L,0M,0H)(9L,0M,0H)CC1,3,21,3,2

(0L,1M,0H)(0L,1M,0H)CC1,3,31,3,3

(0L,0M,2H)(0L,0M,2H)

(0L,0M,1H)(0L,0M,1H)CC1.5.51.5.5

(0L,0M,1H)(0L,0M,1H) CC1.4.51.4.5

3rd round with D=0010 97.3% accurate3rd round with D=0010 97.3% accurate

FCC on CONC150FCC on CONC150 CONC counts are L=43, M=52, H=55 CONC counts are L=43, M=52, H=55 DT=1DT=1 PCCs: PCCs: 5 5 60% change from high60% change from high. . Gaps: Gaps: 3 3

NSTD 1 2 3 4NSTD 1 2 3 4 0.25 0.25 0.24 0.220.25 0.25 0.24 0.22NSTD NSTD Std(X Std(Xkk-minX-minXkk)/SpreadX)/SpreadXkk ? ?

D 1111D 11111st rnd1st rnd

0 1 6 7 8 10 14 15 16 17 18 20 21 22 23 24 25 26 28 29 31 33 34 360 1 6 7 8 10 14 15 16 17 18 20 21 22 23 24 25 26 28 29 31 33 34 36 1 1 1 1 1 1 1 2 1 1 2 2 2 2 1 2 2 4 1 3 1 1 2 11 1 1 1 1 1 1 2 1 1 2 2 2 2 1 2 2 4 1 3 1 1 2 1 1 5 1 1 2 4 1 1 1 1 2 1 1 1 1 1 1 2 1 2 2 1 2 11 5 1 1 2 4 1 1 1 1 2 1 1 1 1 1 1 2 1 2 2 1 2 1 (200) C2(2,(200) C2(2,22,0),0)

37 38 39 40 41 42 43 44 45 46 48 49 50 51 52 53 54 55 56 58 59 61 6237 38 39 40 41 42 43 44 45 46 48 49 50 51 52 53 54 55 56 58 59 61 62 1 2 1 1 2 1 2 1 5 4 2 6 1 1 4 1 1 7 1 3 2 1 51 2 1 1 2 1 2 1 5 4 2 6 1 1 4 1 1 7 1 3 2 1 5 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 2 1 11 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 2 1 1C3(24 C3(24 13 513 5) C4(6,) C4(6,4,14,1)()(2204) C6(04) C6(5,65,6,11),11) C5 C5

63 64 65 66 67 68 69 71 72 73 74 76 77 78 80 81 82 85 86 87 91 92 9463 64 65 66 67 68 69 71 72 73 74 76 77 78 80 81 82 85 86 87 91 92 94 3 3 1 2 1 2 1 1 2 1 1 1 1 1 1 5 3 5 2 2 5 1 13 3 1 2 1 2 1 1 2 1 1 1 1 1 1 5 3 5 2 2 5 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 2 1 1 3 1 1 4 1 2 31 1 1 1 1 1 2 1 1 1 2 1 1 2 1 1 3 1 1 4 1 2 3 C9(0 C9(0 1111 16) C10( 16) C10(115522)(005)C11()(005)C11(114444) (002)) (002)

97 98 100 101 102 104 110 119 F97 98 100 101 102 104 110 119 F 1 3 2 2 1 1 1 1 Count1 3 2 2 1 1 1 1 Count 1 2 1 1 2 6 9 Gap1 2 1 1 2 6 9 Gap C15(0 5 C15(0 5 55) (0 2 0)) (0 2 0)

1st rnd 53% accurate1st rnd 53% accurate

D:1100D:11002nd rnd2nd rndC3 C3 (24 (24 13 513 5))

(3(3110) (60) (6110) (002) (0) (002) (1102) (02) (11 2 0) 2 0) F 21 26 31 32 36 51 53 64 68 71 100 102 109 112 121 122F 21 26 31 32 36 51 53 64 68 71 100 102 109 112 121 122 Ct 2 3 4 3 1 4 3 2 7 2 2 2 3 1 1 2Ct 2 3 4 3 1 4 3 2 7 2 2 2 3 1 1 2 GP 5 5 1 4 15 2 11 4 3 29 2 7 3 9 1 GP 5 5 1 4 15 2 11 4 3 29 2 7 3 9 1 (5 0 0)(5(5 0 0)(5220)(100) (0 5 0) (0)(100) (0 5 0) (1101) (020) (100)01) (020) (100)

D:1100D:11002nd rnd2nd rndC9 C9 (0 (0 11 11 16)16)

(010) (002) (0(010) (002) (0111)1) F 16 20 24 29 37 48 49 51 52 54 57 63 64 67 73 82 83F 16 20 24 29 37 48 49 51 52 54 57 63 64 67 73 82 83Ct 1 1 1 1 2 4 1 1 3 1 3 1 1 2 2 1 1Ct 1 1 1 1 2 4 1 1 3 1 3 1 1 2 2 1 1GP 4 4 5 8 11 1 2 1 2 3 6 1 3 6 9 1GP 4 4 5 8 11 1 2 1 2 3 6 1 3 6 9 1 (0 4 0) (002) (0 (0 4 0) (002) (0 22 8) (0 8) (0112)(020) (02)(020) (0111)1)

D:1100D:11002nd rnd2nd rndC15 C15 (0 5 (0 5 55))

(010) (010) F 20 42 58 64 77 83F 20 42 58 64 77 83Ct 1 2 3 1 2 1Ct 1 2 3 1 2 1GP 22 16 6 13 6GP 22 16 6 13 6(010)(0 0 5) (0 4 0) (010)(0 0 5) (0 4 0)

D:1100D:11002nd rnd2nd rndC4 C4 (6 (6 44 11))

(200) (200) F 14 16 24 28 48 49 51 64F 14 16 24 28 48 49 51 64Ct 1 1 2 2 2 1 1 1Ct 1 1 2 2 2 1 1 1GP 2 8 4 20 1 2 13GP 2 8 4 20 1 2 13 (200) (10(200) (1011) () (1140) 40)

D:1100D:11002nd rnd2nd rndC6 C6 ((55 66 11) 11)

(100) (100) (0 0 9) (010) (100) (100) (0 0 9) (010) F 8 16 24 27 28 29 37 41 42 45 48 49 51 67 73F 8 16 24 27 28 29 37 41 42 45 48 49 51 67 73Ct 1 1 1 2 2 1 1 1 1 1 4 2 2 1 1Ct 1 1 1 2 2 1 1 1 1 1 4 2 2 1 1GP 8 8 3 1 1 8 4 1 3 3 1 2 16 6GP 8 8 3 1 1 8 4 1 3 3 1 2 16 6 (020) ( 2 (020) ( 2 2 12 1) (1) (1110) (001)0) (001)

D:1100D:11002nd rnd2nd rndC2 C2 ((22 22 0) 0)

F 31 37 41 49F 31 37 41 49Ct 1 1 1 1Ct 1 1 1 1GP 6 4 8GP 6 4 8 (200) (020) (200) (020)

D:1100D:11002nd rnd2nd rndC5 C5 ((22 0 4) 0 4)

F 71 74 119F 71 74 119Ct 1 1 4Ct 1 1 4GP 3 45GP 3 45 (200) (004) (200) (004)

D:1100D:11002nd rn102nd rn10C10 C10 ((11 5 5 22))

(030) (030) F 51 57 58 63 64 83F 51 57 58 63 64 83Ct 1 1 3 1 1 1Ct 1 1 3 1 1 1GP 6 1 5 1 19 GP 6 1 5 1 19 (100)(001) (01(100)(001) (0111) (010)) (010)

D:1100D:11002nd rn102nd rn10C11 C11 ((11 4 4 44))

(010) (001) (010) (001) F 8 42 54 58 63 73F 8 42 54 58 63 73Ct 1 2 1 3 1 1Ct 1 2 1 3 1 1GP 34 12 4 5 10GP 34 12 4 5 10 (010)((010)(1110) (003) (010)10) (003) (010)

2nd round 90% accurate2nd round 90% accurate

FCC on WINE150FCC on WINE150 WINE counts are L=57, M=75, H=18 WINE counts are L=57, M=75, H=18 DT=1 DT=1 PCCs: PCCs: 5 5 60% change from high60% change from high. . COLUMN 1 2 3 4COLUMN 1 2 3 4STD .017 .005 .001 .037STD .017 .005 .001 .037SPREAD 10 39 112 5.6SPREAD 10 39 112 5.6STD/SPR .18 .23 .21 2.38STD/SPR .18 .23 .21 2.38

NSTD NSTD Std(X Std(Xkk-minX-minXkk)/SpreadX)/SpreadXkk ? ?

1st round 64% accurate1st round 64% accurateD 0001D 00011st rnd1st rndGp>=1Gp>=1

(1 0 0) C3(14 17 1) C5(1 20 4) (0 0 2)(1 0 0) C3(14 17 1) C5(1 20 4) (0 0 2)F 0 1 2 3 4 5 6F 0 1 2 3 4 5 6Ct 1 27 32 53 25 10 2Ct 1 27 32 53 25 10 2 C2(22 4 1) C4(18 28 7) C6(1 6 3)C2(22 4 1) C4(18 28 7) C6(1 6 3)

C2 C2 010001002nd rnd2nd rnd(22 4 1)(22 4 1)Gp>=2Gp>=2

F 1 2 3 4 6 7 8 9 10 11 13 14 17 19 23 24 29F 1 2 3 4 6 7 8 9 10 11 13 14 17 19 23 24 29Ct 2 1 2 1 3 3 3 2 1 2 1 1 1 1 1 1 1Ct 2 1 2 1 3 3 3 2 1 2 1 1 1 1 1 1 1GP 1 1 1 2 1 1 1 1 1 2 1 3 2 4 1 5 GP 1 1 1 2 1 1 1 1 1 2 1 3 2 4 1 5 C21(5 0 C21(5 0 11) C22(10 ) C22(10 44 0) (7 0 0) 0) (7 0 0)

C4 0010C4 00102nd rnd2nd rnd(18 28 7)(18 28 7)

(030)(030) F 0 2 3 4 5 6 7 8 9 10 11 12 13 17 18 20 21 22 24 25 27 32 37F 0 2 3 4 5 6 7 8 9 10 11 12 13 17 18 20 21 22 24 25 27 32 37Ct 1 3 4 2 3 2 4 1 2 1 1 2 2 4 1 2 2 3 2 1 1 1 2Ct 1 3 4 2 3 2 4 1 2 1 1 2 2 4 1 2 2 3 2 1 1 1 2GP 2 1 1 1 1 1 1 1 1 1 1 1 4 1 2 1 1 2 1 2 5 5 3GP 2 1 1 1 1 1 1 1 1 1 1 1 4 1 2 1 1 2 1 2 5 5 3 C31(10 C31(10 8 18 1) C32() C32(11 5 5 33) C33() C33(112211) C34(4 7 ) C34(4 7 11) C35() C35(448811))

(020) (010)(020) (010) 40 52 60 67 93 10640 52 60 67 93 106 1 1 1 1 1 11 1 1 1 1 1 12 8 7 26 1312 8 7 26 13(100) (001) (100)(100) (001) (100)

F 10 12 18 20 35 58 79 81 82F 10 12 18 20 35 58 79 81 82Ct 1 1 1 1 1 1 1 2 1Ct 1 1 1 1 1 1 1 2 1GP 2 6 2 15 23 21 2 1GP 2 6 2 15 23 21 2 1 (002) (030) (100) (03(002) (030) (100) (0311))

C6 C6 001000102nd rnd2nd rnd(18 28 7)(18 28 7)