index provisioning for alm search - my presentation

05/01/2023 1

Index Provisioning and Scale Unit Design for ES

for ALM Search Sunita Shrivastava(SunitaS)

Satish Kumar(Saku)Mohammed Imran(Imran.Siddique)

05/01/2023 2

Index Provisioning• Code Search Query ES Performance depends on

• The performance of ES of course• Performance of our plugins • And how well the index is provisioned

• Focus• Plan for ensuring that ES indices are well provisioned

05/01/2023 3

Agenda• Review Design Objectives• Discuss key ES Experiments and Learnings• Describe the Proposed Scale Unit Model for ES Data Nodes• On boarding Strategy• Other Alternatives Considered• Further Planned Follow-ups• Not on Agenda Today

• Improving the performance of analyzer and plugin• ALM Search Scale Unit Capacity Planning• Relationship of ALM Search SU to TFS SU

05/01/2023 4

Key Design Objectives• Provide a resilient method that meets the performance SLAs which can

deal with fast growth/lack of growth without requiring immediate action on the part of devops

• Provide a cost effective index provisioning plan where indexes are hosted on units which are able to grow with the growth in accounts while keeping the performance acceptable

• Provide reasonable isolation characteristics• Simplify or Aid Testability for Scale, Add Predictability• Ease of Monitoring and Trouble Shooting will be guiding principles• Support Change • Last but not the least, keep it simple

Biju Venugopal

While the account grows not all of the data will "actively searched"... How do we use the fact that a major part of the code base will eventually get "dormant" over a perdiod of time, so as to have the "active code base" in the shards that gets loaded.

05/01/2023 5

Background Elastic Search• Shards : ES allows partitioning of an index into different shards

• However, once you create an index, the number of shards cannot be changed• Shards are a unit and may not be divided; Shards in turn are made of segments which are files into which data is indexed• For a query against a shard, all segments need to be searched• Many shards may be hosted by a single node; quite a few resources(all the caches) that consume JVM memory are shared across shards• Shards may be moved to other nodes, this is called Rebalancing

• Aliasing • ES does allow you to create an alias over multiple indices to handle the problem of growth of shards beyond limits of a single node• By using the notions of aliasing, we can flip the alias for an index to a new ES index that is built for an account and retire the old ES index

• Replicas : Each ES shard can have multiple replicas which serve the purpose of HA and providing read scale out (3 is min recommended)• Routing : ES allows you to routing documents based on some other entity to shards, this can help collocate documents belonging to an

account/project/repo to a single shard• A search query which requires multiple shards to be searched will invoke aggregation logic at the query node• The cost of aggregation vs the benefits of search parallelization across multiple shards effect the overall level of benefits provided by sharding versus

routing • By default(no routing), ES routes based on the Hash ID of the document and a query against an index is routed to all shards belonging to that index• The flipside of routing is that all documents for the same route will go into the same shard, so shards may get to be of uneven sizes

• Shard Allocation policy seems to work fairly well, the following data from prod shows that though shards were unequal in sizes, the overall size of shards on a node was roughly equal

• Here is a link to some data from Prod where currently we are doing repo level routing• PROD Shards (12/08) (Web view)

onenote:https://microsoft.sharepoint.com/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki/ES%20DRI.one#PROD%20Shards%20(12/08)&section-id=%7B75085BFE-5DCD-4D83-83FA-7DACB235BE76%7D&page-id=%7B8AA5E94C-65E5-4B3A-87A8-69150D1062C6%7D&end

https://microsoft.sharepoint.com/teams/DPT/_layouts/OneNote.aspx?id=/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki&wd=target(ES%20DRI.one%7C75085BFE-5DCD-4D83-83FA-7DACB235BE76/PROD%20Shards%20(12//08/)%7C8AA5E94C-65E5-4B3A-87A8-69150D1062C6/)

05/01/2023 6

Key ES Experiments (1)• Rebalancing Vs Reindexing

• Re-indexing is pretty CPU/Memory/IO intensive not only for ES but the indexing pipeline as well, Rebalance is IO intensive• Shards Rebalance (Web view)• To the extent possible, we want to avoid cold start kind of re-indexing and rely on rebalance of shards to support growth

• Optimal Shard Size Experiments• Optimal Shard Size for a given node capacity is defined as the biggest shard that is hosted on a node for reasonable or optimal (under ~500 msec) query perf at a

query rate of (4 to 6 queries per sec) while keeping resource usage on that node under control.

• Optimal Shard Experiments For A6, 10-12 G• Optimal shard A6 (Web view)

• Note that beyond the Optimal Shard Size the query performance is affected really adversely. Mostly the nodes were found to be memory constrained beyond this size

• For A5, 5-6 G• Optimal shard A5 (Web view)

• Optimal Shard Size Experiments on A6 yielded a size twice as big than A5 (See Appendix for details on the experiments)• Query Performance tests yielded a response time of ~250 ms (ES Time) for a search rate of around (~4 to 6) requests per sec [two threads send queries]

• Optimal Shard Size Experiments help tell us two things:• How much we might want to initially fill the shards on nodes while leaving capacity for new data growth?• When we need to start worrying about performance of a query against a shard?

Shard Size Average Query Time Max 90% Max

14G 271ms 519ms 28s

12G 84ms 257ms 14s

8G 28ms 243ms 22s

onenote:https://microsoft.sharepoint.com/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki/Investigations.one#Shards%20Rebalance&section-id=%7BC6FA91E4-1843-4288-AF05-FA2ED2A0811C%7D&page-id=%7B17B50114-90D9-4F43-A8AC-B159CC8F93C9%7D&end

https://microsoft.sharepoint.com/teams/DPT/_layouts/OneNote.aspx?id=/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki&wd=target(Investigations.one%7CC6FA91E4-1843-4288-AF05-FA2ED2A0811C/Shards%20Rebalance%7C17B50114-90D9-4F43-A8AC-B159CC8F93C9/)

onenote:https://microsoft.sharepoint.com/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki/Investigations.one#Optimal%20shard%20A6&section-id=%7BC6FA91E4-1843-4288-AF05-FA2ED2A0811C%7D&page-id=%7BF35CC28D-7D70-4589-B57E-F00561FB1DD3%7D&end

onenote:https://microsoft.sharepoint.com/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki/Investigations.one#Optimal%20shard%20A6&section-id=%7BC6FA91E4-1843-4288-AF05-FA2ED2A0811C%7D&page-id=%7BF35CC28D-7D70-4589-B57E-F00561FB1DD3%7D&end

https://microsoft.sharepoint.com/teams/DPT/_layouts/OneNote.aspx?id=/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki&wd=target(Investigations.one%7CC6FA91E4-1843-4288-AF05-FA2ED2A0811C/Optimal%20shard%20A6%7CF35CC28D-7D70-4589-B57E-F00561FB1DD3/)

onenote:https://microsoft.sharepoint.com/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki/Investigations.one#Optimal%20shard%20A5&section-id=%7BC6FA91E4-1843-4288-AF05-FA2ED2A0811C%7D&page-id=%7BF7860988-F2D6-46F0-9D0D-B9AFA5C81F65%7D&end

https://microsoft.sharepoint.com/teams/DPT/_layouts/OneNote.aspx?id=/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki&wd=target(Investigations.one%7CC6FA91E4-1843-4288-AF05-FA2ED2A0811C/Optimal%20shard%20A5%7CF7860988-F2D6-46F0-9D0D-B9AFA5C81F65/)

05/01/2023 7

Key ES Experiments (2)• Routing

• Shard Allocation policy seems to work fairly well, the following data from prod shows that though shards were unequal in sizes, the overall size of shards on a node was roughly equal

• Here is a link to some data from Prod where currently we are doing repo level routing• PROD Shards (12/08) (Web view)

• For very small accounts, with few repositories, account level routing seemed to perform better• Shards Routing Tests (Web view)• For repositories of 25 MB, going from 2 to 9 routes degraded query performance by 30%.

• As shards are increased, increased nodes to prove to be helpful• As Shards Tests (Web view)• For 128 shards, adding more nodes was helpful for large accounts (with more than 400 repos)• On 3 nodes, having large number of shards (120 as opposted to 6) didn’t help performance

• Bottomline: Currently, we are implementing Routing to be configurable• Current thinking is that Account Level Routing might be a good thing for small accounts and Project/Repo Level Routing beneficial for larger

accounts• We intend to tune and test the impact of routing in Prod

Total Primary Size Node

15020 D0115730 D034923 D02

onenote:https://microsoft.sharepoint.com/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki/ES%20DRI.one#PROD%20Shards%20(12/08)&section-id=%7B75085BFE-5DCD-4D83-83FA-7DACB235BE76%7D&page-id=%7B8AA5E94C-65E5-4B3A-87A8-69150D1062C6%7D&end

https://microsoft.sharepoint.com/teams/DPT/_layouts/OneNote.aspx?id=/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki&wd=target(ES%20DRI.one%7C75085BFE-5DCD-4D83-83FA-7DACB235BE76/PROD%20Shards%20(12//08/)%7C8AA5E94C-65E5-4B3A-87A8-69150D1062C6/)

onenote:https://microsoft.sharepoint.com/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki/Investigations.one#Shards%20Routing%20Tests&section-id=%7BC6FA91E4-1843-4288-AF05-FA2ED2A0811C%7D&page-id=%7B9347CCE9-A9C9-4CD6-BE22-2AB3AF314C97%7D&end

https://microsoft.sharepoint.com/teams/DPT/_layouts/OneNote.aspx?id=/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki&wd=target(Investigations.one%7CC6FA91E4-1843-4288-AF05-FA2ED2A0811C/Shards%20Routing%20Tests%7C9347CCE9-A9C9-4CD6-BE22-2AB3AF314C97/)

onenote:https://microsoft.sharepoint.com/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki/Investigations.one#Shards%20Tests&section-id=%7BC6FA91E4-1843-4288-AF05-FA2ED2A0811C%7D&page-id=%7BB549DE01-006F-421F-ACB6-906388BD8573%7D&end

https://microsoft.sharepoint.com/teams/DPT/_layouts/OneNote.aspx?id=/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki&wd=target(Investigations.one%7CC6FA91E4-1843-4288-AF05-FA2ED2A0811C/Shards%20Tests%7CB549DE01-006F-421F-ACB6-906388BD8573/)

05/01/2023 8

Challenges• Determining the size of the Account Data that needs indexing

• For cold start indexing we would like the indexes to large land in a good place and not require re-indexing or cause too much rebalancing impacting performance for too many other accounts

• We certainly need to avoid domino situations where rebalancing and need for re-indexing(worse case) kick in as we are cold indexing the data in the accounts

• As we optimize our overheads, ES time can be a significant portion of the end to end time. It is important that that is not compromised by bad placements or provisioning decisions

• Ideally, if we knew the size of data that needs to be indexed for an account and had some sense around how it has been growing, it would help us to minimize disruptions in the form of rebalance/re-indexing etc.

• However, even if the former is possible, the knowing the latter (growth patterns) is not feasible• Also, how a new account grows is anybody’s guess

• Our approach is as follows:1. We will assume Large Accounts are well known2. We will follow a Batched approach to cold start indexing while onboarding customers3. And we would like an index provisioning with a scheme that is robust to any misclassifications around its

size

05/01/2023 9

Scale Unit Model for ALM Search ES Cluster• We want to support the notion of scale units as far as data nodes of ES go to

• Provide for hardware isolation• A Scale unit hosts data for a set of accounts with provision for growth

• A Scale unit is configured to yield reasonable utilization• By having a unit of deployment, we can define telemetry and alerts around this unit• Finally by hosting multiple indices within a Scale Unit, it allows for a finer level of isolation

• For instance, an index corruption or non availability issue will impact a set of customers and not all customers

• Read Scale out Requirements• For Read Scale, given the Search Rate we are witnessing, 2 Replicas seems like an overkill

• Search Rate on mseng is more like .02 Requests per sec, this is very low!• However, 3 Replicas are required for HA with tolerance for one fault occurring concurrently with Updates across Update domains• For these replicas to be spread across 3 nodes, we like to start a unit with 3 nodes

• Definition of an ES Scale Unit• We will define a ALM Search ES Scale Unit as a set of 3 data nodes of the same kind (A5 or A6?)

• With A5, we notice that cold start indexing is strained [Currently, we do not have continuous indexing, so this puts undue strain]• It remains to be seen if A5 is taxed with continuous indexing rates as well

• The scale unit will always start with 3 nodes • A scale unit can grow to a maximum of Ni * S * R nodes, where Ni is the number of indexes, S the number of shards, and R the number of

replicas(including primary)• Data Nodes may be added/removed from a ES SU depending on the activity within the accounts hosted in that SU

05/01/2023 10

Provisioning Strategy• We will have a dedicated for account data that we classify as large

• Note it takes a fair amount of time to index from cold (40 GB on prod is roughly taking 4 hours on 3 A6 JAs)

• PROD Run (12/15) - Day Time (Web view)• 40 GB on prod is taking roughly about 4 hours• Currently the feed rate to ES is a bottleneck

• Also, for some FPAs we can expect significant growth over time and potentially some very large repositories

• We define two types of Indices• Dedicated Index : Contains data only for a single account• Shared Index : Contains data for many accounts

• We will maintain a list for Accounts that will be provisioned a Dedicated Index

onenote:https://microsoft.sharepoint.com/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki/ES%20vTeam.one#PROD%20Run%20(12/15)%20-%20Day%20Time&section-id=%7B75085BFE-5DCD-4D83-83FA-7DACB235BE76%7D&page-id=%7B451EA41F-A9CC-4EDE-8986-D5228E6528E4%7D&end

https://microsoft.sharepoint.com/teams/DPT/_layouts/OneNote.aspx?id=/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki&wd=target(ES%20vTeam.one%7C75085BFE-5DCD-4D83-83FA-7DACB235BE76/PROD%20Run%20(12//15/)%20-%20Day%20Time%7C451EA41F-A9CC-4EDE-8986-D5228E6528E4/)

05/01/2023 11

Initial ES SU and Index Configuration• ES doesn’t limit the size of the shards, but we know once the shard size exceeds the optimal limit, query performance can degrade precipitously

• We need to decide how much we should load a scale unit before provisioning a new one• In our experiments, we used multiples of Optimal Size Shard to test for resource utilization and performance with this initial load• For a Scale Unit of 3 Nodes, we try a factor of 3X which means the initial load is 3 times Optimal Shard Size

• For A6, this would imply an Initial Load of 36GB

• We refer to this factor as ES Scale Unit Initial Size Multiplier• Then we adjust the size of the primary shards so that at least 3 or 4 of those will comfortably fit in memory• As an example, ES SU’s with an ES Scale Unit Factor of 3X can look as follows

• Shared Index Scale Unit ( E.g. With a Multiplier of 3X Optimal Shard Size)• As an example, consider 2 Indices, with 6 Replicas and 2 Replicas on each SU • Each Index will have 6 Primary Shards and each shard will have 2 Replicas, so we have a total of 18 Shards• Initial Load : 36 GB (4 Primary Shards of 3GB each node)• Max Capacity(Real Data) : 144 GB (6*12GB), Overall Data Capacity Including Replicas 432 GB (18 * 12GB)• Max Number of Nodes : 36

• Dedicated Index Scale Unit (With a Multiplier of 3X Optimal Shard Size)• Each Account has its own index in such scale units• 1 index, 12 shards, 2 replicas• Initial Capacity : 36 GB (4 Primary Shards of 3GB each node)• Max Capacity(Real Data) : 144 GB, Overall Data Capacity Including Replicas 432 GB• Max Number of Nodes : 36

• We will call this factor the “ES Scale Unit Initial Size Multiplier”• If this is 3X, then one third of the data indexed to a node can be loaded in memory for efficient query performance right after cold start indexing

05/01/2023 12

Proposed Scale Unit Definitions • Can we have a higher ES SU Initial Size Multiplier?

• Some accounts may be dormant as far as search is concerned• As long as we can load the data required to serve the requests at the expected query rate is within Optimal Shard Size, we should be OK• By using the notion of shards and routing, we tried a Scale Factor of 6X, which puts the initial load as 72 GB

• Only 3 to 4 of shards will need to be in memory to serve the query requests• Capacity planning 2 index 12 shards (Web view)• Ninety percent queries within 1.4 sec

• This was the worst case, because each query was fired to a different account so there was no MRU(Most Recently Used) characteristics that could be leveraged

• Why would we want a higher Multiplier?• This helps us keep the COGs in control• We want good utilization with initial onboarding• Define good utilization as what “Serves 90% of queries within 500 ms at a request rate of 3 to 4 per sec while Continuous Indexing is in

progress while hovering around 7o to 85 percent of resource (CPU/Mem) Usage”• Example of a Shared Index Scale Unit (With a Multiplier of 6X)

• 2 Indices, 12 Shards, 3 replicas(including primary) • Initial Load : 72 GB (say I have, 8 Primary Shards of 3 GB per node) • Max Capacity(Real Data) : 288 GB, Overall Data Capacity Including Replicas 864 GB• Max Number of Nodes : 72 [We don’t expect it to grow over 36 nodes !]

• Dedicated Index Scale Unit (With a Multiplier of 6X)• 2 Indices, 12 Shards, 3 replicas (including Primary)• Same as above

onenote:https://microsoft.sharepoint.com/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki/Investigations.one#Capacity%20planning%202%20index%2012%20shards&section-id=%7BC6FA91E4-1843-4288-AF05-FA2ED2A0811C%7D&page-id=%7B474C502A-1DC8-4953-A676-2A8DDFE924A0%7D&end

https://microsoft.sharepoint.com/teams/DPT/_layouts/OneNote.aspx?id=/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki&wd=target(Investigations.one%7CC6FA91E4-1843-4288-AF05-FA2ED2A0811C/Capacity%20planning%202%20index%2012%20shards%7C474C502A-1DC8-4953-A676-2A8DDFE924A0/)

05/01/2023 13

Proposed Scale Unit Definitions (2)• Why not an even higher Multiplier?• SU’s with a higher Scale Factor and a high shard count could look like this

• Shared Index Scale Unit (With a Multiplier of 30X times Optimal Shard Size)• 2 Indices, 90 Shards, 3 replicas(including primary) • Initial Load : 1080 GB (60 Primary Shards of 3 GB per node) • Max Capacity(Real Data) : 2160 GB (~2.2TB), Overall Data Capacity Including Replicas 6480 GB (~ 4.4 TB)• Max Number of Nodes : 540

• 4 of 60 shards could serve requests, if the searches are well distributed across accounts then more swaps are likely to happen• For our needs, this will hardly be an SU and provide us with zero isolation• Also, shards don’t come for free

• Certain Per Shard Resources like segment buffers, node filter cache which are per shard get an equal share from what a node is allowed to consume for these resources from the JVM heap

• To put this into perspective:• Current Index for mseng : 16 GB, Index for all the other 10 accounts : 24 GB, We expect the data for the next 10,000 accounts to be

around 132 GB or not more than 150 GB.• The reSearch index for Windows was about 70GB. “I was just looking at the HPC cluster we have. Currently we have four HPC nodes,

with the first two being dedicated to the current environment, and the last two being dedicated to the legacy environment. Each pair of HPC nodes are configured to expose 20 cores. Which I believe was done as the DB was bottlenecked. The four first jobs, that have a darker color would run on all nodes dedicated to the respective environments, while the merge and publish would run on the first node of each environment.”

• GITHUB : 2 Billion Docs, 5 Requests Per Sec, Hosts on SSD

05/01/2023 14

ES Scale Unit Initial Size Multiplier• Higher Multiplier vs Lower Multiplier

• ES Scale Unit Multiplier tells how much we will load an ES SU before we provision a new SU• A higher factor implies higher initial data storage load for the SU at the time of provisioning

• This implies that when an SU is provisioned initially with 3 nodes, a SU Initial Size factor of 6X will index 72 GB as opposed to 36 GB (with a SU Initial Size Factor of 36GB)

• The eventual capacity of the SU will depend on the Index Configuration(Shards and Replicas)• We tested the worst case performance with the 6X factor and it seemed acceptable in the worst case

• Currently we intend to go with a 12X factor, an SU with 3 nodes will be filled till 144GB before the next SU needs to be provisioned

• Note that we are not trying to prevent swapping completely• Some swapping is going to happen• We are trying to ensure that query performance doesn’t suffer due to resource contention • We are trying to enable queries for multiple accounts to be feasible without degrading query

performance

05/01/2023 15

ALM Search SU to TFS SU Mapping

• A single ALM Search Service SU can potentially serve multiple TFS Service SUs in the same data center

05/01/2023 15

ALM Search Service SU0 ALM Search Service SU2

TFS SUTFS SU

TFS SUA

TFS SUB

ALM Search Service SU1

TFS SUD

ALM Search Service SU3

TFS SUC

Data Center 1 Data Center 2 Data Center 3

05/01/2023 16

ALM Search Service SU X

ES Data Nodes

Proposed Model – A peek

• Each ALM Search Service SU internally, within its ES deployment, will contain multiple ES Data Node Scale Units

• Currently, we don’t isolate query and ingestion nodes but we do separate out Master Nodes

• Potentially, we can separate out Query Nodes/Ingestion Nodes by artefact type as well05/01/2023 16

ES - SU0Name : SU0

Type: code-sharedNodeType: A6

……

Query+Ingestion

Nodes

Master Nodes

Index

ES-SU1Name : SU0

Type: code-dedicatedNodeType: A6

Index

ES-SU2Name : SU0

Type: code-sharedNodeType: A6

SU10Name : SU10

Type: project-sharedNodeType: A2

Index

AT

Job Agents

05/01/2023 17

Onboarding Strategy• Tune the configuration for the ES Scale Unit in the INT environment• Onboarding Logic

• If an account is on the Dedicated List, it will gets its own dedicated index • Accounts over 15GB will be on the Dedicated Index List. • There might be other reasons for an account to be on the Dedicated Index List.• This seems pretty liberal but it is driven by the fact that currently it is taking around X hours to index 20 GB of data

• PROD Run (12/15) - Day Time (Web view)• So ideally we would like to really avoid cold start re-indexing this data• Throughput for each of the Clone, Crawl, Parse and Feed (Web view)

• Even with routing, we expect some potentially large repos in such accounts• These will be safer in a single index scale unit• Can we get the repo size data?

• Do we determine this dynamically?• No, we will largely have this list upfront and we will tune the large index factor with our 10,000 account experiment

• Shared Index Provisioning• For Shared Index SU, 2 indices are maintained. We try to fill them up equally

• When the SU reaches the configured Initial Load Limit, a new shared index SU will be provisioned• Repo Level Routing will be used for Dedicated Indices and Account Level Routing used for Shared Indices• We will onboard in batches

• This will help us tune the configuration further if necessary

• Next Phase : Define all the telemetry and monitoring required for Index Management

onenote:https://microsoft.sharepoint.com/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki/ES%20DRI.one#PROD%20Run%20(12/15)%20-%20Day%20Time&section-id=%7B75085BFE-5DCD-4D83-83FA-7DACB235BE76%7D&page-id=%7B451EA41F-A9CC-4EDE-8986-D5228E6528E4%7D&end

https://microsoft.sharepoint.com/teams/DPT/_layouts/OneNote.aspx?id=/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki&wd=target(ES%20DRI.one%7C75085BFE-5DCD-4D83-83FA-7DACB235BE76/PROD%20Run%20(12//15/)%20-%20Day%20Time%7C451EA41F-A9CC-4EDE-8986-D5228E6528E4/)

onenote:https://microsoft.sharepoint.com/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki/FAQ.one#Throughput%20for%20each%20of%20the%20Clone,%20Crawl,%20Parse%20and%20Feed&section-id=%7B1772137F-06D3-4242-9B42-F96277C1F558%7D&page-id=%7BB6F6AD5B-4439-49FD-ADCD-A9D07A8F804F%7D&object-id=%7B945202CE-FFFF-0CE9-173A-1EFC2ED6B620%7D&10

https://microsoft.sharepoint.com/teams/DPT/_layouts/OneNote.aspx?id=/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki&wd=target(FAQ.one%7C1772137F-06D3-4242-9B42-F96277C1F558/Throughput%20for%20each%20of%20the%20Clone,%20Crawl,%20Parse%20and%20Feed%7CB6F6AD5B-4439-49FD-ADCD-A9D07A8F804F/)

Mohammad Imran Siddique

https://microsoft.sharepoint.com/teams/DPT/_layouts/OneNote.aspx?id=%2Fteams%2FDPT%2FShared%20Documents%2FCode%20Search%2FDocuments%2FCodeSearchWiki&wd=target%28ES%20vTeam.one%7C75085BFE-5DCD-4D83-83FA-7DACB235BE76%2FPROD%20Run%20%2812%5C%2F15%5C%29%20-%20Day%20Time%7C451EA41F-A9CC-4EDE-8986-D5228E6528E4%2F%29onenote:https://microsoft.sharepoint.com/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki/ES%20vTeam.one#PROD%20Run%20(12/15)%20-%20Day%20Time&section-id={75085BFE-5DCD-4D83-83FA-7DACB235BE76}&page-id={451EA41F-A9CC-4EDE-8986-D5228E6528E4}&end

05/01/2023 18

Approach so far• The design focus is for a low cost scale out model based on Scale Unit

approach with an additional level of isolation by having multiple indices

• Note that ES Compute is currently the largest component of our cost• We haven’t done the investigates around Azure D series VMs, Premium

Storage• Since ES performance is memory intensive, move to SSD will help diminish the overheads

of swapping• Potentially cost effective overall -> will need more analysis

05/01/2023 19

Alternative approachesApproach Pros Cons

Single Index (Large Number of Shards, No Routing)

Simple, set and go! Zero IsolationPoor performance for small accountsLarge number of shards are not free

Single Index (Large Number of Shards, Repo Level Routing) GITHub uses this

Works for GITHub!Doesn’t work well for large variations in account size

Zero Isolation - Corruption can cause downtime for allPoor Project Level Search Query Performance when too many repos exist

Dedicated Index Per Account(Repo Level Routing)

Good Isolation Results in Index Explosion Too many small shards on a single node

Single Index with Controlled Routing Potential for limiting shard overheads for small accounts

Monitoring intensiveDepends on re-indexing more than on rebalancingDidn’t work for Really Large Accounts

Fixed SU approach (with different SUs configured for different shard density)

Uses Re-indexing vs Rebalancing

Dynamic SU (Proposed) SU starts with 3 nodes but can growReasonable Isolation characteristics can be achieved

05/01/2023 20

Understanding the COST Model

ES SU Unit Initial Size

Initial Load

Number of Indices Number of Shards

Total ES SU Capacity

Initial Shard Size after Cold Start

Initial Pricing($/mo)

Price per GB per month(internal)

3X 36GB 2 6 144GB 3GB $762/mo 21.16

6X 72GB 2 12 288 GB 3GB $762/mo 10.58

12X 144GB 2 12 288GB 6GB $762/mo 5.29

24X 288GB 4 12 576 GB 6GB $762/mo 2.65

24X 288GB 6 12 864GB 4GB $762/mo 2.65

48X 576 GB 8 12 1152 GB 6GB $762/mo 1.32

48X 576 GB 12 12 1728GB 4GB $762/mo 1.32

96X 1152 GB 24 12 3456 4GB $762/mo .66

• Assume a shard capacity of 12GB for A6 nodes, the initial shard size reflects the remaining capacity in the ESDN SU.

• As a frame of reference, SQL Premium P3 cost for up to max of 500 GB of is Rs $163/day ($4890 per mon) (~$9.78 per GB per mon)

• Price of A6 is around $254/mo

05/01/2023 21

COGS Finalization and Scale Testing• Scale Testing Telemetry

• ~40TB of source code – I don’t know how much of it is in active accounts. I assume a significant fraction is as most inactive accounts don’t have any data in them.

• ~75K active about 22% of all service activity is on MSENG.• Accounts using source control – probably ~50K of them are “engaged” (meaning reasonably frequent use).• Peak version control usage in an hour is ~20K account (~30K users).• About 22% of all service activity is on MSENG

• Drive Scale Testing based on these :• Query Rate : If we have the query rate for mseng, the number of users, and the number of users for all the accounts, and if we

assume that other users will be as active as mseng users, then we can estimate a query rate that we could test with.• Alternately, we have the query rate and if we assume that the search activity pattern is aligned to the service activity in terms of

load, then we know that other users will contribute the remaining 78%.• Number of Accounts for a 500GB ES DN SU: (75K/40)/2 ~ (75*25)/2 ~ 1000 • Peak Usage: If we assume that for source control, the activity at its peak is spread across 20K accounts, and that it would be the

same for Search, then 20K accounts must be searched in an hour interval. For an SU with 1000 accounts, that maps to about 270 accounts searched in an hour(4.5 accounts in a minute).

• Account Context Switches: Right now, we don’t have enough telemetry for this.

05/01/2023 22

Other Scale Out Evaluations Planned• Verify the proposed ES Scale Unit Initial Size Multiplier over the optimal buffer size

• Improved Query Benchmark tests that are multi-account enabled• We will need to use the query distribution telemetry from FPA accounts to better understand the usage patterns• Cold Start Indexing Performance

• Adding Nodes during Cold Start• Adding Replicas post Cold Start

• Improve Index Metrics in prod• Segment Count, Shard Swapping Behavior

• Custom Analyzer Performance Investigations and Improvements• Improving Feed Rate

• Currently the overall indexing rate is limited by feed rate into the ES, here is some data from prod• Throughput for each of the Clone, Crawl, Parse and Feed (Web view)

• Increase the overall feedrate using the SU approach• This will allow better performance for queries

• Scale out for query/ingestion nodes• What telemetry do we watch for apart from basic CPU/Memory/IO usage ?• When are these resource constrained?



https://microsoft.sharepoint.com/teams/DPT/_layouts/OneNote.aspx?id=/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki&wd=target(FAQ.one%7C1772137F-06D3-4242-9B42-F96277C1F558/Throughput%20for%20each%20of%20the%20Clone,%20Crawl,%20Parse%20and%20Feed%7CB6F6AD5B-4439-49FD-ADCD-A9D07A8F804F/)

05/01/2023 23

References• http://www.elasticsearch.org/blog/performance-considerations-elasti

csearch-indexing/• http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/c

apacity-planning.html• http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/s

cale.html• http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/d

eploy.html• http://www.elasticsearch.org/content/uploads/2013/11/es_caseStud

y_github.pdf• https://github.com/blog/1397-recent-code-search-outages

http://www.elasticsearch.org/blog/performance-considerations-elasticsearch-indexing/

http://www.elasticsearch.org/blog/performance-considerations-elasticsearch-indexing/

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/capacity-planning.html

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/capacity-planning.html

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scale.html

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scale.html

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/deploy.html

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/deploy.html

http://www.elasticsearch.org/content/uploads/2013/11/es_caseStudy_github.pdf

http://www.elasticsearch.org/content/uploads/2013/11/es_caseStudy_github.pdf

https://github.com/blog/1397-recent-code-search-outages

05/01/2023 24

Appendix


talk about plan for moving from one index provision model to another model


talk about moving from current model to SSD and how the proposal can still hold

05/01/2023 25

Other Tunings for Optimal Size Shard Experiment• Tuning

• Translog size • Low translog sizes can force frequent commits during bulk indexing, affecting bulk

indexing performance• Translog investigations (Web view)

• index.translog.flush_threshold_ops• No experiments as yet

• Lucene buffer size• Increasing the buffer size should results in fewer segments and lesser merging??• Java Settings investigation (Web view)

• Refresh Rate• Set to -1 for bulk indexing

onenote:https://microsoft.sharepoint.com/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki/Investigations.one#Translog%20investigations&section-id=%7BC6FA91E4-1843-4288-AF05-FA2ED2A0811C%7D&page-id=%7BA88D6E4D-36D2-43BA-A4E9-14C77F701421%7D&end

onenote:https://microsoft.sharepoint.com/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki/Investigations.one#Translog%20investigations&section-id=%7BC6FA91E4-1843-4288-AF05-FA2ED2A0811C%7D&page-id=%7BA88D6E4D-36D2-43BA-A4E9-14C77F701421%7D&end

https://microsoft.sharepoint.com/teams/DPT/_layouts/OneNote.aspx?id=/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki&wd=target(Investigations.one%7CC6FA91E4-1843-4288-AF05-FA2ED2A0811C/Translog%20investigations%7CA88D6E4D-36D2-43BA-A4E9-14C77F701421/)

onenote:https://microsoft.sharepoint.com/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki/Investigations.one#Java%20Settings%20investigation&section-id=%7BC6FA91E4-1843-4288-AF05-FA2ED2A0811C%7D&page-id=%7BE110E7B3-256A-4085-991C-754A95DCBEE2%7D&end

https://microsoft.sharepoint.com/teams/DPT/_layouts/OneNote.aspx?id=/teams/DPT/Shared%20Documents/Code%20Search/Documents/CodeSearchWiki&wd=target(Investigations.one%7CC6FA91E4-1843-4288-AF05-FA2ED2A0811C/Java%20Settings%20investigation%7CE110E7B3-256A-4085-991C-754A95DCBEE2/)

05/01/2023 26

Routing Analysis• Here an evaluation for Routing; we will implement this to be configurable

Routing Scheme Pros Cons

Repo Level Allows us to spread large repository data across shards, useful for large accounts with large repositories.

Useful if repo level searches are optimized for this routing

This will not ensure that multiple large repositories are not mapped to the same shard.

Project Level Most searches are project level, by having project level shards you can tune the query to only hit the relevant shards

Very large projects may result in creation of big shards [TBD : max project size]

Will not ensure that multiple large projects are not mapped to the same shard.

Account Level Seems like a reasonable approach for small accounts that are indexed to the same shared index.

Useful for small accounts, where account level search can be targeted only to the specific shards

Will not ensure that multiple larger accounts in a shared index do not get mapped to the same shard

Will not ensure that dormant and active account data is not mixed into the same shard.

No Routing This is more likely to result in equal sized shards though ES seems to do a good job of balancing the total shard sizes across nodes

When shards are small (or under optimal), aggregation costs will override the benefits

With a large number of shards all hosted on the same machine; this can result in more contention of resources between the shards

05/01/2023 27

Key Provisioning Actions (work in progress)• When do we add a new node to an SU?

• When the total shard size on the Scale Unit is 90%(?) of (Number of Nodes in SU * Optimal Shard Size) • If the CPU/Memory Usage is reaching over 90% across all the data nodes in the SU, then

• When do we add a new Shared Scale Unit ?• When the size of Index in SU is around 90% of (Optimal Shard Size for SU * Scale Unit Initial Size Factor)

• When do we add a new Single Index Scale Unit?• These are known upfront so we can

• When do we remove a node from a SU?• When the Resource Utilization in steady state is less than 50% over a period of time, we can consider removing a node from the SU

• When do we move an index out of the shared index Scale Unit• If shard sizes grow over the optimal sizes, we might need to move an account to have its own index

• What do we do if we find nodes in a scale unit to be unbalanced• What do we do if we find a large account in a shared scale unit grow • If an account in a shared index scale unit grows such that it deserves its own index what will we do

• Monitor shards for that account and when they comes close to 70 % of optimal size, put it on watch list• Leave it till it is really a problem for the scale unit?• Move the smaller of the data out of the scale unit ?• If we do account level routing, this may cause one shard to go over the optimal size. If we do repo level routing and all the growth comes from that

one repot then this may happen


some more:


what if one of index size grows more than half of optimal index size for the scale unit


what happens to deleted document counts impact on shard size due to continous indexing

Babu Krishnaswamy

Again - Is size the only factor to add a new node? Not the activity on the shards?

Sunita Shrivastava

We believe that activity against the same shard will not require adding a new node. However, increased activities across all shards and a higher request rate could warrant adding a new node so that the replicas

05/01/2023 28

Constraints • ALM Service Indexing Constraints

• The indexing pipeline processes multiple repositories in parallel with throttling in place• The size of the entire account data is not known when the data for the first repository in the account is be ready for cold

start indexing • The number of files that need to be indexed, in case of code search in all the repositories for the account is not known

when the index needs to be provisioned during cold start indexing• We looked at a couple of options, mostly heuristics, to see if we could get an estimate of the account index size :

1. Number of Repos per account are known• Extrapolate

2. Size of the Packed File data• It contains history• May be invalid if there is a lot of binary content• Incase, we err on the smaller size, it is not a big issue

3. Get overall repo size from Prod through a one time query to TFS that runs across all repos4. Upfront determine the size of the Accounts and use that

• We will be indexing the top 10,000 active GIT accounts using the current “index per account” scheme• This will yield interesting telemetry

• Other Considerations:• Many small accounts, Relatively few large accounts• Number of Accounts with little GIT content but lots of TFVS content unknown for now

Biju Venugopal

It seems we have telemetry for TFVC from codelens team

Babu Krishnaswamy

What is the purpose of this slide?

Babu Krishnaswamy

It seems to talk about some challenges and some facts. What we want to convey on this one?

Sunita Shrivastava

It is mostly for challenges...but I agree the slide needs to be cleaned up.

05/01/2023 29

Some observations from the Prod deployment

Size of Mseng: 16 GB primary, Time to Index Mseng : Combined Size of other Accounts : ~24 GB Primary DataSearch Rate : .2 per sec (Max), .05 per sec (Avg)

health index pri rep docs.count docs.deleted store.size pri.store.sizegreen fcsamerica 3 2 90371 42219 9gb 3.2gb green capservice 3 2 559611 143413 7.6gb 2.3gb green monacotools 3 2 41164 6870 1.4gb 461.7mb green jet-tfs 3 2 12623 6992 1.7gb 699.9mb green biazure 3 2 69956 9884 2.6gb 840.6mb green domoreexp 3 2 51840 15723 2.9gb 1gb green isipl 3 2 38669 8600 1.4gb 496.1mb green rmtest 3 2 1458 0 1.1mb 399.5kb green icfiwsoco 3 2 6044 3406 192.8mb 57.4mb green mygps 3 2 62338 9472 1.2gb 409.8mb green gkfx 3 2 22228 9115 1gb 401.9mb green dlwteam 3 2 36016 11388 3.8gb 1.2gb green mseng 3 2 1123206 371121 49.8gb 15.8gb green keller-org2 3 2 8011 2736 334.1mb 132.9mb green seducsp 3 2 103417 41952 2.6gb 1gb green olsapps 3 2 9974 1782 891.3mb 280.8mb green almsearchprod 3 2 500 13 4.2mb 1.3mb green .marvel-kibana 5 1 1 0 6.5kb 3.3kb green sysinternals 3 2 7010 1618 386.6mb 133.1mb green onedrive 3 2 19835 7769 803.6mb 261.7mb green microsoft 3 2 283733 64322 21.9gb 6.9gb green mpsit 3 2 121904 46031 5.9gb 2.2gb green tescomobile 3 2 19480 6703 1.1gb 389.1mb green jci-be-eng 3 2 11231 4307 1.3gb 464.7mb

Satish Kumar

I would say let us take a day average. This data is not right. In general If I take 24 hrs time, query rate is around 0.2.

05/01/2023 30

What if (Work in progress)Question Impact Chance of

occurrenceWhat if the shard size grows more than optimal size Medium High

What if one of index size grows more than half of optimal index size for the scale unit Low High

What if the query rate to one of the data nodes (set of shards) goes high High High

What if the query rate goes very low and CPU utilization on the nodes in a SU is not very high Low Medium

What if the large account shrinks to size lesser than min size for large account Low Medium

What happens to large number of deleted document counts impact on shard size due to continuous indexing

Medium Medium

What if large/shared index size grows beyond the max index size we can support High Low

What if around 12 small accounts grow in one of the shared index to 90% of min_large_account size each

High Low

What if one of shards in both active shared indexes (in active SU – where new accounts go) gets corrupted

High Low

What if an active account is mixed in with dormant accounts in the same shard ?

Babu Krishnaswamy

This is great. We should have clear Alerts and resolution TSGs for each of these I guess.

05/01/2023 31

“What if” Responses (In progress)• When the shard size for a shard becomes more than the optimal size

• The way this can be handled is• Upgrade the nodes ?• Determine the Accounts that contribute data to this shard, determine if any of them fits the

definition of a large account• Reindex the data for that index to a Single Account Index

• When one of the nodes of the scale unit is more hot than others• When a large index becomes small?

• Drop the number of nodes down to the required minimum• We will not move it unnecessarily• We can add a new index to the SU?• When continuous Indexing results in Deleted Document Count going up