supporting android-based platform development in samsung
DESCRIPTION
Samsung drives and sees success within the smart phone world. Perforce is their platform of choice for Continuous Delivery for their world-wide development teams because of it's scalability and distributed set-up. Learn all about the software engineering environment behind Samsung's cutting edge smart phone products.TRANSCRIPT
#
Kanggil LeeSenior Software EngineerSamsung Electronics
Perforce Deployment for all Mobile Projects in Samsung
#
Kanggil LeeSenior S/W EngineerSamsung ElectronicsKanggil Lee is the senior S/W Engineer in Samsung Electronics. He administrates ALM systems, especially Perforce servers in Mobile Communications Business.
He is in charge of deploying Perforce to globally distributed Samsung's R&D centers as well as HQ. He managed to configure the world largest transactional and 24/7 sleepless Perforce server. Before joining Perforce team, he used to work as an administrator of IBM Rational products in Samsung.
#
• Perforce at Samsung• Optimizing Perforce Replication• Lockless Reads• Monitoring and Maintenance• Future
OVERVIEW
#
Perforce at Samsung
#
• All mobile projects are in Perforce, and most are android platform.
• We have 15 master servers and provides 19 perforce services.
• ~30 overseas R&D centers.• Most of our users use P4V. ( > 80%)
Perforce at Samsung
#
• Primary Server(Android 4.4.x ~)• 4 CPUs(physically 32cores), 1.5TB Memory, Linux(RHEL)• Metadata, Journal and Logs on Flash Arrays• Depot on Spinning disk(15k, RAID1+0, DAS)• > 7TB metadata(reclaim space up to 500GB/week)• > 10,000 users• ~ 2.6 million submitted changelists since Nov. 2013• ~ 8 million commands/day• ~ 6.5k commits/day
Perforce at Samsung
#
• Other Perforce Servers( ~Android 4.3.x, etc.)• ~ 14 smaller servers, with between 10GB-3TB metadata
each• Metadata, Journal and Logs on Flash Arrays or Spinning
disk(15k, RAID1+0, DAS)• Depot on Spinning disk(15k, RAID1+0, DAS) • 5 Read-Only Replicas• > 30 Build-Farm Replicas worldwide• 90 proxies worldwide
Perforce at Samsung
#
Optimizing Perforce Replication
#
• ~ 5 hours replication lag on overseas build replicas– ~ 25GB journal / hour on our primary server– ~ 500ms latency (HQ <-> Brazil)
• What we have done– Filter db.have by using –T flag
• db.have is over 90% of each journal
– Add QoS rules for perforce traffic– Expand network bandwidth– Set net.keep.x variables
Optimizing Perforce Replication
#
• ~ 35 mins replication lag on all build replicas– ~ 4GB journal / each p4 populate command– Break builds(Proof/Release)
• What we have done– Set rpl=2 in order to profile database locking activity– Filter db.integed by using –T flag
• Most lock time is due to holding write-lock on db.integed• < 30s (70x faster)
Optimizing Perforce Replication
#
Lockless Reads
#
• H/W upgrade is essential. – Flash Arrays, Fast CPU, Huge Memory, 10G NIC etc.
• However, It can Not guarantee best performance all the time due to lock contention.– Sync commands block major write commands
• dm-CommitSubmit, shelve, unshelve, submit, edit, populate
– Integ commands block more write commands• dm-SubmitChange, revert, shelve, dm-CommitSubmit,
change, submit, edit, resolve, reopen, add, sync, delete
Lockless Reads
#
• Upgraded 2013.3 and set db.peeking=3– Upgraded all replica servers in April.
• Reduce replication lag
– Upgraded our primary server on Jun. 29th• Eliminate lock contention caused by sync and integ
commands.• Shows significant performance improvements
Lockless Reads
#
Lockless Reads
1 2 4 8 16 32 64 128 256 5120
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
90,000
100,000
110,000
120,000
130,000
Commands delayed
Feb.
April
July
Co
mm
and
s af
fect
ed
(sec)
#
• Top 20 commands which blocked other commands on the primary server for the last 30days.
Lockless Reads
#
Lockless Reads
Feb. JUN JULY0
50
100
150
200
250
300
350
400
450
500
550
600
650
700
P4 DB Write lock Activity
write-wait
write-held
tim
e(m
s)
Feb. JUN JULY50
55
60
65
70
75
80
85
90
95
100
Command ratio (lapse < 1s)
submit
%
#
Lockless Reads
Feb. JUN JULY0
1
2
3
4
5
6
7
8
9
10
11
12
13
Avg. Lapse
submit
shelve
unshelve
edit
revert
tim
e(se
c)
#
Monitoring and Maintenance
#
• Run P4HealthCheck every 60 seconds– check connection, p4d status, p4broker status – send email and SMS if status is not ok.
• Run Replica Gap checker every 10 minutes– check replication lag– send email if gap is over 10 minutes.
Monitoring (QuickBuild)
#
• Monitoring Perforce Severs
Monitoring (Splunk)
#
• Monitoring Perforce DB Lock
Monitoring (Splunk)
#
• Profiling locking activities– run a query and select any fields you are interested in
Monitoring (Splunk)
#
• Monitoring Perforce Commands
Monitoring (Splunk)
#
• Monitoring Perforce Sync command
Monitoring (Splunk)
#
• Monitoring Errors(max-values)
Monitoring (Splunk)
#
• Maintain perforce db.* files, journal and logs– create and restore checkpoint files/week– db rebuild/week– unload clients and labels/day– rotate and replay journals(logs)/hour– recover the database
• Set up Replica servers• Verify depots/day
Maintenance (QuickBuild)
#
Maintenance (QuickBuild)
#
Maintenance (QuickBuild)
#
The Future
#
• Upgrade our primary server in October– 4 CPUs(physically 40cores), 3.0TB memory
• P4d/P4broker upgrades (r13.3 -> r14.x) • Shared Archive or Depot on Flash Arrays(dedup)• Server consolidation
– 14 other servers -> 7 other servers– Setup read-only replicas for all perforce services
The Future