1
MAGENTOImport-export
Performance challenges and victories we got at open source ecommerce
AUGUST 4, 2015
2
What is Magento?
Data by Hivemind, a technology research and insight company
Magento is an open-source content management system for e-commerce websites. Since 2011 it’s a part of eBay Inc.
3
Magento customers
4
• Populate products to clear shop
• Daily update of inventory and prices
• Move products from one store to another
• Integration with CRM systems using CSV format
• Simple mechanism of mass edit using Excel
• Mechanism should be clear to customer
• It should work fast
Why this functionality is so important to customers?
CONCLUSION:
USECASES:
5
Goals of project
Improve import-export functionalities for products/customers1
Implement new functionality to import/export prices2
Change obsolete file format for import/export purposes3
Optimize import/export performance4
All functionality should be covered with the tests and correspond to Magento coding standards5
6
Acceptance criteria
Import procedure should be a linear process for Magento framework and number of records in a single file should not exponentially increase process time until the bottleneck is a MySQL server itself:Run #1 100k 30 minsimple_products = 60000configurable_products = 20000 (each configurable has 3 simple products as options)bundle_products = 10000 (each bundle product has 3 simple products as options)grouped_products = 10000 (each grouped product has 3 simple products as options) categories = 1000 categories_nesting_level = 3 Each product has 2 images attached using local storage only.Number of product attribute sets = 100Number of attributes per product = 10Total Number of attributes = 1000
Run #2 200k 1 hoursimple_products = 120000 configurable_products = 40000 (each configurable has 3 simple products as options)bundle_products = 20000 (each bundle product has 3 simple products as options) grouped_products = 20000 (each grouped product has 3 simple products as options) categories = 1000categories_nesting_level = 3 Each product has 2 images attached using local storage only.Number of product attribute sets (product templates) = 100Number of attributes per product = 10Total Number of attributes = 1000
Import process shouldn’t affect frontend load time more than 20% of average page load, metered by JMeter
7
System configuration
8
Why it’s not so simple?
Product
Media images
Categories
Links to other
products
TaxesCustom options
Custom attributes
Complex products attributes
• Product – is a key entity for eCommerce
• DB uses EAV model for data storage
• Product has many linked entities
Producttypes
Simple
Virtual
Configurable
BundleGrouped
Virtual
Gift cards (EE only)
9
Import file sample
COLUMNS PRODUCT DATA
sku,website_code,store_view_code,attribute_set_code,product_type,name,description,sh
ort_description,weight,product_online,visibility,product_websites,categories,price,special
_price,special_price_from_date,special_price_to_date,tax_class_name,url_key,meta_title,
meta_keywords,meta_description,base_image,base_image_label,small_image,small_imag
e_label,thumbnail_image,thumbnail_image_label,additional_images,additional_image_lab
els,configurable_variation_prices,configurable_variation_labels,configurable_variations,bu
ndle_price_type,bundle_price_view,bundle_sku_type,bundle_weight_type,bundle_values,
downloadble_samples,downloadble_links,associated_skus,related_skus,crosssell_skus,ups
ell_skus,custom_options,additional_attributes,manage_stock,is_in_stock,qty,out_of_stock
_qty,is_qty_decimal,allow_backorders,min_cart_qty,max_cart_qty,notify_on_stock_below,
qty_increments,enable_qty_increments,is_decimal_divided,new_from_date,new_to_date
,gift_message_available,giftcard_type,giftcard_amount,giftcard_allow_open_amount,giftc
ard_open_amount_min,giftcard_open_amount_max,giftcard_lifetime,giftcard_allow_mess
age,giftcard_email_template,created_at,updated_at,custom_design,custom_design_from,
custom_design_to,custom_layout_update,page_layout,product_options_container,msrp_
price,msrp_display_actual_price_type,map_enabled
simplesku00,,,Default,simple,"simple Product 00","simple Product 00 Description","simple
Product 00 Short Description",33.14,1,"catalog,
search",base,Section3/S3Category4/SubCategory10|Section9/S9Category2/
SubCategory1,3193.50,89.9900,02-03-15,02-03-15,"Taxable Goods",simple00urlkey,"simple
Product 00 Meta Title","simple, product","simple Product 00 Meta
Description",/mediaimport/image1.png,"Base Image
Label",/mediaimport/image2.png,"Small Image
Label",/mediaimport/image3.png,"Thumbnail Image Label","/mediaimport/image4.png,
/mediaimport/image5.png","Label 1, Label
1a",,,,,,,,,,,,simplesku0,,simplesku0,,"set9_attribute1_code = value8,set9_attribute2_code =
value6,set9_attribute3_code = value2,set9_attribute4_code = value1,set9_attribute5_code =
value4,set9_attribute6_code = value8,set9_attribute7_code = value7,set9_attribute8_code =
value6,set9_attribute9_code = value2,set9_attribute10_code =
value1,size=0",1,1,1000,2,0,1,1,1000,1,0,0,0,02-03-15,02-03-15,0,,,,,,,,,02-03-15,02-03-
15,"Magento Blank",02-03-15,02-04-15,,"3 columns","Product Info Column",9,"On
Gesture",1
10
One of the concepts for import optimization
Append data to model
Prepare data for insert Query to DB
Get imported data Retrieve data ready to insert
Create multi-insert query
Standard saving process
Multi-insert process
11
How it’s actually working
Standard saving process
Multi-insert process
Append data to model
Prepare data for insert Query to DB
Get imported data Create multi-insert query
Prepare data for insert
12
Sort products from simple to
complex
Divide full pack to bunches of 50
products in each
Import full bunch of products in one
query
Retrieve Ids of inserted/updated
products
Import connected entities one by
one
Bunch import idea
13
• Importing of 500k products on cluster – nearly 4-5h
• Creating URL rewrites for them – nearly 12h
• Total time: 17h
• Need to be less that 2.5h
Before optimizations takes a place
14
XHProf is a function-level hierarchical profiler for PHP and has a simple HTML based navigational interface. The raw data collection component is implemented in C (as a PHP extension). The reporting/UI layer is all in PHP. It is capable of reporting function-level inclusive and exclusive wall times, memory usage, CPU times and number of calls for each function. Additionally, it supports ability to compare two runs (hierarchical DIFF reports), or aggregate results from multiple runs.
• More lightweight and faster than xDebug
• Hierarchical reports with memory and CPU usage show
• Ability to create call-graph image based on report
• Ability to create summary report based on couple of runs
T - Technology
MAIN ABILITIES
DESCRIPTION
15
How to implement XHProf
<?
//Initialize XHProfxhprof_enable(XHPROF_FLAGS_CPU + XHPROF_FLAGS_MEMORY);
//Run our coderun();
//Stop profiler and retrieve profiling data$xhprof_data = xhprof_disable();
//Generate reportinclude_once "/var/www/xhprof-0.9.4/xhprof_lib/utils/xhprof_lib.php";include_once "/var/www/xhprof-0.9.4/xhprof_lib/utils/xhprof_runs.php";$xhprof_runs = new XHProfRuns_Default();$run_id = $xhprof_runs->save_run($xhprof_data, "test");
16
How reports look like
17
Call-graph visualization
Very bad
Not so bad
Seems normal, but…
18
How it looks like in Magento
19
• Static (one-time):
– Mostly affects small size import
– On large pack of imported products hard to find
• Linear:
– Hard to detect on small size import, because of static bottlenecks
– Takes almost same percent on medium and big packs
• Exponential:
– Hard to find on small/medium size of import pack
– Could be detected on big pack of products
Bottlenecks, classification
20
― Generate queue― Create number of workers― Pray that it won’t affect frontend loading time
Pros:• We could use several processor cores to increase data process speed
Cons:• Troubles with disabled thread/system functions due to security reasons• Potential risks to frontend loading time tests• Quite complex mechanism to implement• Potential risks of raws/tables lock lags due to parallel read-write to
single DB
Approaches to optimization
Implement multi-processing
― Change attribute load process― Change URL Rewrites save process― Implement effective plugin cache― Other small optimizations
Pros:• We could deliver by iterations• Less shit-code
Cons:• We don’t know capability of such fixes to deliver performance
increase• These changes could affect tests and core processes
Find and fix bottle-necks
21
• Time, quality - what should we prefer on really dirty code?
• Import/export functionality is a part of MFT (Magento testing framework) so changing it brakes tests
• Results are affected by the size of import file
• Results varies on different DB data and we didn’t have etalon DB
• Long time to get report
• To detect exponential bottlenecks we should compare reports on different import files
• How to import related entities if we haven’t got an unique key?
• Memory usage vs. queries to DB
• How to compare elephant and fly if we don’t know real server configuration?
• XHProf lies, we cannot be sure in results and should use it only as a guideline
Difficulties in optimization
22
It’s a lie!
23
Interceptors idea
Main classMethod 1Method 2Method 3
Interceptors covered class extends Main classMethod 1Method 2Method 3
Method 1
Before plugin call
Around plugin call
After plugin call
24
Interceptors benchmark results
Before optimizations takes place
Plugin system performance issue
25
Interceptors benchmark results
After plugin system optimization
Instead of old calls
26
Example of optimizing static bottleneck
Load list of product types
Product left for init?
Load attribute entities for the
product
Load data for the attribute
Start init
End init
Get next product type
yes
no
Load list of product types
Product left for init?
Load absent attribute entities
by Id
Load data for the attributes
Start init
End init
Get next product type
yes
no
Load attributes Ids by product type
Is every attribute
in cache?
Add an attributes to cache
Get an attributes from cache by id
no
yes
before after
27
Cache reusability on URL Rewrite example
Get category from DB
Category exist?
Create category
Start creating category
Any categorie
s left?
End creating category
yes
yes
no
no
Start creating URL rewrites
Get category from DB
Create URL rewrite
Any categorie
s left?
yes
End creating URL rewrites
no
Get category from cache
Place to cache
Get category from cache
Get all categories and place to cache
28
Global URL Rewrite optimization
Get IDs of all inserted/updated
products
Products exists?
Start creating URL Rewrites
Load categories for the product
Load product attributes
Load categories attributes
Load next product by Id
End creating URL Rewrites
Generate URLs for the product
Generate URLs for the categories
Generate URLs for the websites
Insert URLs for current product
yes
no
Get array of products from
bunch
Products exists?
Start creating URL Rewrites
Get categories from the cache
Populate data for one product to
object from array
End creating URL Rewrites
Generate URL for the product
Generate URLs for the categories
Generate URLs for the websites
Store URLs in temporary cache
yes
no
Multi-insert URLs from the cache
before after
29
• CPU 4 physical cores 3.5GHz (2 for VM)
• L2 cache 1 Mb
• L3 cache 6 Mb
• RAM 16GB
• SATA3 HDD (64 Mb buffer)
How to compare performance?
First config
• CPU 2 physical cores with Hyper-Thread 3.2GHz (2 for VM)
• L2 cache 512 Kb
• L3 cache 4 Mb
• RAM 8GB
• SATA1 HDD (16 Mb buffer)
Total time: ~50mTotal time: ~16.5m
Second config
30
PROJECT RESULTS
200 000 products
Start: 12:12:16End: 12:48:05Total: ~36 minutes
Start: 13:34:49End: 13:51:08Total: ~16.5 minutes
100 000 products
31
Magento 2 Merchant Beta Release
We are tremendously excited to announce that today we reached another significant development milestone with the release of the Magento 2 Merchant Beta. This release brings us to the last stage before the general availability (GA) of Magento 2 in Q4 2015.
…
• The Enterprise Edition module includes updates to merchant features like import/export functionality, configurable swatches, transactional emails and more.
• It demonstrates significant performance improvements for both the Magento Community Edition and Enterprise Edition with holistic updates to both server-side and client-side architecture. Server-side updates include out of box Varnish 4, full page caching, and support for HHVM3.6. Client-side updates include static content caching in browser, image compression, use of jQuery, and RequireJS for better management of JavaScript and bundling to reduce file download counts.
News link: http://magento.com/blog/technical/magento-2-merchant-beta-release
Our changes goes in release!
32
Performance challenges and victories we got at open source ecommerce
PROJECT OVERVIEWBYULADZIMIR KALASHNIKAU