performance comparison between apache hive and oracle sql for big data
TRANSCRIPT
![Page 1: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/1.jpg)
Performance Comparison Between Apache Hive and Oracle SQL for Big Data
AnalysisPresented by:- Santosh Kumar Dash |M.Tech CSE, Utkal University
ASE| Tata Consultancy Services Limited
8th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2016)
![Page 2: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/2.jpg)
Contents• Introduction• Motivation and Objective• Proposed Methodology• Data Set Description• Experimental Results• Conclusion and Future Scope• References
![Page 3: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/3.jpg)
Introduction
![Page 4: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/4.jpg)
What is Bigdata?
![Page 5: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/5.jpg)
Bigdata Applications
![Page 6: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/6.jpg)
What is ORACLE SQL?
![Page 7: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/7.jpg)
What is Apache Hive?
![Page 8: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/8.jpg)
Motivation• Many researchers have tried to perform Big Data
Analytics using traditional methods which resulted in poor performances due to memory constraints. • Hence, we are motivated to explore the suitability of
Apache Hive as a distributed database for faster retrieval in comparison to popular Oracle SQL approach.
![Page 9: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/9.jpg)
Objective• Performance Analysis of Oracle SQL w.r.t. time• Performance Analysis of Apache Hive w.r.t. time.• Performance Analysis of Mean processing time between
Apache Hive and Oracle SQL.
![Page 10: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/10.jpg)
Proposed MethodologyThe diagram represents the Proposed Model for the experiment.
![Page 11: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/11.jpg)
Data Set Description• Online Video
Characteristics and Transcoding Time • Record Linkage
Comparison Patterns • 3D Road Network (North
Jutland, Denmark)• Rate (Health-Insurance-
Market place)
Data Set Name Rows Columns
Video 168286 11
Record 5749132 12
Road 434874 4
Rate 13,000000 23
![Page 12: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/12.jpg)
Column Names of DatasetsVideo:- YouTube video id, duration, bitrate (total in Kbits), bitrate(video bitrate in Kbits), height(in pixels), width(in pixels), frame rate, estimated framerate, codec, category, and direct video link.
Record:- id_1, id_2, cmp_fname_c1, cmp_fname_c2,cmolname_c1, cmp_lname_c2, cmp_sex, cmp_bd, cmp_bm, cmp_by, cmp_plz and is_match
![Page 13: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/13.jpg)
Column Names of DatasetsRoad:- OSM_ID, LONGITUDE, LATITUDE, and ALTITUDE
Rate:- BusinessYear, StateCode, IssuerId,Source_Name,Version_Num,ImportDat,Issuer_Id2,FederalTIN,RateEffectiveDate,RateExpiraon Date, PlanId , Rating AreaId, Tobacco, Age, Individual Rate, Individual Tobacco Rate, Couple, Primary Subscriber And One Dependent, Primary Subscriber And Two Dependents, Primary Subscriber And Three Or More Dependents, Couple And One Dependent, Couple And Two Dependents, couple and Three Or More Dependents, and RowNumber
![Page 14: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/14.jpg)
Experimental Results
![Page 15: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/15.jpg)
Queries Description StatementsQuery 1 Retrieving Unique column using
DISTINCTRetrieving unique Output
RecordsQuery 2 Retrieving Records from a given
dataset using ORDER BY for general Sorting
Sorting
Query 3 Retrieving Records Using ORDER BY and DESC for Backward Sorting
Sorting Backward
Query 4 Using COUNT and GROUP BY for Retrieving records and their count.
Grouping with Counting
Query 5 Using MAX aggregate function for retrieving MAXIMUM value from a record
Maximum Value
![Page 16: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/16.jpg)
Performance Comparison of Video dataset
Query 1 Query 2 Query 3 Query 4 Query 50
10
20
30
40
50
60
70
80
Hive Oracle SQL
![Page 17: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/17.jpg)
Performance Comparison of Record dataset
Query 1 Query 2 Query 3 Query 4 Query 50
10
20
30
40
50
60
70
Hive Oracle SQL
![Page 18: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/18.jpg)
Performance Comparison of Road dataset
Query 1 Query 2 Query 3 Query 4 Query 50
5
10
15
20
25
30
Hive Oracle SQL
![Page 19: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/19.jpg)
Performance Comparison of Road dataset
Query 1 Query 2 Query 3 Query 4 Query 50
200
400
600
800
1000
1200
1400
1600
Hive Oracle SQL
![Page 20: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/20.jpg)
Mean Processing Time of All Datasets
Video Road Record Rate0
100
200
300
400
500
600
700
800
900
Hive Oracle SQL
![Page 21: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/21.jpg)
Conclusion APACHE HIVE
• In large data sets, Apache hive is very efficient.• Queries with Group By,
Order By and aggregate function are taking more time as compared to a retrieving entire dataset.• average time is more if the
number of rows is less
ORACLE SQL
• In Small scale datasets, Oracle SQL performs better. • Queries with Group By,
Order By and aggregate function are taking less time as compared to a retrieving entire dataset.• average time is less if the
number of rows is less
![Page 22: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/22.jpg)
Future work• In future, we will take large-scale (In TB) datasets and
do analysis on both Apache Hive and Oracle SQL for the performance test.
![Page 23: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/23.jpg)
References1. Chawda RK , Big Data and Advanced Analytics
Tools.Symposium on Colossal Data Analysis and Networking (CDAN).(2016)
2. Garg V, Optimization of Multiple Queries for Big Data with Apache Hadoop/Hive. 2015 Int Conf Comput Intell Commun Networks, pp. 938–941.(2015)
3. Gruenheid A, Omiecinski E, Mark L Query Optimization Using Column Statistics in Hive Categories and Subject Descriptors.(2016)
4. Haryono GP, Zhou Y (2016) Profiling apache HIVE query from runtime logs. Int Conf Big Data Smart Comput BigComp pp. 61–68.(2016)
![Page 24: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/24.jpg)
References5. Kaisler S, Armour F, Espinosa JA, Money W, Big Data:
Issues and Challenges Moving Forward. 2013 46th Hawaii Int Conf Syst Sci.pp. 995–1004. (2013)
6. Rotsnarani Sethy, Mrutyunjaya Panda, Big Data Analysis Using Hadoop: A survey. IJARCSSE pp.1153–1157.(2015)
7. Thusoo A, Sarma J Sen, Jain N, Shao Z, Chakka P, Zhang N, Antony S, Liu H, Murthy R ,Hive - A petabyte scale data warehouse using Hadoop. In: Proc. - Int. Conf. Data Eng. pp 996–1005.(2010)
8. Loshin D, Chapter 7 - Big Data Tools and Techniques. pp 61–72 (2013)
![Page 25: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/25.jpg)
References9. Hive Architecture.
https://cwiki.apache.org/confluence/display/Hive/Design.
10.Introduction to Oracle Database. https://docs.oracle.com/database/121/CNCPT/intro.htm#CNCPT001.
11.Online Video Characteristics and Transcoding Time Dataset Data Set. https://archive.ics.uci.edu/ml/datasets.html.(2015)
12.Record Linkage Comparison Patterns Data Set. https://archive.ics.uci.edu/ml/datasets.html.(2011)
![Page 26: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/26.jpg)
References13.3D Road Network (North Jutland, Denmark) Data Set.
https://archive.ics.uci.edu/ml/datasets.html.(2013) 14.Rate Data Set https://www.kaggle.com/hhsgov/health-
insurance-marketplace.(2015)15.5v’s of Big data http://bigdata.black/featured/what-is-
big-data/16.Big data applications http://image.slidesharecdn.com/
![Page 27: Performance Comparison between Apache Hive and Oracle SQL for Big Data](https://reader035.vdocuments.site/reader035/viewer/2022062306/587e5c041a28ab2a538b4c95/html5/thumbnails/27.jpg)