mo ta bai toan
TRANSCRIPT
Trong vin thng, thut ng CDR (Call Detail Record hay Call Data Record) l cc bn ghi d liu chi tit ca mt cuc gi, mt ln gi tin nhn, hay mt phin s dng mobile internet data . Cc d liu ny thng bao gm: S pht sinh cuc g (calling), s nhn cuc gi (called), thi im pht sinh cuc gi (start_time), thi gian thc hin cuc gi (duration),... Ngoi ra, CDR cn c th c m rng ra thm mt s loi nh: d liu chi tit tnh cc cuc gi, d liu chi tit tc ng thng tin trng thi, gi cc, ti khon thu bao (np th, cng tin, ng/m ti khon),.
Cc CDRs c y ra bi cc tng i hoc cc h thng vin thng v thng c dng binary, khi chng thng c chuyn v nh dng txt lu tr. Nhng file txt nh vy c gi l file CDRs. Cu trc 1 CDR trong file txt bao gm cc trng c ngn bi mt k t phn cch. Cc file CDRs thng ch c lu tr ngn hn trong cc h c s d liu quan h tra cu khi c s c, khiu ni.
Khi sc mnh ca phn tch c p dng vo vin thng cng l lc mt lot cc sn phm mi nh m bo doanh thu (RA Revenue Assurance), Kim sot gian ln (FM Fraud Management), Kinh doanh thng min (BI Business Inteligent), Qun l quan h khch hng (CRM Customer Relationship Management), ra i. Da trn tng hp, phn tch cc thng tin trong cc file CDRs, cc sn phm trn em li nhng gi tr rt ln cho doanh nghip vin thng, v d nh:
Thng k tng s giao dch (gi, nhn tin,) theo u s hoc tng s cc c tnh theo thu bao v so snh vi s liu thc t c th pht hin c cc l hng gy tht thot trong cc h thng x l d liu, h thng tnh cc, p dng trong h thng m bo doanh thu
Thng k tng s ln tc ng vo ti khon theo ngy, tng s tin c np theo ti khon c th pht hin ra cc tc ng vt mc cho php, p dng ktrong cc h thng pht hin gian ln.
Thng k tng lng s dng ca thu bao theo mi gi trong ngy / ngy trong thng / vng a l khi trin khai mt chng trnh khuyn mi, t c th nh gi c phn b s dng ca khch hng, mc hiu qu ca chng trnh khuyn mi, t a ra cc iu chnh thch hp, phc v trong cc h thng kinh doanh thng minh, qun l chin dch.
Mc d d liu trong cc file CDRs l rt qu gi, Tuy nhin vic lu tr v x l chng li khng h n gin bi v chng qu ln. Ly v d ti Viettel, trung bnh kch thc ca mt CDR l 450 bytes (20 trng, m ha bng ASCII), s lng thu bao vo khong 60 triu v mi thu bao trung bnh mt ngy pht sinh khong 20 CDRs. Nh vy mi ngy cn lu tr v x l khong 500 Gb d liu, mt con s khng h nh , nht l i vi cc h thng m phn cng cn hn ch. Cng vic x l lng d liu ln nh vy t ra vn chi ph cao u t h tng in ton cng nh cc ngun lc vn hnh. Thm ch mt khi u t nhng my ch cu hnh cao x l d liu ln th vic m rng khi quy m kinh doanh tng cng rt kh thc hin. Thm vo l nhng vn v an ninh, tnh sn sng, bo mt cng l nhng mi quan tm ln. Theo pht trin ca cc sn phm, cc casestudy lin quan n tng hp, phn tch d liu cng nhiu, dn ti yu cu x l d liu khng ch tng v s lng m cn tr nn rt a dng, phong ph. Khng ch th, khi tp khch hng ca doanh nghip vin thng khng ngng pht trin dn ti lng d liu CDRs cn x l cng theo m tng ln mnh m, khin cho bi ton x l d liu ngy cng tr nn phc tp.
Mt gii php ph bin m cc n v p dng cho sn phm ca mnh chnh l Hadoop v cc subproject cho php khai thc Hadoop theo nhu cu ring ca tng n v nh Hive, Pig, Spark,Hadoop, vi c s l h thng File phn tn (HDFS Hadoop Distributed File System) v thut ton Map Reduce, v c bn c th p ng hu ht nhu cu x l d liu ln ti cc n v. Tuy nhin, thc trng vic khai thc v trin khai x l d liu ti cc n v ang th hin rt nhiu hn ch:
i vi server lu tr d liu gc (cc tng i hoc cc h thng vin thng lu tr file CDRs): Qu ti (khi c nhiu client s dng FTP ly d liu) Kh qun l cc kt ni FTP ca nhiu n v Bo mt km, d b tn cng Kh nng d liu b tht thot cao
i vi n v khai thc v x l d liu:C th cng mt loi d liu, cng mt cch thc x l ging nhau, nhng mi n v phi dng ring mt lung x l, t xin kt ni server lu tr d liu gc, xy dng v ci t lung tin trnh ly v tin x l d liu, thit k v ci t cm server Hadoop x l d liu, y d liu vo v x l d liu trong Hadoop. Nu ng t gc nhn ca doanh nghip vin thng th c th thy: Lng ph ngun nhn lc, lng ph n lc xy dng, trin khai v bo tr Lng ph ti nguyn x l, ti nguyn lu tr Gim tnh chuyn mn ha (nhn lc ca n v ngoi thc hin cc cng vic chuyn mn nghip v ca n v cn phi m nhim thm nhiu cng vic khc) V mt l thuyt th d liu phn tch ca n v ny c th c ti s dng n v khc, tuy nhin do mi n v x l d liu ring c th gy nn mt ng b x l cc n v, dn n vic sai lch d liu mi n v Trong trng hp cc n v cng s dng chung 1 module (module tin x l hoc module x l d liu) do 1 n v khc cung cp, vic cp nht module ny (khi c nng cp) cho tt c cc n v s dng l kh khn v tn nhiu n lc. Ti 1 thi im, ti nguyn ca 1 n v c th thiu, nhng ti nguyn ca n v khc li d tha, tuy nhin khng c s tn dng ti nguyn no c thc hin, thay vo l tip tc u t ti nguyn cho n v c yu cu.Vi cc vn nh vy, doanh nghip vin thng c th ngh ti mt m hnh lu tr v x l d liu tp trung cho cc n v cng s dung. V vic kt hp Cloud Computing v Big Data Processing c th hin thc ha tng .