![Page 1: Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam](https://reader035.vdocuments.site/reader035/viewer/2022062404/551c5bfb550346b1458b52ac/html5/thumbnails/1.jpg)
Big Data:Big Challenges for Computer Science
Henri BalVrije Universiteit Amsterdam
![Page 2: Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam](https://reader035.vdocuments.site/reader035/viewer/2022062404/551c5bfb550346b1458b52ac/html5/thumbnails/2.jpg)
Multiple types of data explosions
High-volume data
10-100 x global internet traffic per year (by 2018)
Complex data
![Page 3: Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam](https://reader035.vdocuments.site/reader035/viewer/2022062404/551c5bfb550346b1458b52ac/html5/thumbnails/3.jpg)
Graphics Processing Units (GPUs)
![Page 4: Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam](https://reader035.vdocuments.site/reader035/viewer/2022062404/551c5bfb550346b1458b52ac/html5/thumbnails/4.jpg)
Differences CPUs and GPUs● CPU: minimize latency of 1 activity (thread)
● Must be good at everything● Big on-chip caches● Sophisticated control logic
● GPU: maximize throughput of all threads usinglarge-scale parallelism
ControlALU ALU
ALU ALU
Cache
![Page 5: Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam](https://reader035.vdocuments.site/reader035/viewer/2022062404/551c5bfb550346b1458b52ac/html5/thumbnails/5.jpg)
Example: NVIDIA Maxwell● 16 independent
streaming multiprocessors
● 2048 compute cores
![Page 6: Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam](https://reader035.vdocuments.site/reader035/viewer/2022062404/551c5bfb550346b1458b52ac/html5/thumbnails/6.jpg)
Ongoing GPU work at VU● Applications
● Multimedia data● Digital forensics data● Climate modelling● Radio astronomy data
● Methodologies● Hadoop on accelerators● Programming methods
for accelerators
● Teaching GPUs (with UvA)● National ICT research infrastructure
COMMIT/
![Page 7: Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam](https://reader035.vdocuments.site/reader035/viewer/2022062404/551c5bfb550346b1458b52ac/html5/thumbnails/7.jpg)
Complex data● Still smaller in volume than astronomy etc.● Much more complicated, semantically rich
data● Growing fast ….
![Page 8: Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam](https://reader035.vdocuments.site/reader035/viewer/2022062404/551c5bfb550346b1458b52ac/html5/thumbnails/8.jpg)
Semantic web● Make the Web smarter by injecting meaning
so that machines can reason about it● initial idea by Tim Berners-Lee in 2001
● Now attracted the interest of big IT companies
![Page 9: Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam](https://reader035.vdocuments.site/reader035/viewer/2022062404/551c5bfb550346b1458b52ac/html5/thumbnails/9.jpg)
WebPIE: a Web-scale Parallel Inference Engine
● Web-scale parallel reasoner doing full materialization● Orders of magnitude faster than previous work by
using smart parallel algorithms● Jacopo Urbani + Frank van Harmelen (VU)
Christiaan Huygens nomination PhD thesis Urbani
![Page 10: Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam](https://reader035.vdocuments.site/reader035/viewer/2022062404/551c5bfb550346b1458b52ac/html5/thumbnails/10.jpg)
Reasoning on changing data
● WebPIE must recompute everything if data changes● Takes on the order of 1 day on a 64-node compute
cluster
● Challenge: real-time incremental reasoning, combining new (streaming) data & historic data● Nanopublications (http://nanopub.org)● Handling 2 million news articles per day (Piek
Vossen, VU)● Data streams from (health) sensors & smart phones
● Exploit massive parallel computing and GPUs
![Page 11: Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam](https://reader035.vdocuments.site/reader035/viewer/2022062404/551c5bfb550346b1458b52ac/html5/thumbnails/11.jpg)
Other work on complex data
● Use semantic web to describe and reason about computer infrastructure (Cees de Laat, UvA)
● Machine learning using GPUs (Hadoop)● Joint work with Max Welling (UvA)
● Business applications● With Frans Feldberg (VU, Economy)
![Page 12: Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam](https://reader035.vdocuments.site/reader035/viewer/2022062404/551c5bfb550346b1458b52ac/html5/thumbnails/12.jpg)
Discussion
● We can process peta-scale (1015 , LHC) simple datawith cluster and grid technology
● Exascale (1018 , SKA) may be feasible with GPUs, but requires new parallel programming methodologies
● Processing complex data is vastly more complicated, even at smaller scales
● Complex data is also escalating in size● Dynamic (streaming) data will be next● Processing exa-scale dynamic complex data?