quantum natural language processing€¦ · quantum natural language processing lee james...

1
Intel® QS Scaling References 1. William Zeng and Bob Coecke, “Quantum Algorithms for Compositional Natural Language Processing”, EPTCS 221, 2016. 2. Joachim Lambek, “From word to sentence”, Polimetrica, Milan, 2008. 3. Nathan Wiebe, Ashish Kapoor, and Krysta M. Svore, Quantum Algorithms for Nearest-Neighbor Methods for Supervised and Unsupervised Learning”, Quantum Information and Computation, 2014. 4. Mikhail Smelyanskiy, Nicolas P. D. Sawaya, Alán Aspuru- Guzik, “qHiPSTER: The Quantum High Performance Software Testing Environment”, arXiv:1601.07195, 2016. Intel® Quantum Simulator Intel® Quantum Simulator [4] – Quantum High Performance Software Testing Environment. – Distributed high-performance implementation of a quantum simulator on a classical computer. – Out-of-the-box simulation of general single-qubit and two- qubit (controlled) gates. Rotation, Hadamard, Pauli, Square Root of Pauli, Toffoli, SWAP, Square Root of SWAP. – Single- and double-precision for qubit registers. Implementation Single-node and multi-node implementations of qubit gate operations. Multi-node implementation distributes state vectors to fit per-node memory capacity to store states with optimised memory usage for improved communication. Optimisations Vectorisation, multi-threading, gate fusion. Methodology Quantum Natural Language Processing Lee James O’Riordan 1 , Myles Doyle 1 , Venkatesh Kannan 1 , Fabio Baruffa 2 1 Irish Centre for High-End Computing, Ireland 2 Intel Deutschland GmbH Background Natural Language Processing Sentiment analysis, relationship extraction, word sense disambiguation, automatic summary generation. Traditional “bag of words” approach Lacks information about grammatical rules of language. Increase in problem complexity reduces quality of results. “Distributed compositional semantics” approach [1] Grammatically informed algorithms compute sentence meanings. Implementation requires large computational classical resources. Quantum Computing Potential to offer dramatic speedup to algorithms which can exploit quantum parallelism. Requires quantum versions of classical algorithms. Application development Limited by scale, reliability and coherence of quantum devices. Software quantum simulators allow proof of concept applications. Project Objective Implement quantum versions of distributed compositional semantics algorithms to analyse sentence meaning. Develop and evaluate solution on the Intel® Quantum Simulator [4] deployed in Irish national supercomputer “Kay”. Partnership Irish Centre for High-End Computing & Intel® Corporation. Co-funded by Enterprise Ireland & Intel® Ireland. Project Execution January 2019 to March 2020. Distributed Compositional Semantics Quantum Advantage Contact Information Distributional Model Algorithms based on “bag of words” approach. Meaning of words represented by frequencies of “nearby” words in a corpus. Compositional Model [2] Algorithms derive meaning of sentences or phrases from known meanings of component words. Embeds types of words and grammatical structure. Unified DisCo Model [1] Combines both approaches to introduce grammatical form to the composition of word meanings. Allows computing meanings of two sentences and decide if their meanings match. INSERT LOGO HERE Irish Centre for High-End Computing (ICHEC) Dr. Lee James O’Riordan, [email protected] Mr. Myles Doyle, [email protected] Dr. Venkatesh Kannan, [email protected] Intel Deutschland GmbH Dr. Fabio Baruffa, [email protected] Classical vs. Quantum Storage Requirements 1 transitive verb 10K transitive verbs Classical 1 GB 10 TB Quantum 33 qubits 47 qubits Monoidal categories (type specification and reduction; meaning representation and composition) Meaning vector spaces Pregroup for types Language objects meaning axioms type logic Problem Mapping 1. Represent category theoretic data structures and NLP operations using Dirac notation. 2. Define quantum versions of DisCo algorithms from literature using Dirac notation. 3. Map elements from Dirac notation to quantum circuit notation (gates/registers/circuits). 4. Map elements from quantum circuit notation to the Intel® Quantum Simulator paradigm. Results: Comparing Sentences Mary (subject) John (object) likes (verb) Category theoretic graphical notation Quantum Dirac Representation Example: DisCo model sentence meaning Quantum circuit notation |Maryi⌦ 0 @ X i,j,k c i,j,k |ii|j i|k i 1 A |Johni ! X i,j,k c i,j,k hMary|ii|j ihJohn|k i ! X j d j |ji |likesi = X i,j,k c i,j,k |ii|j i|k i |i = f (|i) Intel® Quantum Simulator X[1] X[1] X[-1] X[1] X[1] X[-1] X[0] X[0] X[0] X[0] X[0] ... ... Two-tier implementation 1. Intel-QS bindings and quantum encoding implemented in C++ layer (red). 2. Pre-processing, analysis and plotting implemented in Python (blue). 3. External dependencies for respective layers indicated by colour gradients (green/red => C++, green/blue => Python). Quantum computing applications can be written entirely in C++, or Python. The Python layer allows for a single-ended application development environment, or standard cluster submission with respective Python scripts. Intel® Quantum Simulator on Kay Kay 336 node cluster. 13,440 CPU cores, 63 TB distributed memory. Dual-socket 20-core Intel® Xeon Gold (Skylake) 6148 at 2.4 GHz with 192 GB memory. 400 GB local SSD scratch. 100 GB Intel® OmniPath network. Additional partitions Dual NVIDIA Tesla V100. Intel Xeon Phi (Knights Landing architecture). High-memory 1.5 TB RAM with 1TB local SSD scratch. Simulation on Kay Up to 33 qubits on single-node executions. Up to 41 qubits on Kay’s main partition. M;“;£Q“ 'peQl Aal;¡„ n s ;Nfig“ …… n s J^agN …q n s £ja“^ q… n s £fi¡ZQpl qq v £“;lN …… v jp†Q …q v £a“ q… v £gQQs qq n o al£aNQ n o pfi“£aNQ q M;“;£Q“ 'peQl ¢“;“Q n s bp^l (|00i + |10i)/ p 2 n s i;¡„ (|01i + |11i)/ p 2 v ·;ge (|00i + |01i)/ p 2 v ¡Q£“ (|10i + |11i)/ p 2 n o al£aNQ |0i n o pfi“£aNQ |1i adult,sit,inside |00100i smith,sit,inside |10100i adult,sleep,inside |00110i smith,sleep,inside |10110i child,stand,outside |01001i surgeon,stand,outside |11001i child,move,outside |01011i surgeon,move,outside |11011i 0.00 0.05 0.10 0.15 0.20 P (adult, stand, inside) 1/ p n Even superposition measurement Post-selection measurement adult,sit,inside |00100i smith,sit,inside |10100i adult,sleep,inside |00110i smith,sleep,inside |10110i child,stand,outside |01001i surgeon,stand,outside |11001i child,move,outside |01011i surgeon,move,outside |11011i 0.00 0.05 0.10 0.15 0.20 P (smith, stand, outside) 1/ p n Even superposition measurement Post-selection measurement 11001i utside |01011i rgeon,move,outside |11011i A;£a£ M;“; dzbp^l ¡Q£“£ pfi“£aNQL ;lN i;¡„ ·;ge£ al£aNQǴ Data encoding and computation Sentence data is mapped onto the chosen basis, defining the quantum state for encoding. Quantum state representing meaning is encoded onto quantum simulator. Hamming distance calculated between test and encoded data. State amplitudes adjusted to reflect Hamming distance. Bit-string outcome is measured. Simulation is repeated to build state-distribution knowledge. dzi;¡„ ·;ge£ al£aNQǴ ! 1 2 (|01001i + |01011i + |11001i + |11011i) dzbp^l ¡Q£“£ pfi“£aNQǴ ! 1 2 (|00100i + |00110i + |10100i + |10110i) Preliminary results (Left) The closest encoded quantum-state vector to the data data “adult(s) stand inside”. The smallest Hamming distance calculated is “adult(s) sit inside”. (Right) Similarly, the closest state for “smith(s) stand outside” gives “surgeon(s) stand outside”. The results of this computation are entirely context-dependent, and by choosing different corpora one may obtain different relative meanings between queried test data. Strong Scaling Approximate linear decrease in runtime which slows as number of processes becomes large (MPI communication overhead) as the work per process decreases. Weak Scaling Non-linear increase in runtime caused by the increase in MPI Communications as the workload increases. Implementation

Upload: others

Post on 10-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Quantum Natural Language Processing€¦ · Quantum Natural Language Processing Lee James O’Riordan1, Myles Doyle1, Venkatesh Kannan1 , Fabio Baruffa2 1Irish Centre for High-End

Intel® QS Scaling

References

1. William Zeng and Bob Coecke, “Quantum Algorithms for Compositional Natural Language Processing”, EPTCS 221, 2016.

2. Joachim Lambek, “From word to sentence”, Polimetrica, Milan, 2008.

3. Nathan Wiebe, Ashish Kapoor, and Krysta M. Svore, “Quantum Algorithms for Nearest-Neighbor Methods for Supervised and Unsupervised Learning”, Quantum Information and Computation, 2014.

4. Mikhail Smelyanskiy, Nicolas P. D. Sawaya, Alán Aspuru-Guzik, “qHiPSTER: The Quantum High Performance Software Testing Environment”, arXiv:1601.07195, 2016.

Intel® Quantum Simulator

Intel® Quantum Simulator[4]

– Quantum High Performance Software Testing Environment.– Distributed high-performance implementation of a quantum

simulator on a classical computer.– Out-of-the-box simulation of general single-qubit and two-

qubit (controlled) gates.• Rotation, Hadamard, Pauli, Square Root of Pauli, Toffoli, SWAP,

Square Root of SWAP.– Single- and double-precision for qubit registers.

Implementation– Single-node and multi-node implementations of qubit gate

operations.– Multi-node implementation distributes state vectors to fit

per-node memory capacity to store states with optimised memory usage for improved communication.

Optimisations– Vectorisation, multi-threading, gate fusion.

Methodology

Quantum Natural Language ProcessingLee James O’Riordan1, Myles Doyle1, Venkatesh Kannan1 , Fabio Baruffa2

1Irish Centre for High-End Computing, Ireland2Intel Deutschland GmbH

Background

Natural Language Processing– Sentiment analysis, relationship extraction, word sense

disambiguation, automatic summary generation.– Traditional “bag of words” approach

• Lacks information about grammatical rules of language.• Increase in problem complexity reduces quality of results.

– “Distributed compositional semantics” approach[1]

• Grammatically informed algorithms compute sentence meanings.• Implementation requires large computational classical resources.

Quantum Computing– Potential to offer dramatic speedup to algorithms which can

exploit quantum parallelism.– Requires quantum versions of classical algorithms.– Application development

• Limited by scale, reliability and coherence of quantum devices.• Software quantum simulators allow proof of concept applications.

Project Objective– Implement quantum versions of distributed compositional

semantics algorithms to analyse sentence meaning.– Develop and evaluate solution on the Intel® Quantum

Simulator[4] deployed in Irish national supercomputer “Kay”.

Partnership– Irish Centre for High-End Computing & Intel® Corporation.– Co-funded by Enterprise Ireland & Intel® Ireland.

Project Execution– January 2019 to March 2020.

Distributed Compositional Semantics

Quantum Advantage

Contact Information

Distributional Model– Algorithms based on “bag of words” approach.– Meaning of words represented by frequencies of “nearby”

words in a corpus.

Compositional Model[2]

– Algorithms derive meaning of sentences or phrases from known meanings of component words.

– Embeds types of words and grammatical structure.

Unified DisCo Model[1]

– Combines both approaches to introduce grammatical form to the composition of word meanings.

– Allows computing meanings of two sentences and decide if their meanings match.

INSERTLOGO HERE

Irish Centre for High-End Computing (ICHEC)– Dr. Lee James O’Riordan, [email protected]– Mr. Myles Doyle, [email protected]– Dr. Venkatesh Kannan, [email protected]

Intel Deutschland GmbH– Dr. Fabio Baruffa, [email protected]

Classical vs. Quantum Storage Requirements

1 transitive verb 10K transitive verbs

Classical 1 GB 10 TB

Quantum 33 qubits 47 qubits

Monoidal categories(type specification and reduction; meaning

representation and composition)

Meaning vector spaces

Pregroup for types

Language

objects

meaning

axioms

type logic

Problem Mapping1. Represent category theoretic data structures and NLP

operations using Dirac notation.2. Define quantum versions of DisCo algorithms from literature

using Dirac notation.3. Map elements from Dirac notation to quantum circuit

notation (gates/registers/circuits).4. Map elements from quantum circuit notation to the Intel®

Quantum Simulator paradigm.

Results: Comparing Sentences

Mary(subject)

John(object)

likes(verb)

Category theoreticgraphical notation

Quantum DiracRepresentation

Example:DisCo model

sentence meaning

Quantum circuit

notation

|Maryi ⌦

0

@X

i,j,k

ci,j,k|ii|ji|ki

1

A⌦ |Johni

!X

i,j,k

ci,j,khMary|ii|jihJohn|ki

!X

j

dj |ji<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>

|likesi =X

i,j,k

ci,j,k|ii|ji|ki<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>

| i<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>

=<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>

f(| i)<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>

Intel® Quantum Simulator

X[1] X[�1] X[1]

X[1] X[�1] X[1]

X[1]

X[0] X[0] X[0] X[0] X[0]

| 1i

X[�1] X[1]

X[1] X[�1] X[1]

X[1] X[�1] X[1]

X[0] X[0] X[0] X[0] X[0]

R

. . .<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>

. . .<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>

Two-tier implementation1. Intel-QS bindings and quantum encoding implemented in

C++ layer (red).2. Pre-processing, analysis and plotting implemented in

Python (blue).3. External dependencies for respective layers indicated by

colour gradients (green/red => C++, green/blue => Python).

Quantum computing applications can be written entirely in C++, or Python. The Python layer allows for a single-ended application development environment, or standard cluster submission with respective Python scripts.

Intel® Quantum Simulator on Kay

Kay– 336 node cluster.– 13,440 CPU cores, 63 TB distributed memory.– Dual-socket 20-core Intel® Xeon Gold (Skylake) 6148 at

2.4 GHz with 192 GB memory.– 400 GB local SSD scratch.– 100 GB Intel® OmniPath network.– Additional partitions

• Dual NVIDIA Tesla V100.• Intel Xeon Phi (Knights Landing architecture).• High-memory 1.5 TB RAM with 1TB local SSD scratch.

Simulation on Kay– Up to 33 qubits on single-node executions.– Up to 41 qubits on Kay’s main partition.

.�i�b2i hQF2M "BM�`vns �/mHi yyns +?BH/ yRns bKBi? Ryns bm`;2QM RRv bi�M/ yyv KQp2 yRv bBi Ryv bH22T RRno BMbB/2 yno QmibB/2 R

h�#H2 R, "�bBb /�i�<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>

.�i�b2i hQF2M ai�i2ns CQ?M (|00i+ |10i)/

p2

ns J�`v (|01i+ |11i)/p2

v r�HF (|00i+ |01i)/p2

v `2bi (|10i+ |11i)/p2

no BMbB/2 |0ino QmibB/2 |1i

h�#H2 k, a2Mi2M+2 /�i� 2M+Q/BM; mbBM; #�bBb 7`QK h�#H2 RX<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>

adult,sit,inside |00100i

smith,sit,inside |10100i

adult,sleep,inside |00110i

smith,sleep,inside |10110i

child,stand,outside |01001i

surgeon,stand,outside |11001i

child,move,outside |01011i

surgeon,move,outside |11011i

0.00

0.05

0.10

0.15

0.20

P(a

dult,st

and,in

side)

1/p

n

Even superposition measurement

Post-selection measurement

adult,sit,inside |00100i

smith,sit,inside |10100i

adult,sleep,inside |00110i

smith,sleep,inside |10110i

child,stand,outside |01001i

surgeon,stand,outside |11001i

child,move,outside |01011i

surgeon,move,outside |11011i

0.00

0.05

0.10

0.15

0.20

P(s

mith,st

and,ou

tsid

e)

1/p

n

Even superposition measurement

Post-selection measurement

adult,sit,inside |00100i

smith,sit,inside |10100i

adult,sleep,inside |00110i

smith,sleep,inside |10110i

child,stand,outside |01001i

surgeon,stand,outside |11001i

child,move,outside |01011i

surgeon,move,outside |11011i

0.00

0.05

0.10

0.15

0.20

P(a

dult,st

and,in

side)

1/p

n

Even superposition measurement

Post-selection measurement

"�bBb<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit> .�i�

<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>

dzCQ?M `2bib QmibB/2- �M/ J�`v r�HFb BMbB/2Ǵ<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>

Data encoding and computation• Sentence data is mapped onto the chosen basis, defining the

quantum state for encoding.• Quantum state representing meaning is encoded onto

quantum simulator.• Hamming distance calculated between test and encoded data.• State amplitudes adjusted to reflect Hamming distance.• Bit-string outcome is measured.• Simulation is repeated to build state-distribution knowledge.

dzJ�`v r�HFb BMbB/2Ǵ ! 12 (|01001i+ |01011i+ |11001i+ |11011i)

<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>

dzCQ?M `2bib QmibB/2Ǵ ! 12 (|00100i+ |00110i+ |10100i+ |10110i)

<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>

Preliminary results

• (Left) The closest encoded quantum-state vector to the data data “adult(s) stand inside”. The smallest Hamming distance calculated is “adult(s) sit inside”.

• (Right) Similarly, the closest state for “smith(s) stand outside” gives “surgeon(s) stand outside”.

The results of this computation are entirely context-dependent, and by choosing different corpora one may obtain different relative meanings between queried test data.

Strong Scaling– Approximate linear decrease in runtime which slows as

number of processes becomes large (MPI communication overhead) as the work per process decreases.

Weak Scaling– Non-linear increase in runtime caused by the increase in

MPI Communications as the workload increases.

Implementation