the graphing of difference: numerical mediation & the case of google’s knowledge graph (draft)

The Graphing of Difference Numerical Mediation & the Case of Google’s Knowledge Graph

Abstract

This article offers a critical examination of contemporary graph databases, such as Google’s Knowledge Graph, from the perspective of media theory, philosophy of difference, and epistemology. It argues that the fundamental data structure of the ‘triple,’ in essence a subjectpredicateobject statement, constitutes a problem immanent to the database itself. The article begins with a brief meditation on numerical mediation before examining the emergence of Knowledge Graph through Google’s research publications. It then moves on to demonstrate a logic of representation underlies all graph databases that operates similarly to Aristotle’s theory of perception and categorization. Drawing on Gilles Deleuze’s criticism of Aristotle, it argues that graph databases fall into similar traps of identity and representation and are unable to understand difference in itself. In closing, it offers an initial diagnosis of the limitations of graph databases, including an unbridgeable distance from the discovery and invention of the new. KEYWORDS : Google; Knowledge Graph; Graph Databases; Gilles Deleuze & Difference; Aristotle & Categories

The Graphing of Difference Numerical Mediation & the Case of Google’s Knowledge Graph

A Meditation on Number: Enumeration and Nomination

PROMETHEUS. [...] Number too, supreme among skills, I invented for them, and letters in combination, the record of all things, the mother and crafter of poetry.

Aeschylus, 2008, 457461, p. 113

[N]umeracy almost always, perhaps always, precedes literacy.

Hacking, 1982, p. 287

The traditional story of Prometheus begins with the fault of Epimetheus, his brother Titan, who was charged with distributing attributes to animals at their origin. Lacking foresight, Epimetheus ran out of attributes at the last animal, the human. At this point, Prometheus moved to correct his brother’s error, stealing the arts of civilization and fire from Zeus and gifting it to humankind, a crime for which he was chained for all eternity to have his liver eaten from his body each day by an eagle. This story runs from Hesiod to Plato’s Protagoras , where it was embedded in the history of Western philosophy. From Plato to Jacques Derrida, one could argue that philosophy’s most frequent interpretations of the story are fundamentally rooted in the gift of logos , the word, language, reason, and its technical supplementation through prosthetic supports like writing. As Aeschylus makes clear in Prometheus Bound, this is only half the story, for Prometheus gave humans number, which he considers ‘supreme among skills’. Perhaps in hindsight this story was so often overlooked because of a problematic translation of the Greek concept of number: arithmos .

As Martha Nussbaum (1979) has demonstrated in her analysis of Philolaus’ response to Parmenides and his Eleatic Conventionalism, arithmos was intimately bound up with the capacity to apprehend, recognize, and even name an object. As Nussbaum notes, the translation of arithmos as ‘number’ is problematic because a contemporary understanding of number implies abstraction, whereas for the ancient Greek, arithmos more often conveyed the sense of

that which gets counted, in contrast to the mechanism by which we count (p. 8990). As Nussbaum writes,

The most general sense of arithmos in ordinary Greek of the fifth century would be that of an ordered plurality of its members, a countable system or its countable parts. The notion of arithmos is always very closely connected with the operation of counting. To be an arithmos , something must be such as to be counted — which usually means that it must either have discrete and ordered parts or be a discrete part of a larger whole. To give the arithmos of something in the world is to answer “how many” about it. (p. 90)

This notion of number is that which runs from Homer to Aristotle, and it is one in which the very act of perception is dependent upon the skill of number, the ability to bound or delimit a thing such that it can be recognized as a discrete object. It is only atop this skill that humans are able to recognize ordered pluralities or subdivisions of objects, and thus it is the bedrock of all counting, classifying, and measuring. Number is thus also the bedrock of apperception.

The primary story of Prometheus’ gift describes only language, and is rooted in poetry and a metaphysics of presence, most clearly exemplified by the work of Martin Heidegger. One must look to Alain Badiou, a mathematician become philosopher, for the contrasting narrative. He writes: “The Greeks did not invent the poem. Rather, they interrupted the poem with the matheme” (Badiou, 2006, p. 126). In Badiou’s mathematical ontology, existence is tied to nomination. In order for something to be, it must have a (proper) name. And for Badiou, all nomination requires enumeration, save one exception, nothing (p. 6667). Here we can see the alternate history of Western thought, in which the very act of signification is afforded only by enumeration, by arithmos in the Greek sense of rendering something discretely, either as an organized plurality or a part of one. In a strong interpretation, Being is literally preceded by the function of enumeration and nomination, but a lighter one might instead argue that ontology is epistemologically limited by this function.

What I mean to take from all of this is that number may be the first medium, if not in ontology, then certainly in epistemology. Language, be it written or spoken, is operative across time. It is mnemotechnical; it engages in processes of hypomnesis and anamnesis that require the presumption of at least two moments. Where language operates in the interstice between moments, number can operate at a single point. Number operates in the thickness of the present moment – in what Alfred North Whitehead would term ‘presentational immediacy’, or what William James would call ‘the specious present’ – by rendering discretion and affording the immediate perception of differentiated objects prior to their recognition and representation, their naming, classification and categorization. In so doing, it is enumeration that affords nomination, and thus it is number that is at the bedrock of our distinctions between what is in being and nonbeing, what exists and what does not. Number is the primary medium by which chaos communes with order. As Hacking notes, “Enumeration demands kinds of things or people to count. Counting is hungry for categories” (1982, p. 280).

§1: Things, Not Strings

To exist is to be indexed by a search engine…

Introna & Nissenbaum, 2000, p. 171

In 2012, Google made company wide changes to reorient their focus from ‘search’ to ‘knowledge,’ all of which was based on the introduction of the Knowledge Graph, a graph database that had been in the works for years and, via an inhouse implementation of a C++ API, would now automatically populate content boxes with the most relevant information related to specific queries alongside standard search results (see Figure 1). Announcing its launch, Amit Singhal – Google’s Senior Vice President, software engineer, and head of its core ranking team – argued that the introduction of the Knowledge Graph signaled a critical first step towards a new generation of search capable of parsing semantics, of knowing exactly what you meant by your search terms. In the past, Google had been unable to do this. Singhal writes, “It’s why we’ve been working on an intelligent model–in geekspeak, a “graph”–that understands realworld entities and their relationships to one another: things, not strings” (2012). No longer would Google be parsing arbitrary data values in table strings; instead, Google’s algorithms would now have the power to engage with things : realworld people, places, and things.

By analyzing the content of the internet at scale and monitoring our search practices, Google claimed it would be able to “[tap] into the collective intelligence of the web and [understand] the world a bit more like people do” (Singhal, 2012). Graph databases only operate atop artificial intelligence and machine learning

algorithms designed to extract machinereadable information from human semantics. These extractions occur by quickly parsing web corpuses at the scale of billions of documents, and producing iterative ontologies and schemas that constantly adapt themselves to be able to classify and extract as much of the data they come across in those corpuses as possible. At its launch, Singhal noted that the Knowledge Graph already contained more than 500 million objects and more than 3.5 billion facts about and relationships between them, largely based on its analysis of Freebase, Wikipedia, and the CIA World Factbook. The end result is a responsive database that knows which things your query actually corresponds to, that can summarize the most relevant information about them, and that can facilitate discovery and “sometimes help answer your next question before you’ve asked it” (ibid.). It is worth noting that in the same year, Google’s Search Quality Team was redubbed the Knowledge Team, which reflects their commitment to realizing this future expansion of search into the epistemic grounds of the real.

At the first World Wide Web Conference in 1994 Tim BernersLee called for the expansion of the Web to include machinereadable information and relationship values for links (1994). For years it was thought that this expansion would allow machines to comprehend semantic web corpuses (BernersLee, Hendler, & Lassila, 2001), and it is precisely this understanding of semantics as produced by metadata and typed links that is at the core of Google’s new knowledge infrastructure. At its simplest, the entire apparatus behind the Knowledge Graph can be boiled down to two interrelated processes: first, the automated production of machinereadable information and metadata about the things on the Web and their interrelationships, and second, the learning and iterative production of an entire ontology and schema that describe what those things can be and/or how they can be related.

For years it has been Google’s position that the Web at large can be understood as a huge body of human knowledge in the form of (classes of and named) objects or entities, and facts constituted by their interconnecting relations (Pasça, 2007, p. 101). These relations between entities are understood as “‘hidden’ arguments” about the world, and underlie implicit typologies of relations (p. 107). In essence, Google researchers look at the Web as a gigantic databank of expository statements about the world, and it is precisely as a collection of parsable expository statements that they understand knowledge. When represented in machinereadable form, these expository statements take the form of a ‘triple’, which essentially is the computer equivalent of a subjectpredicateobject statement. A graph database is 1

a large aggregate of these triples, populated with ‘nodes’ representing things or entities and ‘edges’ representing relations (see Figure 2). Any given thing or entity represented by a node in the graph can serve as either the subject or the object of any given triple, and which is determined by the direction of the edge. All edges are directional and establish subject/object relations by their 1 For a more technical overview of this aspect of the Knowledge Graph, please see my forthcoming article in Computational Culture.

directionality. While this may seem complicated, it is rather simple to understand in application. It is rarely the case that the subject and object of any expository statement can be inverted without producing nonsense. For example, the fact that the subject [Djuna Barnes] has the relation [was born on] to the object [June 12, 1892] makes sense, but it is nonsensical to invert it and say that June 12, 1892 was born on Djuna Barnes.

From these large repositories of triples – constative claims or arguments reduced to the form of subjectpredicateobject statements of ‘facts’ – Google is able to learn and iteratively produce an entire ontology and schema for knowledge. The schema it produces is a typology of 2

relations or links, one of the key components BernersLee called for decades earlier. This schema ensures that across the graph all represented relations are subsumable under a general typology, and that all relations of the same type are numerically equivalent (i.e., the same). The ontology it produces is a specification of what things or entities, and which relations, can exist; it is a pattern that determines what the graph database contains, what form search queries need to take in order for it to be parsed, and what the appropriate knowledge is in relation to any particular query. 3And here we come full circle to our opening meditation on number, where we can see that existence, at least in terms of epistemological, if not phenomenal, availability is literally determined by enumerability. What exists for Google is strictly that which can be statistically extracted through machine learning algorithms, and that which can be abstracted from its context into the numerical form of a triple. In a state of ‘information explosion’ and constant dissemination of communications via digital media, what we are able to engage is strictly and necessarily limited to that which can be bounded, epistemologically differentiated, and it is precisely graph databases like Google’s that are at the forefront of performing that labor for us. The sheer volume of web documents fills its channel, like an old DTMF telephone whose buttons are all pressed simultaneously, and becomes sheer noise. We rely on data indexes and parsing algorithms to enumerate those documents once again for us, an impractical task for humans because of its scale, and it is only after that having occurred that we can once again engage with them on the level of human language and nomination.

The problem with graph databases like Google’s is not obvious, and were these databases not rapidly becoming the future of enumeration and identified with knowability per se, there would be no problem. Through what is called a ‘join function’ multiple nodes can be linked by their edges forming complex paths, even isolating clusters and regions that form sub or regional graphs, and these operations allow for the performance of complex computations at an unimaginable speed and the production of knowledge that was never before available – at least 4

in this form – to humans. In these largescale processes of enumeration, the human is afforded an

2 It is important to note here that this endeavor is not something new in Google’s research trajectory. Even before publishing his work on the PageRank algorithm that launched Google search, Sergey Brin was working on extracting binary relations from large corpuses via machine learning (1998). This tradition has been reflected across a large swath of Google’s research publications ever since. For a more detailed history, see my forthcoming article in Computational Culture. 3 Ontology here is a rough analog to its philosophical equivalent. For computer science, as Thomas Gruber writes, “A body of formally represented knowledge is based on a conceptualization: the objects, concepts, and other entities that are presumed to exist in some area of interest and the relationships that hold them… An ontology is an explicit specification of a conceptualization… For knowledgebased systems, what ‘exists’ is exactly that which can be represented” (1993, p. 1). 4 In 2010, Google was able to use its pilot graph database to implement the famous PageRank algorithm with only 15 lines of code (Malewicz, et al., 2010, p. 140).

expanded capacity of nomination, or, more simply, humans can know new things, relations, and aggregates of the two. There are many problems with this that seem to me, despite their utmost importance, to be outside the data structure, such as environmental damage caused by electricity consumption in server farms, unequal access to and blackboxing of the machines capable of performing these operations, and malicious use of the knowledge generated by these systems. 5The problem that is instead immanent to the data structure itself is caused by the basic building block of the triple, which, despite some clever new maneuverings, still risks falling into the traps of representationalism that have plagued philosophical debates since Plato popularized them and Aristotle formalized them. As such, we will now turn to an examination of the relevant aspects of Aristotle, and look to Gilles Deleuze’s criticism of Aristotle to locate the source of the problem immanent to graph databases themselves.

§2: Graph Probabilities and the Potentiality of Difference

In very general terms, we claim that there are two ways to appeal to ‘necessary destructions’: that of the poet, who speaks in the name of a creative power, capable of overturning orders and representations in order to affirm Difference in the state of permanent revolution which characterizes eternal return; and that of the politician, who is above all concerned to deny that which ‘differs’, so as to conserve or prolong an established historical order, or to establish a historical order which already calls forth in the world the forms of its representations. (53)

Gilles Deleuze, 1990, p. 53

For Aristotle, the question of aisthesis and the faculty for sense perception can only be answered within his well known theory of actuality ( energeia) and potentiality ( dynamis ). The theory of energeia / dynamis is largely the result of Aristotle’s efforts to explain becoming ( genesis ) in light of Eleatic Conventionalism – a school of thought initiated by Parmenides that essentially argued that nothing can comeintobeing from nonbeing, and upon this assertion, erected a unified and undifferentiated universal One. Aristotle argued that Parmenides’ position had forced successive philosophers to reduce genesis to either qualitative change or the rearranging of basic elements, or the stoicheia ( Physics I, 187a; De geratione et corruption I, 12). He instead argued that the basic elements ( stoicheia) function as binary opposites ( enantia) , and each is always capable of changing into its opposite ( De generatione et corruption II, 331a, 337a). He is able to explain genesis without need of nonbeing by positing three principles: an immanent form ( eidos ), elsewhere referred to as a species , an undefined substratum ( hypokeimenon) to the world, which persists throughout change and houses the third principle, privation ( sterēsis ), which is the potential ( dynamis ) for any of the basic elements ( stoicheia) to change into its opposite ( enantion) ( Physics I, 190ab). The substratum’s essence ( ousia) is purely to serve as the ground for the genesis of other things ( Metaphysics 1028b1029a), though it has a material existence and, along with eidos , serves as a coprinciple of Being ( on) ( Physics I, 190b). It is important to note that this substratum performs this function by serving as a reserve

5 For more on this variety of problem, see Andrejevic, 2013 for a strong start.

of potential ( dynamis ) for change, but this potential for change is limited to privation ( sterēsis ), which Aristotle defines as “the negation of something within a defined class” ( Metaphysics 1011b). Here this means that change is brought about by an actual ( energeia) lack in each basic element ( stoicheion) of its opposite ( enantion), but a corollary potential ( dynamis ) for its passage into the opposite ( enantion) that it lacks. The possibility of genesis is created by this substratum’s reserve of privation ( sterēsis ) as potential ( dynamis ) for the basic elements ( stoicheia) to shift between their opposites ( enantia) ( Physics I, 190a192b; De generatione et corruption II, 324a, 328b331a).

In De anima, Aristotle argues that perception requires two things: (1) that the perceiver has the capacity to perceive, regardless of whether he or she is actually perceiving anything, and (2) that he or she actually perceives something. The perceiving faculty of the soul must exist as a latent capacity or potential corollary to what the perceived thing is in actuality (II, 417a418a). In terms of the soul, the perceptive faculty perceives an idea ( eidos ) without its materiality (see Peters 1969, p. 12). When described physically, the perceptive faculty is made possible by a balancing of opposite forces, where they exist in the mean or in a proportional state in which the 6

perceiver’s faculty exists as “actually neither, but potentially both” ( De anima II, 423b424a), and the actual perception comes about during their actualized adjustment to the perceived object. It is thus that ‘like can know unlike,’ and the subject can (potentially) become like the object known in order to perceive it ( ibid., II, 417a418a). Aristotle’s theory thus rests upon his concept of privation ( sterēsis ), wherein the faculty for sensation holds itself in reserve, always containing the potential to actually change into what it is not. Thus, at the level of actuality, things can be unlike one another, while at the same time they can maintain the potentiality of becoming alike through privation.

While this may seem esoteric in terms of our topic, we can here see the source of the first firmly grounded representationalism in philosophy. The capacity for being represented is the necessary foundation for any perception or knowledge, and thus it precedes both aisthesis and noesis for Aristotle. In fact, anything that can be perceived or known is already present, but lying in potential, and is made actual by the senses and the mind becoming identical with that object’s sensible and intelligible form ( De anima III, 429a, 431ab). Here the mode of representation determines what gets represented, and this comes to bear on the Knowledge Graph in the same sense that its basic form of the triple and its given ontology and schema precisely determine what is perceivable and knowable to the graph. Further, this capacity to shift from potential to actual, for like to know unlike through its capacity to become the unlike, is grounded on the play of opposites in the basic elements. While Aristotle’s basic elements are no longer applicable, we can see the operation of similar elements in the Knowledge Graph. As I’ve shown, Google researchers understand knowledge as a large set of constative claims or ‘facts’ that can all be formalized into subjectpredicateobject statements, or aggregates thereof. While the form of the triple is certainly more flexible than the original binary relations that Sergey Brin was analyzing back in 1998, it still boils down to basic elements that can be analyzed in terms of oppositions between subject and object, or node and edge. What this means for us is that even those things 7

6 I.e., the opposing elements (stoicheia), which Aristotle reduces to two pairs of opposites (enantia) of hotcold, and drymoist; see De geratione et corruption II, 329b330a. 7 This point might be pushed further to include the process by which these relations are extracted, where in a natural language sentence each element of the triple is isolated as an ‘infix’ in relation to filler words that function as

that are not yet included in the Knowledge Graph can only ever be included through their entrance into this play of ‘basic elements’ that constitute the graph’s capacity to know what it is ‘unlike’. In his reading of Plato’s Sophist, Deleuze explains that the sophist leads Platonism into a confrontation with nonBeing where it’s ‘common sense’ begins to fail, and yet, Platonism is unable to reduce that with which it is confronted to the negative, to nonBeing (1990, p. 256). Deleuze instead argues that “‘non’ in the expression ‘nonbeing’ expresses something other than the negative,” and it is this aporia that Deleuze will write as “(non)being,” or better, “?being” (1994, p. 63, 64). Deleuze repeatedly warns that representational philosophy, invested as such in identity, will always present one with the false dichotomy of either a fully determinate and positive Being with no difference, or a Being with differences produced by nonBeing, negation, and the negative (e.g., Deleuze 1990, p. 106107; 1994, p. 52, 2689). Instead, Deleuze looks for a purely positive and affirmative articulation of difference (p. 277), and looks to ?being for an opening. For Deleuze, ?being is the source of affirmative differentiation, and thus is the first principle of all genesis. Negation is only the shadow of affirmative differentiation, and to confuse the two is always to allow the illusion of contradiction to slip into our understanding of genesis. There is always an affirmative differentiation behind the appearance of contradiction and the shadow of negation. Deleuze writes, “Beyond contradiction, difference – beyond nonbeing, (non)being; beyond the negative, problems and questions” (p. 64).

Aristotle formalized and remained ensnared in this Platonic logic of negativity, as evidenced by his philosophy requiring the logic of representation and identity to function. For Aristotle, the essence of any thing is its position in an analytical taxonomy, its species and genus . Thus, its essential difference is not its singular identity, but instead its membership in a species 8

sharing a homogeneous differentiation. Individual members of the same species in their absolute specificity and singularity are only superficially different from one another. We can already see here that difference is being grounded and bracketed, only appearing in the interstice between two genera or species. Deleuze argues that an ontology such as this only understands difference through the logic of representation. It makes difference reflective, mediated by the analogies and oppositions between conceptual identities and their predicates (1994, p. 34). Aristotle fully subsumes and brackets difference within the logic of (classificatory) identity, wherein it can only manifest itself as relations between conceptual groups of individuals. For Deleuze, this ontology has overlooked difference in itself, both at the level of species, within which individual members constitute an irreducible diversity of singular identities, and at the level of genus, where

‘prefixes’ and ‘postfixes’. For more on that process see my work on Google’s TextRunner in my forthcoming article in Computational Culture. 8 For Aristotle, the appropriate answer (ti esti) to this new question (dia ti) is the result of an analytical definition (horos ) that proceeds by separating, dividing, and making distinctions (diairesis ) between members of a large, common group of things (genus ) (Analytica Posteriora II, 96b97b; Topics VIII, 153a). Each of these divisions (diaphora) serves to further isolate the essence (ousia) of the thing in question, and the process terminates by amassing the differences (diaphora) into an identification of a species, which is the answer (ti esti) and essence (ousia) of the thing at hand. Now, it is important to note that while Agamben always uses the word species to denote this identifier, Aristotle uses two synonymic terms: atomon eidos and infirma species (Metaphysics 1038a, 1045b). He frequently uses the word eidos to denote species, and frequently argues that eidos has the best claim to being the individual primary substance (ousia) of a thing. His first argument treats eidos as the most succinct and accurate predicate for any given subject or object (Categories 2ab). And Aristotle’s second treatment articulates eidos as the immanent formal cause in any (compound) being (Metaphysics 1029a, 1041ab).

difference is already bracketed into particular conceptual differentiae by some overarching analogical generality. We might push this further and note that for Aristotle, the categories that 9

taxonomize differentiae and allow for the classification of species and genus predicate Oneness ( Metaphysics 1003ab, 1053b). There is a particular enumerative operation that originates with categorization that affords individual differences, species, and genuses their capacity to be recognized and represented in the world.

As I’ve already noted, this enumerative operation that is predicated by categorization is that of the triple, and it is only by the play of the ‘basic elements’ of the triple – subject, predicate, and object, or nodes and edges – that the representations are made available for categorization. This Aristotelian impulse to examine only lesser and greater abstractions for taxonimical purposes is precisely that behind Google’s research. Google’s credo is that of simple models fueled by a lot of data, and this is because Google researchers understand anomalous events to be collectively frequent at large enough scales (Halevy, Norvig, & Pereira, p. 9). Google’s taxonomy is more flexible because their simple models are meant to grow through iterative steps of machine learning, and thus the need for generative rules is alleviated. Yet, the modifications of schema and ontology across iterations is still limited by the ‘basic elements’ that are able to be perceived and known. The data structure of the graph is, by necessity, encoded at the level of hardware, data serializations, assembly and compiler code, and thus its potential to become ‘unlike’ is always bracketed to operations on triples.

Drawing on Henri Bergson, Deleuze tells us that the form of difference composed by negative relations between identities at varying levels between genus and species is always abstracted from the real world, and is too broad and general to be of any real use (1991, p. 44). He writes, “The combination of opposites tells us nothing; it forms a net so slack that everything slips through” (p. 4445). The differences between individuals of the same species, which Aristotle brackets because of its unsoundness for a philosophy of representation, are truly differences in kind, and thus elude an Aristotelian classificatory schema (p. 46). This becomes most important in the distinction between ‘possibility’ and ‘potentiality’. Deleuze argues that possibility has no reality before its exhaustion; it only exists in retrospect because it is only given to us readymade, preformed, and preexistent by its real(ized) form. While one might assume that the real is some manifestation of a larger realm of possibilities, it is instead the case that that larger realm of possibilities is but a sterile duplicate abstracted a posteriori from the real (p. 9798). In this sense, the possible is always dependent on the real, exists secondarily, and is but a reflection of what already is or has been.

In contrast to this, Deleuze argues that a conception of potentiality is better understood by replacing the real/possible binary with that of the actual/virtual distinction. The virtual is alwaysalready fully real, existing alongside and intertwined with the actual, and can be envisioned as a great plane populated with ‘nonnumerical multiciplicities,’ with singularities and forces intermingling with one another and allowing for an indefinite number of actualizations to arise (1991, p. 967). What is precisely the point here is that these nonnumerical multiplicities are not enumerable, they cannot be rendered into a stable identity, and despite the fact that multiple actualizations can arise from these same multiplicities, those actualizations are not numerically identical. Each actualizations is different in kind, and provides an inflection of or perspective on the nonnumerical multiplicity from which it arose without being able to 9 For a fuller explication of Deleuze’s problematization of Aristotle’s classification schema, see James, 2003, p. 5963.

enumerate it. The nonnumerical multiplicity cannot be totalized for enumeration, does not even resemble its actualized counterpart, and thus cannot be represented. As such, the true commonality between different actual things is nonrepresentational in nature. They hang together only as affirmations of difference, as positive and creative lines of heterogenesis in an actual world of irreducible pluralism (p. 101104). For Deleuze, the virtual is difference in itself, and its actualization is the only productive understanding of differentiation.

Each actualized thing is the fruit of chance, a multiple phenomenon composed of “a plurality of irreducible forces” (Deleuze, 1983, p. 3940). Yet, in its actualization, any given thing is cut off from the virtual multiplicity from which it arose. Actualization is an arrest of difference, and thus classifications of species and genus based on already actualized differences is a diachronic, and somewhat arbitrary, endeavour that has little explanatory power for the actual rhythms of change in the world. In order to produce this taxonomy of differences, one must produce differentiations secondhand and abstractly, one must cut what is to be surveyed and classified off from difference in itself in order to obtain stable and representable identities. The realm of possibilities that one produces as a reflection of the real things being surveyed then must constantly be refreshed, as the potentiality of the virtual continues to allow for the actualization of the im possible.

§3: The Problem(s) with Graph Databases

You have the individuality of a day, a season, a year, a life (regardless of its duration) — a climate, a wind, a fog, a swarm, a pack (regardless of its regularity). Or at least you can have it, you can reach it. A cloud of locusts carried in by the wind at five in the evening; a vampire who goes out at night, a werewolf at the full moon.

Gilles Deleuze & Félix Guattari, 1987, p. 262

While Deleuze’s rather dense and technical explications of difference in itself might seem

far from the subject of graph databases and machine learning, they demonstrate a mode of thought capable of highlighting the future limitations of such endeavors. What escapes any graph database is difference in itself, and in its place we have the stale differentiations of a diachronic slice of actual things and entities. While it certainly affords us some rather useful insights into the possibility and probability of what already is, it has little capacity to engage with the truly new. What Google understands any given entity to be is an abstracted set of relations, a participant in a certain isolatable graph region made up of subjectpredicateobject triples, and this process of enumeration delimits the ontology, schema, and epistemology that are built atop it. There are certainly aspects of this problem that can be corrected for, and their manifestation is often comical, like the inclusion of William Shakespeare and Diana, Princess of Wales in the

Knowledge Graph Carousel for ‘Famous Actors’ (see Figure 2). It can also have some rather bizarre results, like including The Book of Repulsive Women and Other Poems among Djuna Barnes famous works, only to provide the following description of the text at the time of my writing: “This updated guide covers everything readers need to know about electronic mail. New to this edition is: advice on choosing a service provider; an updated guide to service providers; more information on LANbased email; and the latest developments on Windows 95.”

The problems that cannot be as easily corrected for are those that are engendered by the basic building blocks of the system, the ‘basic elements’ of the triple at the heart of the statistical abstraction of an entire schema and ontology. It is this numerical mediation that determines the pieces and form of information presented in the Knowledge Graph’s content box, which often include a selection of photos, a name and profession, an opening line from Wikipedia, a date and place of birth and death, as well as notable works and related people (see Figure 1 above). These are the facts it can enumerate about a person, and it is upon these facts that they are presented, named, and made available for us to experience and know. And yet, the immediate question surfaces that, if this is knowledge, how incomplete is it? The logic of representation undergirding it produces an infinite series of questions about just how representative this information really is of a life lived. Is being born in the same place a homogeneous relation across people? Certainly not, as this abstraction loses the entire process by which a person individuates themselves from and comports themselves towards a place. Not only that, but are the most significant relations a person has to a place really those of birth and death? Where is the relation to Paris as a ‘moveable feast’ that so marked a generation of American writers and expats like Djuna Barnes?

In discarding the absolute specificity of the individual in favor of statistically abstractable genera, the Knowledge Graph is cut off from difference in itself, the play of the actual and the virtual. And this is of great consequence for the future hopes of so many companies investing in graph database and parsing mechanisms, because while they can feign a certain capacity and flexibility unavailable in any previous Relational Database Management Systems (RDBMS), they still can’t quite grasp things like style, comportment, individuation, and similar aspects of difference. And this is precisely what they are angled to provide, as graph data is increasingly envisioned as the future of recommendation engines like Netflix or Amazon’s, knowledge engines like Google’s, and curatorial engines like Pandora or Spotify’s. Graph databases are similarity engines, working on an established ground of possibility that is easily exhausted, despite all of their efforts to simulate the serendipity of discovery, of offering you what you didn’t (yet) know that you wanted.

You cannot ask a graph database to present you with difference as a result of your query. You can simulate it by bracketing difference to particular subregions of the graph, to particular relations or entities, in which case you might get the different books Djuna Barnes wrote, or different horror movies that went straight to video. But graph data can’t seem to tell you who the next big undiscovered musical talent is, even though these are relations that it tracks. Graph data can’t tell you which actors were awkward or shy as children, which scientists write more clearly or speak engagingly. These are, at least, processes that could conceivably be simulated one day. What graph databases will never be able to do is present nonrepresentational experiences and knowledge. If Google’s Knowledge Graph does become the dominant mode of producing information out of the billions of documents on the Web, then these limitations take on serious political, epistemological, aesthetic, and ethical stakes. If nomination is indeed preceded by enumeration, then the very field of possible perceptions and knowledges is curtailed by the medium through which things and entities are enumerated. Existence will literally be found in the index of a graph database.

The struggle to find and/or maintain alternate routes to perception and knowledge is outside the purview of this article, as it is not a problem immanent to graph databases themselves. In terms of the internal limitations of graphs, at best, one can hope that the future holds graph databases and information extraction mechanisms that are open to the public, so that graphs might be made that could mutate themselves, that could create new and unexpected schemas and relations based on the introductions of new ‘basic elements’ and combinations thereof. Rather than one graph, many graphs, a proliferant continuance, each engaging in the actualization of new forms of the virtual, and – like Cezanne and his apples – presenting new actualizations of a multiplicity rather than attempting to represent one in total. A new poetics that responds to the matheme, a poetics of data...

Reference List

Aeschylus. (2008). Persians and other plays (C. Collard, Trans.). New York, NY: Oxford World’s Classics. Andrejevic, M. (2013). Infoglut: How too much information is changing the way we think and know. New York, NY: Routledge. Badiou, A. (2006). Being and event (O. Feltham, Trans.). New York, NY: Continuum. BernersLee, T. (1994). W3 future directions. Plenary talk. W3. Retrieved from

http://www.w3.org/Talks/WWW94Tim/ Brin, S. (1998). Extracting patterns and relations from the World Wide Web. in Proceedings of the 6th International Conference on Extending Database Technology (EDBT98), Workshop on the Web and Databases : 172183. Deleuze, G. (1994). Difference & repetition (P. Patton, Trans.). New York, NY: Columbia University Press.

—. (1991). Bergsonism (H. Tomlinson & B. Habberjam, Trans.). New York, NY: Zone Books.

—. (1990). The logic of sense (C.V. Boundas, Trans.). New York, NY: Columbia University Press. —. (1983). Nietzsche & philosophy (H. Tomlinson, Trans.). New York, NY: Columbia University Press.

Deleuze, G. and Guattari, F. (1987). Capitalism and schizophrenia: A thousand plateaus (B. Massumi, Trans.). Minneapolis, MN: University of Minnesota Press. Graph databases for beginners: NoSQL databases. neo4j. Retrieved from http://neo4j.com/blog/whynosqldatabases/ Gruber, T. (1993). A translation approach to portable ontologies. Knowledge Acquisition, 5(2): 199220. Halevy, A. Norvig, P. and Pereira, F. (2009). The unreasonable effectiveness of data. Intelligent

http://www.w3.org/Talks/WWW94Tim/

http://www.w3.org/Talks/WWW94Tim/

http://neo4j.com/blog/why-nosql-databases/

Systems, IEEE 24(2): 812. Hacking, I. (1982). Biopower and the avalanche of printed numbers. Humanities in Society, 5: 179195. Introna, L.D. and Nissenbaum, H. (2000). Shaping the web: Why the politics of search matters. The Information Society 16: 169185. p. 171. James, W. (2003). Gilles Deleuze’s difference and repetition: A critical introduction and guide. Edinburgh, UK: Edinburgh University Press. Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, I., Leiser, N., & Czajkowski, G. (2010, June). Pregel: A system for largescale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (pp. 135146). ACM. Monea, A. (Forthcoming). Graph force: Rhetorical machines and the narization of knowledge. Computational Culture. Nussbaum, M.C. (1979). Eleatic Conventionalism and Philolaus on the conditions of thought. Harvard Studies in Classical Philology, 83, 63108. Paşca, M. (2007, May). Organizing and searching the World Wide Web of facts – step two:

Harnessing the wisdom of the crowds. In Proceedings of the 16th international conference on World Wide Web (pp. 101110). ACM.

Rancière, J. (1999). Disagreement: Politics and philosophy (J. Rose, Trans.). Minneapolis, MN: University of Minnesota Press.

—. (2010). Dissensus: On politics and aesthetics (S. Corcoran, Trans.). New York, NY: Continuum.

Singhal, A. (2012, May 16). Introducing the Knowledge Graph: Things, not strings. Google: Official Blog. Retrieved from http://googleblog.blogspot.com/2012/05/introducingknowledgegraphthingsnot.html

the graphing of difference: numerical mediation & the case of google’s knowledge graph (draft)

Documents