Tuesday, March 24, 2009

TNP: 20 Years On

In 1989 I was reaching the peak of my career. My PhD coursework was complete and behind me, I was gainfully employed (if underpaid) teaching logic and philosophy for the U of A and Athabasca University, I was elected for my first term as president of the Graduate Students' Association, and I was riding a wave of personal and political popularity.

More importantly for me, I was finally understanding the problems that had drawn me to formal learning in the first place. Though I had started university simply because it was a requirement for advancement in the work world, over the years I had been drawn increasingly to political activism and philosophical exploration. In 1989, the pieces came together. I watched the rise of 'people power' around the world. I had seen Francisco Varela speak on AIDS and immunology at the University of Alberta hospital. I began to see how networks, whether of individuals of cells, could take shape, form patterns, act with purpose. And how this would reshape how we understood the world.

In 1990 I attended, along with a number of the other graduate students at the University of Alberta, the Connectionism conference at Simon Fraser (downtown), combining it with a National Graduate Council meeting and a week-long vacation in New Westminster I spent reinterpreting the Tao Te Ching. That summer I sat at the very top of the hill at the Edmonton Folk Festival and in a frenzy of writing, completed the first draft of what would eventually become The Network Phenomenon: Empiricism and the New Connectionism. In the fall of that year, I presented it to my doctoral committee as a proposal for the work I wished to do to complete my PhD.

It has been almost 20 years, and I thought I had put it behind me, but recently I see that my former supervisor, and chair of that committee, is now one of the people blogging on a philosophy website.

Now you can read the proposal for yourself - that's why I put it online. Having just retyped it (I'll use OCR for my other work, but I wanted to revisit this paper personally) I can see that it is an overly ambitious work covering a wide swath of theory and evidence. As a proposal, it also lacks a lot of the depth and research that one would want of a completed dissertation. Yet, still, 20 years later, the paper strikes me as genuine, original, and important. A dissertation based on this work - or even just a chapter of this work, which might have been more appropriate - would have been a worthwhile contribution to the field.

The committee didn't see it that way. Led by the chair, they engaged in an attack on the basic premises of the work, of the idea of associationist forms of reasoning and connectionist models of cognition. The idea that cognition could be non-propositional, the idea that proof would proceed by metaphor and similarity, rather than form and validity, they rejected as ridiculous. For good measure they offered the opinion that even if the work were worthwhile, it would be well beyond my capability. The committee felt that my PhD would be better spent in an investigation of mental content - something I had denied in the paper even existed! - rather than this fool's errand.

I submitted a dissertation proposal based on mental content a couple of weeks later, a 30-page overview of the field they were quite enthused about. But my heart had fallen out of the project. I wrote for myself a long diatribe attacking a book the committee was enthusiastically recommending, Jerry Fodor's Psychosemantics, called "Trash Fodor" (when I find it, I'll post it). I thought the book represented the epitome of the inanity of the cognitivist approach. I gradually turned my back on the program and on philosophy in general. I retreated to my little cabin in northern Alberta, taught logic, and worked on my computer.

I have never forgotten - or stopped believing - the work I presented in that paper. About five years later I began writing again - you can see it as the beginning of the work on Stephen's Web - and began rebuilding my understanding of learning, inference and discovery. My work continued to be informed by my understanding of connectionism, people power, the Tao, and related concepts. The structure of content networks, the organization of metadata, and my description of connective knowledge, all are based on this basic foundation.

I struggle every day with the question of whether my work is genuine, original and important, whether, indeed, it is even academically and scientifically sound. I look at the work of others - like Varela's, for example - and I am daunted and humbled. But such work, too, is rare. And what I leave behind is so different in format and method and in style and structure a comparison is probably impossible. The best I can do is to work as honestly and as openly as possible, consistent in my pratcice and my principles.

So when I realized how angry I was, even these many years later, I concluded that the best - and only - response would be to put the material into the open, and let people decide for themselves. Because there's a certain sense in which I feel I have missed out. And I'm sure some people will find it trivial and others obscure, some will find it too dense and others too simplistic, some will see in it a naive foray into amateur epistemology while others will see it as part of a wider discipline. Some will think I should have been able to complete my PhD, while others will question whether I have any academic merit at all. And I - well, I will see it as mine. As me.

Being angry was cathartic, because it made me see what I've had to come through, and I'm over it now.

You know, in life, you have certain kinds of regrets. One kind of regret revolves around the opportunities you never had - what if I had had better schools, better teachers, better jobs, better finances. What if I had been treated fairly here, rewarded justly there, shown this in that place. Things I could never be, places I could never go. These are regrets over things I cannot control. But the other kind of regret - ah. The regret of a man who was not true to himself, who did not give his all, who held himself back or conformed for the sake of advancement, of the man who stopped seeking because he was told what to believe: these are the regrets I could not bear to feel.

I guess I had a choice, back in 1990, about which kind of regret I would feel 20 years later. I do not, for an instant, think I made the wrong choice.

Monday, March 23, 2009

TNP 11. Projects and Investigations

The Network Phenomenon: Empiricism and the New Connectionism
Stephen Downes, 1990
(The whole document in MS-Word)

TNP Part X Previous Post

XI Projects and Investigations

A. Computational Difficulties

Let me now conclude this paper by outlining a number of areas of further investigation which ought to be pursued in order to accomplish the fullest and most useful presentation. These areas divide into two distinct categories: conceptual difficulties and computational difficulties. Let me outline some computational difficulties first.

By computational difficulties I mean aspects of the implementation of connectionist theories on computers. A number of concerns can be raised by viewing connectionism within a philosophical framework and some additional features are required. What I would like to do here is actually build a connectionist system using the C programming language and intended for application on an IBM XT compatible or clone. The large number of options, for example, different learning rules, will be incorporated as options on my own system. This system will fill a void on the market: an easy to use connectionist system which costs less than $1,000.

Having developed a connectionist system (which I'll call SDPDP) I want first to look at network variability in aconnectionist system. First of all, I want to construct nets in which different options may be employed in different parts of the same net at the same time. For example, in a PDP net [83] either every unit employs a stochastic on-off activation or every unit is activated in degrees. But in some systems, we want to be able to have units of both varities. In addition to variable structure, I want to incorporate some mechanisms of network plasticity. For in human systems, not only the connections, but the units themselves grow in response to input, especially in early life. Finally, I want to consider what I call "dimensions". For we want it to be the case that such things as religious conversions and scientific revolutions are possible. This requires that a network be able to construct [alternative] pairs of stable representations at the same time, which may alternate in priority. [84] Each of these alternative representations I cann a "dimension".

Another computational problem which I wish to consider concerns learning schedules and annealing. Currently, PDP systems employ a system which is very similar to that employed in physics. But, first, it is not clear that an annealing equation which is suitable for thermodynamics is suitable for human brains. I would like to investigate grounds for choosing one, rather than another, annealing equation. Second, it is clear to me that the annealing schedule employed is inadequate. In my view, temperature increases and decreases ought to be cyclic, as for example paterns of increased brain activity when we sleep. In addition, temperature ought to be sensitive to input, so that we can rapidly process conflicting input.

Finally, there is the hardware itself to think about. Human hardware is much smaller and more complex [than] contemporary computer technology. Perhaps we will not be able to build actual neurons, however, it seems reasonable that, now that we know exactly what we are looking for, we can make some plausible suggestions regarding how to build a computer neuron. I think that it would be best if many of the features currently represented by parameters, for example, threshold or rest values, can be implemented physically.

B. Conceptual Questions

By conceptual questions, I mean investigations into some of the things which connectionism can tell us about epistemology and the philosophy of mind. For, if the arguments concerning rules and categories are sufficiently strong, then we will want to reevaluate such concepts as knowledge and belief. For example, I would like to say that an item of knowledge is a stable pattern of activation, a pattern which tends not to change given varying input. If this is the case, then I may want to say with Feldman that "you do not have a store of knowledge, you are your knowledge." [85] In such a case, then, it becomes necessary to explore what we are and what part of us it is which is our knowledge.

In addition, I want to consider questions concerning theoretical and physical parallelism which arise. For example, through this paper I have used the terms "neuron" and "unit" roughly equivalently. I have also talked of the advisability of using this or that learning equation according to whether or not humans actually employ (or instantiate) the equation. We need to ask, first, whether or not we should design systems in parallel with human neural structure, and if so, what they would look like, and even further, how we would determine what they would look like.

As another investigation, I want to make some remarks about the nature of knowledge (as opposed to the definition of knowledge). For, if knowledge consists of stable patterns of activation then we cannot think in terms of knowledge as being sentences which have a given propositional content. It is unclear whether we can assign propositional content to patterns of activation. If that difficulty does arise, then we may want to consider some other relation between that which would serve as content (for example, representations of events in the real world) and patterns of activation. Here, perhaps, one could follow Armstrong and oldman and assert that there is a causal connection (and distinguish between appropriate and inappropriate causes). In order to successfuly defend this approach, it is necessary to give a fulla ccount of how we learn about causes.

Yet another investigation concerns consciousness. I have suggested above that there are conscious and unconscious regions of the brain. My belief is that those regions which are conscious are those which correspond to the activation of sensory input areas. In other words, my hearing someone speak a sentence and my thinking in a sentence is an activation of the same set of neurons (or an overlapping set). This solves the problem of how we can have a "stream of consciousness [86] in a non-linear network. But a much more detiled story is required here.

Finally, it is worth posing the question of whether connectionism is a type of scientific revolution, in the Kuhnian sense, or whether it is not. Some philosophers, for example Stich and Johnson-Laird, have expressed the opinion that it is not. In my own view, since so many traditional concepts must be overturned, then it is a scientific revolution. Haveing said that, however, I must ask whether or not we are working within an eliminativist paradign, as suggested by, say, Churcland, or not. In my view, there is still a role for words such as "knowledge" and "belief". If I believe this, then first I must explain this role, and then show how this role makes sense within the new paradigm.

C. Other Projects

When I began by asserting that connectionism vindicates empiricism, I embarked on a philosophical enterprise. What followed has been primarily technical and non-philosophical. I would like to return to a connectionist treatment of some philosophical issues.

For example, some contemporary [87] adocate a form of nominalism. While the philosophical debates concerning realism and nominalism are periphrial to this project, it is still the case that connectionism, if successful, should shed some light in this direction. I assume that it would support a form of nominalism, but this should be more fully explained.

Another project of a philosophical nature concerns the foundationalism-coherence debate. If we employ relevant similarity instead of truth-presenvation as a means o evaluating inference then the traditional concept of justification, if it must not be abandoned altogether, must be radically altered. This sheds a completely new light on the traditional problem and is worth investigating.

TNP: 20 Years On

[83] Rumelhart and MacClelland, Explorations.

[84] For example, we may switch back and forth between views of a Necker Cube.

[85] J.A. feldman, "A Connectionist Model of Visual Memory", in Hinton and Anderson (eds.), Parallel Models of Associative Memory, p. 51.

[86] See William James, The Principles of Psychology, p. 279.

[87] Like Nelson Goodman.

TNP 10. Summary

The Network Phenomenon: Empiricism and the New Connectionism
Stephen Downes, 1990
(The whole document in MS-Word)

TNP Part IX Previous Post

Part X Summary

This concludes the presentation of the theory of learning and cognition which I wish to present. Before describing some of the further avenues of investigation I wish to pursue, let me summarize what I have asserted to this point in this paper.

I began by proposing a new theory of learning, connectionism, and described some prima facie objections to the theory. In order to respond to those objections, I argued that we need to reconsider some paradigms concerning rules and categories. Then I developed connectionism as an alternative theory of rules and categories. On this new theory, any concept is represented as a pattern of activations in a network of interconnected units. A category, on this view, is represented by a unit which can be actiuvated, and the members of the category are the concepts whose connections activate the unit which represents the category. Connectionist networks not only store categories in this manner, they can learn them on their own. In order to develop this idea further, I examined a number of objections to the concept of distributed representation. To meet these objections, I described how patterns are developed from perceptual input and described and defended the "picture" theory of representation.

I then turned to considering detailed objections to associationism and connectionism. These divided into problems concerning distributed representation, problems concerning perception, and problems concerning associative mechanisms in general. In order to defend against problems of distributed representation, three types of patterns of connectivity were identified and the concept of similarity was defined in terms of activation vectors. In order to develop a theory of perception, I defined perception as input activations and two types of perception, conscious and real perception, were identified. This successfully explained theory-ladeness and the development of three-dimensional representations without the requirement of a priori or innate knowledge. Finally, a number of arguments against associationism were considered. In order to show that associationist and connectionist systems can perform higher level cognitive functions, I argued that a two-stage process is employed. First, prototypical representations are constructed, and then second, these are used to support inference by analogy or metaphor. This is in turn supported by the observation that such process [can] be viewed as operations. Finally, I considered the problem of the evaluation of models and inferences in connectionist systems, and argued that we should employ the concept of relevant similarity.

TNP Part XI Previous Post

Sunday, March 22, 2009

More on New Knowledge

Responding to Tony Bates, Bates and Downes on new knowledge: Round 3

You say > However, I don’t believe the distinction between ‘academic’ knowledge and ‘applied’ knowledge is particularly useful.

Here we agree.

You say > What is useful is a distinction between academic and non-academic knowledge, as measured by the values or propositions that underpin each kind of knowledge.

Here we disagree.

First, I'm not sure you can made the distinction stick.

Second, even if you make the distinction stick, then so much the worse for academic knowledge, because the values or propositions that underpin academic method are unsound.

You say academic method > AIMS for deep understanding, general principles, empirically-based theories, timelessness, etc

Yes. But it shouldn't. That's my point.

You say > Academic knowledge is not perfect, but does have value because of the standards it requires.

This is a statement deserving of more discussion, because I think that either academics have lost track of the standards, being devoted to process over rigor, or that the standards adhered are in fact no guarantor of worthwhile results.

You say > I also agree with Stephen that knowledge is not just ’stuff’, as Jane Gilbert puts it, but is dynamic. However, I also believe that knowledge is also not just ‘flow’.

It is neither 'stuff' nor 'flow', in my view. I explicitly reject both views in my post and in the comment that follows.

As I wrote:

"The central tenet of emergence theory is that even if stuff flows from entity to entity, that stuff is not knowledge; knowledge, rather, is something that 'emerges' from the activity of the system as a whole.

"This network - and subnets with the network (aka 'patterns of connectivity') - may be depicted as knowledge...

"A second way of representing knowledge, and one that I embrace in addition to the first for a variety of reasons, is that patterns of connectivity can be recognized or interpreted as salient by a perceiver."

The reason why this depiction is important is that knowledge, on this view, is *not* "deep understanding, general principles, empirically-based theories, timelessness, etc."

So whatever it is that academic method is aiming for, it is not knowledge.

This is a key point of contention between us:

You write > at some point each person does settle, if only for a brief time, on what they think knowledge to be. At this point it does become ’stuff’ or content. I still contend then that ’stuff’ or content does matter, though recognising that what we do with the stuff is even more important.

I disagree with.

I do describe (following o0thers) 'settling mechanisms' in the brain. We can say that we 'settle'. We can hypothesize, at least, a (thermodynamically) stable state of connections and activations in the brain.

But the 'entities' in such a system (if we can call them that) that constitute 'knowledge' do NOT have the properties of 'stuff' or 'content'. This is the key and fundamental point of my argument:

Not 'stuff' - not discrete, not localized, not atomic
Not 'content' - not semantical, not propositional, not symbolic

And that's my problem with academic method. It seeks out specifically propositions - symbolic or semantical - that are discrete, localized and atomic. Things that are _candidates_ for deep understanding, general principles, empirically-based theories, timelessness.

I think that maybe if we can untangle the vocabulary we might come to agreement on this. After all,

You say > this is likely to result in a shift in knowledge that may be very important, and it is in this area where I think Stephen and I may have some agreement.

This encourages me.

Skipping ahead quite a bit...

You write > My concern about much of the discussion of the ‘new’ knowledge is that it seems to depend on what I might call majority voting - it is the number of hits that matter, not the quality of the content.

Quite so.

Voting - and counting generally - record only the mass of a thing. They require some sort of identity (in order to identify that which is being counted).

This is distinct from the type of knowlecdge I have been trying to describe, which depends not on the quantity of things assembled, but on the way those things are interconnected.

This is what I have tried to clarify with the distinction between 'groups' and 'networks'. http://www.downes.ca/post/42521

The properties found in the group are (to my way of seeing) just those embraced by what we have been calling the academic method. If you look at the diagram http://www.flickr.com/photos/stephen_downes/252157734/ you see typical academic values: unity (of purpose, of workers, of science), coordination, closed systems, distributive (expert-based) knowledge.

Knowledge based on networks is not based on counting - not on votes, on surveys, on mass, on category or type, etc. because knowledge is not the sort of thing that can be counted, not the sort of thing that can be generalized (as a mass).

The objection to voting *is* an objection to academic method.

The new knowledge is precisely *not* knowledge by counting, knowledge by popularity.

But it's not knowledge by experts ether. Because if we say that knowledge is based on experts and expertise, then we are saying that knowledge is the 'stuff' that's in people's heads that goes from place to place. Which - again - it isn't.

Now it is reasonable to disagree with my position on knowledge, but it's important to recognize that 'network knowledge' isn't based on counting or popularity - no matter how much this is emphasized by the (popular) media.


> Lastly, Stephen was puzzled as to why I felt a blog was not the best way to discuss this issue. What I feel the topic needs is more space and time, and a critique from philosophers would also add to the discussion, I am sure, because I do not have specialist knowledge or training in epistemology. I would like to have had more time to review other writers on this topic, and more space to elaborate my views. I feel that I could do a better job that way.

Well - take all the time and space you need. Neither are in short supply on blogs.

Indeed - and this is one thing I like - you can go back over again, return to the same point again, attack it from various angles - a whole range of things you can't really strive for in any other forum.

> It was not because I needed the discussion to be academically reviewed in the way that journals are reviewed

Good. because if we were restricted by reviewers, we could never be having this discussion. Which would be a pity.

TNP 9. Connectionism and Justification

The Network Phenomenon: Empiricism and the New Connectionism
Stephen Downes, 1990
(The whole document in MS-Word)

TNP Part VIII Previous Post

IX. Connectionism and Justification

A. When Some Connctions Are Better Than Others

An objection exactly analogous to the objection to operationalism may be brought against connectionism in general. In connectionist systems, anything may be connected with anything else. However, it is clear that there must be some subset of the set of all possible connections such that the connections in this subset are better than the other connections. For example, among the types of connection which are possible, there is a subset of connections which corresponds to logical inference. [74] We want to distinguish these logical connections from those connections which are (for lack o a better term) merely accidental. However, there is no means, from within a strictly connectionist framework, of establishing this distinction. Therefore, connections must be evaluated according to constraints over and above any given connectionist system.

One weakness of the objection just stated is that there is no clea agreement regarding what constitutes the proper constraints for such an evaluation. Suppose, for example, we are attempting to parse a sentence in order to determine its meaning. According to some philosophers, for example, Fodor, this task may be accomplished with reference to grammar, that is, rules and structure. Others, for example Winograd, argue that semantical consideraions need sometims to be taken into account. It is also reasonable to argue that the meaning of a sentence can only be determined with respect to pragmatic, or context-dependent, constraints.

Similarily, in the philosophy of science, there is no clear agreement regarding what constitutes a good scientific theory. Some philosophers, for example van Fraassen, argue that theories ought to be evaluated according to their empirical adequac. Others, such as Hooker, argue that "epistemic virtues" such as simplicity and coherence are what guides the evaluation of a theory. According to many philosophers, most prominent among them being Popper, a scientific theory ought to be testable, bt this does not stop some theorists, for example van Daniken, from porposing untestable theories. And finally, some philosophers follow Feyerabend and assert that there are no standards of goodness for scientific theories.

These examples may appear to be out of place on the ground that, in the formal disciplines, there are clear standards for the evaluation of operations. In logic, we have the constraint of truth-preservation, specifically, an inference is valid if and only if it preserves truth, and is invalid otherwise. In mathematical equations, similarily, an operation is correct if and only if it preserves equivalence, and incorrec otherwise. Therefore,if a connectionist system cannot distinguish between, say, truth-preserving and non-truth preserving operations, then the system must be guided by some set of constraints over and above itself, that is to sa specifically, it must be guided by innate constraints. There are several examples in the literatire of this sort of consideration. Fodor [75] criticizes the "picture" theory of representation on similar grounds, and Holland (et.al.) [76] build such constraints into their system of inductive inference.

The idea here is that in any representation, there will be representational content. Representational content may be more or less representative of what it represents. For example, if the representation is propositional in form, then the proposition will be either true or false according to whether whatever is asserted by the proposition is in fact the case. the criticism, therefore, of connectonist systems is that there is no means of evalating connections such that it can be determined that their representational content corresponds, or does not correspond, to whatever happens to be the case.[77]

If I were to use the general response to ojections outlined above, then if this were an item of knowledge, I would deny it, and if it were a skill or capacity, I would explain how it can be accomplished using an associationist (connectionist) mechanism. owever, ruth does not appea to fall under wither category, and hence, needs a special discussion of its own.

B. Truth

Let me examine the concept of truth more closely. The standard, naive definition of truth is correspondence with reality, for example, a proposition P is true if and only if P. This definiion of truth is inadequate because there are many propositions which are true, for example, predictions and other subjunctive conditionals, or statements about possibility, to which by definition nothing in the world corresponds. A better definition of truth is provided by Tarski: P is true if and only if it corresponds with a model of the world.

But this is a different definition of truth than the definition of truth which is considered to apply in formal inferences, for in this case, we are talking about truth-preservation and not truth per se. A logical inference is valid strictly according to its form; the world is not a factor to be taken into consideration. Thus, the claim that logical inferences are truth-preserving by itself has nothing to do with the nature of the world or models of the world. An additional link - between truth-preservation and correspondance - must be established indepe3ndently. For, without such a link, truth-preservation by itself is no virtue. It must be shown that truth-preservation is a good means of constructing inference about the world or about models of the world.

For a certain set of inerences, we can concede that this is the case. Take, for example, an inference about points on a journey. If x arrived at A before B and x arrived at B before C, then the rules of truth-preservation tell us that x arrived at A before C. This inference is confirmed by observation. It is however by no means clear that the rules of truth=preservation always apply when we are talking about the world. First, there is no reason to believe that these laws actually apply to the real world or even t models of the real world (unless the models are governed by an a priori stupilation that they must adhere to such rules, in which case holding up the model as an example is a fancy way of begging the question). [78] And second, it is clear that we want to make many [other] inferences aout the world or models of the world, for example, inductive inferences, for which the rule of truth preservation [is] of little or no use. Therefore, in at least some cases, something other than the rule of truth preseration must be employed in order to evaluate our inference.

This is an important criticism of the objection to connectionism and associationism. In response to the objection that connectionist systems cannot provide an evaluation of this or that representation, the response is that traditional systems fare no better, or at least, are only a very slight improvement.

C. Relevant Similarity

Opposed to the concept of truth as our standard of evaluation, I wish to propose the standard of "relevant similarity". This standard has a number o advantages. First, it works, in the sense that successful inferences can be distinguished from unsuccessful inferences using relevant similarity. Second, in order to employ relevant similarity, no innate or a priori constraints are required. We know this because systems which naturally employ relevant similarity, connectionist and associationist systems, require no innate or a priori constraints. And third, the standard of relevant similarity is exremely powerful. For example, inferences ma be evaluated drecly according to relevant similarity, for example, the sample of a generalization must be relevantly similar to the whole. Or at another level, an inference may be evaluated according to whether or not its form (that is, some abstraction of the inference) is relevantly similar to previously successful inferences. Let me sketch these in a bit more detail.

Consider the typical industive inference. The premises consist of a set of instances of some phenomenon or state of affairs, for example, "A1 is a B", "A2 is a B", etc. The conclusion is either a generalization of these observations, for example, "All A are B:, or a prediction about the next instance, for example, "An+1 is a B". Standard textbooks [79] list two major fallacies which can occur in such inferences: hasty generalization, in which too few instances are observed, and unrepresentative sample, in which observations are biased in some way. Both of these fallacies can be explained with reference to relevant similarity. An industive argument works because the premises and the conclusion all describe similar phenomena, so, if the phenomena described are not sufficiently simila, the inference fails. An inrepresentative sample is significantly different from the [sample described in the] conclusion, thus, the inference fails. In a hast generalization, we have not seen enough samples to be sure we have established similarity, hence, the inference fails.

Connectionist systems using relevant similarity for the evaluation of industive inferences avoid many of the problems which plague standard work in induction. [80] For example, one may ask why we use one particular set of premises, and not so0me other set of premises. The answer is naturally provided by the clustering mechanism described above. Another problem is the question of how many instances are required before we are able to say we have sufficient grounds to draw a conclusion. This answer is given by the activation value of the abstraction from a given cluster. If that abstraction has a sufficiently high activation value compared to other evaluations, then the inference works. Otherwise, it does not. There is no clear-cut numerical answer to these questions: it will always be relative to the structure of the net as a whole. What connectionism provides, and what traditional theories do not provide, is a mechanism for determining the answer in particular cases, rather than a mechanism which determines one answer for all cases.

I have already mentioned a few cases of the second sort of evaluation, that is, an evaluation which asserts that an inference is successul if its form is sufficiently to some previously correct or successful inference. So, for example, a person learns modus ponens by being shown examples similar to "If I am in Edmonton then I am in Alberta..." and learns not to deny the antecedent in the same way.

As I mentioned above, a connectionist system will attempt to employ relevant similarity on its own. It does this because such a system tends to adjust connection weights and unit activation untl it reaches a stable or "rest" position. The exact ature o this rest position depends to some degree on how the system is constructed: change the leaning rule and you change the rest position. However, in all cases, the settled state will be one in which all and only those units who's vectors are similar to the input activation will themselves be activated (or as nealy so as possible [81]). I have illustrated how we might develop a rule of transitivity which is useful on journeys from place to place. For example, Lakoff suggests that we develop the concept of cause by analogy with human actions.

TNP Part X Next Post

[74] These are described in Rumelhart and McClelland, Parallel Distributed Processing.

[75] In Ned Block, Imagery.

[76] In Holland, Holayk, Nesbitt and Thagard, Induction: Processes of Inference, Learning and Discovery.

[77] Here I am assuming a correspondance definition of truth. Other definitions are available, see, for example, Rescher.

[78] This is very similar to the point made about scientific theories, above, for if a scientific theory is a model of the world, then, as noted, there are innumerable possible ways of building such models.

[79] For example, Jerry Cedarblom and David Paulsen.

[80] See henry Kyburg, "Recent work in the Problem of Induction."

[81] See Rumelhart and Mac Clelland on satisfying multiple simultaneous constraints in Explorations in Parallel Distributed processing, ch. 3.

[82] Philip Kitcher, The Nature of Mathematical Knowledge.

What I Do

As you all know, I work for the Government of Canada, as part of the National Research Council's Institute for Information Technology, Learning and Collaborative Technologies group, a position I have held since November, 2001.

My thanks to the people of Canada for funding this work. As I do from time to time, in the spirit of open and transparent public service, I offer, directly from my yearly performance review, a statement of my 2008 work report and 2009 work objectives. I have reordered the document a bit and ensured that nothing personal or proprietary was included.

Career Aspirations

To be recognized as a leading voice in the field of learning and learning technology, to advance the state of knowledge in these fields, and in particular, to identify and describe new forms of knowledge and learning enabled by, and suggested by, network technologies.

To apply this knowledge to the service of Canadians and of people worldwide, thereby promoting the educational aspirations of all by supporting access to free learning resources and tools, and the fostering of skills and aptitudes that enable people to take advantage of these resources to the greatest degree possible.

Overall research objective

Development of key elements of learning networks infrastructure, which includes ongoing contributions to Synergic3 project, initiation and ramp-up of PLE project, integration of video support via BVC project.

Note that the work in each of the areas below is intensively collaborative, involving close work with people in NRC, in the government of Canada, and in companies and universities around the world.

SynergiC3 (30 percent)

2008: Chaired R&D workgroup and led research in DDRM and Metadata research areas. Ongoing work involved continuing coordination of the research effort, including maintenance of the research plan, reporting on research activities, chairing of R&D WG meetings, and supervision of two co-op. Significant support and guidance offered to NRC researchers participating in the R&D workgroup. Co-Recipient of NRC National Award for this work.

2009 Plan: Chair, R&D workgroup and lead researcher in DDRM and Metadata research areas. Ongoing work will involve continuing coordination of the research effort, including maintenance of the research plan, reporting on research activities, conduct of R&D WG meetings, supervision of students, etc.

Learning Networks (5 percent)

2008: Released prototype gRSShopper software as an instance of a personal learning environment (PLE). This software played a major role in the Connectivism & Connective Knowledge course, offered in cooperation with the University of Manitoba. Received NRC-IIT Award for this work. While learning networks foundational project will continue, obtained approval and funding for spin-off Personal Learning Environment (PLE) project.

2009 Plan: This is foundational research. It includes the continuing development of the theory of network learning, as well as the instantiation of that theory in prototype software such as gRSShopper. Related to this work is the presentation of talks on e-learning, co-teaching of a course on Connectivism, etc. Will maintain active membership in IEEE-LTSC, IMS (Common Cartridge), SCC JTC1-SC36.

Personal Learning Environments (25 percent) New

2009 Plan: Project manager for a 2-year research and development project, including all areas of staffing and staff supervision, budget management, project management, planning and development. Staffing and project plan should be completed in calendar 2009, with significant work undertaken and possible release of inbitial prototypes.

E-Learning Cluster (10 percent)

2008: Offered general support to the E-Learning Center of Excellence concept in New Brunswick, and includes work supporting the Canadian Forces and its allies, presenting to Canadian Forces, in Cornwall and Fredericton, and U.S. Forces, in Fairfax, as well as the e-learning industry in eastern Canada in general. Provided support for AIF in the form of reviews.

2009 Plan: This involves general support to the E-Learning Center of Excellence concept in New Brunswick, and includes work supporting the Canadian Forces and its allies. Also includes support for local events and cluster-building activities, such as the innovation forum. Also includes support for AIF and other granting bodies.

OLDaily (15 percent)

2008: Continued daily newsletter on the field of online learning. Current subscriptions are 4000 (email), 4000 (RSS), 2000 (web).

2009 Plan: Production and publication of a daily newsletter on the field of online learning.

Broadband Visual Communication (15 percent)

2008: Continued involvement on the BVC Social Analysis committee will lend support to that activity, most especially through discussion and support to members' work. Learned about video server-side management and ongoing research in video-conferencing and event recording, including 12 videoconference or webcast presentations over the course of the year.

2009 Plan: Continued involvement will include two aspects. First, continued membership on the BVC Social Analysis committee will lend support to that activity, most especially through discussion and support to members' work. Second, application of BVC tools, including video production tools to the personal learning environment (PLE) and related projects.

Learning and Collaborative Technologies Group

2008: Ongoing support administratively, including time recording, attending group meetings, attending Koffee Klatches (including a Koffee Klatch presentation) occasional training.

2009 Plan: Not counted as part of the percentages above, but worth noting, is ongoing support administratively, including time recording, attending group meetings, attending Koffee Klatches, occasional training, including French training (noted below). Additionally, a key activity for 2009 will be the collection of papers and transcripts for the publication of a book or collection of important works.

Saturday, March 21, 2009

TNP 8. Associationism: Inferential Processes

The Network Phenomenon: Empiricism and the New Connectionism
Stephen Downes, 1990
(The whole document in MS-Word)

TNP Part VII Previous Post

VIII. Associationism: Inferential Processes

A. The Structure in Review

Before proceeding to a description of associationist inferential structure, I would like to draw together some of the conclusions from preceding sections in order to outline the structure in which associationist inference occurs.

The computational structure follows a connectionist model. The system consists of interconnected units which are activated or inactivated according to external input and input via connections to each other. Such systems, I have noted, automatically, via various learning mechanisms, perform various associative tasks, for example, generalization. I have suggested that the human brain is actually constructed according to connectionist principles, therefore, the computational structure is actually built into the brain.

At the data level, mental representations are distributed representations, that is, no one unit contains a given representation, but rather, the representation consists of the set of connections between a given unit and a set of other units. This set of connections can be represented by a vector which displays the pattern of connectivity which activates the unit in question. Various representations cluster according to the similarity of their respective vectors, producing abstractions and categories.

External input to the system is entered via the senses. This input consists in the activation of what I have called real input units. This input is processed unconsciously according to connectionist principles and at a certain point we become conscious of this processing. At this point, I describe the set of activations as conscious input. We produce abstractions by processing conscious input. Any input from any sensory modality will consist of a pattern of unit activation. These patterns of activation are the input patterns for the vectors referred to above.

At no point in the system described thus far is anything like a symbol or a sentence expected or required. Categorization and abstraction from external input occurs as a form of subsymbolic processing. The data from which we form categories and abstractions consists not of symbols or sentences, but rather, the data consists of what may loosely be called pictures or mental images. Mental images, at leas at the conscious level, are formed by a conjunction of external input [and] input from previously formed associations at higher levels.

In all of this processing, no formal rules of inference are expected or required. Abstractions, generalizations and categorizations are formed automatically. One way of describing the process is to say that units with similar vectors will tend to be clustered. The same process can be described in a more complex manner with reference directly and only to the connectionist principles outlined above.

B. Inference by Prototype

Let me now describe the process of inference with reference to an example. Suppose we have constructed a prototype bird (which looks pretty much like a robin). This prototype consists of a unit which is connected to a set of other units, some of themselves may be prototypes. One of these prototypes, which happens to be strongly connected to the bird prototype., represents "flight".

Now for the inference part. Suppose we have a completely new experience, say, for example, an alien being walks off a spaceship. We see this, and this establishes a certain set of input patterns. The input patterns are such that a reasonable potion of the bird vector is activated (one might say, simplistically, that it looks like a bird). The activation of the bird unit in turn tends to activate all the units to which it is connected (that is, he activation of the bird unit consists in the activation of a partial vector for some other unit, which activates that unit, and which in the end results in the activation of the entire vector). Thus, in association with our perceiving an alien, the unit representing flight is activated. From our seeing an alien which looks like a bird, we have formed the expectation that it can fly.

There is reasonable evidence that something like this actually occurs. One clear example is the manner in which we stereotype people according to their skin colour or their country of origin. What is happening here is just an instance of inductive inference: from similar phenomena, we expect similar effects. This is not a rule-governed process. The occurrence and reliability of a particular inductive inference depends on the repetition of similar phenomena in previous experience (we have to have seen birds fly) and the particular set of mental representations in a given observer. Some of our previous experiences may inhibit our comparison of the alien with a bird, in which case, we might not form the expectation that it will fly.

For any given experience, various units at various levels of abstraction will be activated or inactivated. These units will be affected not only by the input experience but also by each other. Initial expectations may be revised according to activations at higher levels of abstraction (for example, we may initially expect the alien to be able to fly, but then only later remember that aliens never fly). The process being described here is similar to Pollock's system of prima facie knowledge and defeaters. [62] The difference is, first, we are not working with propositions as our items of knowledge, and second, anything can cunt as an instance of prima facie knowledge or a defeater, depending on patterns of connectivity and activation values. (But with Pollock, I would say that prima facie knowledge tends to be that produced by sensory input, and that defeaters tend to be produced by abstracted general knowledge.)

When we say that from similar phenomena we expect similar effects, it should be pointed out that this sort of inference need not apply only to similar cases where "similarity" is conceived to be similar appearance feel, sight, sound, etc.). We have a much more precise definition of similarity which can be employed here: two concept or representation units, each of which is associated with a particular vector, are similar if and only if their vectors overlap to a sufficient degree. Now I realize that "to a sufficient degree" is rather vague. In any given system, it can be predicted with mathematical certainty what input will activate what concept (that's what computer emulations do). However, there are no general principles which can describe 'sufficiency" since there are innumerable ways two units can have similar vectors. See figure 18.

Figure 18. Similarity depends on the range of relevant alternatives.

C. Inferences About Abstract Entities

One of the major stumbling blocks for empirical and associationist theories is the problem of abstract entities. We talk about such unobserved abstracts as time, space, mass and even love, yet since there are no direct observations of such entities, there are no empirical bases for such inferences.

But now we are in a position to explain how humans reason about abstract or unobserved phenomena. Consider, for example, tim. A number of linguists have pointed out that humans appear to talk about these entities, and thence, to reason about them, in terms of metaphors. [63] So, for example, we think of time as linear distance, for example, a road. Or we think of time as a resource to be bought, sold, stolen and the like (think of the term "time theft", which is currently in vogue in business journals). We draw conclusions about the nature of time by analogy with the metaphor. So, for example, we might argue that since a journey has a beginning, an end, and a 'line' between them, so does time.

An interesting observation is that these inferences vary from culture to culture. For example, there is no analogue in "undeveloped" cultures to the metaphor of time as a resource. Hence, it is not surprising to see people from such cultures treating time quite differently. Some cultures have never developed the analogy of time as a journey, but rather, identify points in time according to events (even we do this to some degree, or example, "1990 AD"). If our knowledge about time and space were, as Kant suggests, determined a priori, then we should not expect differences in our understanding and reasonings about time. Yet these differences are verified observationally. Therefore, it seems reasonable to conclude that our knowledge about space and time is not a priori knowledge. It must be learned from experience.

One question which arises is the question fo why we would develop such concepts in the first place. In order to explain this, I must do a bit of borrowing from the arguments below, but let me sketch how this done for now. I will proceed by means of an example.

Consider "mass". Mass is unobserved, and indeed, unobservable. There are no direct measurements of mass to be had. Yet mass is central to most of our scientific theories and one of the central concepts not to be tossed aside by Einstein's revision of Newtonian physics. It appears, therefore, that Newton would have had to [have] intuitively or mystically 'discovered' mass. I think that we can allow that Newton observed such things as force and acceleration. Let me borrow from below, and say he could measure these. [64] By employing Mill's fourth method of induction, he would discover that force and acceleration are proportional. This suggests an equality, so he could borrow from previously established identities the idea that there might be a similar identity at work here. because he was seeking to establish an identity, he invented a new term, mass, which converts proportionality to an equation.

The idea is that Newton wanted his equations to 'look like' other successful equations such as those of Kepler and Galileo. In order to accomplish this, he needed to invent a new term. The question remains, of course, where did the invention come from? Computationally, if we compare the vector which represents the proportionality [of] force and acceleration and the vector which represents, say, some equation from Euclid, there will be a difference. This difference is itself a vector and is determinable by, say, XOR addition or whatever. A unit which is activated by this vector becomes, in the first instance, the vector which represents mass. Later, of course, when our understanding of mass becomes enhanced by other experiments, other scientists represent mass with quite different vectors.

This last remark is an important point. There is no one vector which represents abstracts such as time, mass, love and the like. Rather, each individual human may represent these abstracts in quite different ways, depending on the metaphors available. If these concepts were innate, then we would not expect people to have such differing concepts. Whether or not people do have different understandings of time, space and the like is empirically measurable. Therefore, again, there is a means of confirming empirically this theory as compared to innateness theories.

D. Grammar, Mathematics, and Formal Inference

The three systems of grammar, mathematics and formal inference have in common the fact that they are characterized according to a set of formal rules in which abstract terms stand for well formed formulae, terms, and the like. Here I am thinking of such diverse examples as Bever, Fodor and Garrett's "mirror image" language, modus ponens, transformation rules (which require an abstract "trace" to keep track of transformations), x+y = y+y, and the like. Since all of these systems employ abstract terms, they then pose a challenge for empirical and associationist theories.

It is possible to construct abstracts in a connectionist system, as I have shown above. These abstractions are useful when we want to describe formal systems. It is quite a different matter, however, to asset that we actually employ rules containing these abstract entities when we speak, reason or add. I suggest that we do not. Rather, each of, say, a "correct" logical inference or a "grammatical" sentence is a phenomenon which is sufficiently similar to some or another "exemplar" (as I call it) or prototype of such phenomena. Again, let me give an example in order to illustrate.

Suppose we want to teach people basic propositional logic. Either we show them a set of examples and say that inferences like these are good inferences, or we teach them the rules of inference and how to apply them. So if we want to teach, say, modus ponens, then either we give students a set fo examples such as "If I am in Edmonton then I am in Alberta, I am in Edmon, thus, I am in Alberta", or we give them the logical form "If A then B, A, thus B". According to what I suggest, we employ the former method, not the latter. The use of rules alone is insufficient to teach propositional logic; no logic text is or could be written without examples. Thus, the examples are used to teach propositional logic.

I argue that a person learns grammar in a similar fashion. A person is shown instances of correct sentences. Then, when she attempts to construct sentences of her own. she attempts to emulate what she has been shown. A particular sentence is constructed by the activation of several types of units, in particular, units which represent exemplar sentences, and units which represent concepts to be represented in the new sentence. Such behaviour looks like rule-based performance, bt that is because the new sentence will be similar to the old sentence.

This is a theory which can be tested empirically. In a population of students with similar skills, one group could be taught logic via rules and substitutions, while another could be taught by examples. If this theory is correct, then the group using examples should demonstrate better performance. Holland (et.al) describe a series of experiments in which rule-based learning is compared to example-based learning. [65] Their findings are that persons who are subjected to example-based learning do about as well as persons given only rules. The best results are obtained by a combination of the two methods. [66] In my opinion, their results are not conclusive. They use as subjects college students who (presumably) have been exposed to abstract reasoning. In such cases, the rules themselves can function as prototypes. This occurs in people who are used to working with symbolic notation, for example, students with a substantial computer science or mathematics background. Further experimentation would more neutral subjects would be useful.

Connectionist systems can be shown to learn by example. In one instance, a network was trained to predict the order of words in a sentence by having been given examples of correct sentences. [67] The idea here is that different types of words, for example, nouns, verbs, and so on, are used in different contexts. A given class of words, say, a noun, will be used in similar contexts. Words are clustered according to the similarity of the contexts in which they appear. Clustering is described above. When a similar context appears in the future, a pool of words is available for use. This pool consists of words which tend to be employed in similar contexts. Selection of the exact word may depend on broader constraints, for example, visual input.

Let me emphasize that while appropriate word selection may look like rule-based behaviour, it is not necessarily rule-based behaviour, and in connectionist machines it is certainly not rule-based behaviour. As Johnson-Laird writes, "what evidence is there for mental representation of explicit grammatical rules? The answer is: people produce and understand sentences that appear to follow such rules, and their judgments about sentences appear to be governed by them. But that is all. What is left open is the possibility that formal rules of grammar are not to be found inside the head, just as formal rules of logic are not to be found there." [68]

I would like to suggest at this point that the theory that people learn formal systems by exemplar provides a solution to the Bever-Fodor-Garrett problem described above. Recall that the problem was to explain how people can determine whether or not a given string of letters is a wff in a mirror-image language. The problem for the empirical or connectionist approach was that, in order to explain how this is done, it was necessary to postulate that people follow a set of rules containing abstract [entities]. Yet, since associationism (which, of course, is characteristic o empirical and connectionist systems) is constrained by the "terminal meta-postulate", which stipulates that no term not used in the description of the input can be used in the description of the rule.

It is possible merely to deny the postulate and construct a finite-state algorithm, as Anderson and Bower have done. [69] In such a case it would be necessary to construct abstracts from partial vectors as described above. However, it is much more natural and direct to use examples of mirror-image languages to teach a connectionist system. This would be an interesting test for connectionism (and if it worked, a conclusive refutation of the problem). But I do not believe it will be that simple.

Recall ow Bever, Fodor and garret introduced the language: it is a mirror-image language. When they introduce the language in this way, they call to the reader's mind past recollection of mirrors and how they work. While the language does not, in a technical sense, preserve mirror images (the letters are not reversed), there is in a sense an analogy between the performance of mirrors and wffs in the language. In order to adequately test a connectionist system, this information would have to be provided. Clearly, this would be a complex problem. Let me suggest, however, in the absence of an experiment, that there is no a priori reason why a connectionist system, given the relevant information, could not solve this problem.

In my opinion, this is a problem common to many of the challenges to associationism. It is perhaps true that in a narrowly defined context, no associationist system can solve this or that problem. But humans do not work in narrowly defined contexts. In order to adequately test a connectionist system, it is necessary to provide the context.

E. Operationalism

The way to think of such diverse behaviours as riding a bicycle, speaking a sentence, or solving mathematical equations is to think of such behaviours as learned behaviours, learned from examples and by practice and correction. There is a wealth of literature in diverse areas which makes this same point. Kripke's account of Wittgenstein on rules is explicit about the need for practice and correction. Polanyi represents knowledge as a skill, like riding a bicycle, which can be practiced but not described. Dreyfus and Dreyfus talk about expert knowledge as being, in a manner of speaking, intuitive. Kuhn writes that learning science is not a matter of learning formulae, it's a matter of learning how to solve the problems at the back of the book. educational and psychological literature standardly speaks of knowledge being "internalized".

What I am proposing here has its similarities to a movement in the philosophy of science called "Operationalism". First clearly formulated by Bridgeman [70] it was a modified considerably by Carnap. [71] It is difficult to disentangle early operationalism from some of the Logical Positivist theses with which it is associated, for example, reductionism. In its first formulation, the idea of operationalism is to reduce all physical concepts and terms to operations. What I propose is a modification: all formal concepts and terms shoudl be understood as operations. There are several contemporary versions of operationalism. For example, Kitcher, using Mill's axioms as a starting point, formalizes mathematical knowledge in terms of operations. [72] Similarly, Johnson-Laird describes what he calls a "procedural semantics".

The key objection to operationalism - and indeed, to much of what I am proposing in this paper - was stated by L.J. Russell in his review of Bridgeman. Russell noted that scientists often consider one type of operation to be better than another. Therefore, operations are evaluated according to something over and above themselves. A similar critiism coul be made o Kitcher' axioms. Consider set theory. According to Kitcher, we define a set according to the operations of grouping or collecting. However, the objection runs, some groupings are better than others. For example, we prefer a grouping which collects ducks, robins and crows to one which collects typewriter[s], rocks and sheep. Therefore, something over and above any given operation of collecting is employed in order to evaluate that operation.

This is a very general objection to connectionism and deserved a section of its own.

TNP Part IX Next Post

[62] Pollock

[63] George Lakoff, Women, Fire and Dangerous Things, surveys these results. See also Lakoff's "Connectionist Semantics" from the Connectionism conference, Simon Fraser University, 1990.

[64] That is, I still need to explain counting.

[65] Induction, pp. 273-279. They cite Cheng, Holyoak, Nisbett, and Oliver (1986), "pragmatic versus Syntactic Approaches to Training Deductive Reasoning". Cognitive Psychology 16.

[66] Induction, p.276.

[67] Jeff Elman, "Representation in Connectionist Models", Connectionism conference, Smon Fraser University, 1990.

[68] Philip Johnson-Laird, The Computer and the Mind, p. 326.

[69] John Anderson and Gordon Bower, Human Associative Memory, pp. 12-16.

[70] Logic of Modern Physics.

[71] "The Methodological Character of Theoretical Concepts".

[72] Philip Kitcher, The Nature of Mathematical Knowledge.

[73] The Computer and the Mind.

Friday, March 20, 2009

The New Nature of Knowledge

I have written on various occasions in the past that the nature of knowledge is changing, a premise that is directly addressed - and challenged - by Tony Bates in his blog post, Does technology change the nature of knowledge?

I want to go through his post more or less point by point, not to be annoying, but as necessary in order to unravel a thread of reasoning that, I would argue, leads him astray.

Because, right from the beginning, I think, Bates has an idea that there are different types of writing, and different types of knowledge. He writes, "I should warn you that this is probably not a particularly suitable topic for a blog - an academic paper might be more appropriate to do the subject full justice."

One must ask, right off the bat, what he can mean by that. Because certainly it is not the placement of the body of reasoning into a printed paper and journal-bound form that renders it more appropriate. No, there is a supposition that the type of writing in an "academic paper" is a different type of writing from what he is offering here.

In what way? This begins to be a bit more difficult to pin down. Certainly it is not a matter of references or scholarly ability: Bates's article is filled with both. He is current on the academic literature - much more so than I - and covers his subject with an easy facility. At most, one can suppose it is some matter of the process of academic writing, then? The matter of reviewing and editing? Ah, but no; Bates's blog post could easily fit unedited into almost any journal one cares to name, unless it is a point in principle (and this I have seen) that he reference a particular body of literature that he is not covering here.

To Bates's argument, therefore, I must post this first challenge, that there ios nothing in principle that distinguishes the content of a blog post from that of an academic article. The same content may very well be presented in either, and the difference lies only in how that content is treated: subject to secret review and editing in the one case, and open scrutiny in the other.

Ah - but then, one argues, his case is made: that there is no distinction between knowledge of the past and knowledge of today. No, this is not established: only that the distinction is not one between academic and non-academic writing. The barbarians are not at the gates; they arise from within as well as without.

Bates next captures very nicely the nature of the new sort of knowledge with some asute citation from relevant works in academia: Jane Gilbert, citing Manuel Castells, writes, "knowledge is not an object but a series of networks and flows…the new knowledge is a process not a product…it is produced not in the minds of individuals but in the interactions between people," and Jean-Froncois Lyotard, "the traditional idea that acquiring knowledge trains the mind would become obsolete, as would the idea of knowledge as a set of universal truths. Instead, there will be many truths, many knowledges and many forms of reason."

We see the result, that "the boundaries between traditional disciplines are dissolving, traditional methods of representing knowledge (books, academic papers, and so on) are becoming less important, and the role of traditional academics or experts are undergoing major change," in the graphs that represent the state of knowledge today:


These are points that have been captured in a wide body of writings, from Gibson's depiction of Cyberspace to the perceptron of the 1950s and the connectionist literature of the 1980s to populist works such as Rushkoff's Cyberia and the widely popular Cluetrain Manifesto. It is hard to know where this account originates; everybody (including the academics) as as though they have discovered it for the first time.

What is important is not who came up with the theory (because we know that what I will say is that the theory is emergent from the works of numerous writers) but rather what the salient points are of the theory. From the work just cited, we can identify three major points (and those who care to look will find those points repeated throughout my own writing):
  1. knowledge is not an object, but a series of flows; it is a process, not a product
  2. it is produced not in the minds of people but in the interactions between people
  3. the idea of acquiring knowledge, as a series of truths, is obsolete
These point to a conception of knowledge dramatically from the Cartesian foundation or the Platonic form, a conception of knowledge that challenges even the Aristotlean categpry and the Newtonian law of nature. In particular, what seems to me to be relevant, is that the knowledge thus produced is:
  1. non-propositional, that is, not sharp, definite, precise, expressible in language
  2. non-discrete, that is, not located in any given place or instantiated in any particular form
  3. non-objective, that is, independent of any given perspective, point of view, or experience
We can discuss - and many people have discussed, from people as varied as Wittgenstein and Derrida - how such knowledge assembles (as in a cluster or probability space), flows, inhabits, propoagates, and the rest. I will refer to salient features of this type of knowledge in what follows; let's leave the account of it for now.

Bates identifies a singular feature of knowledge as discussed by Gilbert, Castells and Lyotard: "All these authors agree that the ‘new’ knowledge in the knowledge society is about the commercialisation or commodification of knowledge."

We get to this conclusion through an odd route: "'it is defined not through what it is, but through what it can do.’ (Gilbert, p.35). ‘The capacity to own, buy and sell knowledge has contributed, in major ways, to the development of the new, knowledge-based societies.’ (p.39)"

This is an oblique reference to what might be called a functional definition of knowledge, one that has its roots in the philosophical school of functionalism, "what makes something a mental state of a particular type does not depend on its internal constitution, but rather on the way it functions, or the role it plays, in the system of which it is a part, and this in turn perhaps derived from the Wittgensteinian doctrine of "meaning as use".

But functionalism is very distinct from commercialism, and it is a great leap to infer from a 'definition' of knowledge based on "what you can do" to an assessment of knowledge as a "commodification" - a turn, indeed, that turns the new definition of knowledge on its head, and returns it to the status of object, and in particular, a medium of exchange. The retreat from some account of functionalism, which is more or less accurate, to one of commercialism, is an unjustified turn, and one which should not be accepted without significant dispute.

What would explain it? I would suggest by the fact that networks of knowledge resemble networks of commerce, that there is a similarity between the 'emergent knowledge' and 'the invisible hand of the marketplace', through to the overt endorsement of market logic we see in writers such as Surowiecki's The Wisdom of Crowds. But one should not read into the advocacy of a network theory of knowledge (as we have been describing) anything like a market theory of economics, at least (crucially) not to the degree of mistaking a descriptive interpretation with a causal agent.

Return to the definition of knowledge above. It is not an object (or objective), it is not discrete, it is not a causal agent. It is emergent, which means that it exists only by virtue of a process of recognition, as a matter of subjective interpretation. Mistaking a perception of value with 'value' as an objective driver is a classic mistake of market economics (in my view) and certainly a significant misinterpretation of network theories of knowledge.

But Bates has taken that road wholeheartedly: "I have no argument with the point of view that knowledge is the driver of most modern economies, and that this represents a major shift from the ‘old’ industrial economy, where natural resources (coal, oil, iron), machinery and cheap manual labour were the predominant drivers. I do though challenge the idea that knowledge itself has undergone radical changes."

Let us be clear about the view of knowledge that Bates has explicitly endorsed: one in which knowledge has causal efficacy, one where it is a "driver", more similar to objects (like coal or iron) than ephemera (like attitudes and expectations).

Bates then sets up what we have to uncharitably (but regretfully) call the straw man. Skipping the story, we can read: "in education academic knowledge has always been more highly valued in education than ‘everyday’ knowledge. However, in the ‘real’ world, all kinds of knowledge are valued, depending on the context. Thus while values regarding what constitutes ‘important’ knowledge may be changing, this does not mean that knowledge itself is changing."

To be more charitably, what we have here (I would say) is Bates distinguishing between the two types of knowledge according to the different types of uses to which they are put. This has the merit of being consistent with a form of functionalism, and at the same time allowing two different 'types' of knowledge to be (essentially) the same, but applied in different endeavours.

This, though, nonetheless commits two errors:

- first of all, while endorsing a functionalist definition of knowledge, it assumes an as yet undefended essentialist definition of knowledge (because, if functionalism were true, then two items of knowledge which were put to different uses would in fact be two types of knowledge, since function defines typology).

- second, the depiction of knowledge that I have been calling the network account of knowledge is not simply a functionalist theory of knowledge; it has an entirely different ontology in which the former objects, however defined, no longer exist, and something that is non-discrete and non-localized and non-specific is postulated as performing the function we formerly ascribed (mistakenly) to some sort of discrete entity.

Anyhow, having made the distinction between 'academic' and 'commercial' knowledge, Bates will (with reference to Gilbert) expand on the definition of 'academic' knowledge as "‘authoritative, objective, and universal knowledge. It is abstract, rigorous, timeless - and difficult. It is knowledge that goes beyond the here and now knowledge of everyday experience to a higher plane of understanding…..In contrast, applied knowledge is practical knowledge that is produced by putting academic knowledge into practice. It is gained through experience, by trying things out until they work in real-world situations.’"

In fact, this conflates two distinct types of knowledge:
  1. knowledge that is academic, and
  2. knowledge that is abstract, rigorous, timeless
No doubt there are many academics who would will that academic knowledge be abstract, rigorous and timeless, but in fact the argument is that no knowledge has these properties - we thought it did in the past, but this has in fact changed, and is not longer believed to be the case.

This is an important distinction to make because, first, the properties of being abstract, rigorous and timeless characterize what might be called common, practical, or 'folk' knowledge as much as the ever did academic knowledge, and second, what constitutes 'academic' knowledge is (as we see from the diagram near the head of this post) less and less abstract, rigorous and timeless.

This is what makes it possible to claim that the definition of academic knowledge is "too narrow" - much of what is represented as academic knowledge - "engineering, medicine, law, business" - apply academic knowledge, and academic knowledge (at least when well formulated) is "built on experience, traditional crafts, trail-and-error, and quality improvement through continuous minor change built on front-line worker experience."

There was, in the past, no significant distinction between 'academic' knowledge and 'practical' knowledge except where it was applied: and we could see 'abstract, rigorous, timeless' knowledge equally well in the church service, the farmer's field, or the grandmother's advice on weather. Knowledge was, in all cases, timeless wisdom. Such knowledge was power whether applied to engineering feats or to winning at three card brag.

Bates next considers the applicability of academic knowledge. It's a bit difficult to work with the argument now, since we are at such a fundamental divide, but let's consider the proposition: "my other quibble is that ‘academic knowledge’ is implicitly seen in these arguments as not relevant to the knowledge society - it is only applied knowledge now that matters. However - and this is the critical point - it has been the explosion in academic knowledge that has formed the basis of the knowledge society."

This goes to the point that academic knowledge can be used in a practical - even commercial - context, and therefore must not be distinct even functionally. The purpose to which we formerly ascribed only practical knowledge is found to result from academic knowledge (almost to the point of exclusivity): "It was academic development in sciences, medicine and engineering that led to the development of the Internet, biotechnology, digital financial services, computer software and telecommunication, etc. Indeed, it is no co-incidence that those countries most advanced in knowledge-based industries were those that have the highest participation rates in university education."

Leaving aside the question of whether these advances were in fact developed in academia or through some process we might call the academic method, let me focus on the question of the nature of these advances. Did, in all these developments - the internet, biotechnology, and the rest - did academic contribute abstract, rigorous and timeless knowledge? Certainly, there was some point at which it did. Newton's three laws were classical instances of such. The laws of thermodynamics equally so. And even in the last century, Einstein contributed to the paradigm with E=mc[2]. But recently?

I would argue - and this is a matter for empirical investigation - that the research paradigm based on "abstract, rigorous, timeless" knowledge has stalled, and that what researchers have in fact been harvesting over the last few decades is something much more like network knowledge, as I have described it above. This is a distinct form of knowledge that is not based on simple causality, laws of nature, objective perspectives, and the rest. It is (in the words of Polanyi) tacit and ineffable.

The internet is a classic example. While there are protocols, no law governs how computers interact - this is strictly a matter of agreement and individual choice. In biotechnology scientists are looking at systems and networks in everything from immunology to ecology. Financial services proves to be based on, well, Ponzi schemes rather than anything that might be called 'timeless'. And telecommunications are based on laws that have been known for decades, depending more and more on protocol and agreement, rather than natural law, for improvements.

Indeed, the sorts of knowledge that Bates identifies as important resemble more and more dynamic, interpretive, chaotic types of phenomena - our capacity to, as Rushkoff said, not navigate or surf through a dynamic information field, as though it were a gigantic wave (or office block parking garage), rather than an attempt to capture and hold:"it is not just knowledge - both pure and applied - that is important," he says, "but also IT literacy, skills associated with lifelong learning, and attitudes/ethics and social behaviour." But the point is: these are types of knowledge - they are, indeed, the new literacy, 21st century literacy.

The problem is, Bates hasn't let go of the old account of knowledge, the one with abstract, rigorous and timeless truths, knowledge based on objects, the acquisition of content. He writes, "My point is that it is not sufficient just to teach academic content (applied or not)." No, it is not sufficient to teach this type of (old-style) knowledge. It is (arguably) not even necessary. Because what we want are the new skills, based on the new more formless type of knowledge, skills that allow people to et by when nothing is abstract, rigorous, timeless: "the ability to know how to find, analyse, organise and apply information/content within their professional and personal activities, to take responsibility for their own learning, and to be flexible and adaptable in developing new knowledge and skills."

But Bates doesn't admit of this; he explicitly rejects it. "These skills and attitudes may also be seen as knowledge, although I would prefer to distinguish between knowledge and education, and I would see these changes more as changes in education. What is changing then is not necessarily knowledge itself, but our views on what educators need to do to ‘deliver’ knowledge in ways that better serve the needs of society."

This may be the case if, as he suggests, we are simply facing an explosion of new knowledge. But while we are seeing an explosion of content, our stock of abstract, rigorous and timeless truths remains constant - indeed, arguably, it has been on the decline, as we realize more and more tht the laws and principles of nature that we took for granted were at best approximations of reality and at worst projections of our own thoughts, values and beliefs on nature (how else does one explain an economic system based on the infinite expansion of capital?).

What we are experiencing a proliferation of is points of view, and with each iteration of points of view it becomes apparent that the former world in which there was only one (authoritative, lawlike and Catholic) point of view is more and more misrepresentative. The new form of knowledge is a recognition that the propositions in our content, no matter how apparently abstract, rigorous and timeless, are in fact not knowledge, but merely more sea through which we must navigate.

This is why we must change our educational system, indeed, even as Bates says, "moving away from a focus on teaching content, and instead on creating learning environments that enable learners to develop skills and networks within their area of study." Because, contra Bates, content is not still crucial (more, more accurately, no particular bit of content is crucial) and academic values that propel enquiry toward abstract, rigorous and timeless truths are not only obsolete, they are dangerous.

Indeed, I would argue even that what might (again) be called 'academic method' is itself under siege. Bates writes, "we need to sustain the elements of academic knowledge, such as rigor, abstraction and generalization, empirical evidence, and rationalism." But these very principles misconstrue what it means to reason - the practices of abstraction and generalization, for example, ought to be understood not as mechanisms for finding more truth (as the old inductivist interpretations made out) but are rather ad hoc means of creating less (but more manageable) truth.

The very forms of reason and enquiry employed in the classroom must change. Instead of seeking facts and underlying principles, students need to be able to recognize patterns and use things in novel ways. Instead of systematic methodical enquiry, such as might be characterized by Hempel's Deductive-Nomological method, students need to learn active and participative forms of enquiry. instead of deference to authority, students need to embrace diversity and recognize (and live with) multiple perspectives and points of view.

I think that there is a new type of knowledge, that we recognize it - and are forced to recognize it - only because new technologies have enabled many perspectives, many points of view, to be expressed, to interact, to forge new realities, and that this form of knowledge is emerged from our cooperative interactions with each other, and not found in the doctrines or dictates of any one of us.

Tuesday, March 10, 2009

TNP 7. Associationism: Cognitive Structures

The Network Phenomenon: Empiricism and the New Connectionism
Stephen Downes, 1990
(The whole document in MS-Word)

TNP Part VI Previous Post

VII. Associationism: Cognitive Structures

A. Objections to Associationism

Above, I have outlined what I mean by associationism and sketched some objections. At the risk of repetition, I would now like to describe these objections in greater detail. By considering these objections, I will be able to describe a theory of associationist inference in more detail. This description depends to some extent on some of the conclusions already established regarding representations and perceptions, and will be employed below in a discussion of language and logical inference.

The general form of objections to associationism is as follows: people have the ability to know or do X, associationism is not sufficiently powerful to explain how people know or do X, therefore, people employ some means of knowing or doing X other than associationism. For example, "We know that the external world exists. However, empiricism (which depends on associationism) cannot prove that the external world exists. Hence, we must have some non-empirical means of knowing that the external world exists."

As an example of this form of argument, consider the following from Leibniz's New Essays. "The senses, although sufficient for all our actual knowledge, are not sufficient to give it all to us, since the senses never give us anything but examples, that is, individual or particular truths. Now all the examples which confirm a general truth, whatever their number, do not suffice to establish the universal necessity of that same truth.... necessity truths... must have principles whose proof does not depend on examples, nor consequently on the testimony of the senses." [51]

As another example of the same sort of argument, consider Chomsky. He argues, correctly, that certain features of language use, for example, transformation, depend on knowledge of the structure of a given sentence in the language. Step-by-step inductive operations (that is, those which employ finite state devices) are inadequate to produce this knowledge. Therefore, we must have this knowledge independently of experience. It is innate, perhaps, or the product of evolution, and is not learned from experience. [52]

Bever, Fodor and Garrett also describe what they call a formal limit to associationism. [53] According to these authors, we are able to recognize that a certain string of characters is a well-formed formula (wff) in a language L (L) only with respect to a set of rules which contain abstract character. Since association is subject to what they call the "terminal meta-postulate", which asserts that associationist rules may be described only in those terms which describe behaviour, no associationist principle may contain an abstract character. [54] Therefore it follows that on the basis of associationist principles alone we cannot determine whether or not a given string of letters is a wff in L.

These arguments are all valid arguments. Thus, in order to refute them, it is necessary to show that either the first premise is false or the second premise is false. Which of these two options we employ will vary according to circumstances. In general I take the following route. Those arguments which assert that we have this or that knowledge are refuted by a denial of the first premise; I argue that we have no such knowledge. Those arguments which assert that we have a demonstrated capacity I refute by a denial of the second premise; I argue that associationism can produce such a capacity.

B. Scepticism and Knowledge Claims

let me consider only briefly instances of the first sort of refutation. Consider Leibniz's argument, stated above, that the "universal necessity" of some general truths must be known by some means other than the senses. One part of Leibniz's argument is certainly correct: we do not arrive at such knowledge from the senses. Further, it could be taken as arguable that we do not even know general principles, such as laws of nature, from the senses, nor can we even establish that one or another such principle is probably true. In my opinion, Popper's arguments on this point are conclusive. [55]

Contra Leibniz, I argue that we do not have any cognitive access to any such universal necessity, and therefore, do not in fact know that this or that principle is universal or necessary. Here is my argument.

Leibniz's own theory of necessity and possibility is very similar to that which we employ today: a proposition is necessarily true if and only if it is true in all possible worlds. Now either possible worlds are something which we create in our own minds or they are not. If they are, then while we may be certain that a given proposition is true or not true in all (conceived) possible worlds, since it may be the case that there may be possible worlds which we have not thought of yet (alternatively: since there are worlds which we cannot imagine), then our knowledge that a proposition is true in all (conceived) possible worlds is insufficient for us to know that it is universally or necessarily true. Thus, whatever we know about possible worlds in our own mind is distinct from the possible worlds in question. Hence, our knowledge about possible worlds might be incorrect. So even if a proposition is true in all (conceived) possible worlds, we cannot know it is true in all possible worlds. therefore, we cannot know that any proposition is universally or necessarily true.

It is of course true that there are some things which we can know, for example, I know that I exist. What I am arguing here is that experience, for example, my experience of myself, is sufficient to establish those things which I do know. Scepticism serves as a good rough-and-ready means of distinguishing what I know from what I don't. In general, those things which it is claimed that we know and which associationism cannot prove (that is, for which we cannot construct associative processes for knowing) are those things that can be undermined by a sceptical argument.

There is an alternative approach for those people who don't like scepticism. Suppose it is claimed that we know some proposition, say, that the ground will not disappear under my next step. Instead of asking how we know (for which there is probably no answer, but this is the sceptical move to be avoided) we ask how we know that we know. In such cases, typically, it is necessary to argue that we behave as though we know (direct introspection tends to be unconvincing in such cases and is the only alternative answer). But now it is not necessary to explain the knowledge; it is only necessary to explain the behaviour. Connectionism allows that a person can behave in this or that way without ever knowing the principle which underlies the behaviour. Thus, we can respond to an apparent knowledge claim by saying not only that we can't know, but further, that we don't need to know. (Human beings managed to stay attached to the Earth without difficulty for centuries prior to the discovery of gravity.)

C. Association and Cognitive Capacities

In general (exceptions noted), scepticism can refute any knowledge claim. Thus, the only means of establishing that associationism is inadequate to explain human cognition is to establish that we have some demonstrated capacity which, in principle, could not have been produced employing associative mechanisms.

The "in principle" part of the argument is the tough part to establish. Above, I have sketched a new theory, connectionism, which employs associationist principles. Although the exact limits of this new theory are difficult to define, nonetheless, first, we know that it is a very powerful theory, and second, we know exactly how it works. Hence, we are now in a position to describe in detail associationist mechanisms for producing previously unexplainable behaviour (unexplainable, that is, except with reference to some innate knowledge or capacity).

At th core of my objection to such as Fodor and Chomsky is a related theory which I have sketched above, specifically the theory which asserts that cognition does not necessarily proceed according to rules and clear and distinct categories. Therefore, it will not do to argue that associationism must produce a principled mechanism for performing this or that cognitive feat. All that is necessary is that some mechanism be described, even if we allow that particular instantiations may vary, perhaps considerably. (This latter should be expected for human capacities vary considerably.)

The theory I wish to propose in response to the Fodor-Chomsky argument has two parts. In the first part, during the course of experience, human beings detect repeated experiences of similar phenomena. From these, characteristic or prototype representations of those phenomena are constructed. Then, in the second part, these prototypes are employed to produce the cognitive behaviours various philosophers have argued cannot be created by association.

D. Essences and Accidents

It is to me a mystery why people argue that an abstract is something different from an experience. Let us examine how we developed a theory of abstractions in the first place. Its origin is Aristotelian, though it receives its clearest formulation in Medieval philosophy. In order to examine essences, let us consider, for example, the essence of something concrete, say, Socrates.

Medieval philosophers such as Ockham and Scotus agreed that Socrates was composed of two parts: his essence, and his accident. His essence is that attribute which Socrates must possess in order to be Socrates. His accident is that set of features which are not necessary particular to Socrates. We might say that the essence is that which continues, unchanging, to be Socrates, and his accident is that which may change from time to time without changing the fact that Socrates is Socrates. For example, Socrates is essentially human, but only accidentally snub-nosed.

So, for example, Ockham characterizes Scotus's view as follows: "a nature is this by something added that is formally distinct (from the nature)". [56] the 'something added' is called a "contracting difference", which "contracts it (the nature) to a "determinate individual". The word 'contract', or in Latin, 'contrahere', is, for example, to apply the genus to some species, of some species to some individual. For example, 'Socrates contracts the species of humanity'. [57]

The point I wish to emphasize here is that Socrates, the single individual, is composed of two parts: the essence and the accident. If we take away the accident, then we have the essence. For any given experience, it is no difficult matter to take away that part of the experience, particularly if that experience consists of, as I have suggested above, a set of activations of neural cells. If only some of those cells activate a further set of cells, the we have succeeded in taking away some of the experience. So we can, via a connectionist process, construct something which could be the essence of Socrates. We do so by deleting from the representation some or another features of Socrates, for example, his snub nose.

A key point: this essence just is what we mean by an abstract. The debate between Ockham and Scotus illustrates the contemporary debate concerning abstracts. According to Scotus, the essence of Socrates exists. [58] Socrates just happens to be a "contraction", or a particular instantiation, of that essence. Other human beings, for example, Aristotle, are different instantiations of that same essence. For after all, both Aristotle and Socrates are essentially human. Ockham's response to Scotus is well known in its outline. If Scotus is right, then we have two distinct types of entities: particular things, for example, Socrates, and essences, for example, humanness. However, as a methodological principle, it is better not to multiply entities beyond necessity. Since we do not have to postulate some independently existing essence, it follows that we should not.

Some philosophers, for example, Kripke, apparently still believe that there are independently existing essences. [59] Most philosophers do not. From my point of view, it does not matter whether essences have independent existence. The question is whether or not, by virtue of experience alone, we can detect them. I argue that we can, and I argue that the process just is as described above: we strip the accidental features from a given experience, and are left with a representation of the essence.

E. Evaluation of Essences

Where the real dispute lies, in my opinion, is whether there is one and only one set of permissible essences. For example, it is arguable that Socrates is essentially human. But it is also arguable that Socrates is essentially snub-nosed. There are several ways to pose this question. Must we identify one, rather than another, set of essences of things? Is some or another set of essences better? Or is the determination of essences ad hoc and random? In my opinion, some types of essences are better than others, but there is no [one] way that we must define the essences of things.

I believe that the essence of Socrates is the way that Socrates is similar to other things, and that the accident of Socrates is the way in which he is different. For example, Socrates is similar to Aristotle in that they are both human, yet they are different in that only Socrates is snub-nosed. The reason why humanness is a better essence than snub-nosedness is that snub-nosed and non-snub nosed people are otherwise very similar, while humans and non-humans tend to be quite different.

Another way of saying the same thing is as follows. Recall that a given representation, say, of Socrates, consists of a set of connections between a given unit and some set of units, and that this set of connections may be represented as a vector. See figure 14. These vectors may be more or less similar, for example, "1011" is more similar to "10010" and less similar to "00001".

Now suppose that we have the following set of vectors:


These vectors can be clustered according to similarity



It is by virtue of and only because of these clusterings that this or that identification of an essence is to be preferred. [60] In the former case, we may have the essence:


and in the latter:


The "x"s in this example indicate that there is no connection between a given unit in the vector and the unit which represents the essence. There are partial vectors; see figure 15.

We can produce a measure of the 'betterness' of a given essence by considering, first, the number of "x"s in a given vector, and second, the number of instances of the given essence. Suppose there are n instances of "111xxx" and there are m "x"s (in this case, m=3). Then, to use a simple example, the betterness b of "111xxx" is b=f(n,m) where f is a betterness function.

It is worth noting that this system of betterness is exactly what we would expect from a connectionist system. Take any unit "i" which is connected to a set of other units. The fewer the number of x's the greater the number of input units, hence, since input is summed, then (other things being equal) at any given time t, an essence with fewer x's will have greater activation than one with more x's. Second, if a given vector is activated frequently, then (other things being equal) a unit the activation of which depends on the activation of that vector will be activated more frequently. Since in connectionist systems, unit activation values tend to decay, then the more frequently a unit is activated, the higher its activation value will be. The function f takes into account the decay rate and the rest position toward which the unit tends to decay.

F. Abstractions, Categories, and Prototypes

What I wish to point out immediately is that an essence, defined above as a vector with some "x"s, just is an abstraction. The more "x"s a given essence has, the more abstract it will be. Abstractions, by virtue of the fact that they have many "x"s, tend at first glance to not have very much betterness; they hardly correspond to any input activation (ie., experience) at all. However, since they are so frequently activated, this initial weakness is overcome.

The definition of a category can proceed with reference to the essence or the abstract feature of the members of that category. A category just is the set of those instantiations which result in the activation of, say, vector "111xxx". This is a normal and standard type of definition of categorization: the necessary and sufficient conditions for membership in any given category will be the set of activations which correspond to "111xxx". But the story does not end there.

Suppose we have a given category, the essence of which is activated by "111xxx". However, since partial vectors can result in the activation of a given unit, the unit will be activated by "110xxx". In this case, the activation will be only two thirds as strong as in a normal case. But since this is possible, no one of the units will be a necessary condition for the activation of a given essence-unit. If the clustering is such that there is no other place to put an instance of "110xxx", then we will typically assign whatever corresponds to "110xxx" to the category defined by "111xxx". Note that we have not defined a new category "11xxxx", since the third spot on the vector remains connected. Rather, we have extended what counts as an instance of "111xxx". See figure 17.

To change the example so slightly now in order to make the next point, suppose we have a category defined by "11111x". Any and all of the following will stimulate activation of that essence:


and so on. It is clear from this example that some sets of activation are better than others, that is, they result in a greater activation of the essence-unit. In this case, the activation of


will create the strongest activation. Whatever it is which corresponds with this vector constitutes a "prototype" of the category defined by "11111x". [61]

Human beings actually do this. Consider, for example, the category "bird". Birds are grouped into a given category because they have some features in common, for example, they are cold-blooded, lay eggs, have wings, beaks and claws, fly, and the like. Some birds, such as robins, have all of those features. A robin is therefore a prototypical bird. Others, for example, penguins, have most but not all of these features (they don't fly). While they are still birds, we do not consider penguins to b prototypical birds.

Think about this. Imagine a "dog". Now - did you imagine a collie or German shepherd, or did you imagine a Mexican hairless?

G. Are There Real Essences?

The one objection I can think of to this sort of story is that there are "real" essences which, first, do not correspond to any given experience, and which, second, we must employ in order to construct our system of categorizations. This objection is first raised by Descartes and has its modern instantiation in Kripke.

In my opinion, whether or not there are real essences does not matter. Suppose they exist. Either we detect them or we do not. If we do not, then we have no means of employing them in order to construct categories. Therefore, if they are of any importance at all, then we must detect them. Suppose we detect them. Then we either detect them as thy are, or we do not. If we detect them as they are, then whatever they are (according to connectionist theory) will be reflected in our actual system of categorizations. If we do not detect them as they are, then the way they are does not affect our categorization. Therefore, the only case in which real essences can affect our system of categorization is a case in which, first, they exist, and second, we detect them as they are.

Suppose they exist and we detect them as they are. Either we detect them through the sense or we do not. Suppose we believe, like Descartes, that we do not detect them though the senses. Then they must be, as Descartes suggests, innate. If they are innate, however, then there could be no disagreement regarding the best system of categorization (recall that we are detecting them as they are). However, there is such a disagreement, for I disagree. Therefore, they cannot be innate. Thus, we must detect them by experience.

If they are detected by experience, however, since what we experience is distinct from that which is experienced, then even if we detect them as they are, we cannot ever know that we detect them as they are. Therefore, whether or not we detect them as they are is irrelevant, for all we can work with is the experience. This is exactly what I am proposing.

Finally, let me propose the following challenge to those people who propose that there are real essences and that we detect those essences via some non-empirical mechanism. Since according to the theory I have proposed I have an exact and clearly detailed mechanism for identifying and evaluating different schemes of categorization, then let me challenge those who propose an alternative mechanism to detail exactly how these categorizations are detected and how disputes concerning the relative merits of different systems of categorizations are to be evaluated. There is only one condition to thi challenge: the system cannot refer to experience in order to detect and evaluat systems of categorization. I propose that it cannot be done.

TNP Part VIII Next Post

[51] G.W. Leibniz, New Essays Concerning Human Understanding, pp. 42-44.

[52] Noam Chomsky, Syntactic Structures. Cited in P. Johnson-Laird, The Computer and the Mind, pp. 306-314.

[53] T.G. Bever, J.A. Fodor, M. Garrett, "A Formal Limit of Associationism", from Verbal Behaviour and General Behaviour Theory, T.R. Dixon and D.L. Horton, editors. Prentice-Hall, 1968.

[54] J.R. Anderson and G.H. Bower, Human Associative Memory; A Brief Edition, p. 15.

[55] Karl Popper, The Logic of Scientific Discovery and Postscripts, pp. 363-366.

[56] William of Ockham, Ordinatio, from martin Tweedale (ed. trans.) "Selections from William of Ockham's Ordinatio Concenring Universals." Mss.

[57] Richard McKeon, Selections from Medieval Philosophers, Vol. 2, p. 441.

[58] though not independently, that is, not in the absence of a contraction. John Duns Scotus, Opera Omnia, Vol. XVI, sec/ 275. See also Martin Tweedale, "Dpoes Scotus' Doctrine on Universals Make any Sense?", p. 104.

[59] Sail Kripke, Naming and Necessity, p. 127. After asserting that gold is essentially element 79, he writes that "According to the view I advocate, then, terms for natural kinds are much closer to proper names than is ordinarily supposed."

[60] See Jeff Elman, "Representation in Connectionist Models", Connectionism conference, for an account of how clustering occurs according to word functionality. See also George Lakoff, Women, Fire and Dangerous Things, ch. 2.

[61] This is a simplified version of the theory proposed by Anderson and Mozer in "Categorization and Selective neurons: in Anderson and Hinton, Parallel Models of Associative Memory, pp. 213-236.