Computation means many things to many people. Not unlike ‘Linguistics’. This is, probably, the source of many vitriolic diatribes and meaningless quackery in both disciplines. We are only concerned with two very specific senses of each.
Computation & Computationalism
Computation, in its purest sense, refers to any calculation performed by following a set of non-ambiguous instructions, or an algorithm. A system that so functions, then, is a computational system. Such systems require information that is stored in a systemic fashion, using variables. These variables form a core component of any computational system, and are represented in its memory to be worked upon. Hence, computational systems are also referred to as representational systems. Variables can be assigned specific values, depending on the specific requirements of a task, and rules of computation operate on these variables to combine and recombine them to yield new forms. The variables that go into this process are inputs to the system. The results that are attained from such operations are their outputs. A computational system is, thus, conceived of as an Input-Output (I-O) system. Computational rules are syntactic statements, meaning that they have a strict grammar that specifies how to treat input variables, how to manipulate them and in what order, and determines what the output is going to be. Computationalists typically assume what one might call the formal-syntactic conception of computation (FSCC). The intuitive idea is that computation manipulates symbols in virtue of their formal syntactic properties rather than their semantic properties.
There is more than one way to approach a computational paradigm, however. But we find that David Marr’s approach to computationalism, elaborated in his famous book Vision (1982), to be the most applicable to an explanatory account of cognition. Marr recommends a level-based approach to figuring out solutions to a cognitive problem. Level-1, or the computational level, is a statement of the problem to be solved. Level-2, or the algorithmic level, spells out the data-structure of the required informational content, and the different ways such structuring could be achieved so as to make the solution possible (i.e. how many different algorithms can solve the problem). Level-3, or the implementational level, spells out the constrains arising from the substrate on which computations are to be performed. We feel that Marr’s insight is inherently applicable to the most complex issue in Cognitive Science - the triggering of natural languages in the human brain. A Marrist framework helps spell out the problem clearly, separates the question from the different possible answers that can be possible candidates for a viable solution, while also being sensitive to the need for proper linking hypotheses to connect the computational and algorithmic levels to their implementational mechanisms.
Linguistics, not Languistics
What is Linguistics a study of? That question is likely to open a rather large can of worms, but it is important to note certain peculiarities that surround the term. A brief search online would lead to certain recurring bumper-sticker type statements regarding linguistics. Something like:
Linguistics is the scientific study of human languages.
Noam Chomsky is often considered Father of Modern Linguistics.
While we wholeheartedly agree with the second bumper-sticker, given that Chomsky’s works were primarily responsible for launching the Cognitive Revolution, we find the former statement utterly misleading.
The Cognitive Revolution, to which the birth of modern linguistics (a rather badly defined umbrella) is often traced, was the result of a number of developments. Around MIT scientists like George Miller would soon discover that Shannon’s Information Theory, based on Markov processes, suffered from numerous limitations, with often times the probabilities measuring information transition appearing to be more interesting than their logarithmic values, and neither affording much insight into their causal roots (i.e. the psychological processes responsible for the observed patterns).
Jerome Bruner’s early experiments also helped confirm what Jerry Fodor would later eloquently sum up thusly — there is no learning without prior knowing. In other words, children do not construct hypothesis, they merely test pre-existing ones. In a similar vein, Jerry Fodor mounted a passionate defense of folk psychology within a physicalist framework, arguing that mental priors and innate knowledge is essential for accounting for higher cognition. Similar observations poured in from mainstream biology, especially from the works of such biologists as Jacob, Luria and Pollack.
Observing the obsession in psychologists of the day with recording a complete history of input stimulus and accompanying behavior, Chomsky writes:
“Putting it differently, anyone who sets himself the problem of analysing the causation of behavior will (in the absence of independent neurophysiological evidence) concern himself with the only data available, namely the record of inputs to the organism and the organism’s present response, and will try to describable the function specifying the response in terms of the history of inputs. This is nothing more than the definition of the problem. There are no possible grounds for argument here, if one accepts the problem as legitimate, though Skinner has often advanced and defended this definition of a problem as if it were a thesis which other investigators reject. The differences that arise between those who affirm and those who deny the importance of the specific “contribution of the organism” to learning and performance concern the particular character and complexity of this function and the kinds of observations and research necessary for arriving at a precise specification of it. If the contribution of the organism is complex, the only hope of predicting behavior even in a gross way will be through a very indirect program of research that begins by studying the detailed character of the behavior itself and the particular capacities of the organism involved.”
— Noam Chomsky, A Review of B.F. Skinner’s Verbal Behavior —
The introductory lines of Syntactic Structures, a book that grew out of his initial attacks on behaviorism, aptly summarises Chomsky’s intended alternative to approaching the psychological basis of mental processes:
Syntactical investigation of a given language has as its goal the construction of a device for producing the sentences of the language under investigation.… The ultimate outcome of [such] investigations should be a theory of linguistic structures in which the descriptive devices utilised in particular grammars are presented and studied abstractly.… One function of this theory is to provide a general method for selecting a grammar for each language, given a corpus of this language.
— Noam Chomsky, Syntactic Structures —
Unlike the behaviorists, Chomsky’s rationalist approach is concerned less with attested patterns in behavior, and more with the open-ended ‘creativity’ with which children seem to manifest their linguistic abilities . This is reminiscent of Bertrand Russell’s inquiry on how is it that human beings whose contact with nature is so brief and transient, comes nonetheless to learn so much of it? The very idea of rejecting behavioral patterns in language use in favor of uncovering a causally explanatory account of what enables Language, and how, involves a shift in perspectives that requires a complete break from earlier assumptions. Chomsky explains this in several writings and interviews:
The child, placed in a linguistic community, is presented with a set of sentences that is limited and often imperfect, fragmented, and so on. In spite of this, in a very short time he succeeds in "constructing," in internalizing the grammar of his language, developing knowledge that is very complex, ..
-- Noam Chomsky, Language and Responsibility --
So the obvious hypothesis is that our language is the result of the unfolding of a genetically determined program.
-- Noam Chomsky, KBS TV Kyoto, Japan --
Taken together, these bits and pieces lead us to the conclusion that the task at hand concerns investigating the underlying biological substrates that support Language. Not unlike how Physics would study the source of gravity, as a natural phenomena, rather than make records of all the things that have undergone free-fall under gravity (for a recent take on whether this issue has been properly addressed till now, see Bever, T. G. 2009a,b,c). This would seem to imply that our first bumper-sticker should be revised a little bit:
Linguistics is the scientific study of the human language faculty.
The objective of Linguistics should be, if we are to take our perspectives from the cognitive revolution, an account of the biological ability for Language. In his blog, The Faculty of Language, Norbert Hornsteins contrasts this perspective with the idea that descriptive grammars of different languages, within some unifying framework, is what the theorist should pursue. This latter tendency he terms Languistics. Our interest in Linguistic Theory is motivated primarily by this important contrast made in Hornstein’s Lament. The tools and methods of generative grammar were conceived as instruments to probe a biological ability (Universal Grammar), a functional organ of the mind/brain, and as such they need to be amenable to biological and natural scientific inquiry. They were not, however, intended to be used for data-fitting language typology. To use them, and retro-fit them, to attain descriptive coverage over typology is to use cognitive tools to philological ends. This has, in some ways, become a twice-told tale, however. The following excerpt from Hornstein’s Lament aptly sums up our experience with mainstream linguistic publications (generative or otherwise):
The papers I’ve read have roughly the following kinds of structure:
1 . Phenomenon X has been analyzed in two different ways; W1 and W2. In this paper I provide evidence that both W1 and W2 are required.
2. Principle P forbids structures like S. This paper argues that phenomenon X shows that P must be weakened to P’ and/or that an additional principle P’’ is required to handle the concomitant over-generation.
3. Principle P prohibits S. Language L exhibits S. To reconcile P with S in L we augment the features in L with F, which allows L to evade P wrt S.
In each instance, the imperative to cover the relevant data points has been paramount. The explanatory costs of doing so are largely unacknowledged. Let me put this a different way. All linguists agree that ceteris paribus simpler theories are preferable to more complex ones (i.e. Ockham shaves us all!). The question is: What makes things not equal? Here is where the trade-off between explanation and description plays out.
MP (minimalist program) considerations urge us to live with a few recalcitrant data points rather than weaken our explanatory standards. The descriptive impulse urges us to weaken our theory to “capture” the strays. There is no right or wrong move in these circumstances. Which way to jump is a matter of judgment. The calculus requires balancing our explanatory and descriptive urges. My reading of the literature is that right now, the latter almost completely dominate. In other words, much (most?) current research is always ready to sacrifice explanation in service of data coverage. Why?
— Norbert Hornstein, Hornstein’s Lament—
The point Hornstein makes is crucial, and elegant in its simplicity. Consider this metaphor: You are staring at a computer desktop with a blue triangle on the screen. You know, intuitively, that there must be a script somewhere that is drawing that triangle, and making it blue. How do you go about trying to understand the script, and its grammar? How many such triangles are you going to want to see? Or is this perhaps barking up the wrong tree? We are primarily interested in understanding the nature of the script, and the syntax that governs the language in which it is written. In linguistic terms, this translates into trying to understand the limits of natural language combinatorics, and the constrains that make those limits what they are. In this view Language is no different from any other biological organ/ability, with its structural and functional limitations. Some of those limitations are bound to stem from the very general principles of nature, particularly those of mathematics and physics, lending to the formal properties of linguistic processes. Yet others stem from Language being implemented on a biological substrate. The anatomical and functional properties of the brain, as well as its evolutionary history, substantially constrain the type of computations it can support, and lend to the peculiarities involved in processing of information encoded by linguistic structures. Linguistic Theory, in our opinion, should be concerned solely with spelling out the computational nature of Language, their biological implementation and appropriate linking hypotheses between them.
The term Biolinguistics was coined by Massimo-Piatelli Palmarini for a seminal meeting held in the early 1970s between Noam Chomsky and biologists such as Salvatore Luria. Chomsky explains the biolinguistic perspective:
The biolinguistic perspective views a person’s language in all of its aspects – sound, meaning, structure — as a state of some component of the mind, understanding “mind” in the sense of 18th century scientists who recognised that after Newton’s demolition of the “mechanical philosophy,” based on the intuitive concept of a material world, no coherent mind-body problem remains, and we can only regard aspects of the world “termed mental,” as the result of “such an organical structure as that of the brain,” as chemist-philosopher Joseph Priestley observed. Thought is a “little agitation of the brain,” David Hume remarked; and as Darwin commented a century later, there is no reason why “thought, being a secretion of the brain,” should be considered “more wonderful than gravity, a property of matter.” By then, the more tempered view of the goals of science that Newton introduced had become scientific common sense: Newton’s reluctant conclusion that we must be satisfied with the fact that universal gravity exists, even if we cannot explain it in terms of the self-evident “mechanical philosophy.” As many commentators have observed, this intellectual move “set forth a new view of science” in which the goal is “not to seek ultimate explanations” but to find the best theoretical account we can of the phenomena of experience and experiment (I. Bernard Cohen).
— Noam Chomsky, Biolinguistics and the Human Capacity —
By definition, biolinguistics = bio + linguistics. So, how do we ensure that a biolinguistic exploration is appropriately biological, capable of providing explanatory accounts of all aspects of natural language in a naturalist framework, and defining its structural and functional limits? Here referring to Marr’s levels allows us to properly categorise the problems of linguistic theory. At level-1, we have the computational statement of the problem to be solved — how do children attain so much from so little ? Level-2 concerns the data-structure and informational complexity of language - what kind of informational content are used by the triggering child to narrow down to a target grammar, and how could they be encoded? Level-3 addresses the constrains arising from the evolutionary, structural and functional considerations of neurobiology - all our formalisms must be implementable on a biological substrate? We think that the most interesting challenge for biolinguistics lies in not just elaborating on the internal composition of the three levels, but more so in formulating proper linking hypotheses between them. This is necessitated also because linguistic primitives are not straightforwardly reduceable to neurobiological ones. Nor do linguistic and neurobiological explanations use similar conceptual granularity.
For the linguistics part, we focus on two primary aspects of the theory— (a) properly elaborating the formal properties of linguistic computations, including finding appropriate granularity-size for the primitives of computation, and (b) causally explaining why they are they way they are, as opposed to some other conceivable way that they could have been. To address the first question, we take cue from Hornstein’s Lament, and following a compelling proposal put forth by Charles Reiss we argue that linguistic theory needs to focus on the right type of data set to test itself. Briefly, Reiss argues that a theory of UG should concern itself with only what is computable. The actual attested linguistic data is only a subset of this larger (computable) set, and as such much of the claims of ‘overgeneration’ are false alarms. A scientific theory can only be evaluated in the appropriate level at which it is constructed. Linguistic theory should not be limited by what is, or has been, attested, but only by what could never be attested. This requires not only refocusing the theory on the proper set of target data (the red oval in the diagram), but also dispensing with such notions as markedness, contrast, well-formedness etc. some of which have no biological interpretation (e.g markedness) and others that are irrelevant for understanding specific types of computations (e.g. contrast is irrelevant for phonology, and phonological theory should make no reference to well-formedness). For a detailed elaboration on these issues see Reiss (2007), Martins (2016), Chomsky (2005), Dawkins (2015) etc.
With regard to the second aspect mentioned above, we argue that linguistic theory needs to meet the demands of explanatory adequacy (see Appendix-I). To begin with, this requires asking questions of appropriate relevance. As Hornstein points out, the urge to cover every possible data point must be avoided. For instance, instead of asking why a certain sentence in language X does not seem to obey Principle-C, the relevant questions should look something like this:
Why are there three binding principles?
Why those specific three?
Why are two of them in complementary distribution?
Why the x-bar schema, as opposed to some other ones?
A central tenet of minimalism involves the assumption that any computation should be performed only if it is absolutely necessary, and then it must be performed in the most optimal fashion. But if linguistic processes are optimal, what are they optimal with regard to? We argue that here Turing’s observation that “some physical processes are of very general occurrence” is of some importance. If linguistic computations are optimal processes, then linguistic theory should be able to explain exactly how computational optimality is attained. Some interesting research in this direction has already been undertaken by Uriagerika (2000), Medeiros, Bever and Piatelli-Palmarini, and Volenec and Reiss (2016, 2018). The recurrence of Fibonacci numbers throughout the informational structure of linguistic phenomena is well-documented. As it turns out, there are interesting reasons to believe that adhering to Fibonacci-ness (through projections in the x-bar schema) not only results in minimised computational costs (in the form of minimum search operations. cf. Medeiros, 2012), but also that such adherence elicits distinct behavioral responses. Fibonacci numbers are, of course, closely linked to the Golden Ratio, and this pattern recurs everywhere throughout nature.
So, if linguistic phenomena are computational in nature, and linguistics is a natural science, then how (else) do the general laws of nature constrain linguistic computations? What is any particular schema (e.g. x-bar, or mora) good for? How do we take sound waves and translate them into a digital form of information that can be stored in algebraic variables and operated upon by FSCC-type processes? How do these variables translate back into neuromuscular commands that are readable by the sensory-motor systems? What is an appropriate way to characterise the primitives of linguistic computation such that they, both, adhere to general laws of nature as computational entities while also being implementable on a biological substrate?
For the bio part we are interested in ensuring that all biolinguistic explorations remain appropriately biological. To this end, we argue that no notion should be admissible in Linguistic Theory if it cannot have a proper biological interpretations. Consider markedness, for instance. A notion such as markedness is a baggage of the descriptive obsession that Hornstein laments about. It has no place in a biologically plausible theory of Language because it has no biological interpretation. For one thing, neither descent with modification nor a one-step mutation account of evolution would be able to explain why something like markedness would be engineered. Also, markedness makes little sense within a classical computationalist framework. A computational machine is only supposed to generate. It does not generate more or less well-formed structures, because by definition anything that is the output of a computational process has adhered to the syntax of the required computation, and hence it is well-formed. The statistical/typological observations that are grouped under notions such markedness are in no way the concern of a theory of UG. UG is a biological theory, while markedness merely groups together a variety of grammar-external emergent phenomena that have no implications for computation.
A further concern of a biological nature involves finding the right granularity size for linguistic formalisms. While a lot of neurolinguistic literature of late has targeted such broad granularities as Principle C for imaging investigations, it is not at all clear that this is the right size to put under a microscope. Binding as a notion itself consists of subroutines, and instrumental investigations in linguistics should consider independent motivations for picking a specific granularity size to target.
Finally, we are also concerned about the ability of biological substrates to support classical Turing styled computations. We feel that the seminal argument put forth in the Gallistel-King Conjecture (cf. Gallistel and King, 2009) that the ability to store and retrieve numbers is the minimum requirement for computing higher cognition is both formally promising, and empirically validated (cf. Green et al. 2017) . The ability of biological cells to function as classical logic gates open the doors for further biological investigations, including protein-folding as sentence-parcing. We believe that the time has never been better for Biolinguistics to come forth as an independent branch of natural science, and live up to Luria’s biolinguistic suggestions.
So, Computationalist Biolinguistics…
Noam Chomsky in his foundational paper Conditions on Transformation highlights the importance of acquisition of languages by children as a phenomena by linking it to the technical specification of the intuitive notion of explanatory adequacy. Any analysis of linguistic phenomena is said to meet the requirements of explanatory adequacy if and only if it goes beyond mere description of WHAT the child is doing and is able to explain causally HOW the child is doing so, and WHY said process takes a particular trajectory of progress as opposed to several possible ‘others’. A central puzzle identified by Chomsky, and one that has defined linguistic research for the last half a century, concerns the previously mentioned notion of poverty of stimulus which implicates fatally the erroneous worldview that adult knowledge can be achieved by inductively, and through analogy, organising experiences into a tabula which is intially rasa. Chomsky rightly points out a mere description of language typology is prone to overfit UG to specific languages, unless we are mindful of the three factors in Language design (Chomsky, 2005) – Language, in the abstract and universal sense, is a biological ability (like binocular vision), but it’s externalisation in the form of speech is contingent on other systems and processes not unique to the cognitive domain of Language. Thus speech, the mind-external linear form of language, manifests properties other than hierarchical linguistic structures, and attempts at back-calculating the cognitive mind from linear speech forms results in the theory of mental Grammar being burdened with effects of which it is not the cause. The task of the computationalist biolinguist, then, is three-fold – first, she must identify and take seriously the formalisms about linguistic computation, then she must reject the notion of linguistic reality (of structures and dependencies) as real in some ‘other’ sense than the one of ‘psychological reality’(they are all psychologically real, because they are computed in real-time), and finally she must identify possible circuits in the brain (also real in a real sense, or really real!) and then solve the problem/answer the question of how aforementioned real formalisms are implemented in real-time by the biological reality of our neural architectures. It is important to note, however, as Poeppel and Embick do in their seminal works together, that in so doing one must avoid the pitfall of trying to reduce processes to processors.
The Levels of Theoretical Adequacy
The theory achieves an exhaustive and discrete enumeration of the data points.
There is a pigeonhole for each observation.
The theory formally specifies rules accounting for all observed arrangements of the data.
The rules produce all and only the well-formed constructs (relations) of the protocol space.
...the grammar gives a correct account of the linguistic intuition of the native speaker, and specifies the observed data (in particular) in terms of significant generalizations that express underlying regularities in the language.
The theory provides a principled choice between competing descriptions.
It deals with the uttermost underlying structure.
It has predictive power.
A linguistic theory that aims for explanatory adequacy is concerned with the internal structure of the device [i.e. grammar]; that is, it aims to provide a principled basis, independent of any particular language, for the selection of the descriptively adequate grammar of each language.
Theories which do not achieve the third level of adequacy are said to "account for the observations", rather than to "explain the observations."
The second and third levels include the assumption of Ockhamist parsimony. This is related to the Minimalist requirement, which is elaborated as a corollary of the levels, but which is actually employed as an axiom.
Appendix -II: References
Bever, T. G., (2009b) Minimalist Behaviorism: the role of the individual in explaining language universals. In Christiansen, M. Collins & Edelman, S. (Eds.) Language Universals (Oxford University Press). Pp. 270-298 (pdf)
Bever, T.G. (2009c). Remaks on the individual basis for linguistic structures. In Piatelli, M. Of minds and language: the Basque country encounter with Noam Chomsky. Oxford University Press. Pp. 278-295.
Chomsky, N. (2005). Three Factors in Language Design. Linguistic Inquiry, 36(1), 1–22. https://doi.org/10.1162/0024389052993655
Dawkins, R. (2015). Brief candle in the dark: My life in science. Random House.
Gallistel, C. R., & King, A. P. (2009). Memory and the Computational Brain. https://doi.org/10.1002/9781444310498
Green, A. A., Kim, J., Ma, D., Silver, P. A., Collins, J. J., & Yin, P. (2017). Complex cellular logic computation using ribocomputing devices. Nature, 548(7665), 117.
Martins, P.T. (2017). There is no place for markedness in biologically-informed phonology, In Samuels, D.B. (ed.) Beyond Markedness in Formal Phonology, John Benjamins Publishing Company, 2017.
Medeiros, D. P. (2012). Economy of Command. PhD Dissertation, University of Arizona Linguistics Department
Medeiros, D.P. Thomas G. Bever & Massimo Piattelli-Palmarini (2016). “Many important language universals are not reducible to processing or cognition.” Reply to Christianen, Morten H., and Nick Chater, “The Now-or-Never Bottleneck: A Fundamental Constraint on Language.” Behavioral and Brain Sciences 39 (2016) 42-43. PDF
Medeiros, D.P. & Massimo Piattelli-Palmarini (2018). “The Golden Phrase: Steps to the Physics of Language.” In Gallego, Angel J. and Roger Martin, eds., Language, Syntax, and the Natural Sciences. Cambridge: Cambridge University Press, 333-350. PDF
Reiss, C. (2007). Modularity in the “Sound” Domain: Implications for the Purview of Universal Grammar. The Oxford Handbook of Linguistic Interfaces. https://doi.org/10.1093/oxfordhb/9780199247455.013.0003
Uriagereka, J. (2000). Rhyme and Reason: An Introduction to Minimalist Syntax. MIT Press.