The Representation of Compunds and Phrases in the Mental Lexicon: Evidence from Chinese ^*

Yi-ching Su

University of Maryland

(received 11 August 1999, revised 10 November 1999)

Copyright Notice:
First published in Web Journal of Modern Language Linguistics. © 1999 Yi-ching Su.
The moral rights of the author(s) to be identified as author(s) of this work are asserted in accordance with §§.77 and 78 of the Copyright, Designs and Patents Act 1988. This work may be reproduced without the consent of the author, in part or in whole in any manner and in any medium subject only to the two following conditions:
(a) no charge shall be made for the copy containing the work or the excerpt,
(b) a copy of this notice shall precede the work or the excerpt.

0 Introduction
1 Review and Goals
2 Experiments
3 General Discussion and Conclusion
Bibliography

0 Introduction

In psycholinguistic research on morphology, the main issues have been how morphologically complex words are represented or processed in the mental lexicon. Since in the tradition of linguistic analysis, morphemes are defined as the smallest meaningful units, for the sake of the "economy" of representations, it is plausible to postulate the mental lexicon as a morpheme inventory, and morphologically complex words as integration of morphemic units (the morpheme-based model, e.g. Taft & Forster 1975, 1976). However, problems arise when we consider irregularly inflected words, derivational words and compounds. For irregularly inflected words, although the relationship between the various inflectional forms of a lexeme is semantically predictable, the problem lies in the opacity of the mapping of the forms (e.g. go ~ went ~ gone; mouse ~ mice). For derivations and compounds, there are two problems. First, the existence of derived and compound words is not paradigmatically determined, i.e. they are not productive and thus cannot be reliably predicted. For example, for derivations, there is prepare ~ preparation, but no compare ~ *comparation; for compounds, there are words like snowman and junkman, but no *sleetman. Second, the meaning of a derivation or a compound is often not derivable from the meanings of its morphemic constituents (e.g. recall does not mean "to call again"; junkman does not mean "man made of junk", in the way that snowman means "man made of snow"). Therefore, it seems necessary to posit whole-word representations, at least at a certain level of processing, in the mental lexicon (the whole-word model, e.g. Butterworth 1983; Henderson 1985).¹

Among the three types of morphologically complex words, namely, inflections, derivations and compounds, compounds are distinct from the other two in that "compounding consists in the combination of (two or more) existing words into a new word, while derivation (as well as inflection) consists in the application of a Word Formation Rule to a single existing word" (Anderson 1992:292). Therefore, in the hierarchy of morphological units shown in (1), they are larger than words, but smaller than phrases (Di Sciullo & Williams, 1987:14).

(1) morpheme > word > compound > phrase > sentence

Besides, some compound words have structures that are syntax-like and thus are parallel with syntactic phrases. For example, the compound word blackbird is composed of an adjective and a noun just like the noun phrase black bird, and the compound word pickpocket contains a verb and a noun just like the verb phrase "pick pocket" (Matthews 1991:85-88).² Given this phrase-like property of the forms for some compounds, in the whole-word model it is not clear whether or not there should be a distinction in the representation between compounds and phrases.

The language to be investigated here regarding the representation and recognition of compounds vs. phrases is Chinese. Compared with Indo-European languages, Chinese morphology has very limited inflectional and derivational processes (see Li & Thompson 1983, Chapter 3). However, compounding is comparatively productive in this language. According to research done by the Chinese Knowledge Information Processing Group, there were 42,686 words (by type) and 5,666 characters appearing in a large corpus. Except for very few bisyllabic monomorphemic words like pu-tao ‘grape’), in which the single character does not have meaning by itself, all other characters are meaningful units (therefore they are morphemes or words) and are used repeatedly to form polymorphemic (especially bimorphemic) compounds.

1 Review and Goals

1.1 Review of Previous Studies

Previous studies on the representation of Chinese bimorphemic words (Zhang & Peng 1992; Liang 1992; Zhou & Marslen-Wilson 1994) showed that in general there was a frequency effect for the bimorphemic words as a whole in a lexical decision task, i.e. when character frequency was held constant, bimorphemic words with higher frequency were responded to faster than those with lower frequency. As for whether there is a character frequency effect (that is, whether there are decompositional representations for the constituent morphemes), the results were not consistent. Using auditory presentation for lexical decision, Zhou & Marslen-Wilson (1994) found no morpheme frequency effect at all, and thus concluded that Chinese bimorphemic words are represented as whole units in the mental lexicon. However, using visual presentation for the stimuli, although with different concerns, both Zhang & Peng's (1992) and Liang's (1992) studies demonstrated a morpheme frequency effect. In Zhang & Peng (1992), they found that for coordinative compound words (e.g. diu-qi (throw discard 'abandon')), the frequency of both morphemes played a role in determining the latencies, while for modifier-head words (e.g. nao-zhong(noisy clock 'alarm clock')), only the frequency of the second character (the head) was crucial. Based on this, they proposed that Chinese words are represented as decomposed forms in the (orthographic input) lexicon, and that word frequency reflects the strength of associations of characters. In Liang (1992), compositional (e.g. bi-kong (nose hole 'nostril')) versus idiomatic (e.g. jiu-bao (wine protect 'bartender')) bimorphemic words were compared. In addition to a word frequency effect, she also found a character frequency effect for compositional words, i.e. for both high and low word frequency sets, if the words were made up of high frequency characters, the reaction time would be shorter than those with low frequency characters. For idiomatic words, the same character frequency effect was found for the high word frequency set, but not for the low word frequency set. She thus suggested that characters and idiomatic words are independently stored in the lexicon, but this failed to explain why there was a word frequency effect for compositional words.³

It seems that the three studies exemplify three models of representation for Chinese bimorphemic words, ranging from Zhou & Marslen-Wilson's whole-word representation, to Liang's partly decompositional representation, to Zhang & Peng's completely decompositional representation. However, there are two factors which may account for the discrepancy among the three studies: modality difference and semantic idiomaticity. Regarding modality difference, since Zhou & Marslen-Wilson's study used an auditory presentation for the stimuli, and homophonic morphemes prevail in spoken Chinese, one needs to process the two morphemes as a whole in order to disambiguate its meaning. However, the problem of ambiguity does not exist in the written forms, as there is usually a one-to-one correspondence between a morpheme and a character. Therefore, word frequency may play a more important role in auditory presentation than in visual presentation, and whether there will be any difference in terms of access representation and linguistic representation ⁴ in Chinese when different modalities are used needs further investigation. As for semantic idiomaticity, it was not considered in Zhang & Peng's study. Since the stimuli they used were all semantically quite transparent, the results might not be able to generalize to the representation of opaque compounds (cf. Zwitserlood 1994 showed that fully opaque compounds are not connected with their constituents at the level of semantic representation). Although semantic distinctions were made between compositional and idiomatic words in Liang's study, among the idiomatic words some were slightly opaque, e.g. guan-jia (control house ‘house-keeper’), and others were fully opaque, e.g. feng-li (phoenix pear ‘pineapple’). It is possible that for low character frequency idiomatic items because there were more fully opaque words in the set, she failed to find a character frequency effect there (in fact, the reaction time for this set was shorter than the low word frequency, high character frequency set).

Another problem with these studies is the assumption that compounds function clearly as words in the mental lexicon if there is a word frequency effect, but the notion of "word" is not so clear in Chinese. The stimuli used in all of the three studies were compound words as the Yes responses and meaningless bimorphemic pairs as the No responses in a lexical decision task. However, one characteristic of written Chinese is that there is only a space distinction between characters, but not between words. Unlike English compound words such as blackbird, in which the two constituent morphemes are written as a single unit,⁵ in written Chinese, compound words like bai-cai (white vegetable ‘cabbage’) and phrases like bai-bu (white cloth) cannot be distinguished orthographically by how much space there is between the characters. Besides, phonologically there is no reliable stress distinction between the two types of complex forms. The distinction relies on semantic idiomaticity ⁶ (especially for those with structures parallel to corresponding phrases, such as modifier-noun and verb-noun ⁷). Therefore, as mentioned in Hoosain (1992), while the notion of "word" is relatively clear in English, it is not so in Chinese.⁸ In the two experiments reported in Hoosain (1992), subjects were asked to segment several simple sentences into "words", the results showed that there were substantial disagreements among subjects as to where the word boundaries were in the sentences, and all disagreements involved extensions of word boundaries (i.e. to involve phrases).

In Osgood & Hoosain (1974) and van Jaarsveld & Rattink (1988), the issue of the processing of (lexicalized) compounds versus phrases (or novel compounds) was tackled. In Osgood & Hoosain (1974), it was found that recognition thresholds were consistently lower for words than for morphemes or trigrams, and lower for word-like nominal compounds than for noun phrases or nonsense compounds. In addition, prior exposure to noun phrases, nonsense compounds and the constituent words of the nominal compounds significantly facilitated subsequent recognition of the single-word constituents, but prior exposure to nominal compounds had no effect on subsequent recognition of their single-word constituents. It was thus concluded that because of the unique meaningfulness of words and word-like nominal compounds as a whole, they have special salience in the perception of language. In a series of experiments reported in van Jaarsveld & Rattink (1988) comparing lexicalized compounds, novel compounds and nonwords in Dutch, it was shown that for lexicalized compounds, the familiarity of the compound had a significant effect on reaction times, but the frequency of the constituent nouns had no effect on lexical decision times. As for novel compounds, the frequency effect of the first constituent was consistently found across experiments. Therefore, it was argued that compound words are preferably processed as single units, and if no lexical entry is found (as novel compounds), a segmentation into constituents is carried out (i.e. the decomposition second model).

Although in the above two studies the status of a compound word as represented as a whole unit was established by comparing it with phrases or novel compounds, the frequency of the phrases could not be matched with the compounds. However, as indicated in a review by Sandra (1994), there is a possibility that

"Access is always attempted by two processing routines at the same time--one operating with the whole word form, the other with the morphemes--and that the past experience with the word form (i.e. its frequency) determines which route wins the access race" (p. 245).

According to this claim, a phrase with comparatively high frequency may be eligible for the whole-word form route. Evidence from English regularly inflected words are found in Taft (1979), in which lexical decision times were shorter for high-frequency inflected words than for low-frequency inflected words when stem frequency was controlled, and in Stemberger & MacWhinney (1986), in which experimentally induced speech errors occurred more often with low-frequency inflected words than with high-frequency ones. Other evidence from language acquisition (MacWhinney 1982) suggests that because of the frequent repetition in a constant context, a phrase may not be analyzed into its component words and it may be used as a single item before other strategies like analogy and rules are developed. Evidence for such a "rote" analysis of certain phrases are based on the fact that larger units (e.g. my book) often appear in production before their component pieces emerge by analogy (e.g. my toy). Therefore, high-frequency phrases may also be stored in the lexicon in the process of acquisition. However, it is not clear whether they are also stored as whole-word forms for adult speakers.

To sum up, previous studies examining the processing and representation of compounds in the mental lexicon either did not consider the comparison of compounds versus phrases (as in the three studies on Chinese), or did not control the frequency of phrases and compounds (as in Osgood & Hoosain 1974; van Jaarsveld & Rattink 1988). In the studies by Swinney & Cutler (1979) and Gibbs (1986), it was found that comprehension of the figurative meaning of an idiomatic expression is faster than comprehension of its literal meaning. In Swinney & Cutler (1979), the subjects were faster at judging that a string of words was meaningful when it had a figurative and a literal meaning (e.g. break the ice) than when it had a literal meaning only (e.g. break the cup). In Gibbs (1986), each ambiguous idiomatic expression (e.g. he kept it under his hat) was contained in a passage which biased either the figurative or the literal meaning of the idiom. At the end of the passage, a target phrase was presented which was either a paraphrase of the literal meaning of the idiom (e.g. it's beneath its cap), a paraphrase of the figurative meaning (e.g. he didn't tell anyone), or an unrelated control (e.g. it happened to Sally). The task was to decide the meaningfulness of the target phrases. It was found that regardless of the contextual bias, the idiomatic target phrases were responded to faster than either the literal or the unrelated phrases. The results suggested that idiomatic strings are represented and processed as lexical items.

1.2 The Goal of the Present Study

Based on these considerations, what we are interested in investigating is whether and how frequency and semantic idiomaticity determine the forms of representation for compounds and phrases in the mental lexicon. The rationale is that when morpheme-pair frequency and character frequency are matched, if idiomaticity plays a role (as shown in the studies by Swinney & Cutler 1979; Gibbs 1986), there should be some differences in terms of reaction times or accuracy between phrases (semantically transparent and compositional) and compounds (semantically at least slightly idiomatic and conventional). On the other hand, if frequency is the only factor determining the forms of representation, there should be similar reaction times and error rates for both categories. However, if both frequency and idiomaticity are the factors determining the forms of representation in the mental lexicon, we should expect to see differences not only between phrases and compounds, but also between high and low frequency morpheme-pairs within each category.

The arrangement of this paper is as follows: In Section 2, two experiments are reported to examine the roles played by frequency and idiomaticity in the forms of the representation of morpheme-pairs in the mental lexicon. In Section 3, implications from the results of the two experiments are discussed.

2 Experiments

Due to the fuzziness of word boundaries in Chinese, it can be found in the frequency count book that some items may be better classified as phrases (Wu & Liu, 1987).⁹ Therefore, this characteristic allows us to control phrase frequency and compound frequency for the experiments. In Experiment I, the morpheme-pair frequency and the character frequency of phrases (i.e. semantically transparent morpheme-pairs) and compounds (i.e. semantically idiomatic morpheme-pairs) are matched in order to see the effect of semantic idiomaticity. In Experiment II, in addition to the manipulations made in the first experiment, number of items and character frequency were further matched for both the high frequency set and the low frequency set in order to examine the effect of both morpheme-pair frequency and semantic idiomaticity.

2.1 Experiment I

2.1.1 Methods

Subjects. Twenty adult native speakers of Mandarin Chinese participated in this experiment. All of them had at least 13 years of Mandarin education in Taiwan.

Stimuli. There were 176 stimulus morpheme-pairs used in this experiment. Eighty-eight were meaningful combinations of morphemes in Mandarin Chinese (as the Yes responses), and the other 88 morpheme-pairs were nonsense combinations used for the No responses,¹⁰ e.g. zhong-zu 'loyal foot', yue-qiang 'to date wall'. Among the meaningful morpheme-pairs, half of them (44) were semantically fully transparent (i.e. phrase-like, e.g. bai-bu ‘white cloth’), and the other half were semantically slightly idiomatic (i.e. compound-like, eg. bai-cai (white vegetable ‘cabbage’)). The criteria used for deciding whether a pair is phrase-like or compound-like were (1) whether the meaning of the pair can be directly derived from combining the meanings of the constituent morphemes; and (2) whether the morpheme can be freely substituted by other morphemes. For transparent pairs, the meaning of the pair is directly derived from the combination of the constituents' meanings, and the morphemes can be freely substituted (e.g. in contrast with white cloth, there can be red cloth, blue cloth, white flower, white car, etc.). For slightly idiomatic pairs, the meaning of the pair can not be directly derived from the meanings of the constituents (but not completely opaque, as in hua-sheng (flower birth ‘peanut’), and the substitution of any morpheme disrupts the idiomaticity. Both fully transparent morpheme-pairs and slightly idiomatic morpheme-pairs were further divided into high frequency morpheme-pairs (N=12) and low frequency morpheme-pairs (N=32).¹¹ In low frequency morpheme-pairs, 21 were made up of Adjective+Noun and 11 were Verb+Noun. Some examples of Adjective+Noun transparent morpheme-pairs are ye-niao 'wild bird', hei-ya 'black duck', bai-qi 'white paint', and Adjective+Noun idiomatic morpheme-pairs are like xiao-fei (small expenses 'tip'), re-ku (hot pants 'shorts'), da-chang (big bowels 'large intestine'). Examples of Verb+Noun transparent morpheme-pairs are bao-shu 'to wrap books', xi-lian 'to wash face', mai-jiu 'to sell wine', and Verb+Noun idiomatic morpheme-pairs are like xi-pai (wash cards 'to shuffle cards'), hui-hao (wield a writing brush 'to write calligraphy'), shao-bei (burn cups 'a flask'). In each set, the mean morpheme-pair frequency and the mean character frequency were matched for the idiomatic morpheme-pairs, the counterpart transparent morpheme-pairs and the nonsense morpheme-pairs, as shown in Table I and Table II.

Table I: Mean frequency counts for high frequency morpheme-pairs
	Pair	1st Character	2nd Character
Transparent Pairs (N=12)	26	1289	1235
Idiomatic Pairs (N=12)	26	1274	1252
Nonsense Pairs (N=12)	--	1282	1292

Table II: Mean frequency counts for low frequency morpheme-pairs

Pair 1st Character 2nd Character
Transparent Pairs (N=32) .81 581 252
Idiomatic Pairs (N=32) 0.81 581 253
Nonsense Pairs (N=32) -- 580 255

Procedures. Each subject was tested individually in a quiet room. They were asked to make a judgement on whether or not the morpheme pair appearing at the center of the computer screen was meaningful in Mandarin by pressing the keys marked Yes or No. In each trial, the word "Ready?" first showed up on the screen until the subject pressed any key when he/she was ready. Then a fixation sign "+" appeared for 250 msec, followed by the stimulus morpheme-pair displayed vertically for a maximum of 1500 msec. If the subject did not respond within the time limit, the stimulus would disappear and be replaced by the next trial, otherwise the reaction time would be recorded. Before starting the experiment, each subject did eight practice trials to familiarize themselves with the procedures. All the items were randomized for each subject.

2.1.2 Results and Discussion

The mean reaction times and error rates ¹² for all transparent and idiomatic morpheme-pairs are displayed in Table III.

Table III: Mean RT and error rate for transparent and idiomatic morpheme-pairs
	Transparent	Idiomatic
Mean RT (in msec.)	698	680	p < .05
Mean Error Rate	10%	3.4%	p < .001

The results showed that responses were faster and more accurate for idiomatic morpheme-pairs than for transparent morpheme-pairs. These reaction time and error rate effects were significant in subject (but not item) analyses (F(1, 79)= 4.39 for reaction times, and F(1, 79)= 16.02 for error rates in subject analysis). Since morpheme-pair frequency and character frequency were matched for both transparent and idiomatic pairs, if semantic idiomaticity does not play any role in determining the representations, there should be no difference between the two categories.

Another possibility for the role frequency might play in the forms of representations is that "different kinds of representations are used for high-frequency and low-frequency words" (Sandra 1994, p. 264). That is, once the words achieve a certain frequency in their language use, the morphological structure which language users initially use for the representations of polymorphemic words starts to lose its primary role in the representations. The Augmented Addressed Morphology Model (Caramazza et al. 1985) is an example of this type of model. If this possibility is correct, we will expect to see that the distinction between phrases (transparent morpheme-pairs) and compounds (idiomatic morpheme-pairs) only exists for low frequency morpheme-pairs, as high frequency phrases may also have their own representation in the mental lexicon. The mean reaction times and error rates for high frequency morpheme-pairs and low frequency morpheme-pairs are shown in Table IV and Table V, respectively.

Table IV: Mean RT and error rate for high frequency morpheme-pairs
	Transparent	Idiomatic
Mean RT (in msec.)	647	615	p < .05
Mean Error Rate	3.3%	0.4%	p < .07

Table V: Mean RT and error rate for low frequency morpheme-pairs

Transparent Idiomatic
Mean RT (in msec.) 749 745 not sig.
Mean Error Rate 17% 6% p < .05

As shown in Table IV, the results indicate that even for high frequency morpheme-pairs, responses were still faster and more accurate for idiomatic pairs than for transparent pairs. The reaction time and error rate effects were significant for both subject and item analyses (F(1, 39)= 10.31 for reaction times, and F(1, 39)= 4.38 for error rates in subject analysis; F(1, 23)= 4.35 for reaction times, and F(1, 23)= 3.88 for error rates in item analysis). Therefore, even the alternative possibility of the frequency of co-occurrence account cannot explain the results obtained here, and it suggests that the differences between the two categories rely on the idiomaticity of the morpheme-pairs.

Although the difference in error rates for both high frequency and low frequency morpheme-pairs were significant (for error rates in low frequency morpheme-pairs, F(1, 39)= 16.3 in subject analysis, and F(1, 63)= 4.74 in item analysis), the difference in reaction times was only significant for high frequency morpheme-pairs (for reaction times in low frequency morpheme-pairs, F(1, 39)=0.01, p > .5 in subject analysis, and F(1, 63)= 0.04, p > .5 in item analysis). However, further inspection revealed that among the low frequency morpheme-pairs, those made up of Adjective+Noun had different results from those made up of Verb+Noun, as displayed in Table VI.

Table VI: Mean reaction time and error rate for subsets in low frequency morpheme-pairs
	Transparent	Idiomatic
Mean RT (in msec.) Adj+Noun Verb+Noun	789 695	732 787	not sig. p < .05
Mean Error Rate Adj+Noun Verb+Noun	17% 16%	5% 8%	p < .05 p < .05

*The difference is significant only in subject analysis, but not in item analysis.

As shown in Table VI, although the difference in reaction times was still not significant for the Adjective+Noun items (F(1, 39)= 2, p > .1 in subject analysis, and F(1, 41)= 2.71, p > .1 in item analysis), they have the same pattern as in high frequency morpheme-pairs, namely, the idiomatic pairs have shorter reaction times and lower error rates than the transparent pairs (for error rates in Adj.+Noun items, F(1, 39)= 12.32 in subject analysis, and F(1, 41)= 5.86 in item analysis). However, the Verb+Noun items have a reverse pattern for reaction times (F(1, 39)= 5.18 in subject analysis, and F(1, 21)= 4.82 in item analysis), but not for error rates (F(1, 39)= 8.39 in subject analysis, and F(1, 21)= 0.48, p > .5 in item analysis). Whether the results for Verb+Noun items are due to a trade-off between reaction time and error rate, or because of some idiosyncratic properties in this type of compounds (as suggested in Zhou et al. 1993 that Chinese Verb-Object compounds may behave differently from other types of compounds ¹³) needs further investigation.

Another possible explanation for the Verb+Noun pattern is that among the Verb+Noun idiomatic pairs, five out of eleven items have the grammatical category of the pair as Noun (i.e. [V-N]_N), and the average response time for [V-N]_N pairs was 785 msec., but it was 740 msec. for [V-N]_V pairs.¹⁴ It is likely that when subjects process a verb and a noun, they first assign V for the whole pair (as it is natural to take the noun as the complement of the verb and thus to form a VP), then after the meaning of the pair is accessed, the pair has to be reanalyzed as a noun, thus taking more processing time.

As for the comparison of high versus low frequency morpheme pairs, a significant frequency effect was found in both subject and item analyses. However, since in this experiment high frequency morpheme-pairs also have higher character frequency than low frequency morpheme-pairs, it is not clear whether the frequency effect came from character frequency or pair frequency. This confounding factor on frequency will be further explored and better matched in the next experiment.

Furthermore, although there seems to be a clear distinction between transparent pairs and idiomatic pairs in terms of reaction times and error rates for the high frequency pairs, they were not as well-matched as the low frequency items with respect to the internal structures of the morpheme pairs.¹⁵ One problem was that for transparent pairs, ten out of twelve had a modifier-noun structure, while for idiomatic pairs, they were more diverse, and none of them had a modifier-noun structure. Another problem was that for the ten modifier-noun transparent pairs, only two of them had predicating modifiers, i.e. the modifier could be used as a predicate in a sentence, e.g. "short poem" vs. "The poem is short", but in the low frequency Adjective+Noun pairs, all were predicating modifiers. In Murphy (1990), it was found that NPs with nonpredicating modifiers (e.g. "musical clock" vs. "*The clock is musical") took subjects more time to interpret than NPs with predicating modifiers. In Experiment II, this factor will also be considered.

To sum up, in Experiment I, we found that semantic idiomaticity must play a role to distinguish the representation and processing of transparent morpheme pairs (phrase-like pairs) versus slightly idiomatic pairs (compound-like pairs) in the mental lexicon. In the next experiment, some confounding factors will be further controlled to better examine the effects of frequency and semantic idiomaticity.

2.2 Experiment II

Due to the above-mentioned problems in Experiment I, in this experiment, all relevant factors, such as number of items, character frequency for both high and low frequency sets, and the internal structure of the morpheme pairs, were better matched in order to compare the effect of the distinction between phrases versus compounds and the effect of frequency.

2.2.1 Methods

Subjects. Fifteen adult native speakers of Mandarin Chinese participated in this experiment, and all of them had at least 13 years of Mandarin education in Taiwan.

Stimuli. Each subject was presented 280 stimulus morpheme-pairs, including 140 test pairs (for Yes responses) and 140 control pairs (for No responses). As in Experiment I, transparent pairs were compared with slightly idiomatic pairs, and both were further divided into high pair-frequency and low pair-frequency sets, but here the character frequency was matched for all four sets, i.e. only pair frequency varied. In addition, the internal structures of the pairs included Adjective+Noun (N=15) and Verb+Noun (N=10) as well as Noun+Noun (N=10) for all four sets.

For the Adjective+Noun and Noun+Noun pairs, in addition to whether or not the meaning of the pair can be derived from the meanings of the constituent morphemes and whether or not free substitution of the morphemes is possible, one more criterion was also considered. For the transparent morpheme-pairs (phrases), the insertion of the adjective/possessive marker de does not disrupt the meaning, e.g. for Adjective+Noun pairs, zhi-xian 'straight line' means the same thing as zhi-de-xian, fei-yang 'fat sheep' is also fei-de-yang; and for Noun+Noun pairs, ji-dan 'chicken egg' is also ji-de-dan,shu-pi 'tree skin (or peel)' is also shu-de-pi.¹⁶ However, the insertion of de was either impossible or would change the meaning of the idiomatic morpheme-pairs (compounds). For example, for Adjective+Noun compounds, xiang-yan (fragrant smoke 'cigarette') does not mean xiang-de-yan, and jing-mai (quiet blood vessel 'veins') is meaningful but *jing-de-mai is meaningless. For Noun+Noun compounds, qi-pao (flag robe 'Chinese long gown for women') is meanful but qi-de-pao is a nonsense, ma-xi (horse play 'circus') does not mean ma-de-xi.

For Verb+Noun items, more criteria were also taken into consideration in selecting the stimuli. First, the grammatical category of the pairs was Verb for both transparent and idiomatic pairs. Second, with regard to separability, which is often used as a criterion to distinguish V-N compounds vs. V-N phrases, we used items which could be more extremely transformed (e.g. passivization or Ba object-fronting transformation) as transparent pairs (phrases, as examplified in (2) for he-jiu ‘drink wine’), and items which could not be so extremely transformed as idiomatic pairs (compounds).

(2) Passivization
      a. jiu    bei Lisi he-le
         wine by Lisi drink-ASP
         "The wine was drunk by Lisi"

      ba Object Fronting
      b. Lisi ba   jiu    he-le
          Lisi BA wine drink-ASP
         "Lisi drunk the wine"

The reason for adding this criterion was that most V-N compounds (as defined by other criteria) have a certain degree of separability, even the more semantically opaque ones, e.g. dan-xin (carry heart ‘worry about’) can be used in dan-le-san-tien-xin (carry-Asp-three-day-heart ‘have worried for three days’). However, passivization and Ba object-fronting transformation are not possible for dan-xin, as shown in (3).

(3) Passivization
      a. *zin   bei Lisi dan-le
           heart by Lisi carry-ASP

      ba Object Fronting
      b. *Lisi ba   xin    dan-le
            Lisi BA heart carry-ASP

Some examples of the Verb+Noun phrases are mai-piao 'to buy tickets', ting-che 'to park cars', shua-di 'to brush floor', and examples of Verb-Noun compounds are like fen-shou (to separate hands 'to break up'), re-huo (to provoke fire 'to make someone angry'), kua-kou (to exaggerate mouth 'to boast').

Table VII and Table VIII display the mean character frequency and pair frequency for transparent, idiomatic and nonsense morpheme-pairs.

Table VII: Mean frequency counts for high frequency morpheme-pairs
	Pair	1st Character	2nd Character
Adj-Noun Transparent Pairs (N=15) Idiomatic Pairs (N=15) Nonsense Pairs (N=15)	5.7 5.7 ---	520.4 518.6 511.3	265.6 265.7 267.4
Verb-Noun Transparent Pairs (N=10) Idiomatic Pairs (N=10) Nonsense Pairs (N=10)	5.2 5.5 ---	320.0 324.3 315.4	733.8 738.5 734.9
Noun-Noun Transparent Pairs (N=10) Idiomatic Pairs (N=10) Nonsense Pairs (N=10)	5.7 5.6 ---	344.6 343.7 342.9	240.8 244.9 243.2

Table VIII: Mean frequency counts for low frequency morpheme-pairs

Pair 1st Character 2nd Character
Adj-Noun
    Transparent Pairs (N=15)
    Idiomatic Pairs (N=15)
    Nonsense Pairs (N=15)
1
1
-
520.1
520.0
507.2
268.6
265.4
269.3
Verb-Noun
    Transparent Pairs (N=10)
    Idiomatic Pairs (N=10)
    Nonsense Pairs (N=10)
1
1
-
321.1
320.3
329.1
737.1
731.9
754.9
Noun-Noun
    Transparent Pairs (N=10)
    Idiomatic Pairs (N=10)
    Nonsense Pairs (N=10)
1
1
-
345.5
346.3
345.8
246.3
235.0
240.7

Procedures. The procedures were the same as in Experiment I.

2.2.2 Results and Discussion

Table IX shows the mean reaction times and error rates for transparent and idiomatic morpheme pairs, and Table X shows the comparison of high frequency versus low frequency morpheme pairs.

Table IX: Mean RT and error rate for transparent and idiomatic morpheme pairs
	Transparent	Idiomatic
Mean RT (in msec.)	692	637	p < .001
Mean Error Rate	6.1%	2.4%	p < .05

Table X: Mean RT and error rate for high and low frequency morpheme pairs

High Frequency Low Frequency
Mean RT 637 692 p < .001
Mean Error Rate 2.9% 5.6% p < .05

*The difference in error rate was significant in the subject analysis, but not in the item analysis.

As shown in Table IX, the results from Experiment II showed that idiomatic morpheme pairs were responded to faster and more accurately. Both the differences in reaction times and error rates were significant in subject and item analysis (for reaction times, F(1, 179)= 44.01 in subject analysis, and F(1, 139)= 14.77 in item analysis; for error rates, F(1, 179)= 18.32 in subject analysis, and F(1, 139)= 5.48 in item analysis). In addition, as shown in Table X, there was a frequency effect, as high frequency morpheme pairs were responded to significantly faster and more accurately (for reaction times, F(1, 179)= 48.05 in subject analysis, and F(1, 139)= 13.57 in item analysis; for error rates, F(1, 179)= 17.25 in subject analysis, and F(1, 139)= 2.48, p > .1 in item analysis). Since character frequency was matched between high and low frequency morpheme pairs, the effect of frequency was completely due to the contrast of morpheme pair frequency.

The comparisons of reaction times and error rates between transparent and idiomatic morpheme pairs are shown in Table XI for high frequency pairs, and in Table XII for low frequency pairs.

Table XI: Mean RT and error rate for high frequency morpheme pairs
	Transparent	Idiomatic
Mean RT (in msec.)	652	622	p < .05
Mean Error Rate	4%	1.8%	not sig.

Table XII: Mean RT and error rate for low frequency morpheme pairs

Transparent Idiomatic
Mean RT (in msec.) 731 652 p < .005
Mean Error Rate 8.2% 3% p < .08

Although the difference in error rate was not significant for high frequency pairs (F(1, 69)= 2.4, p > ..1 in item analysis), it was marginally significant for low frequency pairs (F(1, 69)= 3.16 in item analysis). However, the differences in reaction times were significant for both high and low frequency pairs (F(1, 69)= 5.46 for high frequency pairs, and F(1, 69)= 9.42 for low frequency pairs). This is consistent with the general finding from Experiment I, i.e. the idiomatic pairs were responded to faster than the transparent pairs for both high and low frequency pairs.

As shown in Table XIII, what is more noteworthy is that within transparent morpheme pairs, the reaction times for high frequency pairs were significantly faster than low frequency pairs (F(1, 69)= 7.87 in item analysis), as it was for idiomatic morpheme pairs (F(1, 69)= 6.93 in item analysis).

Table XIII: Mean RT (in msec.) comparison for high versus low frequency morpheme pairs
	High Frequency	Low Frequency
Transparent	652	731	p < .01
Idiomatic	622	652	p < .05

Since character frequency was held constant between high and low frequency pairs, the effect was caused by the difference in pair frequency. For idiomatic pairs, since their meanings cannot be directly derived from computing the constituents' meanings, they are eligible for being stored in the mental lexicon as "words" (compound words), and thus show the effect of word frequency. For transparent pairs, as the meanings are the combinations of the constituent morphemes' meanings, for the sake of the "economy" of representations, it seems superfluous to store both the morphemes and the morpheme pairs (phrases). However, if the transparent morpheme pairs do not have their own representations in the mental lexicon, but are computed each time when they are encountered, there should be no effect of pair frequency. Since the results did show the effect of pair frequency for transparent morpheme pairs (phrases), there are two possible explanations: The first is that those transparent pairs do not have their own representations in the mental lexicon, and the frequency effect reflects the efficiency of the computation of combining constituent morphemes' meanings, according to how often those morpheme pairs are encountered. The second possibility is that the representations of transparent versus idiomatic morpheme pairs are not all-or-none, but in a kind of gradient manner, with frequency and semantic idiomaticity as two dimensions in determining the strength of the representations.

3 General Discussion and Conclusion

In this study, two experiments were conducted to investigate the forms of representation of compounds and phrases in the mental lexicon. In Experiment I, the results show that there is a distinction in terms of reaction times and error rates between semantically transparent morpheme pairs (phrases) and slightly idiomatic morpheme pairs (compounds). The distinction indicates that semantic idiomaticity plays a role in the form of representation in the mental lexicon. Furthermore, in Experiment II, with number of items, character frequency and the internal structure of the morpheme pairs matched between high and low frequency sets, it was found that in addition to the effect of semantic idiomaticity found in Experiment I, there is also an effect of morpheme-pair frequency for both transparent and idiomatic pairs. The pair frequency effect suggests that either the speed of computing the constituents' meanings for transparent morpheme pairs is determined by their frequency of co-occurrence (but the pair itself is not stored in the mental lexicon), or transparent morpheme pairs also have their representations in the mental lexicon, and semantic idiomaticity and frequency are the two factors determining the strength of the representations.

In order to tease apart the two possible accounts for the frequency effect on transparent morpheme pairs, the effect of semantic idiomaticity on the representations of compound words needs to be explored. Since in Experiment I and Experiment II the compounds (the idiomatic morpheme pairs) used were not semantically fully opaque, and the study done by Zwitserlood (1994) showed that in a semantic priming task, there was facilitation for the components of transparent and partially opaque compounds, but not for fully opaque compounds, that study suggested that at the semantic level truly opaque compounds behave like monomorphemic words. Due to this finding, in order to better understand the way bimorphemic words are represented in the mental lexicon, it is necessary to compare slightly opaque and fully opaque compounds in a task of meaningfulness judgement, as used in the two experiments here. In addition, in order to know whether fully opaque compounds are really like monomorphemic words, it is also necessary to compare fully opaque compounds and bisyllabic monomorphemic words like pu-tao ‘grape’. If the response to slightly opaque and fully opaque compounds is similar to the response to bisyllabic monomorphemic words, then we would be justified in claiming that compounds and phrases are represented or processed differently, with compounds stored in the mental lexicon, but phrases computed from the constituents' meanings, with the speed of computation determined by how often the phrases are encountered. However, if there is a difference between slightly opaque and fully opaque compounds, then it is more likely that the representations of compounds and phrases are determined by two interacting dimensions-- frequency and semantic idiomaticity. Finally, if the results show a difference between fully opaque compounds and bisyllabic monomorphemic words, it may suggest that the representations of fully opaque compounds are still distinct from monomorphemic words.

The results of this study show that compounds, which are semantically more idiomatic than phrases, should have their own representations in the mental lexicon as "words". However, contrary to the expectation from the "economy" of representation, the results in our study also show that morpheme pair frequency has an effect on the processing of phrases. Since phrases are semantically transparent, i.e. the meaning of a phrase can be derived from the meanings of the constituent morphemes (or words), it is redundant to have a separate lexical entry for each phrase in the mental lexicon. The results found in this study can not be accounted for by postulating that high frequency phrases are stored in the mental lexicon as words but not low frequency phrases. The effect of idiomaticity found between high frequency phrases and compounds in the two experiments shows that the postulation cannot be true, as even for high frequency morpheme-pairs, the difference between semantic transparent pairs (phrases) and slightly idiomatic pairs (compounds) still exists. No matter whether phrases are stored in the mental lexicon or not, the results found here suggest that even for semantically transparent morpheme-pairs (phrases), frequency of morpheme co-occurrence must be coded in one way or another in our mental lexicon.

Bibliography

Anderson, S. R. 1992. A-Morphous Morphology. Cambridge: Cambridge University Press.

Butterworth, B. 1983. Lexical Representation. In B. Butterworth (Ed.), Language Production, Vol. 2: Development, Writing, and Other Language Processes. London: Academic Press.

Chao, Y. R. 1968. A Grammar of Spoken Chinese. Berkeley: University of California Press.

Chi, T. R. 1985. A Lexical Analysis of Verb-Noun Compounds in Mandarin Chinese. Taipei: The Crane Publishing Co., Ltd.

Carammazza, A., Miceli, G., Silveri, C., & Laudanna, A. 1985. Reading Mechanisms and the Organisation of the Lexicon: Evidence from Acquired Dyslexia. Cognitive Neuropsychology, 2.1: 81-114.

Chinese Knowledge Information Processing Group. 1993. Corpus-based Frequency Count of Characters in Journal Chinese: Corpus-based Research Series no. 1. Taipei: Academia Sinica.

Chinese Knowledge Information Processing Group. 1993. Corpus-based Frequency Count of Words in Journal Chinese: Corpus-based Research Series no. 2. Taipei: Academia Sinica.

Di Sciullo, A. M. & Williams, E. 1987. On the Definition of Word. Cambridge, MA: The MIT Press.

Gibbs, R. W. 1986. Skating on Thin Ice: Literal Meaning and Understanding Idioms in Conversation. Discourse Processes, 9: 17-30.

Hoosain, R. 1992. Psychological Reality of the Word in Chinese. In H. C. Chen & O. J. L. Tzeng (Eds.), Language Processing in Chinese. Amsterdam: North-Holland.

Li, C. N., & Thompson, S. A. 1983. Mandarin Chinese: A Functional Reference Grammar. Taipei: The Crane Publishing Co., Ltd. (Chinese version)

Liang, M. Y. 1992. Recognition Processes of Compositional Words and Idiomatic Words. Unpublished Master thesis, National Tsing Hua University, Hsin- chu.

Liu, H. 1986. A Categorial Grammar Analysis of Chinese Separable Compounds and Phrases. Taipei: The Crane Publishing Co., Ltd.

Lu, Z.-W. 1957. Hanyu de Gouci Fa (Word Formation in Chinese). Beijing: The Science Publishing Co.

MacWhinney, B. 1982. Basic Syntactic Processes. In S. Kuczaj (Ed.) Language Development: Vol. 1. Syntax and Semantics. Hillsdale, NJ: Erlbaum.

Matthews, P. H. 1991. Morphology (second edition). Cambridge: Cambridge University Press.

Murphy, G. 1990. Noun Phrase Interpretation and Conceptual Combination. Journal of Memory and Language, 29: 259-288.

Osgood, C. E., & Hoosain, R. 1974. Salience of the Word as a Unit in the Perception of Language. Perception & Psychophysics, 15.1: 168-192.

Sandra, D. 1994. The Morphology of the Mental Lexicon: Internal Word Structure Viewed from a Psycholinguistic Perspective. Language and Cognitive Processes, 9.3: 227-269.

Stemberger, J. P., & MacWhinney, B. 1986. Frequency and the Storage of Regularly Inflected Forms. Memory and Cognition, 14: 17-26.

Swinney, D. & Cutler, A. 1979. The Access and Processing of Idiomatic Expressions. Journal of Verbal Learning and Verbal Behavior, 18: 523-534.

Taft, M. 1979. Recognition of Affixed Words and the Word Frequency Effect. Memory and Cognition, 7: 263-272.

Taft, M., & Forster, K. 1975. Lexical Storage and Retrieval of Prefixed Words. Journal of Verbal Learning and Verbal Behavior, 14: 638-647.

Taft, M., & Forster, K. 1976. Lexical Storage and Retrieval of Polymorphemic and Polysyllabic Words. Journal of Verbal Learning and Verbal Behavior, 15: 607-620.

Van Jaarsveld, H. J., & Rattink, G. E. 1988. Frequency Effects in the Processing of Lexicalized and Novel Nominal Compounds. Journal of Psycholinguistic Research, 17.6: 447-473.

Wu, J. T., & Liu, I. M. 1987. Exploring the Phonetic and Semantic Features of Chinese Words. Taipei: National Taiwan University.

Zhang, B., & Peng, D. 1992. Decomposed Storage in the Chinese Lexicon. In H. C.

Chen & O. J. L. Tzeng (Eds.), Language Processing in Chinese. Amsterdam: North-Holland.

Zhou, X., Ostrin, R. K., & Tyler, L. K. 1993. The Noun-Verb Problem and Chinese Aphasia: Comments on Bates et al. (1991). Brain and Language, 45: 86-93.

Zhou, X., & Marslen-Wilson, W. 1994. Words, Morphemes and Syllables in the Chinese Mental Lexicon. Language and Cognitive Processes, 9.3: 393- 422.

Zwitserlood, P. 1994. The Role of Semantic Transparency in the Processing and Representation of Dutch Compounds. Language and Cognitive Processes, 9.3: 341-368.

Footnotes

*. I would like to thank Luigi Burzio, Tim Clausner, Chin-fa Lien, Brenda Rapp, Dylan W.-T. Tsai for discussions at various points of this study, Trish van Zandt for her help on statistics, and Peggy Antonisse for carefully reading the earlier drafts and giving me valuable comments. This study was supported in part by a Fulbright Graduate Fellowship. All errors of any kind are my own. Back.

1. There are variants of the whole-word model and the morpheme-based model, depending on which level of representation is concerned. For a more detailed review, see Sandra 1994. Back.

2. Although for the English examples here, spacing can be used to distinguish compounds and phrases, in Chinese this kind of cue is not available. Back.

3. As the reviewer correctly pointed out, the conclusion drawn by Liang (1992) also could not explain why there should be any character frequency effect for high word frequency idiomatic words. Back.

4. Access representation is the form aspect of morphemes, and linguistic representation means the semantic and syntactic aspect of morphemes (Sandra 1994, p. 242, for detailed discussion of the distinction of the two levels of representation, see Sandra 1994). Back.

5. Although most compounds in English are written as a single unit like blackbird or lighthouse, there are still orthographic puzzles as "when a compound should be hyphenated, fully joined or written as two separate words" (Henderson 1985, p. 37). However, in the case of Chinese, there is never hyphenated or fully joined written forms. All polymorphemic words are written with separate characters. Back.

6. Here semantic idiomaticity means that the meaning of the morpheme pair can not be derived by combining the meanings of the two morphemes. In the case of the compound bai-cai (white vegetable 'cabbage'), which has a modifier-noun construction, the meaning of the morpheme pair is idiomatic, in contrast with the phrase bai-bu (white cloth), which is semantically transparent. A verb-noun construction case is the compound shao-bei (burn cup 'a flask'), which is semantically idiomatic, versus the phrase shao-zhi (burn paper), which is transparent. For the rest of the discussion on Chinese, compounds will be equivalent to semantically idiomatic morpheme pairs, and phrases are semantically transparent morpheme pairs. Back.

7. There are also compounds which are transparent, e.g. most of coordinative compounds like huan-xi (joyful joyful 'joyful'). However, the usage of them is conventional. For example, le also means 'joyful', but there is no such a word as le-xi. Back.

8. As stated in Hoosain (1992, p. 112), in the case of Chinese, two words--zi and ci--can correspond to the English "word", depending on the context. In Chao (1968), it was indicated that zi was "the sociological word" which "the general, nonlinguistic public is conscious of, talks about, has an everyday term for, and is practically concerned with in various ways" (p. 136), that is, it refers to characters. On the other hand, ci is "the syntatic word and can be made up of one or more characters" (Hoosain 1992, p. 112). It is the identification of such word boundaries that is not straightforward. Back.

9. Note that the frequency count book was not made to include all the phrases in the original corpus. It was because of the difficulty one might have in segmenting texts that the count happened to include some items which according to general linguistic criteria are more phrase-like. Due to the limited number of such phrase-like items, in the following experiments, only the frequency of the whole morpheme-pair varies, and the character frequency will be held constant between conditions to be compared (i.e. phrase vs. compound). Back.

10. Although the nonsense morpheme pairs were used as controls (i.e. as the No responses), their internal structures were also matched with the counterpart meaningful morpheme pairs (i.e. there were the same amount of Adjective+Noun pairs and Verb+Noun pairs as the meaningful counterparts). The manipulation was used to ensure that subjects must process the meanings of the items in order to make a judgement. Back.

11. Due to the rarity of high frequency phrase-like morpheme pairs found in the frequency count book, and the difficulty of matching morpheme-pair frequency and character frequency for the items, the numbers of items were not the same for high frequency set and low frequency set in Experiment I. This factor (i.e. number of items) was better matched in Experiment II. Back.

12. In both experiments, if the decisions were not made in time (i.e. within 1500 msec.), they were not counted in the calculation of reaction times and error rates. Back.

13. It is also true that in Chinese linguistic literature, it is controversial as how to set criteria to distinguish V-N compounds from V-N phrases, e.g. Chi (1985), Liu (1986) etc. Back.

14. In the comparison of [V-N]_V versus [V-N]_N compounds here, one of the [V-N]_V pairs was not included in the calculation of the average RT here because the first character used was the infrequent variant of the form for the meaning 'eat', and that caused the RT for this item to be much longer than others (1029.3 ms.). Back.

15. The reason why we did not match the internal structure for high frequency items was that since there was no previous study as a reference for us to set up a line between high and low frequency morpheme pairs, we made the frequency distinction as large as possible, and there were very few items that could be used if we controlled the internal structure as we did for low frequency items. Back.

16. Whether the insertion of de between the two morphemes is possible or not was also used by Lu (1957) as a criterion to distinguish ci (compound) from ci-zu (phrase) in Adjective+Noun and Noun+Noun pairs. However, he treated Noun+Noun cases like yang-rou 'sheep meat' as a compound based on the reason that in spoken Chinese yang-de-rou is seldom used unless a comparison is made between the meat of sheep (or goat) and the meat of other animals. In this study, they were classified as phrases because the meaning of the pair can be directly derived from the meanings of the constituent morphemes, free substitution is allowed, and the insertion of de does not change the meaning of the pair, e.g. zhu-rou 'pig meat', yang-ru 'sheep milk', and ji-dan 'chicken egg'. Back.

Return to The Issue 4-5 Contents Page | The WJMLL Home Page

Table X: Mean RT and error rate for high and low frequency morpheme pairs
	High Frequency	Low Frequency
Mean RT	637	692	p < .001
Mean Error Rate	2.9%	5.6%	p < .05

The Representation of Compunds and Phrases in the Mental Lexicon: Evidence from Chinese*

Yi-ching Su

Contents

0 Introduction

1 Review and Goals

1.1 Review of Previous Studies

1.2 The Goal of the Present Study

2 Experiments

2.1 Experiment I

2.1.1 Methods

Table I: Mean frequency counts for high frequency morpheme-pairs

Table II: Mean frequency counts for low frequency morpheme-pairs

2.1.2 Results and Discussion

Table III: Mean RT and error rate for transparent and idiomatic morpheme-pairs

Table IV: Mean RT and error rate for high frequency morpheme-pairs

Table V: Mean RT and error rate for low frequency morpheme-pairs

Table VI: Mean reaction time and error rate for subsets in low frequency morpheme-pairs

2.2 Experiment II

2.2.1 Methods

Table VII: Mean frequency counts for high frequency morpheme-pairs

Table VIII: Mean frequency counts for low frequency morpheme-pairs

2.2.2 Results and Discussion

Table IX: Mean RT and error rate for transparent and idiomatic morpheme pairs

Table X: Mean RT and error rate for high and low frequency morpheme pairs

Table XI: Mean RT and error rate for high frequency morpheme pairs

Table XII: Mean RT and error rate for low frequency morpheme pairs

Table XIII: Mean RT (in msec.) comparison for high versus low frequency morpheme pairs

3 General Discussion and Conclusion

Bibliography

Footnotes

The Representation of Compunds and Phrases in the Mental Lexicon: Evidence from Chinese ^*