关键词:
Language
Linguistics
摘要:
This dissertation investigates the operation of two linguistic mechanisms, ellipsis and wa-marking, in a corpus of colloquial Japanese speech. Our data set is the CallHome Japanese (CHJ) corpus, a collection of transcripts and digitized speech data for 120 telephone conversations between native speakers of Japanese. In order to make the CHJ data useful for linguistic research, we annotated the original transcripts with a comprehensive set of acoustic, phonetic, syntactic, and semantic tags, as described in detail in Part I of the dissertation. Part II of the dissertation presents our results on ellipsis and wa-marking. In the case of ellipsis, we first demonstrate that Japanese conversation obeys certain principles of argument ellipsis that appear to be language universal: namely, the tendency to omit transitive and human subjects and the tendency to express at most one argument per clause. Next, we identify a set of syntactic and semantic factors that correlate significantly with the ellipsis of grammatical particles following a noun phrase. These factors include the grammatical construction type (question, idiom), length of the NP (in syllables), utterance length (in words), proximity of the NP to the predicate, and the animacy and definiteness of the NP. The animacy and definiteness constraints on Japanese particle ellipsis are of particular interest because these too seem to reflect language-universal principles. We then investigate the use and function of the topic-marking particle wa in Japanese. Analyzing the CHJ data, we identify a set of semantic and prosodic properties that tend to distinguish wa from the subject-marking particle ga. In terms of lexical semantics, we show that nouns marked by ga tend to be animate, while wa is strongly associated with references to locations and times. We also show that wa-phrases exhibit more prominent intonation, as measured by peak F0, than ga-phrases in the CHJ speech data, contradicting accounts that predict that ga-p