[Elsnet-list] 14 Ideas on Language, Information and Intelligence
Yao Ziyuan - 我不说话表明我忙或者不感兴趣
yaoziyuan at gmail.com
Sun Sep 13 06:43:54 CEST 2009
An HTML version is at https://sites.google.com/site/yaoziyuan/ideas .
Foreign Language Learning
* Automatic Code-Switching (ACS) - The computer automatically
selects a few words in a user's native language communication (such as
a web page being viewed), and supplements or even replaces them with
their foreign language counterparts, thus naturally building up his
vocabulary. For example, if a sentence
(Chinese for "He is a good student.") appears in a Chinese
person's Web browser, the computer can insert student after 学生
(optionally with additional information such as student's
After several times of such teaching, the computer can directly
replace future occurrences of 学生 with student:
Ambiguous words such as the 看 (Chinese for "see", "look",
"watch", "read", etc.) in
(Chinese for "He is reading a book before the TV.") can also be
automatically handled by listing all context-possible translations:
他在电视前看 (阅读: read; 观看: watch) 书。
Practice is also possible:
他在电视前 [read? watch?] 书。
Because the computer would only teach and/or practice foreign
language elements at a small number of positions in the native
language article the user is viewing, the user wouldn't find it too
intrusive. Automatic code-switching can also teach grammatical
knowledge in similar ways.
* Progressive Word Acquisition (PWA) - In ACS, long words are
optionally split into small segments (usually two syllables long) and
taught progressively, and even practiced progressively. For example,
(Chinese for "Colorado") first appears in a Chinese person's Web
browser, the computer inserts Colo' after it (optionally with Colo's
When 科罗拉多州 appears for the second time, the computer may decide
to test the user's memory about Colo' so it replaces 科罗拉多州 with
Colo' (US state)
Note that a hint such as "US state" is necessary in order to
differentiate this Colo' from other words beginning with Colo. For the
third occurrence of 科罗拉多州, the computer teaches the full form,
Colorado, by inserting it after the Chinese occurrence:
At the fourth time, the computer may totally replace 科罗拉多州 with
Not only the foreign language element (Colorado) can emerge
gradually, the original native language element (科罗拉多州) can also
gradually fade out, either visually or semantically (e.g. 科罗拉多州 ->
美国某州 -> 地名 -> ∅). This prevents the learner from suddenly losing the
Chinese clue, while also engages him in active recalls of the
occurrence's complete meaning (科罗拉多州) with gradually reduced clues.
* Subword Familiarization (SWF) - Again in ACS, word roots (e.g.
pro-, scrib-) and meaningless word fragments (e.g. -ot) are optionally
treated as two special kinds of standalone words and taught and
practiced in the user's incoming native language information.
Meaningless fragments are considered abbreviations and acronyms
derived from real, meaningful words. Getting the learner familiar with
all these subword units can facilitate the acquisition of longer, real
words that contain them.
* Phonetics-Enhanced English (PEE) - The computer can add
non-intrusive diacritical marks (e.g. the mark in á) above normal
English words to better reflect their pronunciations. Unlike radical
spelling reform proposals, a word's original literal form is always
preserved. Unlike annotating words with their IPA forms above,
diacritical marks are closely integrated with letters so a learner can
"read once and learn both the literal and the phonetic form." In
inputting English, the learner still uses the original literal form
* Orthography-Enhanced English (OEE) - Sometimes spelling a word
based on its pronunciation can be hard, even for native speakers. For
example, is it Lawrence or Lawrance? Is it porridge or porrige? We can
slightly change a word's visual form to help recall its correct
spelling. For example, when the computer displays a word that has the
-ance suffix (e.g. instance), it can lower the letter a to some
degree, just like Intel has a trademark "intel" with a lowered e. Such
a new visual form can help people recall that the unclear letter in
inst?nce is a because a is always lowered in -ance while e is never
lowered in -ence. Similarly we can let the computer display porridge
in a new form by adding an arc (Unicode U+035C) below dg to indicate
this sound corresponds to two letters instead of one.
Computer-Assisted Foreign Language Writing
* Input-Driven Syntax Aid (IDSA) - As a non-native English user
inputs a word, e.g. search, the word's sentence-making syntaxes are
prompted by the computer, e.g.
v. search: n. searcher search~ [n. search scope] [for
n. search target]
so he can now write a syntactically valid sentence like "I'm
searching the room for the cat."
* Input-Driven Ontology Aid (IDOA) - As a non-native English user
inputs a word, e.g. badminton, things (entities) and relations that
normally co-exist with the word in the same scenario or domain are
prompted as a systematic ontology graph by the computer, e.g. entities
like racquet, shuttlecock and playing court, relations like alternate,
serve and strike, and even full-scripted composition templates like
template: a badminton game. The benefits of the ontology aid are
twofold. First, the ontology helps the user verify that the "seed
word", badminton, is a valid concept in the intended scenario (or
context); second, the ontology pre-emptively exposes other valid words
in this context to the user, preventing him from using a wrong word,
e.g. bat (instead of racquet), from the very beginning.
Foreign Language Reading without Learning that Language
* Full-Automatic Layered-Quality Machine Translation (FALQ-MT) -
Lexical and syntactic ambiguities are translated to fuzzy concepts and
structures instead of precise but error-prone results. Less
information is better than misinformation. If the reader can't guess
the meaning of a fuzzy occurrence from its context, he can "zoom in"
and see more detailed translation possibilities if he feels that
occurrence is important.
Foreign Language Writing without Learning that Language
* Formal Language Writing and Machine Translation (FLW) - A person
not knowing a target language can generate information in that
language by composing in a formal language based on his native
vocabulary and having the composition machine-translated. Tools such
as the input-driven syntax aid and input-driven ontology aid can be
borrowed to assist the person in formal language writing. Manual word
sense disambiguation (WSD) can be conducted after the composition is
finished, on a domain-to-domain basis, because it is cognitively
easier for the writer to focus on a single domain at a time and answer
a series of questions "Does <word_i> belong to this domain?"
Ontology-Based Resource Sharing
* Wikipedia-Based Resource Sharing (WP-RES) - A useful property of
Wikipedia is that each Wikipedia article or category can serve as a
unique address, or "coordinates", for the topic it corresponds to.
With this property, we can enable people with the same interest to
rendezvous at the same Wikipedia page and therefore talk with each
other. People could also register resources at a Wikipedia page's
External Links section so that other people with the same interest can
find them. People could even "subscribe" to a Wikipedia page for new
and updated resources and opportunities on that topic.
Ontology-Based Problem-Solving Skills Sharing
* Wikipedia: From Knowledgebase to Strategybase (STRABASE) - If
we're solving a problem, say, a math problem, we choose a seemingly
promising strategy from our "strategy bases" in our minds, according
to the problem's main type and characteristic conditions. Such a
"strategy base" is something we can build up externally using a wiki.
A "strategy" is a special kind of knowledge that caters to certain
problem characteristics and provides certain problem-solving
frameworks. The wiki can store and categorize strategies and domain
knowledge by their intended problem types and characteristics, so the
human can better evaluate, select and apply strategies relevant to his
* Chinese Pinyin Input Method Revisited (PYIME) - Today's Chinese
pinyin input methods inherit the single-row candidates window from the
DOS era. If we categorize candidate characters into multiple rows
according to some criteria, the user can more easily home in on his
desired character. For example, each row contains characters that have
the same phonetic radical, and one row reads "马 吗 妈 码 玛", while
another row reads "麻 嘛 䗫". Rows can also correspond to the five
possible tones in Chinese, as most mainland Chinese don't type tones.
Still, there can be a special, first row for the most frequently used
words and characters.
* A Politically Correct New Name for English (ARCS) - As
technology like automatic code-switching would make English a much
cheaper commodity for non-native people to acquire, for the first time
it will become possible for most people in the world to use decent
English. But nationalist sentiments can be a negative factor for some
people to adopt English. While it is logically recognized by everybody
that all natural languages are actually made of equally random
syllables, emotionally people can still more or less feel unequal that
one language is more international than others. A reason for this
paradox is that languages are named by their nations of origin:
English, French, Spanish, etc. Therefore, we can use a "renaming"
technique to better reflect a language's random nature rather than
nationalist connotation. Actually, the word "language" itself already
has a strong nationalist connotation, and I propose the term "code
system" to eliminate that connotation. As for English, let's rename it
as "A Random Code System", or ARCS for short.
* Foreign Language Proficiency Measurement (FLPM) - How does a
non-native speaker introduce his language level to a native speaker in
an understandable manner? The computer can test his proficiency and
compare it with native speakers at different ages. Introductions like
"My English level is like a 10-year-old American child" should be
understood well by a native speaker.
More information about the Elsnet-list