About RR & IF
Newsletter Home
May 2002
June 2002
July 2002
Aug 2002
Sept 2002
Oct 2002
Nov 2002
Dec 2002
Jan 2003
Feb 2003
Mar 2003
Apr 2003
May 2003
June 2003
July 2003
Aug 2003
Sept 2003
Oct 2003
Dec 2003
~~~ Ramana Rao's INFORMATION FLOW ~~~ Issue 2.3 ~~ Mar 2003 ~~~~

Information Flow is a monthly opt-in newsletter.  Your email
address was entered on www.ramanarao.com or www.inxight.com.
You may forward this issue in its entirety.  
Send me your thoughts and questions:	     [email protected]

~~~ IN THIS ISSUE ~~~ March 2003 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* Introduction
* 26000 Languages
* A Review of Open Innovation by Henry Chesbrough
* Light Linking on Language

~~~ Introduction ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I had fun with this month's issue, as I hope you will see.  The
first article and the link section focus on linguistics, a
central element of the technology at Inxight.  The second article
is a review of a book about capitalizing on innovation which I
recommend strongly.  Enjoy!  As always, comments appreciated.

~~~ 26000 languages ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In 1981, Ron Kaplan and Martin Kay, gave a paper on computational
morphology that started the effort that has lead to Inxight's
core engine for linguistic analysis.  Morphology is an area of
linguistics that explores the structure of words.  Kaplan and Kay
intended to work on the structure of sentences but they took a
step back to get warmed up.  But an interesting thing happened.
They found an amazingly interesting world inside words.

(Like discovering that way down in the deepest ocean, under
extreme pressure and with no light, and geysers shooting out 400
degree Celsius water at fire hose speeds, live tube worms, giant
clams, and blind crabs in baths of sulphuric acid.)

If I were to give you a word that you didn't know (for example
morphology), you would know whether to bother looking it up in
the English dictionary or not.  Or if you speak French, I could
give you a word, and you might say, no way, not English, but it
could be French.

So what is it you know?  You know something about the structure
of words in the languages you speak.  Words are made from parts,
and you recognize the parts: prefixes, suffixes, middles,
endings, etc.  A linguist would speak of stems, morphemes,
inflections, and many other words that describe the particles
that make up words.

What Kaplan and Kay theorized is that the structure that you see
inside a word could be modeled with a simple kind of abstract
computer called a finite state machine.  In all languages. This
is a theory about a "universal" in languages, which is an
achievement with serious bragging rights among linguists.

We won't go into the intricacies of finite state machines (FSMs).
Basically it means that you should be able to rip words apart
perfectly in all human languages.  And fast.  Great theory, now
on to reducing this to practice.

All through the eighties, Kaplan and his colleagues, developed
tools and algorithms for working with really big finite state
machines and achieving their theoretical speed.  And tools to
allow linguists to build lists of words, word parts, and assembly
instructions in ways familiar to them.  This was also a user
interface problem: the linguists didn't speak FSMs, but instead

Then on through the 80s and 90s, linguists at universities around
the world started to model the words of different languages with
FSMs.  20 years of such efforts have proven Kaplan and Kay right.
Words in all languages can be ripped apart fast.  Inxight's core
LinguistX Platform currently support 26 languages.  If you only
know English you would hardly be able to appreciate the
challenges thrown by other languages.

The ripping action starts with breaking the stream of characters
into words.  Easy you say, but Japanese doesn't use space between
words.  And the level of ambiguity is comparable to that in
"parsing" an English sentence, the classic example being "time
flies like an arrow".  As a more direct example of putting spaces

The next step is to rip a word apart into its beginnings,
middles, and ends.  The beginnings and ends (and sometimes
similar things in the insides) can get quite complicated in
highly inflected languages.  In Finnish, there are certain nouns
and verbs in some forms that have never been uttered.  And the
middles can also get quite complicated, say, in compounding
languages.  You can probably construct a German word that prints
from Hamburg to Munich.

But it's not just about 26 languages, but about 26000 languages.
Not just Japanese or German, but also languages like Pfizerese
and AstraZenecese.  Every organization or community speaks in its
own language, certainly based on one of the world languages, but
many of these nested languages contain quite large vocabularies
and even occasionally language patterns that wouldn't be
understood by speakers of the base language.

Certainly Dilbert knows this. Consider Marketingu and
Engineerish.  The comic strip is poking fun at languages that
seem to have no purpose other than to hide incompetence or to
keep the powerless in their cubicles.

Yet, legitimately, nested languages often support real needs for
greater communicational economy and greater precision in
specialized disciplines.  You can hardly take any college or even
high school course without having a section at the end of each
section devoted to the special vocabulary of the subject.

Words and concepts are the gateways into the ideas of a
discipline.  This is another way to understand the importance of
the languages we create to allow people to access, route, and
mine content more effectively.

We start with simple controlled word list and move up to more
formally structured "controlled vocabularies" and "taxonomies."
And eventually we focus on the links and relationships between
the words, and there we start to call the structures ontologies.

Certainly power comes with moving up the representational food
chain, but we should not forget that at the bottom are, say the
single cell organisms of sharing thoughts, the words.

~~~ Review of "Open Innovation" by Henry Chesbrough ~~~~~~~~~~~~

Open Innovation: The New Imperative for Creating and Profiting
from Technology
By Henry William Chesbrough

   ~> http://ramanarao.com/cgi-bin/book.cgi?isbn=1578518377
      Harvard Business School Press, March 2003

Maybe it was in the early nineties that I heard Ron Kaplan (the
same as mentioned above) ask the question how does PARC with its
200 researchers compete with say 100 garages with 2
ex-researchers.  This question was asked in the context of many
years of discussing Xerox fumbling the future.  In fact, there
must be a Microsoft Word template file (*.dot) for journalists,
starting: "Xerox PARC, the famed research center, nestled in the
hills behind Stanford, invented blah blah blah ... and failed to
capture the commercial value."

Henry Chesbrough, a Harvard Business School professor, has just
published a book that maps and explains the world that Ron
Kaplan's question is gesturing at.  Chesbrough interviewed me in
1997, just after we spun Inxight up but not quite out of Xerox.
I've talked with Chesbrough a number of times over the years.

Open Innovation is carefully-researched, well-organized,
articulate, and fun to read.  And, Chesbrough comes through with
many clear observations and valuable insights.  Even as someone
who, for 10 years at PARC, participated in many discussions on
the question of capitalizing on research, and then took the
spin-out journey myself, I have gained a broader perspective and
a coherent framework to organize my experiences.

If you are not interested in how large companies can capitalize
from R&D, you will still find this book interesting if you care
about innovation at all.  Just as the Open Source movement is not
just about software and software business, but about business and
social practices in general, Open Innovation is also about the
much broader economic and social realities necessitating a change
in the management of innovation.

Xerox, for its failure, provides the perfect starting point for
the book as a model of how you can hit the highest highs of
invention and still sink to the deepest abyss blah blah.
Actually, the book is extremely fair to the challenges that were
faced by Xerox and doesn't stand on the simplistic theories of
the Word template.

Instead, it focuses on the broader context that enables the truly
radical inventions in the first place, and the set of structural
factors and social changes that made capitalizing on the
inventions near impossible.  I always felt that if Xerox had
managed to control its ideas, that *you* wouldn't be scrolling
this email right now.  It was a massive parallel social
investment into various configurations of technologies and
markets and business models that really created the computing and
networking infrastructure we all have now.

Chesbrough covers this with just the right amount of historical
background and focused research.  He looks at the birth of
industrial research at the beginning of the 20th century and how
large industrial giants with near-monopolies on the practical
knowledge of their arena were best served by a closed model of
innovation.  And he looks at the more recent changes in the
knowledge landscape.  For example, the increasing availability of
knowledge enabled by the growing mobility of high-skill people,
and the improved identification and realization of high-risk,
high-reward market opportunities enabled by venture capital.

All of this leads to the model of open innovation as the best
that large companies can do as we move forward.  The model moves
away from regulating and controlling knowledge and knowledge
workers to fostering the effective flow of ideas into and out of
a company.  It provides a new vision of how a company can capture
the greatest achievable share of value for ideas it generates.
More broadly, the world of open innovation depicted in the book
has implications for academic research and government policy.
And even for small companies and teams of all kinds.

Beyond Xerox, Chesbrough looks at the successful transformation
of IBM research from a closed to an open model, at Intel's
experience in connecting with academic research and use of
venture capital, and Lucent's corporate venture effort, which
successful as it was, ends on a grim note.  Along the way,
Chesbrough provides insights from Cisco, Microsoft, Merck, and
many other companies.

I am a big fan of books intended for broad audiences that can be
read by those that generally don't read books of the given genre.
For example, I like reading science books that can be read by
non-scientists, that communicate the essential ideas of a subject
matter simply, and that convey a sense of why anybody would ever
choose to be a scientist.  Substitute design, technology, or
business into that sentence, and I would apply the same test.

Chesbrough's book succeeds easily because of its mix of
scholarship and practical conception.  I would certainly
recommend it to entrepreneurs and business managers, and also to
scientists, designers, and technologists.

~~~ Light Linking on Language ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

~> Kids Creoles, and the Coconuts.  Thinking about nested
   languages, I vaguely remembered stuff about pidgins and
   creoles.  Web searching, I found this article from 1992 on an
   interesting language experiment by Bickerton who is one of the
   pioneers on studying Creoles.

   ~> At Discover.com, search for "Creoles" in archive
      Or perhaps the following might work:

~> The Ethnosphere and Language Extinction.  Last month, I was
   enthralled by a talk by Wade Davis at the TED conference.
   Wade is an anthropologist and an explorer with the National
   Geographic Society.  Juxtaposing his coined concept of
   Ethnosphere with some reading on language extinction, besides
   provoking social questions, brings up questions about the
   relationship between language and thought.
   ~> http://www.sacredbalance.com/web/drilldown.html?sku=91
   ~> http://news.nationalgeographic.com/news/2002/06/0627_020628_wadedavis.html
   ~> http://abcnews.go.com/sections/world/DailyNews/endangered_languages.html
   ~> http://www.lsadc.org/web2/faq/endangered.htm

~> The relationship between language and thought has been a topic
   in linguistics and psychology for most of the 20th century.
   Benjamin Whorf, who worked as fire prevention specialist for
   an insurance company and did linguistics research on the side,
   argued for the influence of language on thought.  Though
   Whorfian theory fell into disregard, more recent work is
   reconsidering the possibilities.

   ~> http://sciam.com/article.cfm?articleID=00009A6B-B402-1CDA-B4A8809EC588EEDF

~> Whorf to Whorf.  On the topic of Whorfs, there is a language
   that probably won't be going extinct any time soon.  In fact,
   it may still need to be invented.  It isn't transmitted from
   parent to child, but rather from trekkie to trekkie.  It's

   ~> http://www.kli.org/

~> The "Cold Fusion" of Linguists.  Okay if that's not enough
   quirkiness in my linking, then try this one on the Nostratic
   hypothesis on the roots of language.

   ~> http://www.santafe.edu/~johnson/articles.nostratic.html

~> Meanwhile, for the seriously interested, Pinker's book beats a
   fistful (or fiveful) of light links.  And it meets my book
   tests above grandly.  Or if you prefer good documentary, years
   ago, there was a series on PBS called the Story of English
   with an excellent companion book.
   ~> The Language Instinct

   ~> The Story of English

Ramana Rao is Founder and CTO of Inxight Software, Inc.  
Copyright (c) 2003 Ramana Rao.  All Rights Reserved.
You may forward this issue in its entirety.

See:  http://www.ramanarao.com
Send:   [email protected]
Archive:  http://www.ramanarao.com/informationflow/archive/ 
Subscribe:  mailto:[email protected]
Unsubscribe:  mailto:[email protected]