|
1st July
Rohit is arranging to upload the TTS voice to
be used with Festival, we should be doing that soon.
31st May 2002
Deepti the Hindi Speaking Chat Bot is intended
to be a Computing Time Companion (CTC) providing human computer interaction
using Natural Language Interface for an Indian Language. Looking forward to this
aim we flagged off the development of Deepti. It was decided to have an AIML
based bot with speech out capabilities. We picked up Hindi to be language that
Deepti shall speak. We also decided that we shall go ahead with the development
by diving it into two modules. One which was primarily concerned with the bot's
Intelligence i.e. writing the AIML categories that will ultimately form the bot's
brain. Here certain issue regarding the notations to be used for writing in
Hindi using the QWERTY keyboard. We considered several of the existing standards
including iTrans and ISCII. Finally after making this issues clear we went on to
concentrate out efforts on the Text-to-Speech Systems. Building up the
categories involved intelligent analysis of the kind of questions that should be
thought to the bot. Also we had to look into the choices of replies for
unanswerable queries, the random replies to the same questions so as to make the
chat more intelligible, etc.
The basic mantra of having an intelligent bot
is: The botmaster lends her own intelligence to the bot, so, overtime, the
bot would be as smart as the botmaster is!
Having the job of writing the categories going
underway (also it must be realized that it is a continuous process, one can
never write enough categories, after all this is Natural Language), the next aim was to have a Hindi Text-to-Speech System. Again we
explored several possibilities for a Text-to-Speech System which can be used for
Indian Languages. Finally we decided to go ahead with FestVox for the TTS. We
decided to have a diphone-database. We started by going through the Festvox
Manuals. Then
finally the actual database building started and we had the problem of deciding
the phone set of the hindi language. Finally we decided to have a look
at existing systems to decide upon the phone sets. We even went down to consult
some of the linguists from the local university. Ultimately a phone set was
decided and then the procedure of building a TTS as specified by Festvox was
taken. We had to generate all the non-sense words for each diphone combination.
These turned out to be 2116. Then we recorded the prompts for the non-sense
words with some random duration information. Correspondingly prompts were
generated and using these prompts the actual wav files were recorded. Festvox
generate the corresponding label files for all the wave files but with almost
random or no labeling. So all the recorded utterances where to be hand labeled.
Now for hand labeling we were looking for some suitable tools which can make
the process as automated the process as much as possible. Finally Alan suggested
the use of EMU Labeller and we decided to take his advice for it. The labeling could be said to be the most cumbersome job in the whole process of TTS
building.
Anyhow having got the recorded non-sense words and the labels, Festvox automated
most of the process which involved the extraction of pitch marks, F0 Contours and
other processing to the recordings. Ultimately we had a synthesizer running
which given the phones would generate the corresponding utterances.
Now the work was to have a Text to Phones
conversion for a text written using the QWERTY Keyboard. Also it was assumed
that the text followed certain restrictions which we were following while
writing the categories for the bot's intelligence. So looking into the options
that Festival provides for we found that using a lexicon (for all the words in
the bots knowledge base) + having a set of letter to sound rules (for words
which come at runtime like names etc.) can quite well serve the purpose of text
to phone conversion. By now we have a good enough knowledge base at our disposal
with about 1500 categories. We built up a word list out of these categories and
wrote the pronunciations for these words. Finally we came up with a set of LTS
rules and having them in place we were able to have the text to speech output.
Though we were seeming several deficiencies by now which we hope to correct in
near future. Mainly the problems were with the selection of phone sets,
recording & labeling. Anyhow so the two major modules were in place by now i.e.
the bot and the TTS. So it was time now to integrate the two. This was basically
tweaking into the AIML parser code so as to adapt its responder to give output
not only to the HTML responder but also to the Festival Server. Finally all was
in place and soon a pretty interface was up and we had the bot with speak out
capabilities at our service.
Though the bot is functional now but still we
find a lot that can still go on to improve it further. A review of the phone sets
is surely an effort worth it. Out of what we learnt till the end we found the
this has been one of the biggest deficiencies. Then we can surely go for writing
some more categories and may be concentrate on some particular limited but
exhaustive domain of application. Then we can have a better recording and better
labeling. Finally we see that a good packaging of the bot for easy distribution
and installation will be worth the effort. Going further in our effort we hope
to replace the technologies currently in place that have the bot working i.e.
AIML parser and Festvox TTS framework with our own indigenous work in the near
future.
However ultimately its the idea and the
application that matters. CTC : Computing Time Computing > An intelligent bot
with capabilities of speaking out its responses and helping you while you
compute making computing easier and more interactive. Hope to have Deepti
working for you making your computing A big pleasure, shall we say?.
17th Feb 2002
- We have developed some 450 categories, the
work is slow because the people involved are lazy.
- Had a presentation for the first evaluation
in our college. See the presentation here.
- Rahul has started reading about Text to Speech
synthesis, the process and various existing TTS systems.
- Rahul has tested MBrola but the performance
hasn't been as expected.
- Rahul has tested the Dhvani TTS, the performance is
satisfactory especially of the as for now officially unreleased version 2.
But the biggest problem is that the voice data base is in a Male Voice,
which would not serve our purpose.
- We have also contacted the Punjab University Hindi
Department officially, so any needs typical to Hindi may be catered to.
- Rishi is also working on the Hindi AIML sets and
testing Dhvani, though I have not heard from him in some time.
- We have announced the beginning of this project at
the Alicebot lists and have generated some responses, mainly from Kim
Sullivan
- We have also talked to some people developing
Dhvani, but the responses have been few and far between, don't know why.
Keep checking this page for more . . .
Deepti Home Page
|