1st July

Rohit is arranging to upload the TTS voice to be used with Festival, we should be doing that soon.

31st May 2002

Deepti the Hindi Speaking Chat Bot is intended to be a Computing Time Companion (CTC) providing human computer interaction using Natural Language Interface for an Indian Language. Looking forward to this aim we flagged off the development of Deepti. It was decided to have an AIML based bot with speech out capabilities. We picked up Hindi to be language that Deepti shall speak. We also decided that we shall go ahead with the development by diving it into two modules. One which was primarily concerned with the bot's Intelligence i.e. writing the AIML categories that will ultimately form the bot's brain. Here certain issue regarding the notations to be used for writing in Hindi using the QWERTY keyboard. We considered several of the existing standards including iTrans and ISCII. Finally after making this issues clear we went on to concentrate out efforts on the Text-to-Speech Systems. Building up the categories involved intelligent analysis of the kind of questions that should be thought to the bot. Also we had to look into the choices of replies for unanswerable queries, the random replies to the same questions so as to make the chat more intelligible, etc.

The basic mantra of having an intelligent bot is: The botmaster lends her own intelligence to the bot, so, overtime, the bot would be as smart as the botmaster is!

Having the job of writing the categories going underway (also it must be realized that it is a continuous process, one can never write enough categories, after all this is Natural Language), the next aim was to have a Hindi Text-to-Speech System. Again we explored several possibilities for a Text-to-Speech System which can be used for Indian Languages. Finally we decided to go ahead with FestVox for the TTS. We decided to have a diphone-database. We started by going through the Festvox Manuals. Then finally the actual database building started and we had the problem of deciding the phone set of the hindi language. Finally we decided to have a look at existing systems to decide upon the phone sets. We even went down to consult some of the linguists from the local university. Ultimately a phone set was decided and then the procedure of building a TTS as specified by Festvox was taken. We had to generate all the non-sense words for each diphone combination. These turned out to be 2116. Then we recorded the prompts for the non-sense words with some random duration information. Correspondingly prompts were generated and using these prompts the actual wav files were recorded. Festvox generate the corresponding label files for all the wave files but with almost random or no labeling. So all the recorded utterances where to be hand labeled. Now for hand labeling we were looking for some suitable tools which can make the process as automated the process as much as possible. Finally Alan suggested the use of EMU Labeller and we decided to take his advice for it. The labeling could be said to be the most cumbersome job in the whole process of TTS building. Anyhow having got the recorded non-sense words and the labels, Festvox automated most of the process which involved the extraction of pitch marks, F0 Contours and other processing to the recordings. Ultimately we had a synthesizer running which given the phones would generate the corresponding utterances.

Now the work was to have a Text to Phones conversion for a text written using the QWERTY Keyboard. Also it was assumed that the text followed certain restrictions which we were following while writing the categories for the bot's intelligence. So looking into the options that Festival provides for we found that using a lexicon (for all the words in the bots knowledge base) + having a set of letter to sound rules (for words which come at runtime like names etc.) can quite well serve the purpose of text to phone conversion. By now we have a good enough knowledge base at our disposal with about 1500 categories. We built up a word list out of these categories and wrote the pronunciations for these words. Finally we came up with a set of LTS rules and having them in place we were able to have the text to speech output. Though we were seeming several deficiencies by now which we hope to correct in near future. Mainly the problems were with the selection of phone sets, recording & labeling. Anyhow so the two major modules were in place by now i.e. the bot and the TTS. So it was time now to integrate the two. This was basically tweaking into the AIML parser code so as to adapt its responder to give output not only to the HTML responder but also to the Festival Server. Finally all was in place and soon a pretty interface was up and we had the bot with speak out capabilities at our service.

Though the bot is functional now but still we find a lot that can still go on to improve it further. A review of the phone sets is surely an effort worth it. Out of what we learnt till the end we found the this has been one of the biggest deficiencies. Then we can surely go for writing some more categories and may be concentrate on some particular limited but exhaustive domain of application. Then we can have a better recording and better labeling. Finally we see that a good packaging of the bot for easy distribution and installation will be worth the effort. Going further in our effort we hope to replace the technologies currently in place that have the bot working i.e. AIML parser and Festvox TTS framework with our own indigenous work in the near future.

However ultimately its the idea and the application that matters. CTC : Computing Time Computing > An intelligent bot with capabilities of speaking out its responses and helping you while you compute making computing easier and more interactive. Hope to have Deepti working for you making your computing A big pleasure, shall we say?.

17th Feb 2002
  • We have developed some 450 categories, the work is slow because the people involved are lazy.
  • Had a presentation for the first evaluation in our college. See the presentation here. 
  • Rahul has started reading about Text to Speech synthesis, the process and various existing TTS systems.
  • Rahul has tested MBrola but the performance hasn't been as expected.
  • Rahul has tested the Dhvani TTS, the performance is satisfactory especially of the as for now officially unreleased version 2. But the biggest problem is that the voice data base is in a Male Voice, which would not serve our purpose.
  • We have also contacted the Punjab University Hindi Department officially, so any needs typical to Hindi may be catered to.
  • Rishi is also working on the Hindi AIML sets and testing Dhvani, though I have not heard from him in some time.
  • We have announced the beginning of this project at the Alicebot lists and have generated some responses, mainly from Kim Sullivan
  • We have also talked to some people developing Dhvani, but the responses have been few and far between, don't know why.

Keep checking this page for more . . .

Deepti Home Page