XI.TECHNOLOGY

(Under each head the various approaches/models used, application, current state of Technology, software tools available, future directions etc. will be given along with relevant references to organizations /institutions / companies, products, experts in the field, books, formals, articles, websites, CDs cassettes etc.)

A. Corpus and Corpus Management Tools

(Plain, POS tagged, parsed, parallel, aligned corpora)

		-Corpus creation and updating tools
		-Corpus indexing tools (Concordance, KWIC index etc.) 
		-Corpus compression and encryption and encryption tools 
		-Text processing tools 

(Basic statistics, n-grams, Markov Chains, Hidden Markov Models, etc. at sentence, word, morpheme, syllable, character, phoneme levels)

Indian Statistical Institute (kolkata) has developed following corpus:

	1. A monolingual sample Bangla text corpus is generated.
	2. A historical corpus of Bangla novels published in 19th and 20th century is compiled. 
	3. Statistical frequency counting of various linguistic elements in the corpus has been accomplished. 

B. Text Editors and word Processors

		-Text edition tools 
		-Word Processing tools 
		-DTP tools

iLEAP is an Internet ready Indian language Word-Processor on windows. It has following features:

		∙ Self-explanatory user interface
		∙ Multilingual Spellchecker
		∙ Choice of keyboard layout
		∙ Email facility for Indian languages
		∙ Facility to make web pages in Indian languages
		∙ Language sensitive multilingual editor 
	Address: www.cdacindia.com/html/gist/products/ileap.asp

Bangla word is a word processing application, specially designed for writing Bengali document. It allows the user to enter Bangla text using vowels and consonants. The conjunct characters are placed automatically by the system if possible or as indicated by the user. Bangla alphabets are mapped phonetically on to the standard QWERTY keyboard, independent of any Bangla font, it supports over 200 Bangla fonts. It allows the user to create Web pages and send emails directly from the application as both attached Bangla word documents and RTF format text.

	Address: www.banglasoftware.com/banglaword.asp

Lekho: A Bangla Unicode Editor is a "plain-text" editor designed to take in phonetic input from a standard US keyboard and transliterate it online into Bangla text. It allows the user to re do, and export to bangla text. It is based on the QT toolkit. Runs on Xll (linux, free BSD, unix) and windows systems. GPLd.

	Address: www.lekho.sourceforge.net

Unitype Global writer is a word-processor that lets the user write in his/her own language. It has a built-in and customizable spell-checker and Thesaurus.

	Address: www.untitype.com/unitype.htm

Parabaas is a word-processor that allows the user to write Indian languages in Roman letters using a simple transliteration scheme based on Itrans.

	Address: www.parabaas.com/parabaasaxar

Lekhok is a multi system word processor developed by protivasoft inc. packaged with 34 fonts. It provides the user the facility to insert beautiful graphics in the text.

	Address: www.protivasoft.com 

Hatey khori 2.0 is Bangla scripting software designed for children. It is a multimedia program with animation, audio-feedback, visuals and print options. Bangla letters are animated on the black board. Speaker button allow to hear pronunciation.

	Address: www.protivasoft.com 

Sarmsad Shabdajagat is a Bangla software marketed by Sishu Sahitya Sarmsad. It has following features:

	∙ A text editor for writing Bangla with possibilities of designing one's own Keyboard 
	layout in case the standard Samsad keyboard layout is not acceptable to the user. 
	∙ One option is writing Bangla in traditional and transparent type style. 
	∙ Errors in spelling are detected with suggestions for right spelling 
	∙ Advantages of normal cut, past and copy 
	∙ Text can be transferred to other popular DTP packages
	∙ Advantages of draft printing 
	∙ A list of homophones 
	∙ A list of alternate spelling 
	∙ Conjugation of about 1000 Bangla verbs 
	∙ Find and change possibilities 
	∙ Global keyboard driver 
	∙ Alphabetical sorting 
	∙ Indexing facility 
	∙ Special system for paperless proofreading and correction. 

This information is collected from Professor Ashoke mukhopadhyay.

	 Address: ashoke@cal3.vsnl.net.in

For purchasing write to

	spcom@cal3.vsnl.net.in

-Fonts

The following website contains links to Dr. Berlin's Foreign Font Archive that features list of demo packages of various fonts, full information for purchasing full version of demo packages.

 
	Address: http://www.user.dtcc.edu/`berlin/font/Indian.htm 

The document enlists number of fonts that can be used with Microsoft 95/98, Windows NT, Windows 2000. The package names and price lists are given in the page.

	Address: http://www.genguly.de/banglafont.html

The document present information that will assist font developers in creating fonts for all Bengali Script Languages covered by Unicode Standard.

	Address: http://www.Microsoft.com/typography/ofntdev/bengalo/default.com 

The free Bangla Fonts Project is engaged in creating free, high quality completely Unicode complaint Open Type Bangali fonts. Their website gives option to download four Founts:

		Akaash developed by sayamindu Dasgupta 
		Ani developed by Anirban Mitra 
		Likhan Developed by Deepayan Sarkar 
		Mukti that uses glyphs donated by Cyberscape Multimedia Ltd.
		Address:http://www.akruti.com 

has released a set of True Type fonts 9 Indic languages including Bangla.

	http://cgm.cs.mcgill.ca/`luc/Bengali.html

enlists fonts available and link to the fonts.

C. Dictionary Tools (monolingual/bilingual/multilingual)

	-Word lists/vocabulary
	-Electronic/ Online dictionaries 

Virtual Bangladesh:

English to Bengali Dictionary: this is an English to Bengali dictionary.

	Address: www.virtualbangladesh.com/dictionary.html

-Electronic/Online thesaurus -Morphological analyzers/generators

Bengali word Net:

Semantic net of synonym sets with corresponding roots, etymology, lexical category, and inter-sunset relationships. Relationships based on Princeton Word Net, with augmentation. STATUS: underway

D. Spell Checkers/Grammar Checkers/Style checkers

-Linguistic/statistical/hybrid systems

Bspeller: Bspeller is a lightweight text editor with a Bengali spell checker. For further information

	www.bengalinux.org/project/dictionary/bspller.php

Indian Statistical Institute (kolkata) has analyzed human spelling error patterns and onomatopoeic word in Bangla. They have developed a Bangla spell-checker for automatic detection and correction of spelling error in corpus.

E. Parsing Systems

		-Phonological
		-Morphological

Bengali Morphological Analyzer: Given a root word, and information like Tense-Aspect-Mood for a verb, the synthesizer generates the surface form of the word. Append rules are used on the root word. STATUS: Complete [TOOL AVAILABLE]

For further info\information visit the website

	http://www.cel.iitkgp.ernet.in

GSMorph:

This is a structuralist grapheme-based morphological analyzer. Given an input for analysis, the analyzer parses the word and gives the root, its meaning and the suffix along with its meaning as an output.

For further information write to

	 gsghyd@icqmail.com,gsg@eth.net 

-Syntactic

Bengali and Hindi Parser: Parser a Bengali word into a verb-driven frame given an input sentence using computational Paninian Grammar. User the Bengali/Hindi Morphological Analyzer. STATUS: Underway

For further information visit the website

	http://www.cel.iitkgp.ernet.in 

Machine Translation and Translation system:

	-Translation memories 
	-Terminology data books

Bengali/Hindi to English and English to Bengali/Hindi Translation system: Translation between Indo-Aryan languages using an intermediate frame structure. User Bengali and Hindi Parsers and Morphological Synthesizers. STATUS: Underway-Post-editing tools

For further information visit the website

	http://www.cel.iitkgp.ernet.in 

Indian statistical Institute (kolkata) develops tool for Domain specific translation between Bangla and Hindi.

Machine Aided Translation System (MAT):

Demo system for Language pair Kannada to Hindi was developed initially at IIT Kanpur. This technology was demonstrated at various forms and named as Anusaraka. Later the work is extended to languages like Telugu, Marathi, Bangla, Hindi and Punjabi.

The project was jointly carried out by IIT Kanpur and University of Hyderabad.

-Word sense disambiguation 9WSD) tools

Indian statistical Institute (kolkata) has developed OCR system for

	i.  Printed Bangla script (as well as for printed Devanagari script). The accuracy rate is 92%.
	ii. Handwritten Bangla numeral 
	iii.Bangla isolated handwritten character 

H. Information Retrieval /Information Extraction (IL/Multilingual)

	-Text Mining 
	-Web mining 

I. Search Engines/web Technologies (IL/ Multilingual)

J. Speech Technology

	-Single processing  

Bengali and Hindi Grapheme to phoneme Mapper: Given a string of graphemes in Bengali/Hindi, converts them to the corresponding set of phonemes. STATUS; Bengali G to P Underway, Hindi G to P Completed using Optimality Theory approach.

For further information visit the website

	http://www.cel.iitkgp.ernet.in 

Indian statistical Institute (kolkata) has worked out grapheme to phoneme conversion rule for Bangla.

	-Text to Speech (TTS) 

Prosodic Text-to-speech systems For Hindi and Bengali: Given an ordered list of graphemes 9letters) in Bengali/Hindi, generates the corresponding speech units and concatenates them to give a speech output. Concatenative speech synthesizer has been developed. Screen reader has been developed. STATUS: Flat speech has been achieved, natural (prosodic) speech is being modeled. [TOOL AVAILABLE, TTS IS BEING PLUGGED INTO OTHER SOFTWARES]

For further information visit the website

	http://www.cel.iitkgp.ernet.in 
	-Speech to text (STT)
	-Speech recognition/ Understanding 
		-Language Recognition 
		-Speaker Identification

K.Standardization Issues

-Character level standards: ISCII/UNICODE

ISCII (Indian Standard Code for Information Interchange: In 1991, the Bureau of Indian standards adopted the Indian standard Code for Information Interchange. A standardization committee under Department of Electronics during 1986-88 evolved the ISCII standard.

For an introduction to ISCII and ISCII code table visit the site:

	http://tdil.mit.gov.in/standards.htm 

Unicode Standard:

The Unicode Consortium develops, extends and promotes use of Unicode standard which specifies the representation of text in modern software products and standards. Unicode enables a single software product or a single website to be targeted across multiple platforms, languages and countries without re-engineering.

Indian Languages on Unicode:

Unicode standard has incorporated Indian scripts under the group named Asian Scripts. This includes Bangla, Devanagari, Gurumukhi, Oriya, Gujarati, Tamil, Telugu, Malayalam and Kannada. The Indian language block of Unicode standard is based on ISCII-88.

	Address: http://www.unicode.org 

To view chart of Indian language character sets in Unicode,

	Visit: http://charts.Unicode.org 
	-Glyph standardization 
	-Keyboard layout 

IIkeyb:

This is Indian language keyboard Program developed by Avinash Chopde. The software facilitates typing text in any Indian language script by memorizing only 50-60 keys. The user needs to remember only basic vowel and consonants of any language and the program automatically generates the 200+ characters (glyphs) required to correctly typeset text in any Indian language. It contains a high quality true type font (developed by Shrikrishna Patel) and a software module that run in the background under Microsoft's Windows Operating system. The software maps the ASCII English keyboard to a particular Indian Language script.

For more information visit

	www.aczone.com/ilkeyb 

INSCRIPT

This keyboard layout is used for data entry in Indian languages. The layout uses default 101 keyboard. The mapping of the characters is such that it remains common for all Indian languages. All the vowels are placed on the left side of the keyboard layout and the consonants on the right side.

For more information visit

	www.cdacindia.com/htmlgis/standard/inscript.asp 
	-Rendering engines 
-Operating System level support

BIOS (Bangla Innovative Open source) is developing a Linux based operating system. They are working on providing Bangla support for major Xserver application such as office suites, database, desktop environment like GNOME, KDE.

	Address:Visit: www.banlalinux.org 

-Browser level support

Top
top


Copyright CIIL-India Mysore