(Under each head the various approaches/models used, application, current state of Technology, software tools available, future directions etc. will be given along with relevant references to organizations /institutions / companies, products, experts in the field, books, formals, articles, websites, CDs cassettes etc.)
-Corpus creation and updating tools -Corpus indexing tools (Concordance, KWIC index etc.) -Corpus compression and encryption and encryption tools -Text processing tools
(Basic statistics, n-grams, Markov Chains, Hidden Markov Models, etc. at sentence, word, morpheme, syllable, character, phoneme levels)
Indian Statistical Institute (kolkata) has developed following corpus:
1. A monolingual sample Bangla text corpus is generated. 2. A historical corpus of Bangla novels published in 19th and 20th century is compiled. 3. Statistical frequency counting of various linguistic elements in the corpus has been accomplished.
-Text edition tools -Word Processing tools -DTP tools
iLEAP is an Internet ready Indian language Word-Processor on windows. It has following features:
∙ Self-explanatory user interface ∙ Multilingual Spellchecker ∙ Choice of keyboard layout ∙ Email facility for Indian languages ∙ Facility to make web pages in Indian languages ∙ Language sensitive multilingual editor
Address: www.cdacindia.com/html/gist/products/ileap.asp
Bangla word is a word processing application, specially designed for writing Bengali document. It allows the user to enter Bangla text using vowels and consonants. The conjunct characters are placed automatically by the system if possible or as indicated by the user. Bangla alphabets are mapped phonetically on to the standard QWERTY keyboard, independent of any Bangla font, it supports over 200 Bangla fonts. It allows the user to create Web pages and send emails directly from the application as both attached Bangla word documents and RTF format text.
Address: www.banglasoftware.com/banglaword.asp
Lekho: A Bangla Unicode Editor is a "plain-text" editor designed to take in phonetic input from a standard US keyboard and transliterate it online into Bangla text. It allows the user to re do, and export to bangla text. It is based on the QT toolkit. Runs on Xll (linux, free BSD, unix) and windows systems. GPLd.
Address: www.lekho.sourceforge.net
Unitype Global writer is a word-processor that lets the user write in his/her own language. It has a built-in and customizable spell-checker and Thesaurus.
Address: www.untitype.com/unitype.htm
Parabaas is a word-processor that allows the user to write Indian languages in Roman letters using a simple transliteration scheme based on Itrans.
Address: www.parabaas.com/parabaasaxar
Lekhok is a multi system word processor developed by protivasoft inc. packaged with 34 fonts. It provides the user the facility to insert beautiful graphics in the text.
Address: www.protivasoft.com
Hatey khori 2.0 is Bangla scripting software designed for children. It is a multimedia program with animation, audio-feedback, visuals and print options. Bangla letters are animated on the black board. Speaker button allow to hear pronunciation.
Address: www.protivasoft.com
Sarmsad Shabdajagat is a Bangla software marketed by Sishu Sahitya Sarmsad. It has following features:
∙ A text editor for writing Bangla with possibilities of designing one's own Keyboard layout in case the standard Samsad keyboard layout is not acceptable to the user. ∙ One option is writing Bangla in traditional and transparent type style. ∙ Errors in spelling are detected with suggestions for right spelling ∙ Advantages of normal cut, past and copy ∙ Text can be transferred to other popular DTP packages ∙ Advantages of draft printing ∙ A list of homophones ∙ A list of alternate spelling ∙ Conjugation of about 1000 Bangla verbs ∙ Find and change possibilities ∙ Global keyboard driver ∙ Alphabetical sorting ∙ Indexing facility ∙ Special system for paperless proofreading and correction.
This information is collected from Professor Ashoke mukhopadhyay.
Address: ashoke@cal3.vsnl.net.in
For purchasing write to
spcom@cal3.vsnl.net.in
-Fonts
The following website contains links to Dr. Berlin's Foreign Font Archive that features list of demo packages of various fonts, full information for purchasing full version of demo packages.
Address: http://www.user.dtcc.edu/`berlin/font/Indian.htm
The document enlists number of fonts that can be used with Microsoft 95/98, Windows NT, Windows 2000. The package names and price lists are given in the page.
Address: http://www.genguly.de/banglafont.html
The document present information that will assist font developers in creating fonts for all Bengali Script Languages covered by Unicode Standard.
Address: http://www.Microsoft.com/typography/ofntdev/bengalo/default.com
The free Bangla Fonts Project is engaged in creating free, high quality completely Unicode complaint Open Type Bangali fonts. Their website gives option to download four Founts:
Akaash developed by sayamindu Dasgupta Ani developed by Anirban Mitra Likhan Developed by Deepayan Sarkar Mukti that uses glyphs donated by Cyberscape Multimedia Ltd. Address:http://www.akruti.com
has released a set of True Type fonts 9 Indic languages including Bangla.
http://cgm.cs.mcgill.ca/`luc/Bengali.html
enlists fonts available and link to the fonts.
-Word lists/vocabulary -Electronic/ Online dictionaries
English to Bengali Dictionary: this is an English to Bengali dictionary.
Address: www.virtualbangladesh.com/dictionary.html
-Electronic/Online thesaurus -Morphological analyzers/generators
Semantic net of synonym sets with corresponding roots, etymology, lexical category, and inter-sunset relationships. Relationships based on Princeton Word Net, with augmentation. STATUS: underway
-Linguistic/statistical/hybrid systems
Bspeller: Bspeller is a lightweight text editor with a Bengali spell checker. For further information
www.bengalinux.org/project/dictionary/bspller.php
Indian Statistical Institute (kolkata) has analyzed human spelling error patterns and onomatopoeic word in Bangla. They have developed a Bangla spell-checker for automatic detection and correction of spelling error in corpus.
-Phonological -Morphological
Bengali Morphological Analyzer: Given a root word, and information like Tense-Aspect-Mood for a verb, the synthesizer generates the surface form of the word. Append rules are used on the root word. STATUS: Complete [TOOL AVAILABLE]
For further info\information visit the website
http://www.cel.iitkgp.ernet.in
This is a structuralist grapheme-based morphological analyzer. Given an input for analysis, the analyzer parses the word and gives the root, its meaning and the suffix along with its meaning as an output.
For further information write to
gsghyd@icqmail.com,gsg@eth.net
-Syntactic
Bengali and Hindi Parser: Parser a Bengali word into a verb-driven frame given an input sentence using computational Paninian Grammar. User the Bengali/Hindi Morphological Analyzer. STATUS: Underway
For further information visit the website
http://www.cel.iitkgp.ernet.in
-Translation memories -Terminology data books
Bengali/Hindi to English and English to Bengali/Hindi Translation system: Translation between Indo-Aryan languages using an intermediate frame structure. User Bengali and Hindi Parsers and Morphological Synthesizers. STATUS: Underway-Post-editing tools
For further information visit the website
http://www.cel.iitkgp.ernet.in
Indian statistical Institute (kolkata) develops tool for Domain specific translation between Bangla and Hindi.
Demo system for Language pair Kannada to Hindi was developed initially at IIT Kanpur. This technology was demonstrated at various forms and named as Anusaraka. Later the work is extended to languages like Telugu, Marathi, Bangla, Hindi and Punjabi.
The project was jointly carried out by IIT Kanpur and University of Hyderabad.
-Word sense disambiguation 9WSD) tools
Indian statistical Institute (kolkata) has developed OCR system for
i. Printed Bangla script (as well as for printed Devanagari script). The accuracy rate is 92%. ii. Handwritten Bangla numeral iii.Bangla isolated handwritten character
-Text Mining -Web mining
-Single processing
Bengali and Hindi Grapheme to phoneme Mapper: Given a string of graphemes in Bengali/Hindi, converts them to the corresponding set of phonemes. STATUS; Bengali G to P Underway, Hindi G to P Completed using Optimality Theory approach.
For further information visit the website
http://www.cel.iitkgp.ernet.in
Indian statistical Institute (kolkata) has worked out grapheme to phoneme conversion rule for Bangla.
-Text to Speech (TTS)
Prosodic Text-to-speech systems For Hindi and Bengali: Given an ordered list of graphemes 9letters) in Bengali/Hindi, generates the corresponding speech units and concatenates them to give a speech output. Concatenative speech synthesizer has been developed. Screen reader has been developed. STATUS: Flat speech has been achieved, natural (prosodic) speech is being modeled. [TOOL AVAILABLE, TTS IS BEING PLUGGED INTO OTHER SOFTWARES]
For further information visit the website
http://www.cel.iitkgp.ernet.in
-Speech to text (STT) -Speech recognition/ Understanding -Language Recognition -Speaker Identification
-Character level standards: ISCII/UNICODE
ISCII (Indian Standard Code for Information Interchange: In 1991, the Bureau of Indian standards adopted the Indian standard Code for Information Interchange. A standardization committee under Department of Electronics during 1986-88 evolved the ISCII standard.
For an introduction to ISCII and ISCII code table visit the site:
http://tdil.mit.gov.in/standards.htm
The Unicode Consortium develops, extends and promotes use of Unicode standard which specifies the representation of text in modern software products and standards. Unicode enables a single software product or a single website to be targeted across multiple platforms, languages and countries without re-engineering.
Unicode standard has incorporated Indian scripts under the group named Asian Scripts. This includes Bangla, Devanagari, Gurumukhi, Oriya, Gujarati, Tamil, Telugu, Malayalam and Kannada. The Indian language block of Unicode standard is based on ISCII-88.
Address: http://www.unicode.org
To view chart of Indian language character sets in Unicode,
Visit: http://charts.Unicode.org
-Glyph standardization -Keyboard layout
This is Indian language keyboard Program developed by Avinash Chopde. The software facilitates typing text in any Indian language script by memorizing only 50-60 keys. The user needs to remember only basic vowel and consonants of any language and the program automatically generates the 200+ characters (glyphs) required to correctly typeset text in any Indian language. It contains a high quality true type font (developed by Shrikrishna Patel) and a software module that run in the background under Microsoft's Windows Operating system. The software maps the ASCII English keyboard to a particular Indian Language script.
For more information visit
www.aczone.com/ilkeyb
This keyboard layout is used for data entry in Indian languages. The layout uses default 101 keyboard. The mapping of the characters is such that it remains common for all Indian languages. All the vowels are placed on the left side of the keyboard layout and the consonants on the right side.
For more information visit
www.cdacindia.com/htmlgis/standard/inscript.asp
-Rendering engines
-Operating System level support
BIOS (Bangla Innovative Open source) is developing a Linux based operating system. They are working on providing Bangla support for major Xserver application such as office suites, database, desktop environment like GNOME, KDE.
Address:Visit: www.banlalinux.org
-Browser level support
Copyright CIIL-India Mysore