We are pleased to invite you to the following seminar: Document Image Compression and Compressed Domain Document Processing, by Dr. Utpal Garain, Indian Statistical Institute, Kolkata, India.
The seminar is jointly organized by School of Electrical & Electronic Engineering, Nanyang Technological University (NTU) and Pattern
Recognition and Machine Intelligence Association (PREMIA).
Date: 25 July 2008 (Friday)
Time: 10.30 - 11.30 am
Venue: Meeting Room C, Room Number S1-B1c-111.
School of Electrical and Electronic Engineering, Nanyang
Technological University
(Light refreshments will be provided after the talk)
Title: Document Image Compression and Compressed Domain
Document Processing.
Speaker: Dr. Utpal Garain, Indian Statistical Institute,
Kolkata, India.
Abstract:
With the advent of digital libraries, research
in document image compression and compressed domain document processing finds a
new dimension. Document image compression serves an obvious practical purpose: it conserves storage in a database of document images and
thereby improves computer systems' performance by reducing accesses to storage
devices, enhances utilization of available bandwidth in a network. Researchers
have studied document image compression as a special case of general data
compression for the last quarter century and several techniques and standards
have been proposed. This talk, therefore, starts with revisiting the popular
and frequently used algorithms for compression of scan-digitized document
images. Standards like CCITT-IV, JBIG, and JBIG2 will be discussed giving some
reasonable details. The role of symbolic compression in JBIG2 will be
highlighted. Idea of residue coding will then be introduced and lossy and
lossless versions of a JBIG2 compliant method are to be talked about. With
increasing amount of document images being stored in compressed format,
efficient information retrieval in the compressed domain has also become a
major challenge for proficient designing of digital libraries and document
viewing facility in resource-constraint systems like palm-held devices. The
discussion on compressed domain pattern matching and retrieval is broadly based
on JBIG2 compression method. Requirements for designing a complete compressed
domain document image retrieval system would be outlined. The existing research
in the related area are to surveyed and then the prevailing studies on
accomplishing several IR (information retrieval) tasks for document images
would be reviewed. Several applications like stopword spotting, word stemming,
keyword spotting, duplicate detection, summarization, etc are to be discussed
and research needed to realize these applications in the compressed domain
would be presented.
Brief biography of the Speaker:
Utpal Garain received his
bachelor and master degrees in computer science and engineering in 1994 and
1997, respectively from Jadavpur University, India and Ph.D. degree from Indian
Statistical Institute in 2005. Dr. Garain started his career in software
industry and later on, he joined Indian Statistical Institute where he is, at
present, serving as an Assistant Professor. His research interest includes
document image analysis including document compression and OCRs, technology
development in Indian languages, machine learning, etc. Dr. Garain is one of the key scientists who first developed
an Indic script (Devanagari-Hindi and Bengali) OCR system that was finally
transferred to industry for commercialization. For his significant contribution
in pattern recognition and its applications for DIA and language engineering,
Dr. Garain received prestigious Young Engineer Award in 2006 from the Indian
National Academy of Engineering (INAE).
|