Industrial Invited Speakers for RSCTC 2008
Distinguished Speakers from Industries:
Dr. Dominik Slezak (Chief Scientist and Co-Founder, Infobright)
Dr. Arvind Srinivasan (CTO and Co-Founder, ZL Technologies)
Dr. Slezak's Talk
Rough Sets in Data Warehousing: The Case Study of Infobright Community Edition
The theory of rough sets provides a powerful model for representation of patterns and dependencies, applicable both in databases and data mining. On the one hand, although there are numerous rough set applications to data mining and knowledge discovery, the usage of rough sets inside the database engines is still quite an uncharted territory. On the other hand, however, this situation is not so exceptional given that even the most well-known paradigms of machine learning, soft computing, artificial intelligence, and approximate reasoning are still waiting for more recognition in the database research.
Rough set-based algorithms and similar techniques can be applied to improve database performance in several ways. We focus on the idea of using available information to calculate rough approximations of data needed to resolve queries and to assist the database engine in accessing relevant data. We partition data onto rough rows, each consisting of 64K of original rows. We automatically label rough rows with compact information about their values on data columns, often involving multi-column and multi-table relationships. One may say that we create new information systems where objects correspond to rough rows and attributes – to various flavors of rough information.
In this talk, we show how the above ideas guided us toward implementing the fully functional data warehouse product, with interfaces provided via integration with MySQL and internals based on the newest database trends. Thanks to compact, flexible rough information, we became especially competitive in the field of analytical data warehouses, where users want to query terabytes of data in a complex, dynamically changing way. Recently, we announced at www.infobright.org the open source edition of our data warehouse, ready for free usage and further extensions. In the talk, we illustrate the best scenarios of applying our software to various aspects of data processing. We also discuss the most promising directions for further improvement of our technology, with a special attention to the ideas based on the theory of rough sets and corresponding techniques.
Dr. Srinivasan's Talk
Classification Challenges in Email Archiving
This talk focuses on the technology of Email Archiving and how it has changed the way emails and other such communications are being handled in corporations of the world. In today's world, email finds itself at the top of the preferred modes of communications list. Emails are being increasingly recognized as an acceptable form of evidence in legal disputes and are treated as the most important wealth of information in a company. Every year governments in countries like Unites States of America introduce new laws governing the usage and handling of emails in the corporate worlds. In addition, industry standards are being made that require companies to retain their email communications as a mandatory requirement. These laws and regulations impose huge fines on corporations that fail to comply with these rules. In addition to the fines, any loss in email data makes the corporations susceptible to law suits and losses. To adhere to all the laws and regulations and also to make use of the wealth of information in emails companies have requirement for retention of email data, search and discovery and surveillance of this data. The Email Archiving technology has made it possible for companies to meet these challenges. In this talk, we will elaborate on the key aspects and the challenges that are faced in making an Email Archiving solution. We will show how most of the challenges that are faced in this area are closely related to classification and matching of data and how better and advanced techniques need to be devised to improve the email archiving technology.