arrow Home arrow News arrow PREMIA Web Site arrow A Short Course on Data Mining Techniques and Applications
Search
PREMIA
Home
News
Events and Activities
Downloads
Web Links
Link to PREMIA!
Call for Papers
Job Openings Job Openings
Jobs
Membership
Member's Area
Registration with ROS
Society Constitution
Online Registration Guide
IAPR Newsletter
PREMIA Newsletter
Contact The Committee Board
Advertisement

A Short Course on Data Mining Techniques and Applications PDF Print E-mail
Wednesday, 23 March 2011

A Short Course on Data Mining Techniques and Applications

Organized by Pattern Recognition and Machine Intelligence Association (PREMIA)

Date: 10 & 11 May 2011 (Tue & Wed), 9.00 am – 5.30 pm

Venue: Seminar Room 3/Programming Lab 2, School of Computing, Computing Drive, NUS, Singapore 117417

This course comprises a tutorial on data mining techniques, three invited talks by data mining practitioners on challenging applications of data mining, and a hands-on session on a data mining software.

Registration Fees
Members of PREMIA: S$450.00
Non-members: S$550.00
Student members of PREMIA: S$200.00 (Limited seats)
Student non-members: S$250.00 (Limited seats)

The registration fee includes course notes, refreshments, and a one-year free PREMIA membership subscription. For non-members, entrance fee to PREMIA membership is waived.

Registration Procedure
Please register online from  Events and Activities -> Short Course on Data Mining Techniques and Applications.  Make your cheque payable to PREMIA  and send it to PREMIA’s treasurer Dr. Lu Shijian as follows:  Dr. Lu Shijian, Institute for Infocomm Research,
1 Fusionopolis Way, #21-01 Connexis, South Tower, Singapore 138632.

If you wish to do online fund transfer, please email Dr. Lu Shijian (slu@i2r.a-star.edu.sg) for PREMIA account information.


Registration will close on 6 May 2011. However, due to the lab space limitation for hands-on, the registration may close before the deadline if the class limit is reached. Please register early to avoid disappointment.  If your organization needs an invoice for the course fee registration, or if you encounter any problem during registration, please contact Miss Zhang Xiaoyan at
ZHAN0292@e.ntu.edu.sg

PREMIA reserves its right to cancel the course due to circumstances beyond its control.

Day  1 (10 May 2011, Tuesday): Tutorial on Data Mining Techniques

Venue: Seminar Room 3, Level 2, Com 1, School of Computing


The tutorial will cover an introduction to analytics, 'what is it?' 'what is the motivation and the need for such a domain?'. Illustrating from real-world examples a high level mapping as to how analytics can provide in depth insight and enhance the decision making process.  We will further proceed to introduce the concept of data mining, going into detailed description of various methodologies (such as decision trees and regression methods) as well as advanced elements such as feature selection etc, making the connection with the previously highlighted problems.

9:00 am – 10:30 am              Intro to Analytics (What is analytics? What is the motivation from a real-world perspective?)

10:30 am – 11.00 am           Tea Break

11:00 am – 1:00 pm             Intro to Machine Learning (Linear Discriminant Learning, Perceptron Algorithm, SVM and Kernel Methods and non linearity)

1:00 pm – 2:00 pm               Lunch (on your own)

2:00 pm – 3:30 pm               Data Mining methodologies (Goals of data mining, classification, association analysis, cluster analysis, anomaly detection)

3:30 pm – 4:00 pm               Tea Break

4.00 pm – 5:30 pm               Features (feature selection including hyperspectral data analysis, and how they relate to real-world problems, dimensionality reduction, subspace representation);  Moving from research into practice.

Biodata: Dr. David R. Hardoon is Principal, Analytics at SAS Singapore. His areas of expertise include, but are not limited to, data mining, information retrieval, knowledge discovery, pattern recognition and machine learning. These have been applied across a wide cross-disciplinary scope including problems/applications in music, medical analysis, retail, time sequence analysis, aerospace, taxonomy, content based information retrieval, vision and finance.He received a B.Sc. in Computer Science and Artificial Intelligence with first class honors at Royal Holloway, University of London within the Department of Computer Science in 2002 and a PhD in Computer Science in the field of Machine Learning from the University of Southampton in the Information: Signals, Images, Systems research group in 2006. He has also received the PhD PASCAL label award for his active participation in the PASCL Network of Excellence. He is currently an Adjunct Assistant Professor at the School of Computing, National University of Singapore, a Honorary Senior Research Associate at the Centre for Computational Statistics & Machine Learning, University College London and is also a visiting Research Fellow at Institute of Psychiatry, King’s College London. Dr. Hardoon is also a member of the Pattern Analysis, Statistical Modeling and Computational Learning (PASCAL) Network of Excellence, and a board member of the Pattern Recognition and Machine Intelligence Association (PREMIA) Singapore.

Day 2 (11 May 2011, Wednesday): Invited Talks on Data Mining Applications and Hands-on Session

Venue: Programming Lab 2, Basement, COM1, School of Computing

9:00 am – 10:10 am              Semantic Technologies (by  Dr. Kanagasabai Rajarman)

10:10 am – 10:40 am           Tea Break

10:40 am – 11:50 am           Privacy Data Mining (by Dr. Han Shuguo)

11:50 am – 1:00 pm             Statistical Tools for Data Mining (by Dr. Feng Mengling)

1:00 pm – 2:00 pm               Lunch (on your own)

2:00 pm – 3:30 pm               Data Mining Hands-on Workshop, Part I (by Mr. Jason Loh)

3:30 pm – 4:00 pm               Tea Break

4:00 pm – 5:30 pm               Data Mining Hands-on Workshop, Part II (by Mr. Jason Loh)


Talks’ Synopses and Speakers’ Biodata

1.       Semantic Technologies (by Dr. Kanagasabai Rajarman)

Semantic technology, a key part of Web 3.0, is already changing the way we organize, manage and structure information and data, revolutionizing traditional IT practices and solutions. This talk will introduce semantic technologies and discuss foundations such as ontologies and semantic modeling & querying using W3C recommended standards. A technical introduction to RDF, OWL and SPARQL will be provided together with illustrative examples. State-of-the-art case studies from variety of domains such as Advertising, Marketing, and Healthcare will be presented. Also popular tools will be discussed to get the participants started on exploring real life applications of semantic technologies.

 

Biodata:  Dr. Rajaraman Kanagasabai is currently a Principal Investigator at the Data Mining Department, Institute for Infocomm Research (I2R), Singapore, and leads theSemantic Technology Group. He has widely published in top peer-reviewed journals and conferences, and served in the Programme Committees of many international conferences. He has also chaired or co-chaired several international events related to Semantic technologies and Analytics. He was part of the core research team behind the multiple-award winning iAgent - the first multilingual search engine, WebWatch - the key technology behind the successful startup BuzzCity (www.buzzcity.com), and the KnowleSuite technology that has been spunoff as Knorex (www.knorex.com). He was also the leader of the team that won the Tan Kah Kee Young Inventor’s

 

2.       Privacy Data Mining (by Dr. Han Shuguo)

Personal patient health records are one of the most sensitive types of private data. To prevent the misuse of data, some laws and regulations have been proposed to prohibit companies or groups from sharing their data, such as the U.S. healthcare laws and the 1996 administrative simplification provisions in HIPAA. On the other hand, data mining over health records is vital for medical, pharmaceutical, and environmental research. For instance, one may wish to study the effect of a certain gene on an adverse reaction to a certain drug. However, due to privacy concerns, the DNA sequences and the medical histories may be stored at different data repositories and cannot be collected together. So, how to enable the hospitals/researchers to conduct the desired data mining algorithms on those data without even “seeing” the original data becomes an interesting but challenging research topic.  Privacy-preserving data mining/publishing were proposed to address the problem. This talk will present the introduction and various common techniques of privacy-preserving data mining/publishing, and discuss possible applications in the healthcare and other domains.

Biodata: Dr. Han Shuguo has been with the Data Mining Department of Institute for Infocomm Research (I2R), A*STAR after his graduation. He received his Bachelors degree (with honors) and Ph.D. degree from School of Computer Engineering, Nanyang Technological University, Singapore, in year 2005 and year 2010 respectively. His research interests include privacy/security issues in data mining/publishing/sharing, machine learning and cryptography technologies.  He has published various conference/journal papers, including ACM SIGKDD, SIAM SDM, IEEE TKDE, PAKDD, and IEEE ICDE. Currently, he is working as a key member on privacy/security issues of several industry projects collaborated with companies (e.g., NBCUniversal). He is a member of IEEE Computer Society, ACM, and SIAM.

3.       Statistical Tools for Data Mining (by Dr. Feng Mengling)

The objective of data mining is to discover useful and meaningful patterns hidden in the data. Statistical tools (measurements and tests) are often necessary to assess the “usefulness” and “meaningfulness” of patterns. In the literature, the commonly used statistical tools include t-test, Chi-2 test, fisher’s exact test, etc. All the statistical tools are developed and can be applied under certain assumptions and constrains. However, many researchers in the data mining community tend to apply the statistical tools without careful examination of the validness of the underlying assumption. This may often lead to false discoveries and invalid conclusions. Therefore, in this talk, we will introduce the commonly used statistical tools for data mining along with their underlying assumptions and constrains. We will also review the common mistakes while applying these tools. Moreover, alternatives and necessary corrections will also be suggested.

Biodata: Dr. Feng Mengling is currently working in the Data Mining Department of Institute for Infocomm Research (I2R), A*STAR. He is involved in a wide spectrum of research projects across the fields of bio-imaging, bioinformatics, medical data analysis, continuous time-series analysis, data mining for business strategies and fundamental data mining. Dr. Feng has studied his bachelor in Nanyang Technological University majoring in Electrical & Electronic Engineering. He then found his real interest in knowledge discovery and data mining during his PhD study under the supervision of Prof. Wong Limsoon and Prof. Tan Yap-Peng. Dr. Feng’s current research focus is on biostatistics, data mining for business intelligence and medical time-series analysis.

4.       Data Mining Hands-on Workshop (by Mr. Jason Loh)

This is a three hour hands-on workshop. Participants will work through a practical session of a data mining problem, using a Data Mining Software.  

       The following is the outline of the workshop.

  • ·         Overview of Data mining

o   Basic concept of data mining

o   Applications in various industries (government/ banking/ telco/ commercial)

  • ·         Hands on workshop – Example: “who should the bank offer loans to?”

o   Data access

o   Data exploration/ visualization

o   Data preparation/ partitioning

o   Modeling

§  Decision trees – what? / how? / why?

§  Regression – what? / how? / why?

o   Comparing models

o   Scoring new cases


Biodata:  Mr. Jason Loh has more than 9 years of experience in the area of data mining & analytics solutions from leading vendors SAS and SPSS (IBM). He is currently working with SAS AP as a Product Manager for analytical products with a focus in text analytics, supporting banks/ governments/ telcos/ and other organizations across 14 countries in the Asia pacific.  He specializes in solving a wide range of business challenges by employing the use of analytics - visioning, design, execution and management of a good number of successful projects for organizations in various industries working in both consulting & user environments, Jason is effectively translating enterprise or research data into actionable insights with to solve a range of objectives from CRM, reducing churn, targeting high propensity up sell/ cross-sell opportunities and sentiments analysis, to promoting voluntary policy compliance and detecting fraud.  Jason graduated from Monash University, AU with a double degree – Bach. Business/ Commerce, Bach. Computing. During his work in SPSS/ SAS – he has provided numerous workshops, training, talks, etc every year to analysts and management audience across industries.