click here to return to the home page, logo image
NCSC-6461 Data Mining

Contributing Scholar - Joydeep Ghosh, The University of Texas at Austin

 

3 Semester Credit Hours

 

Course Description

 

The information explosion of the past few years has us drowning in data but often starved of knowledge. Many companies that gather huge amounts of electronic data have now begun applying data mining techniques to their data warehouses and other data repositories to discover and extract pieces of information useful for making smart business decisions. Effective data mining, as opposed to data dredging, requires an understanding of concepts from exploratory data analysis, pattern recognition, machine learning, heterogeneous data bases, parallel processing and data visualization, in addition to knowing the problem domain.

 

This course will focus on basic techniques for data mining, including methods useful for analyzing information from the World Wide Web. While studying techniques for database representation/modeling, clustering, classification, finding associations and sequence processing, emphasis will be placed on the issues of algorithm scalability, performance, interpretability and the ability to deal with garbage data. Some demos using the public domain JAVA package (WEKA) will be given. The course involves a midterm exam and a term project. There will be no final exam.

 

Prerequisities

 

  • An understanding of basic concepts in probability/statistics and in linear algebra is assumed.
  • Knowledge of databases and of Java is helpful but not necessary.
  • General prerequisite: Students must have the knowledge resulting from completing all coursework in the curriculum for a BS degree in Computer Science from a regionally-accredited institution in the United States or the equivalent from a foreign institution; performance level in this coursework should be equivalent to a cumulative undergraduate GPA of 2.9 or better on 4.0 scale.

     

    Course Objectives

     

    • Understand the process of data mining and the key steps involved well enough to lead/manage a real-life data mining project
    • Know the basics of data warehousing and how it facilitates data mining
    • For each of the major types of data mining procedures (data exploration/pre-processing, clustering, association rules, classification and predictive modeling), know:
      • What are the main approaches/algorithms for this procedure
      • What are the pros/cons of these approaches and any assumptions underling their successful application
      • How can one quantitatively evaluate and compare different solutions for this procedure
    • Understand some fundamental issues in statistical data analysis that cut across all procedures, such as generalization to other data, basic tradeoffs, and validity of models. This should enable the student to judge where data mining could be gainfully applied as well as identify situations where data mining is not appropriate.
    • Get an overview of how web data, including (hyper)text, link information and web-logs can be mined.
    • Get an exposure to a few cutting edge data mining techniques through a special topics section.

     

    Technical Requirements

     

    There are no additional software or application requirements for this course. You will be required to have Windows Media Player to view the lectures. For the standard technical requirements, please go to the link below: http://www.waldenu.edu/c/Files/DocsGeneral/Getting_Started_Guide.pdf

     

    Textbooks

     

    Tan, P., Steinbach, M., and Kumar, V. Introduction to Data Mining. Upper Saddle River, NJ: Addison-Wesley, 2006. ISBN: 0-321-32136-7. Required. Witten, I. and Frank, E. Data Mining: Practical Machine Learning Tools and Techniques, Second Edition. San Francisco, CA: Morgan Kaufmann, 2005. ISBN: 0-12-088407-0. Required.  Hastie, T., Tibshirani, R., and Friedman, J.H. The Elements of Statistical Learning. New York, NY: Springer, 2001. Solid; stats oriented. Supplemental. Duda, R., Hart, P., and Stork, D. Pattern Classification, Second Edition. New York, NY: John Wiley & Sons, 2000. Solid again. Gives pattern recognition perspective. Supplemental. Han, J. and Kamber, M. Data Mining: Concepts and Techniques, Second Edition. San Francisco, CA: Morgan Kaufmann, 2006. Database oriented. Supplemental. Chakrabarti, S. Mining the Web. San Francisco, CA: Morgan Kaufmann, 2003. Focused on Web analytics. Supplemental.

     

    Disclaimer: The course syllabus may differ slightly from this.  Course descriptions will be provided in your online course. Textbook information is provided only to give more information about the course.  Do Not use this information to purchase a textbook.  Up-to-date information will be provided when you register.



  • Search


    Walden University is accredited by The Higher Learning Commission and a member of the North Central Association, www.ncahlc.org; 312-263-0456. © Copyright 2007 Walden University; Telephone: 800-925-3368