Course form:

Computer lab workshops.



Course description:

The participants will learn statistical methods useful in the analysis of unstructured data, as well as the application capabilities of the SAS Enterprise Miner and SAS Text Miner in economic research. Coursework covers exploring correlations, dependency patterns and trends in a variety of collections of text documents through the use of SAS analytics software.



Course outline:


1.    Data Mining, Text Mining, Web Mining. Introduction to SAS Text Miner.


2.    Preprocessing. Text parsing. Textual data decomposition.


3.    Quantitative representation of a documents collection. Frequency weight and term weight.


4.    Dimension reduction of frequency matrix. Roll-up and SVD methods.


5.    Clustering and visualization of text data. Concept linking tree graph.


6.    Classification. Predictive modeling and forecasting.


7.    SAS Text Miner and other SAS Enterprise Miner package tools.




Berry M. W., Kogan J., Text Mining: Applications and Theory, Wiley

Clark A., Fox C., Lappin S., The Handbook of Computational Linguistics and Natural Language Processing, Wiley-Blackwell

Feldman R., Sanger J., The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data, Cambridge University Press

Jurafsky D., Martin J. H., Speech and Language Processing, Pearson Prentice Hall

Liu B., Sentiment Analysis and Opinion Mining, Morgan & Claypool Publishers

Markov Z., Larose D. T., Eksploracja zasobów internetowych. Analiza struktury, zawartośœci i użytkowania sieci WWW, PWN



End of Course Assessment:

Design, construction and performance of an advanced model of textual data analysis.