# Knowledge Technology

## Basic Concepts

• Data Measurements
• Information Processed data, patterns that are satisfied for given data
• Knowledge Information interpretted with respect to a user’s context to extend human understanding in a given area.
• Concrete Tasks Mechanically processing data to an unambiguous solution; Limited contribution to human understanding
• Knowledge tasks Data is unreliable or the outcome is ill-defined; Computers mediate between user and the data, where context is critical; Enhance human understanding

## Document representation

• Structured data Conforms to a schema
• Semi-Structured data Conforms in part to a schema

## String processing

### Regular expression

• * : Zero or more
• \? : Zero or one
• + : One or more

They are greedy

• {m,n} : Between m and n inclusively
• [0-9] = \d
• [a-zA-Z0-9] = \w
• [\ \t\r\n\f] = \s
• [^0-9] = \D
• [^a-zA-Z0-9] = \W
• [^\ \t\r\n\f] = \S

Placing a pattern in parentheses leads to the match being stored as a var

• \n : nth var

## Similarity (of text documents)

### Terminologies

• $f_d$ number of terms in document d
• $f_{d,t}$ Freq of term t in document d (TF)
• $f_{ave}$ The average number of terms contained in a document
• $N$ Number of documents
• $f_t$ Number of documents that contains t

## Calculate Evaluation Matrics for Classifier

• Look at the confusion matrix, the diagonal elements are correctly classified item counts.
• Accuracy is calculated once per classification, summing up diagonal divided by total item count
• Precision and Recall are calculated once per class. The sum along actual except the diagonal element of a class is FN(because they are incorrect, and negative). Sum along classified except diagonal ones is FP because they are incorrect, and should be positive
• Then use the formula to calculate the god damn values

## Calculate user-based/item-based recommendation system

