Summary of Videos from Intro to AI (first half)
There have been 2 main types of language models:
1. Probabilistic model:
- deals with the surface words, and sometimes letters
– (letter models are good when we are going to be dealing with sequences of unique words)
- actual data of what we see, rather than some underlying model
- learned from the data
- the probability of a sequence of words
2. Tree/Abstract structures (logical):
- boolean; the sentence is either in the sequence or it is not
- a set of sentences, set defines the language
- based on abstraction (trees, categories–like noun phrase, verb phrase, etc)
- primarily hand-coded (linguistically applied rules)
- rules are not hard cut, can use probabilistic models within logical, but not traditional
Assumptions within Probabilistic models:
1. Markov Assumption: effect of one variable on another will be local; if looking at the nth word, the ones that are relevant are the ones that have occurred recently, not a long time ago.
2. Stationarity Assumption: the probability of each variable is the same; probability distribution over the first word is the same as the probability distribution over the nth word. If you keep saying sentences, the words in that sentence are going to depend on the surrounding words in the sentence, but are independent of the other sentences.
3. Smoothing: if we are going to learn these probabilities from counts, we go out into the world, observe the data, and figure out the probabilities of given words, we are going to find that a lot of probabilities come out to zero or smaller; smoothing is a technique that fixes this.
**All follow the idea of a probabilistic model over sequences
Using Language for Learning
Language has a powerful connection to the real world. We can learn a lot from looking at how language is used. From looking at the patterns of when words are used we can learn a lot about culture. An example, the word”ice cream,” we can see that “ice cream” is used most during the summer, and less during the winter, which will tell us the trend of when ice cream is popular.
Language models can also be used for: classification (e.g. spam): by scanning through an email and searching for a specific word, we can find out if an email is spam or if it is important; clustering (e.g. news stories); input correction (e.g. spelling, segmentation), looking through the sequence of letters to see if they make an English word, and offering suggestions for a word similarly spelled to the one tried by the user; sentiment analysis (e.g. product reviews), we can tell whether someone liked a product or not; information retrieval (e.g. web search), search the web for websites containing these specific words; question answering (e.g. IBM’s Watson); machine translation (e.g. English to Arabic); speech recognition (e.g. Apple’s Siri); and driving a can autonomously.
Unigram: single words
Bigrams: 2 words, second word is locally consistent with previous word
Trigrams: 3 words some complete sentences; fairly consistent
N Grams: 4 words; make the most sense
N Grams can also be used for single words, and instead of breaking down the sentence into words, we break down the word into letters. Unigrams, bigrams, and trigrams are all the same concept except one would look at the number of letters per word, rather than the number of words per sentence.
More next week on this…