http://videolectures.net/bark08_ghahramani_samlbb/
bark08_ghahramani_samlbb_01
From
Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?
author:
Zoubin Ghahramani, Department of Engineering, University of Cambridge
published: Oct. 9, 2008, recorded: September 2008, views: 2789
- Why be Bayesian?
- we want to represent the strength of those beliefs numerical in the brain of any artifical agent
- The Cox Axioms (Desiderata)
- The Dutch Book Theorem (?? don’t really understand the example)
- * Asymptotic Certainty
- Asymptotic consensus
- Bayesian Occam’s Razor and Model Comparison
- Potential advantages
- tries to be coherent and honest about uncertainty
- easy to do model comparison, selection
- rational process for model building and adding domain knowledge
- easy to handle missing and hidden data
- Where does the prior come from?
- Objective Priors:
- Non informative priors that attempt to capture ingnorance and have good frequentist properties.( not very good and helpful)
- Subjective Priors:
- Priors should capture our beliefs as well as possible . They are subjective but not arbitrary.
- Hierarchical Priors:
- multiple levels of priors:
- (parameters and hypoparameters)
- Empirical Priors:
- Learn some of the parameters of the prior from the data
- Two views of machine learning
- The Black Box View: (general and user don’t need to think too much)
- The Case study view: ( really try to understand the problem)
- Bayesian Black Boxes?–(where is the prior??)
- Parametric vs Non-parametric Models
- Parametric models: model-based, have a finite fixed number of parameter {\theta}
- Non-para: allow the number of “parameters” to grow with the data set size, (also: memory-base learning) (e.g. kernel density estimation)
- example: Infinite mixture models.
- Is non-parametric the only way to go?
- When do we really believe our parametric model?
- But, when do we really believe or non-parametric model?
- Is a non-parametric model (e.g. a DPM) really better than a large parametric model( e.g. a mixture of 100 components)?
- The approximate inference Conundrum
- All interesting models are intractable.
- So we use approximate inference (MCMC, VB, EP , etc).
- Since we often can’t control the effect of using approximate inference, are coherence arguments meaningless
- Is Subjective Bayesianism pointless?
- Reconciling Bayesian and Frequentist Views
- Frequentist theory tends to focus on sampling properties of estimators, i.e. what would have happened had we observed other data sets from sets from our model. Also look at minimax performance of methods – i.e. what is the worst case performance if the environmnet is adversarial. Frequentist methods oftern optimize some penalized cost function.
- How do we do these integrals?
————————————————————–
* ELI5: frequentist and Bayesian:
http://stats.stackexchange.com/questions/22/bayesian-and-frequentist-reasoning-in-plain-english
Here is how I would explain the basic difference to my grandma:
I have misplaced my phone somewhere in the home. I can use the phone locator on the base of the instrument to locate the phone and when I press the phone locator the phone starts beeping.
Problem: Which area of my home should I search?
Frequentist Reasoning:
I can hear the phone beeping. I also have a mental model which helps me identify the area from which the sound is coming from. Therefore, upon hearing the beep, I infer the area of my home I must search to locate the phone.
Bayesian Reasoning:
I can hear the phone beeping. Now, apart from a mental model which helps me identify the area from which the sound is coming from, I also know the locations where I have misplaced the phone in the past. So, I combine my inferences using the beeps and my prior information about the locations I have misplaced the phone in the past to identify an area I must search to locate the phone.