14h – 15h : Jean-Bernard Salomond (CEREMADE, Univ. Dauphine)

Titre : Sharp conditions for posterior contraction in the Sparse normal means problem Abstract: The first Bayesian results for the sparse normal means problem were proven for spike-and-slab priors. However, these priors [...]]]>
**Thursday November 5, 2015, room C47**

**14h – 15h** :** Jean-Bernard Salomond** (CEREMADE, Univ. Dauphine)

**Titre** : Sharp conditions for posterior contraction in the Sparse normal means problem **Abstract:** The first Bayesian results for the sparse normal means problem were proven for spike-and-slab priors. However, these priors are less convenient from a computational point of view. In the meanwhile, a large number of continuous shrinkage priors has been proposed. Many of these shrinkage priors can be written as a scale mixture of normals, which makes them particularly easy to implement. We propose sharp general conditions on the prior on the local variance in scale mixtures of normals, such that posterior contraction at the minimax rate is assured. The conditions require tails at least as heavy as Laplace, but not too heavy, and a large amount of mass around zero relative to the tails, more so as the sparsity increases. These conditions give some general guidelines for choosing a shrinkage prior for estimation under a nearly black sparsity assumption. We verify these conditions for Horseshoe type class of priors which includes the horseshoe and the normal-exponential gamma priors, and for the horseshoe+, the inverse-Gaussian prior, the normal-gamma prior, and the spike-and-slab Lasso, and thus extend the number of shrinkage priors which are known to lead to posterior contraction at the minimax estimation rate. (Joint work with Stéphanie van der Pas and Johannes Schmidt-Heiber)

**15h -15h30** : **Moussab Djerrab** (LTCI, Telecom ParisTech)

**Titre** : Structured prediction with operator-valued kernels **Abstract:** Operator valued Kernel are gaining momentum in the scientific community. We propose to use this framework to solve prediction problems for structured data. We are currently working on a chain of work for modelling the data and predicting it. In this talk, I will present my exploratory work on theses topics with a focus on the future developments.

14h – 15h : Albert Thomas (doctorant, LTCI, Telecom ParisTech)

Titre : Calibration of One-Class SVM for Minimum Volume set estimation” Abstract: A general approach for anomaly detection or novelty detection consists in estimating high density regions or Minimum Volume (MV) sets. The [...]]]>
**Jeudi 29 Octobre 2015, salle C48**

**14h – 15h** : **Albert Thomas** (doctorant, LTCI, Telecom ParisTech)

**Titre** : Calibration of One-Class SVM for Minimum Volume set estimation” **Abstract:** A general approach for anomaly detection or novelty detection consists in estimating high density regions or Minimum Volume (MV) sets. The One-Class Support Vector Machine (OCSVM) is a state-of-the-art algorithm for estimating such regions from high dimensional data. Yet it suffers from practical limitations. When applied to a limited number of samples it can lead to poor performance even when picking the best hyperparameters. Moreover the solution of OCSVM is very sensitive to the selection of hyperparameters which makes it hard to optimize in an unsupervised setting. After briefly introducing the context of anomaly detection and MV sets estimation I will present a new approach to estimate MV sets using the OCSVM. Such an approach makes it possible to tune the hyperparameters automatically and experimental results show that it outperforms the standard OCSVM.(Joint work with Vincent Feuillard and Alexandre Gramfort)

**Titre** : Integral approximation by kernel smoothing **Abstract: **We introduce a new stochastic procedure for integral approximation. Given a real valued function and some points randomly distributed over a compact set of the Euclidean space, the algorithm returns an accurate approximation of the integral of the function over the compact set. The main ingredient of the method is the evaluation of the classical kernel estimator associated to the points. This quantity captures the isolation of the points. In a theoretical part, we give bounds on the rate of convergence and we describe the limiting distribution. Then we debate the choice of the bandwidth for the kernel estimator and we highlight the good behavior of the procedure through simulations. (Joint work with R. Azais and B. Delyon)

http://postes.smai.emath.fr/2014/AUTRES/PJ/RechercheES.TelecomParistech.pdf

]]>http://postes.smai.emath.fr/2014/AUTRES/PJ/RechercheES.TelecomParistech.pdf

]]>First talk:

Title: Ranking Ordinal Data: Optimality and Pairwise Aggregation

Abstract: In this talk, we describe key insights in order to grasp the nature of K-partite ranking. From [...]]]>

First talk:

Title: Ranking Ordinal Data: Optimality and Pairwise Aggregation

Abstract: In this talk, we describe key insights in order to grasp the nature of K-partite ranking. From the theoretical side, the various characterizations of optimal elements are fully described, as well as the « likelihood ratio monotonicity » condition on the underlying distribution which guarantees that such elements do exist. Then, a pairwise aggregation procedure based on Kentall tau is introduced to relate learning rules dedicated to bipartite ranking and solutions of the K-partite ranking problem. Criteria reflecting ranking performance under these conditions such as the ROC surface and its natural summary, the volume under the ROC surface, are then considered as targets for empirical optimization. The consistency of pairwise aggregation strategies are studied under these criteria and shown to be efficient under reasonable assumptions. Eventually, numerical results illustrate the relevance of the methodology proposed.

Second Talk:

Title: An Empirical Comparison of V-fold Penalisation and Cross Validation for

Model Selection in Distribution-Free Regression

Abstract: Model selection is a crucial issue in machine-learning and a wide

variety of penalisation methods (with possibly data dependent complexity

penalties) have recently been introduced for this purpose. However their

empirical performance is generally not well documented in the

literature. It is the goal of this paper to investigate to which extent

such recent techniques can be successfully used for the tuning of both

the regularisation and kernel parameters in support vector regression

(SVR) and the complexity measure in regression trees (CART). This task

is traditionally solved via V-fold cross-validation (VFCV), which gives

efficient results for a reasonable computational cost. A disadvantage

however of VFCV is that the procedure is known to provide an

asymptotically suboptimal risk estimate as the number of examples tends

to infinity. Recently, a penalisation procedure called V-fold

penalisation has been proposed to improve on VFCV, supported by

theoretical arguments. Here we report on an extensive set of experiments

comparing V-fold penalisation and VFCV for SVR/CART calibration on

several benchmark datasets. We highlight cases in which VFCV and V-fold

penalisation provide poor estimates of the risk respectively and

introduce a modified penalisation technique to reduce the estimation error.

Abstract: We consider the problem of combining a (possibly uncountably infinite) set of affine estimators in non-parametric regression model with heteroscedastic Gaussian noise. Focusing on the exponentially weighted aggregate, [...]]]>

Titre: Optimal aggregation of affine estimators

Abstract: We consider the problem of

combining a (possibly uncountably infinite) set of affine estimators

in non-parametric regression model with heteroscedastic Gaussian

noise. Focusing on the exponentially weighted aggregate, we prove a

PAC-Bayesian type inequality that leads to sharp oracle inequalities

in discrete but also in continuous settings. The framework is general

enough to cover the combinations of various procedures such as least

square regression, kernel ridge regression, shrinking estimators and

many other estimators used in the literature on statistical inverse

problems. This is a joint work with Arnak Dalalyan.

Title: A GLRT based framework for the resolvability of two targets in a MIMO context

Abstract: During the last decade, multiple-input multiple-ouput systems have received an increasing interest. One can ﬁnd several estimation [...]]]>

Title: A GLRT based framework for the resolvability of two targets in a

MIMO context

Abstract:

During the last decade, multiple-input multiple-ouput systems have received

an increasing interest. One can ﬁnd several estimation schemes in the

literature related to the direction of arrivals and/or direction of

departures, but their ultimate performance in terms of the statistical

resolution limit (SRL) have not been fully investigated. In this

presentation we focus on the SRL to resolve two closely spaced sources in

clutter interference using a MIMO system with widely separated antennas.

Toward this end, we use a hypothesis test formulation based on the

generalized likelihood ratio test (GLRT).

Furthermore, we investigate the link between the SRL and the minimum

signal-to-noise

ratio (SNR) required to resolve two closely spaced sources for a given

probability of false alarm and for a given probability of detection.

Finally, theoretical and numerical analysis of the SRL are presented for

several scenarios (with/without clutter interference, known/unknown

parameters of interest and known/unknown noise variance)

Title: Sparse Prediction, Matrix Completion and First-Order Optimization

Abstract: I will derive a novel regularization method that corresponds to the tightest convex relaxation of sparsity combined with an L2 penalty. I will show that [...]]]>

Title: Sparse Prediction, Matrix Completion and First-Order Optimization

Abstract: I will derive a novel regularization method that corresponds to the tightest convex relaxation of sparsity combined with an L2 penalty.

I will show that this new method provides a tighter such relaxation than the elastic net and will propose using it as a replacement for the Lasso

or the elastic net in sparse prediction problems. In addition, this regularization problem can be solved with an accelerated first-order optimization

method and thus can scale to large data sets. In the second part, I will present a new optimization algorithm for minimizing a convex objective which

decomposes into three parts: a smooth part, a simple non-smooth Lipschitz part, and a simple nonsmooth non-Lipschitz part. Our algorithm combines the

methodology of forward-backward splitting, smoothing, and accelerated proximal methods. As a corollary of our convergence results, this algorithm removes

the boundedness assumption required by Nesterov’s smoothing methods. I will also show empirical results on the learning problems of max-norm regularized

matrix completion and clustering, robust PCA and sparse inverse covariance selection.

Joint work with R. Foygel (University of Chicago), N. Srebro (TTI Chicago) and F. Orabona (TTI Chicago).

]]>Transportation and Machine Learning

Transportation distances were first popularized in machine learning and computer vision in the late 90′s under the name of earth mover’s distances. In most applications, transportation distances have been shown – when suitably parameterized – [...]]]>

Transportation and Machine Learning

Transportation distances were first popularized in machine learning and computer vision in the late 90′s under the name of earth mover’s distances. In most applications, transportation distances have been shown – when suitably parameterized – to outperform standard metrics for histograms, such as total variation, Chi-square or Hellinger metrics. We start this talk by recalling this classical body of work and follow by presenting two recent works. (1) We present a metric learning framework for transportation distances and introduce the first algorithm (2011) designed to learn the parameters of transportation distances using exclusively labeled examples. (2) Although transportation distances cannot be embedded in Hilbert spaces, we show that the main mathematical ingredient of transportation distances – the polytope of transportation matrices – can be used to define a positive definite kernel for histograms through its generating function. We conclude this talk by describing open problems and future research directions on this subject.

]]>Title: :

Machine Learning: The bridge connecting Information, Game and Learning Theory.

Abstract:

In this presentation, strong connections between learning theory, game theory and information theory are highlighted in the context of multi-agent learning. First, we explore the [...]]]>

Title: :

Machine Learning:

The bridge connecting Information, Game and Learning Theory.

Abstract:

In this presentation, strong connections between learning theory, game theory and information theory are highlighted in the context of multi-agent learning. First, we explore the perspective of reinforcement learning aiming to model an agent that simultaneously learns (or estimates) its achieved performance and learns its optimal adaptation strategy. Here, optimality is interpreted considering the aim of agents to optimize its own performance by self-adapting to the environment. In this context, tools from information theory, mainly the notion of maximum entropy, are used to obtain the optimal strategy given the current estimations of the agent. In particular, we show how the notion of bounded rationality naturally pops up from this scenario. Following this reasoning, we show that if an equilibrium (stable state) is achieved by these kind of agents, it corresponds to a logit equilibrium, a well known notion of epsilon-Nash equilibrium in game theory. Finally, we show how the notion of simultaneous learning of both performance and adaptation strategy contributes to the solution of long-standing problems in learning theory such as the existence of cycles in the learning dynamics, noisy observations and equilibrium selection. Examples using wireless communication scenarios are presented to motivate such ideas and verify our claims.

Bio:

Samir M. Perlaza received his B.Sc degree from Universidad del Cauca, Colombia, in 2005 and his M.Sc and Ph.D degrees from École Nationale Supérieure des Télécommunications (Telecom ParisTech), Paris, France, in 2008 and 2011 respectively. During his M.Sc from 2006 to 2008 at Institute Eurécom, he was a recipient of the Alban scholarship (European Union Programme of High Level Scholarships for Latin America). During his Ph.D, he was sponsored by France Télécom (Orange Labs, Paris, France), where he also held a position as a research engineer. In 2011, he visited the Alcatel Lucent chair in Flexible radio (Gif-sur-Yvette, France). Currently, he is a post-doctoral fellow at the department of electrical engineering at Princeton University. His research interests lie in the overlap of signal processing, information theory, machine learning and game theory in wireless communications. He is the recipient of the Crowncom’09 best student paper award.

]]>The deadline for applying should be around September 15 2012. A description of the position is available on the

TSI department open positions page . ]]>