Les probabilités et la statistique dont des fondements méthodologiques essentiels de la science des données. Les besoins sociétaux d’optimisation et de sécurisation nécessitent une compréhension approfondie de systèmes complexes tels que des réseaux sociaux, des réseaux de télécommunication ou des systèmes informatiques. La masse sans cesse accrue de données disponibles favorise un traitement probabiliste des systèmes étudiés, incluant des garanties pratiques et théoriques concernant le comportement des estimateurs, des prédicteurs, des tests ou de toute autre procédure de décision statistique.

Ce thème de recherche est naturellement lié au thème “apprentissage” et partage avec lui un grand nombre d’applications, avec une intérêt particulier accordé à la modélisation et l’inférence plutôt qu’à la prédiction et à l’optimisation.

Les activités de recherche du thème probabilités et statistique incluent des aspects méthodologiques et théoriques dans divers domaines tels que

- Les
**processus stochastiques**: chaînes de Markov, séries temporelles, dépendance à longue distance, processus ponctuels, graphes aléatoires et hypergraphes. - La
**grande dimension**: modèles semi-paramétriques, régression parcimonieuse, analyse en dimension infinie. **Événements rares, valeurs extrêmes**:

Extrêmes multivariés et spatiaux, applications à la détection et au ranking d’anomalies, à la quantification des risques liés aux événements rares.**Quantification de l’incertitude**: bootstrap, méthodes de vraisemblance, processus empiriques, concentration, apprentissage par renforcement**Théorie de l’information**: interactions avec les problèmes d’estimation, inégalités de transport optimal et d’entropie.

** Activités**

- Séminaires: deux fois par mois à Télécom, le jeudi après-midi; séminaire parisien de statistique (SemStat) à l’Ihp
- groupes de travail/de lecture: sur proposition des participants.

** Chercheurs**:

Anne Sabourin, François Portier, Stephan Clémençon, François Roueff, Pascal Bianchi, Roland Badeau, Olivier Fercoq, Umut Simsekli, Laurent Decreusefond, Olivier Rioul, Eric Moulines, Patrice Bertail

**Contact**

Anne Sabourin, François Portier.

## Séminaires

- 09/11/2017 (amphi Rubis):

**2 pm: Romain Azais (INRIA Nancy)
**

**Titre** : Inference for conditioned Galton-Watson trees from their Harris path

**Abstract** : Tree-structured data naturally appear in various fields, particularly in biology where plants and blood vessels may be described by trees, but also in computer science because XML documents form a tree structure. This talk is devoted to the estimation of the relative scale of ordered trees that share the same layout. The theoretical study is achieved for the stochastic model of conditioned Galton-Watson trees. New estimators are introduced and their consistency is stated. A comparison is made with an existing approach of the literature. A simulation study shows the good behavior of our procedure on finite-sample sizes and from missing or noisy data. An application to the analysis of revisions of Wikipedia articles is also considered through real data. This is a joint work with Alexandre Genadot and Benoît Henry.

**3pm: Aymeric Dieuleveut (Sierra team)
**

**Titre**: Bridging the Gap between Constant Step Size Stochastic Gradient Descent and Markov Chains

**Abstract**: We consider the minimization of an objective function given access to unbiased estimates of its gradient through stochastic gradient descent (SGD) with constant step-size. While the detailed analysis was only performed for quadratic functions, we provide an explicit asymptotic expansion of the moments of the averaged SGD iterates that outlines the dependence on initial conditions, the effect of noise and the step-size, as well as the lack of convergence in the general (non-quadratic) case. For this analysis, we bring tools from Markov chain theory into the analysis of stochastic gradient and create new ones (similar but different from stochastic MCMC methods). We then show that Richardson-Romberg extrapolation may be used to get closer to the global optimum and we show empirical improvements of the new extrapolation scheme.

28/09/2017:

Rémi Gribonval (INRIA / Panama Team)

Titre: Compressive Statistical Learning with Random Feature Moments

Abstract: We describe a general framework -compressive statistical learning- for resource-efficient large-scale learning: the training collection is compressed in one pass into a low-dimensional sketch (a vector of random empirical generalized moments) that captures the information relevant to the considered learning task. A near-minimizer of the risk is computed from the sketch through the solution of a nonlinear least squares problem. We investigate sufficient sketch sizes to control the generalization error of this procedure. The framework is illustrated on compressive clustering, compressive Gaussian mixture Modeling with fixed known variance, and compressive PCA.

We provide theoretical guarantees to control the resulting generalization error, with sketch of size driven by an intrinsic measure of complexity of the learning task, which is independent of the volume of the training collection. Volume reductions of several orders of magnitude are demonstrated while preserving the overall learning quality. The framework is illustrated on MNIST digit clustering and large-scale speaker verification.

Joint work with Nicolas Keriven (Université de Rennes 1, France), Yann Traonmilin (Inria – Rennes, France) and Gilles Blanchard (Universität Potsdam, Germany).

Vincent Duval (INRIA / Mokaplanteam)

Title: A gridless method for super-resolution microscopy

Abstract: The TIRF (Total Internal Fluorescence Microscopy) technique is a method which is able to image secretion mechanisms (exocytosis) inside thin layers of cells (with typical depth 100nm to 500nm). By exciting fluorophores along different illumination angles, its variant Multi-Angle TIRF can retrieve depth information, providing one is able to invert a partial Laplace transform, a very ill-posed problem.

In this talk, I will discuss preliminary works on this topic. We shall see how a “LASSO for measures”, that is, a LASSO with continuous dictionary, is able to solve this problem in simple cases.

- 21/09/2017:

**2pm:**

Speaker: **Stéphane Robin** (Agro ParisTech)

Title : Detecting change-points in the structure of a network: Exact Bayesian inference

Abstract: We consider the problem of change-point detection in multivariate time-series. The multivariate distribution of the observations is supposed to follow a graphical model, whose graph and parameters are affected by abrupt changes throughout time. We demonstrate that it is possible to perform exact Bayesian inference whenever one considers a simple class of undirected graphs called spanning trees as possible structures. We are then able to integrate on the graph and segmentation spaces at the same time by combining classical dynamic programming with algebraic results pertaining to spanning trees. In particular, we show that quantities such as posterior distributions for change-points or posterior edge probabilities over time can efficiently be obtained. We illustrate our results on experimental data arising from biology and neuroscience.

Reference: Schwaller, L., \& Robin, S. (2017). Exact Bayesian inference for off-line change-point detection in tree-structured graphical models. Statistics and Computing, 27(5), 1331-1345.

The slides are here : https://drive.google.com/file/d/0B_eRF-Q5vYc_UkJSTUQ2b0xveEk/view?usp=sharing

- 14/09/2017 :

**14h**:

Speaker: **Johan Segers** (Université catholique de Louvain)

Title : Accelerating the convergence rate of Monte Carlo integration through ordinary least squares

Abstract: In numerical integration, control variates are commonly used to reduce the variance of the naive Monte Carlo method. The control functions can be viewed as explanatory variables in a linear regression model with the integrand as dependent variable. The control functions have a known mean vector and covariance matrix, and using this information or not yields a number of variations of the method. A specific variation arises when the control functions are centered and the integral is estimated as the intercept via the ordinary least squares estimator in the linear regression model. When the number of control functions is kept fixed, all these variations are asymptotically equivalent, with asymptotic variance equal to the variance of the error variable in the regression model. Nevertheless, the ordinary least squares estimator presents particular advantages: it is the only one that correctly integrates constant functions and the control functions. In addition, if the number of control functions grows to infinity with the number of Monte Carlo replicates, the ordinary least squares estimator converges at a faster rate than the Monte Carlo procedure, the integration error having a Gaussian limit whose variance can be estimated consistently by the residual variance in the regression model. An extensive simulation confirms the superior performance of the ordinary least squares Monte Carlo method for a variety of univariate and multivariate integrands and control functions.

Joint work with François Portier

**15h**:

Speaker: **Randal Douc** (Télécom SudParis)

Title: Posterior consistency for partially observed Markov models

Abstract: We establish the posterior consistency for a parametrized family of partially observed, fully dominated Markov models. The prior is assumed to assign positive probability to all neighborhoods of the true parameter, for a distance induced by the expected Kullback-Leibler divergence between the family members’ Markov transition densities. This assumption is easily checked in general. In addition, we show that the posterior consistency is implied by the consistency of the maximum likelihood estimator. The result is extended to possibly non-compact parameter spaces and non-stationary observations. Finally, we check our assumptions on a linear Gaussian model and a well-known stochastic volatility model.

Joint work with Francois Roueff and Jimmy Olson.

- 13/04/2017 :

14h : Rémi Bardenet (CNRS and University of Lille)

Title : Monte Carlo with determinantal point processes

Abstract:

In this talk, we show that using repulsive random variables, it is possible to build Monte Carlo methods that converge faster than vanilla Monte Carlo. More precisely, we build estimators of integrals, the variance of which decreases as $N^{-1-1/d}$, where $N$ is the number of integrand evaluations, and $d$ is the ambient dimension. To do so, we propose stochastic numerical quadratures involving determinantal point processes (DPPs) associated to multivariate orthogonal polynomials. The proposed method can be seen as a stochastic version of Gauss’ quadrature, where samples from a determinantal point process replace zeros of orthogonal polynomials. Furthermore, integration with DPPs is close in spirit to randomized quasi-Monte Carlo methods, leveraging repulsive point processes to ensure low discrepancy samples. (joint work with Adrien Hardy (Univ. Lille))

15h : Hong-Phuong Dang (University of Lille)

Title : Bayesian nonparametric approaches and dictionary learning for inverse problems in image processing

Abstract:

Dictionary learning for sparse representation has been widely advocated for solving inverse problems. Optimization methods and parametric approaches towards dictionary learning have been particularly explored. These methods meet some limitations, particularly related to the choice of parameters. In general, the dictionary size is fixed in advance, and sparsity or noise level may also be needed. In this thesis, we show how to perform jointly dictionary and parameter learning, with an emphasis on image processing. We propose and study the Indian Buffet Process for Dictionary Learning (IBP-DL) method, using a bayesian nonparametric approach. The proposed model for dictionary learning relies on a non-parametric prior named Indian Buffet, which permits to learn an adaptive size dictionary. The Monte-Carlo method for inference is detailed.

Noise and sparsity levels are also inferred, so that in practice no parameter tuning is required. Numerical experiments illustrate the performances of the approach in different settings: image denoising, inpainting and compressed sensing. Results are compared with state-of-the art methods is made. Matlab and C sources are available for sake of reproducibility.

- 06/04/2017 :

14h: Gwennaëlle Mabon

Title : Aggregation of Laguerre density estimators

Abstract :

We are interested in finding the best linear combination of $K$ estimators of a density in the convolution model on the nonnegative real line. We consider the model $ Z = X + Y$ with $X$, of unknown density $f$, independent of $Y$, when both random variables are nonnegative. Mabon (2017) has already provided a new estimation procedure in the convolution model under the assumption that the random variables are nonnegative. This work is based on projection estimators computed in Laguerre basis. This basis is defined up to a scale parameter. Yet this degree of freedom is not taken into account and the scale parameter is arbitrarily fixed. Thus we propose a linear aggregation procedure as an alternative to model selection for taking advantage of that unknown structurein the data. Following Rigollet and Tsybakov (2007), we prove sharp oracle inequality for the mean squared risk of the aggregate under mild conditions. This works relies mainly on the results developed in Mabon (2016).

15h: Jérémie Sublime

Titre : Collaborative Clustering and its Applications

Abstract :

Unsupervised frameworks involving several clustering algorithms working together to tackle difficult data sets are a recent area of research with a large number of new clustering paradigms such as multi-view clustering, clustering of distributed data, multi-expert clustering or multi-scale clustering analysis. Most of these frameworks can be regrouped under the umbrella of collaborative clustering, the aim of which is to reveal the common underlying structures found by the different algorithms while analyzing the data. The fundamental concept of collaboration is that clustering algorithms operate locally but collaborate by exchanging information about the local structures found by each algorithm.

While many difficulties remain, finely tuned collaborative framework several promising real applications such as remote sensing analysis and the multi-view analysis of complex data.

- 23/03/2017 :

14h: Julie Josse (polytechnique),

Title: Low-rank interaction log-linear model for contingency table analysis

Abstract:

Log-linear models are very popular tools for contingency table analysis and Poisson regression. They are particularly useful to model row and column effects as well as row-column interaction terms in two-way tables. We introduce a log-linear model with low-rank interaction which can incorporate side information such as row and column features. The estimator is defined through minimization of a negative Poisson quasi-log-likelihood penalized by the nuclear norm of the interaction matrix. We present algorithm based on the alternating direction method of multipliers. To propose a complete methodology to users, we suggest automatic selection of the regularization parameter. A Monte Carlo simulation reveals that our estimator is particularly well-suited to estimate the rank of the interaction in low signal to noise ratio regimes. Ecological data analysis illustrates that the results can be easily interpreted through biplot visualization.

15h: Balamurugan Palaniappan (LTCI)

Title: Stochastic Variance Reduction Methods for Saddle-point Optimization Problems.

Abstract:

Many convex optimization problems arising in machine learning can be represented as a weighted sum of a suitable regularizer term and a loss term. For typical applications like classification and regression, the related primal and dual optimization problems contain terms that may be decomposable over the training examples. This particular property is exploited by the widely popular primal stochastic gradient descent method and dual coordinate ascent method. In spite of their excellent generalization performance, both these methods lead to moderately accurate solutions. The recently popularized stochastic averaged gradient descent (SAG), its accelerated variant (SAGA), and the stochastic variance reduced gradient (SVRG) methods rectify this situation by providing highly accurate solutions. In this talk, I will discuss an extension of the SAG, SAGA and SVRG methods to a particular class of saddle-point optimization problems. I will first motivate the need for this extension using suitable examples. Since the extension does not reside in the favorable convex optimization framework, I will also present the challenges which need to be handled. The talk will conclude with some natural generalizations, a bit of theory and demonstration of the theoretical results using experiments.

- 16/03/2017 : Séries temporelles, prédiction, non-stationnarité (14h, C46)

14h: François Roueff (LTCI),

Introduction aux séries temporelles localement stationaires

15h: Tobias Kley (LSE)

Title: “Predictive, finite-sample model choice for time series under stationarity and non-stationarity” (joint work with Philip Preuß and Piotr Fryzlewicz)

Abstract: In statistical research there usually exists a choice between structurally simpler or more complex models. We argue that, even if a more complex, locally stationary time series model were true, then a simple, stationary time series model may be advantageous to work with under parameter uncertainty. We present a new model choice methodology, where one of two competing approaches is chosen based on its empirical finite-sample performance with respect to prediction. A rigorous, theoretical analysis of the procedure is provided. As an important side result we prove, for possibly diverging model order, that the localised Yule-Walker estimator is strongly, uniformly consistent under local stationarity. An R package, forecastSNSTS, is provided and used to apply the methodology to financial and meteorological data in empirical examples. We further provide an extensive simulation study and discuss when it is preferable to base forecasts on the more volatile time-varying estimates and when it is advantageous to forecast as if the data were from a stationary process, even though they might not be.

The preprint is available here: https://arxiv.org/abs/1611.04460

- 9/03/2017: Copules. (14h, Amphi Grenat)

Speaker: Jean David Fermanian (ENSAE)

Title: Conditional copulas and some tests of the “simplifying assumption ».

Abstract: We discuss non- and semiparametric estimation of conditional copulas when the conditioning variables have an influence on the underlying copula only through the underlying conditional marginal distributions. This assumption is key in a lot of high dimensional copula models, as vines. We propose several omnibus tests of the latter “simplifying assumption”. We introduce some new bootstrap schemes to evaluate the limiting behaviors of our test statistics, and their performances are assessed by simulation.

Speaker: Olivier Lopez (LSTA, Paris 6)

Title : Copula estimation under censoring and applications in actuarial sciences

Abstract : In this talk, we introduce the use of copula theory for some problems in insurance involving censored observations. Censoring is a classical issue when it comes to analyzing duration variables. We propose new parametric copula estimators in frameworks motivated by applications in life and non-life insurance.