We just finished another exiting SuRI event. In case you want to see what you missed or recall what happened, check out the schedule of our 2010 program below. We thank all participants, and in particular our two keynote speakers, for their memorable talks and hope to see many of you back in Lausanne in the near future.
To watch a video of one of the two keynotes, please click on the corresponding picture below.
Dr. Barbara Liskov (MIT) | Dr. Butler Lampson (Microsoft) |
The property of global injectivity is often essential for the identifiability of parameters in mathematical models, and for showing uniqueness of equilibria for high-dimensional nonlinear dynamical systems. On the other hand, there are no comprehensive computational tools for checking the global injectivity of general nonlinear function, especially if they depend on several variables and contain several unknown parameters. In particular, mathematical models of biochemical reaction networks give rise to dynamical systems that are usually high dimensional, nonlinear, and have many unknown parameters. Nevertheless, we show that it is often possible to use reaction network properties to conclude global injectivity of the associated vector field. Moreover, some of these criteria for global injectivity hold for very wide classes of nonlinear functions, even if these function are not related to any biochemical network. We also explain how these criteria are related to other problems, such as the Jacobian conjecture in algebraic geometry and the Bezier self-intersection problem in computer graphics. Prof. Craciun's homepage
In the last decade fractals from an area of interest restricted for theoreticians and artistically-oriented members of academic communities came to many spectacular applications. This lecture presents an overview of basic geometric concepts of fractal geometry and space-filling-curves. Further a number of examples of very advanced applications in microelectronic circuits will be presented. These applications show enhancement of properties of microelectronic circuits through construction of new geometric structures e.g. in biosensors, fractal electrodes, fractal antenna, solar cells and super-capacitors, etc. Properties of geometric structures exhibiting unusual geometries indicate their applicability in ever growing miniaturization (nano-structures) and frequency of operation (TeraHz range). Fractal theories apart from bringing ideas to non-conventional geometric constructions provide also useful tools for analysis of micro-system operation in the extreme conditions in the nano and tera range when standard macroscopic physical laws and circuit-theoretic approaches (such as eg. the Ohm's law) are not applicable any more. Prof. Ogorzalek's homepage
I will describe recent methods that allows the efficient extraction of information in very large networks. I will first show how a simple generalization of PageRank can be used to compute similarities between graph vertices. I will then describe an algorithm for detecting communities in very large networks and will show the surprising results obtained with this algorithm on a country-wide mobile phone database of more than one billion phone calls. Prof. Blondel's homepage
Presence, broadly defined as an event publish-notification infrastructure for converged applications, has emerged as a key mechanism for collecting and disseminating context attributes for next-generation services in both enterprise and provider domains. Current presence-based solutions and products lack in the ability to a) support flexible user-defined queries over dynamic presence data and b) derive composite presence from multiple provider domains. Accordingly, current uses of context are limited to individual domains/organizations and do not provide a programmable mechanism for rapid creation of context-aware services.
This work presents a large-scale presence virtualization and federation platform that (a) federates across heterogeneous presence domains and thereby enables applications to exploit cross-domain contextual data; (b) provides a programmatic interface for aggregating presence data from various sources and for composing base presence information into abstract, functionally richer entities for enabling applications. An underlying design consideration is to leverage capabilities of protocols that are being widely deployed today.
This is a joint research project between IBM T.J. Watson Research Center and IBM Research India. The platform has undergone customer trials.
Time permitting, In the later part of the talk, I plan to cover our work in the related area of real-time social networking, and specifically talk about R-U-In? (pronounced Are you in?) - an activity-oriented social networking prototype we have developed. Dr. Chakraborty's homepage
Interruptions are ubiquitous in everyday life. This talk will first review research on how often they occur, what effects they have and what types exist. Then, solutions for managing interruptions in human-computer interaction will be discussed before presenting the author’s own work on how people manage interruptions of collaborative tasks through conversation. The talk will end with a discussion of the challenges for modelling and managing constraints of parallel collaborative activities. Prof. Bangerter's homepage
Abstraction is at the center of much work in Computer Science. It encompasses finding the right interface for a system as well as finding an effective design for a system implementation. Furthermore, abstraction is the basis for program construction, allowing programs to be built in a modular fashion. This talk will discuss how the abstraction mechanisms we use today came to be, how they are supported in programming languages, and some possible areas for future research. Prof. Liskov's homepage
For over five years, we have been investigating the reliability of the software and hardware systems that store data. In almost every case, we have found serious design flaws, implementation problems, and software bugs, many of which can lead to system crashes, data loss, or silent data corruption.
In this talk, I will present an overview of our research, highlighting the aforementioned problems. I will discuss why such problems seem to repeatedly occur and thus are fundamental to the current way in which we build software. I will then describe some techniques we have eveloped to build more reliable storage systems in the future.
A sub-theme of my talk will focus on an issue of broader interest to younger researchers: where our ideas came from. Thus, in presenting the work, I will not only discuss the core technical material but also the thinking, reasoning, chance, and dumb luck that led to some of our ideas. Prof. Arpaci-Dusseau's homepage
The Cloud abstracts away infrastructural complexity for the benefit of tenants. But to tenants' detriment, it can also abstract away vital security information. In this talk, I discuss several protocols for remote testing of cloud storage integrity and robustness. Executing these protocols without detailed infrastructural knowledge or trust in cloud providers, clients or auditors can: (1) Verify the integrity of full files without downloading them; (2) Distribute files across cloud providers and ensure strong robustness with periodic, inexpensive checks (in a cloud analog to RAID); and (3) Test whether files are resilient to drive crashes. Joint work with Kevin Bowers, Marten van Dijk, Alina Oprea, and Ron Rivest. Dr Juels' homepage
As use of location-based services (LBSs) is becoming increasingly prevalent, mobile users are more and more enticed to reveal their location, which may be exploited by attackers to infer the points of interest (POIs) the users visit, then compromise their privacy. To protect a user's location privacy, we have been developing a new approach based on unobservability, preventing the attackers from associating any particular POI to the user's location. Specifically, we designed, implemented, and evaluated a privacy-protection system, called the Location Information ScrAmbler (LISA), that adjusts the location noise level in order to remove or significantly weaken the distinguishability of POIs the user may visit. By protecting location privacy locally on each mobile user's device, LISA eliminates the reliance on trusted third-party servers required by most previous approaches, avoiding the vulnerability of a single point of failure and facilitating the deployment of LBSs. Moreover, since energy-efficiency is the most critical requirement for battery-powered mobile devices, LISA explores the trade-off between location noise/privacy and energy consumption to achieve both privacy-protection and energy-efficiency. This is joint work with two of my graduate students, Jerry Chen and Xin Hu. Prof. Shin's homepage
The distinction between lazy and eager (or strict) evaluation has been studied in programming languages since Algol 60’s call by name, as a way to avoid unnecessary work and to deal gracefully with infinite structures such as streams. It is deeply integrated in some languages, notably Haskell, and can be simulated in many languages by wrapping a lazy expression in a lambda.
Less well studied is the role of laziness, and its opposite, speculation, in computer systems, both hardware and software. A wide range of techniques can be understood as applications of these two ideas.
Laziness is the idea behind:
* Redo logging for maintaining persistent state and replicated state machines: the log represents the current state, but it is evaluated only after a failure or to bring a replica online * Copy-on-write schemes * Clipping regions and expose events in graphics and window systems * “Infinity” and “Not a number” results of floating point operations * Futures (in programming) and out of order execution (in CPUs), which launch a computation but are lazy about consuming the result * “Formatting operators” in text editors, which apply properties such as “italic” to large regions of text by attaching a sequence of functions that compute the properties; the functions are not evaluated until the text needs to be displayed
Speculation is the idea behind:
* Optimistic concurrency control in databases, and more recently in transactional memory * Prefetching in memory and file systems * Branch prediction, and speculative execution in general in modern CPUs * Exponential backoff schemes for scheduling a resource, most notably in LANs such as WiFi or classical Ethernet * All forms of caching, which speculate that it’s worth filling up some memory with data in the hope that it will be used again
In both cases it is usual to insist that the laziness or speculation is strictly a matter of scheduling that doesn’t affect the result of a computation but only improves the performance. Sometimes, however, the spec is weakened, for example in eventual consistency. I will discuss many of these examples in detail and examine what they have in common, how they differ, and what factors govern the effectiveness of laziness and speculation in computer systems. Dr. Lampson's homepage
Multivariate Cryptography has been proposed as an alternative to classical public-key schemes such as ElGamal and RSA. One of the most appealing feature these schemes is that the difficult problem on which they are based is difficult even for quantum computers. However, differential cryptanalysis has been used with success on some of these schemes. In this talk, we will discuss some of these attacks and new algorithms for solving the key recovery problem. Prof. Fouque's homepage
Recent advances in geometry processing have resulted in a wide range of acquisition and modeling tools. Easy accessibility of such tools have lead to large online repositories for 3D polygonal models of shapes, digitally acquired from physical objects or modeled from scratch. However, such polygonal representations do not make specific the characteristics and invariants of the underlying shapes. In this talk, I will describe shape analysis techniques, specially for detecting object symmetry, i.e., invariance under the action of certain transformations. Such self-similarity is often related to form, function, utility and aesthetics. As we enter the age of easily accessible 3D geometry, analyzing their global properties and invariants becomes important. I will describe a simple and powerful algorithm for detecting partial and approximate symmetries in 3D shapes, and its generalization to detect regularity among the extracted symmetry transformations. In the later part of the talk, I will describe how such high-level shape invariants, extracted in the analysis phase, can be readily used in a range of applications including shape completion, smart geometry editing, motion visualization, shape abstraction, which are otherwise challenging to perform. Prof. Mitra's homepage
File and storage systems contain design flaws, implementation problems, and software bugs that can lead to system crashes, data loss, and silent data corruption. In this talk, I describe two specific contributions of our group at Wisconsin in designing and building more reliable file and storage systems.
First, I describe SQCK, a new file system checker. File system checkers are necessary in order to fix problems that may occur in file system images. However, existing checkers (such as e2fsck) are overly complex and fail in significant ways. The key contribution of SQCK is that it is based on a declarative query language, which is a natural match for the cross-checking that must be performed across the many structures of a file system image. Thus, SQCK is able to perform more useful checks and repairs than existing checkers with surprisingly elegant and compact queries.
Second, I describe the I/O Shepherd. The main contribution of the I/O shepherd is to make the reliability policy of a file system a first-class concern. With the I/O shepherd, the reliability policy of the file system (e.g., retry, parity, mirrors, checksums, and/or sanity checks) can be cleanly specified and encapsulated. We again show that with the right abstraction, even complex policies can be specified in relatively few lines of code.
Database systems rely heavily on indexes in order to achieve good performance. Selecting the appropriate indexes is a difficult optimization problem, and modern database systems are equipped with automated methods that recommend indexes based on some type of workload analysis. Unfortunately, current methods either require advanced knowledge of the database workload, or force the administrator to relinquish control of which indices are created. This talk will summarize our recent work in semi-automatic index tuning, a novel index recommendation technique that addresses the shortcomings of previous methods. Semi-automatic tuning leverages techniques from online optimization, which allows us to prove strong bounds on the quality of its recommendations. The experimental results show that semi-automatic tuning outperforms previous methods by a large margin, offering index recommendations that achieve close to optimal savings in workload evaluation time. Prof. Polyzotis' homepage
2010 heralds the 30th anniversary of the OECD data privacy guidelines, the US Fair Information Practices (FIP) and the 15th anniversary of the EC data protection Directive 95/46/EC underlying today’s privacy practices. These fundamental documents pre-date the web and the internet. Location Based Services, Social Networking, Online Shopping are introducing new privacy concerns for potential improper use of consumer data. Surveys have shown that every group of internet users has privacy concerns. Consumer advocacy and regulatory stakeholders are anticipating the advent of privacy enabling technology (PET) innovations that will facilitate consumers’ retaking control of their personal data. The talk will present some of the leading hot topics and industry trends prioritising these PET developments.
Petabytes of data about human movements, transactions, and communication patterns are continuously being generated by everyday technologies such as mobile phones and credit cards. This unprecedented volume of information facilitates a novel set of research questions applicable to a wide range of development issues. In collaboration with the mobile phone, internet, and credit card industries, my colleagues and I are aggregating and analyzing behavioral data from over 250 million people from North and South America, Europe, Asia and Africa. I will discuss a selection of projects arising from these collaborations that involve inferring behavioral dynamics on a broad spectrum of scales; from risky behavior in a group of MIT freshman to population-level behavioral signatures, including cholera outbreaks in Rwanda and wealth in the UK. Access to the movement patterns of the majority of mobile phones in East Africa also facilitates realistic models of disease transmission as well as slum formations. This vast volume of data requires new analytical tools - we are developing a range of large-scale network analysis and machine learning algorithms that we hope will provide deeper insight into human behavior. However, ultimately our goal is to determine how we can use these insights to actively improve the lives of the billions of people who generate this data and the societies in which they live. Prof. Eagle's homepage
We present a group-theoretic approach to producing fast algorithms for matrix multiplication. In this framework, one devises algorithms by constructing non-abelian groups with certain properties. The algorithms themselves are natural and make use of the discrete Fourier transform over these groups.
We construct several families of groups that achieve matrix multiplication exponent significantly less than 3 (but not better than the current best bound, 2.376...). This leads to two appealing conjectures, one combinatorial and the other algebraic. Either one would imply that the exponent of matrix multiplication is 2.
This is joint work with Henry Cohn, Bobby Kleinberg, and Balazs Szegedy. Prof. Umans' homepage
Recent years, have witnessed an explosive demand for cellular wireless data -- a trend that is not abating. Cisco, for example, estimates that wireless data may grow by as much as 38 fold by 2013. Meeting this demand in a cost effective manner presents a major challenge for cellular operators and wireless engineers.
One promising new technology for deploying radically cheaper networks is femtocells, which are small (typically < 1W) base stations installed in the customer's premises. With their smaller form factor, self-configuration capabilities, and ability to leverage the customer's backhaul, femtocells can be deployed at a fraction of the cost of traditional macrocellular networks.
This talk will discuss a number of the technical challenges concerning femtocells, with a focus on problems of interference coordination. Devices in mixed femto / macro deployments often experience strong interference conditions not well-handled by traditional cellular power control. We offer alternative interference coordination methods based on subband scheduling -- a new technique available in 4G cellular systems such as LTE.
We will also see that the mathematics of interference coordination can be seen as a type of distributed optimization with linear mixing. Time permitting, I will discuss connections to graphical models and belief propagation and applications to other problems including neural mapping and compressed sensing. Prof. Rangan's homepage
I will talk about the use of multiresolution techniques and frames in the area of biomedical imaging. Experiments on diverse data sets showed that multiresolution techniques and frame in particular have great promise in classification of biomedical images. This prompted us to to pursue two lines of work: trying to explain in a mathematically rigorous way why frame perform so well in practice, as well as design new families of frames tailored to such applications. Prof. Kovacevic's homepage
Parallel channels appear in wide range of communication system applications, in addition to optics and electromagnetic propagation. Two main issues arising when dealing with parallel channels is, first, how to identify the number of available parallel channels, and second, how to exploit the existence of such channels in an optimum way. The first question involves finding the so-called Degrees of Freedom of the channel. This number, for example, corresponds to the number of sub-carriers in an OFDM system, the rank of channel matrix for a MIMO system, the number of independent time slots in a fast fading scenario, and the number of independent routes between source and destination in a communication network. After identifying the number of parallel channels, the second question has been traditionally addressed by the Waterfilling algorithm which optimizes the transmission rate over different sub-channels for a given total transmission power. However, the solution is not trivial when other practical factors such as delay come into consideration. Another important question also arises when diversity and multiplexing trade-offs are taken into account. Extending such ideas to multiuser scenarios has also been addressed through schemes such as Interference Alignment. In this talk, after revisiting the main concepts behind parallel channels, we will discuss new ideas related to this topic and finally, present some open problems in this field. Prof. Khalaj's homepage
This talk investigates the behavior and the control of discrete dynamic systems made of distributed objects evolving in a common environment. We build an information hierarchy, from blind to clairvoyant information schemes on one hand, and from distributed local information to central, total information on the other. This hierarchy has a counterpart for the optimization procedures, from static to adaptative policies on one side and from distributed selfish policies to social optimal policies on the other.
We show that when the number of objects becomes large, the optimal cost of the system converges to the optimal cost of a mean field limit system that is deterministic. Convergence also holds for optimal policies. This implies that the value of information disappears when the number of objects goes to infinity so that the hierarchy constructed above collapses.
This framework is illustrated by a brokering problem in grid computing. We provide several comparisons in the blind versus clairvoyant cases as well as bounds on the price of anarchy (resulting from a lack of central information) and show how this vanishes when the size of the system grows.
Several simulations with growing numbers of processors will also be reported. They compare the performance of the optimal policy of the limit system used in the finite case, with classical policies by measuring its asymptotic gain. Prof. Gaujal's homepage
The field of anonymous communications has received considerable interest in the past 10 years, with systems such as Mixminion and Tor fielded and used. At the same time traffic analysis of anonymity systems, the science of extracting information about who is talking to whom, has made significant progress. In this talk we present our latest result on how to cast most current traffic analysis attacks in the framework of Bayesian inference, and how we apply tools from probabilistic modelling and sampling to evaluate anonymity. We will be reformulating long term de-anonymization attacks, classic mix systems as well as pass-the-parcel Crowds in terms of this paradigm, and look at the road ahead when it comes to analysing anonymity. Dr. Danezis' homepage
Technology projections indicate the possibility of 50 billion transistors on a chip in a few years, with the processor landscape being dominated by multi-core designs. Developing correct and reliable software that takes advantage of the multiple cores to deliver high performance, while at the same time ensuring performance isolation, remains a growing challenge.
In this talk, I will begin by describing our changes to existing coherence mechanisms in order to control data sharing --- in particular, to monitor memory, isolate data modifications, support fine-grain protection, and improve the efficiency of fine-grain sharing.
I will discuss the utility of the mechanisms for a range of applications and show how monitoring and isolation can be combined to support one parallel programming construct --- transactional memory.
The prevalence of multi-core processors also implies more ubiquitous resource sharing at several levels. Controlled resource sharing is essential in order to provide performance isolation, dictating the need for resource-aware policies within the operating system. As time permits, I will also describe our efforts in combining resource utilization prediction, resource control mechanisms, and resource-aware policies in order to effect performance isolation at the operating system level and improve multi-core resource utilization. Prof. Dwarkadas' homepage
In the last few years, we have been witnessing an evergrowing need for continuous observation and monitoring applications. This need is driven by recent technological advances that have made streaming applications possible, and by the fact that analysts in various domains have realized the value that such applications can provide. In this presentation, we describe a general framework for enabling complex applications over data streams. This framework is based on efficiently computing an approximation of multi-dimensional distributions of streaming data, and is amenable to distributed operation, thus, making better use of the available resources. We demonstrate the use of the proposed framework for the diverse problems of deviation detection, and detection and tracking of homogeneous regions. Finally, we briefly discuss future research directions in this general area. Prof. Palpanas' homepage
The goal of deriving good upper bounds for the capacities of real networks under general demands has, for decades, remained a seemingly unattainable objective. This talk explores a new road map for building such computational tools and for assessing the accuracy of the resulting bounds. Prof. Effros' homepage
High-Speed Data-Oriented donwlink has been included in current 3G wireless standards in various similar forms (e.g., 1xEv-Do for CDMA2000 and HSDPA for 3GPP-WCDMA). These schemes make use of opportunistic scheduling that take advantage of instantaneous channel state information in order to allocate the user to the time-frequency slots with the most favorable channel conditions, subject to some ``fairness'' criterion. For delay-tolerant data traffic, opportunistic scheduling provides a significant throughput improvement with respect to conventional CDMA or TDMA/FDMA with fixed round-robin channel assignment. In the pursuit of even higher data rates, the next generation of wireless systems will make widespread use of multiuser multiple antennas (MU-MIMO) techniques. In this talk, we review the principles of MU-MIMO dowlink schemes (MIMO Gaussian broadcast channel), we discuss the challenges related to providing reliable channel state information at the transmitter(s), and to the presence of unpredictable inter-cell interference due to independent uncoordinated operations in neighboring cells, and provide a comprehensive and general framework to solve the opportunistic downlink scheduling problem that takes into account these non-ideal conditions. The proposed approach builds on a general stochastic network optimization framework based on Liapunov queue stability and Liapunov drift. Effects such as non-perfect channel state information and unknown instantaneous interference power levels from adjacent cells are included, and computationally efficient near-optimal implementation are discussed. Finally, we will present an analysis method that combines results from large random matrices and convex optimization, and is able to characterize semi-analytically the system performance under opportunistic scheduling and arbitrary fairness criteria. Prof. Caire's homepage
Computer hardware and software can be modeled precisely in mathematical logic. If expressed appropriately, these models can be executable. An ``appropriate'' logic is an axiomatically formalized functional programming language. This allows models to be used as simulation engines or rapid prototypes. But because they are formal they can be manipulated by symbolic means: theorems can be proved about them, directly, with mechanical theorem provers. But how practical is this vision of machines reasoning about machines? In this highly personal talk, I will describe the 40 year history of the ``Boyer-Moore Project'' and discuss progress toward making formal verification practical. Among other examples I will describe important theorems about commercial microprocessor designs, including parts of processors by AMD, Motorola, IBM, Rockwell-Collins and others. Some of these microprocessor models execute at 90% the speed of C models and have had important functional properties verified. In addition, I will describe a model of the Java Virtual Machine, including class loading and bytecode verification and the proofs of theorems about JVM methods. Prof. J Strother Moore's homepage
Usual techniques in abstract interpretation apply "join" operations when control flows from several nodes. Unfortunately, these techniques introduce imprecision, resulting in not being able to prove desired properties. Modern SMT-solving techniques allow enumerating paths "on demand". We shall see how such path techniques may be combined with conventional polyhedral analysis, quantifier elimination, or with policy iteration, in order to obtain more precise invariants. This is joint work with Laure Gonnord (Université Lille I / LIFL) and Thomas Gawlitza (then CNRS / VERIMAG, now at INRIA). Prof. Monniaux's homepage
The Internet economy relies on the collection and aggregation of personal data on an ever-increasing scale. Information about our tastes, purchases, searches, browsing history, social relationships, health history, genetics, and so forth is shared with advertisers, marketers, and researchers, who use it to predict individual behavior and provide personalized product offerings, recommendations, and even clinical treatments. I will survey privacy issues caused by massive aggregation of personal information. After demonstrating that the existing methods for "anonymizing" the data fail to provide meaningful privacy protection, I will describe new approaches to privacy-preserving computation. This includes Airavat, a new system for large-scale data analysis which integrates mandatory access control and differential privacy. Prof. Shmatikov's homepage
We examine spectrum sensing in a situation of uncertain channel modeling. In particular, we assume that, besides additive noise, the observed signal contains an interference term whose probability distribution is unknown, and only its range and maximum power are known. We discuss the evaluation of the detector performance and its design in this situation. Although this talk specifically deals with the design of spectrum sensors, its scope is wider, as the applicability of its results extends to a general class of problems that may arise in the design of receivers whenever there is uncertainty about how to model the environment in which one is expected to operate. The theory expounded here allows one to determine the performance of a receiver, by combining the available (objective) probabilistic information with (subjective) information describing the designer's attitude. Prof. Biglieri's homepage
We describe work on modeling of industrial laboratories and the use of the models for optimization and control of management processes. The industrial laboratory is viewed as being analogous to a knowledge factory, where research is done at the first stage and is followed by development stages, which terminate with hand-off to the market or other organizations. Uncertainty is pervasive in this framework. There is randomness in the processing time at each stage. Also, stochastic processes govern the evolution of project values through the stages. Management is two-fold, project and investment. Option value is the methodology for project management. It relies on infrastructure support that enables information on project values to be collected after each stage. Such information allows calculations to be made of the option value, which is the expected terminal value of a project that takes into account future decisions. This allows mid-course decisions to be made on whether to allow projects to continue or be terminated, which occurs if its option value drops to zero. Investment optimization concerns allocation of given budgets to the resources in the various stages. Yet another important management mechanism that is considered is controlled access to resources and servers in the development stages. Projects with higher option values are given priority in accessing resources. For the special case of two stages, research and development, we consider a combined model of investment optimization in the presence of optimal project and resource management. For this case results from an asymptotic analysis provide insights on optimal management decisions. Finally, time permitting, we will descrbe preliminary results of an endogeneous growth model in a slow time scale. Throughout the talk we give numerical examples to illustrate concepts. Prof. Mitra's homepage
Mobile sensing applied to psychology research: Challenges and Opportunities The talk will describe how sensing has extended to mobile computing through the use of mobile phones to monitor everyday activities of users. I will describe how various sensors on the phone can be used and be combined to monitor activity and the various projects which have started to use the approach. In particular i will describe our work EmotionSense which allows the sensing of user behaviour and mood to help social psychology research. I will further illustrate how this could be integrated into mobile social network applications. Prof. Mascolo's homepage
The work establishes fundamental limits with respect to rate, reliability and computational complexity, for encoding-decoding in a general setting of outage-limited MIMO communications. In the high-SNR regime, the limits are optimized over all encoders, all decoders, and all complexity regulating policies. The work then proceeds to explicitly identify encoder-decoder designs and policies, that meet this optimal tradeoff. In practice, the limits aim to meaningfully quantify different pertinent measures, such as the optimal rate-reliability capabilities per unit complexity and power, the optimal diversity gains per complexity costs, or the optimal number of numerical operations (i.e., flops) per bit. Finally the tradeoff's simple nature, renders it useful for insightful comparison of the rate-reliability-complexity capabilities for different encoders-decoders. In the same setting we identify the first ever lattice decoding solution that achieves both a vanishing gap to the error-performance of the exact solution of lattice decoding, as well as a computational complexity that is subexponential in the rate and in the problem dimensionality. This complexity behavior is rigorously demonstrated here for the first time, where it is also proven that for most common codes, the complexity cost for asymptotically optimal regularized lattice (sphere) decoding is exponential in the rate, and in many cases it in fact matches the complexity cost of ML (bounded) sphere decoding. In light of the fact that, prior to this work, a vanishing gap was generally attributed only to full lattice searches of exponential complexity, whereas subexponential complexity was generally attributed to early-terminated (linear) solutions having a performance gap that can be up to exponential in dimension and/or rate, the work constitutes the first proof that subexponential complexity need not come at the cost of exponential reductions in lattice decoding error performance. Finally, the performance and complexity guarantees in this work hold for the most general MIMO setting, for all reasonable fading statistics, all channel dimensions, and all lattice codes. Prof. Elia's homepage
Schema matching is the task of providing correspondences between concepts describing the meaning of data in various heterogeneous, distributed data sources. Schema matching is one of the basic operations required by the process of data and schema integration, and thus has a great effect on its outcomes, whether these involve targeted content delivery, view integration, database integration, query rewriting over heterogeneous sources, duplicate data elimination, or automatic streamlining of workflow activities that involve heterogeneous data sources. Although schema matching research has been ongoing for over 25 years, only recently a realization has emerged that schema matchers are inherently uncertain. Since 2003, work on the uncertainty in schema matching has picked up, along with research on uncertainty in other areas of data management. This lecture presents the benefits of modelling schema matching as an uncertain process and shows a single unified framework for it. We also briefly cover two common methods that have been proposed to deal with uncertainty in schema matching, namely ensembles and top-K matchings. The talk is based on a recent manuscript, part of the "Synthesized Lectures on Data Management" by Morgan & Claypool. Prof. Gal's homepage
Data uncertainty is inherent in emerging applications such as pervasive computing, sensor networks, biological databases, and data integration. In this seminar, we will study some interesting issues that we have encountered during the development of the ORION, a system prototype developed to handle the storage and retrieval of a large amount of uncertain information. We will cover topics like uncertain data modeling, probabilistic query classification, query quality, and uncertain data indexing. Prof. Cheng's homepage
Logic is the study of the rules of valid reasoning. Its direction of development has always been inspired by its intended areas of application. When first codified by Aristotle it was applied to reasoning in philosophy which then included biology and other branches of science. In classical times it was applied by Euclid to land measurement (geometry); in medieval Europe it was applied by William of Occam and others to theology. Leibnitz hoped that logic could transform the art of mathematical proof to calculation as in the infinitesimal calculus. Logic was developed further in modern times to secure the foundations of mathematics protecting against inconsistency of its axioms and checking against errors and omissions in individual mathematical proofs. Most recently logic has been applied in the design of the hardware of binary digital computers and in the development of the software programs which make them usable. The consequential reduction in the cost of errors in both hardware and software is estimated in billions of dollars per year. The aim of this lecture is to show that Computer Science belongs to a great intellectual and academic tradition that stretches back into antiquity. And the future is equally exciting. Computers are now getting ever more proficient at logical and mathematical reasoning; and their capabilities are being widely exploited in the humanities in biology and in other branches of science.
Automatic inference of function summaries is the key to compositional verification of numerical programs. In the first part of this talk I will present an algorithm for verifying hierarchical non-recursive programs, based on an efficient computation of transitive closures for several classes of program loops: difference bounds, octagonal and finite monoid affine relations.
On the theoretical side, this provides a common solution to the acceleration problem, for all these three classes of relations. In practice, according to our experiments, the new method performs up to four orders of magnitude better than the previous ones for the first two classes.
In the second part of the talk, I explain how transitive closure computation combines with other methods (quantifier elimination, abstraction, etc.) in order to provide a framework for the analysis of such systems. These ideas led us to the implementation of the FLATA tool.
Joint work with Marius Bozga and Filip Konecny.
In addition to being correct, a system should be robust, that is, it should react reasonably even to unexpected inputs. In this talk, I will summarize two formal notions of robustness that we have introduced previously for reactive systems. One of the notions assigns cost to failures based on a user-provided notion of an incorrect transition in a specification. Here, we define a system to be robust if a finite number of incorrect inputs does not lead to an infinite number of incorrect outputs. We also give a more refined notion of robustness that aims to minimize the ratio of output failures to input failures. The second notion is aimed at liveness. In contrast to the previous notion, it has no concept of recovery from an error. Instead, it compares the ratio of the number of liveness constraints that the system violates to the number of liveness constraints that the environment violates. Dr Jobstmann's homepage
SAT is a problem of satisfiability in propositional logic, and the satisfiability modulo theory (SMT) is a problem of satisfiability with respect to a background theory (e.g., linear arithmetic, theory of arrays) expressed in classical first-order logic with equality. Over the last decade, SAT and SMT solvers made tremendous advances both in theoretical and in practical aspects. They have a wide range of applications, primarily in software verification, planning, scheduling, cryptanalysis, electronic design automation, etc. Typically, problems are reduced to SAT/SMT by problem-specific, ad-hoc tools. In this talk, a general framework, URSA Major, for reducing problems to SAT/SMT will be presented. The system can be also considered as a general modelling system for specifying and solving a wide class of problems (e.g., NP problems). The specification language is C-like, but based on a novel imperative-declarative programming paradigm. A problem is specified by a test (expressed in an imperative form) that given values indeed make a solution to the problem. From a problem specification, a corresponding propositional or first-order formula (in a suitable theory) is generated and passed to a SAT/SMT solver. If the formula is satisfiable, then its model is transformed back to the values of the unknowns. The system will be illustrated by examples from several domains. Prof. Janicic's homepage
Boolean Satisfiability (SAT) is the problem of checking if a propositional logic formula can ever evaluate to true. This problem has long enjoyed a special status in computer science. On the theoretical side, it was the first problem to be classified as being NP-complete. On the practical side, SAT manifests itself in several important application domains such as the design and verification of hardware and software systems, as well as applications in artificial intelligence. Thus, there is strong motivation to develop practically useful SAT solvers. However, the NP-completeness is cause for pessimism, since it seems unlikely that we will be able to scale the solutions to large practical instances. While attempts to develop practically useful SAT solvers have persisted for almost half a century, for the longest time it was a largely academic exercise with little hope of seeing practical use. Fortunately, several relatively recent research developments have enabled us to tackle instances with even millions of variables and constraints - enabling SAT solvers to be effectively deployed in practical applications. The problem is even (erroneously?) considered to be significantly tamed. This talk will explore this transition from theoretical hardness to practical success, including a description of the key ideas used in modern SAT solvers and future research directions. Prof. Malik's homepage
George Boole was a pioneer of the algebraic presentation of logic. I will suggest algebra as a simple way of presenting and reasoning about the semantics of broad families of programming languages. It neatly characterises the similarities and differences between members of the family, and the choices that are made in design of new languages. Denotational, operational, and deductive semantic presentations can be readily derived from algebraic axioms by algebraic reasoning. When a programming language is extended (for example, by introduction of non-determinism and concurrency), all the earlier and simpler derivations and theorems remain valid. Prof. Hoare's homepage
Software model checking has become popular due to the success of predicate abstraction and refinement techniques in finding API usage bugs in programs. In this talk, I will describe an alternate approach we have taken for finding memory safety bugs (such as null pointer dereferences, array buffer overflows, string errors, memory leaks), and violations of API usage and user-specified assertions in C programs.
The highlights of our approach are:
- accurate memory modeling, including an analysis-friendly representation
- powerful static analyses based on abstract interpretation, to simplify models and find relatively easy proofs of correctness
- efficient symbolic model checking, to verify the unresolved properties and generate concrete error traces.
To improve scalability on large software projects, we utilize a DC2 (Depth Cutoff with Design Constraints) framework. DC2 creates manageable partitions based on scope bounding, and uses automated specification inference techniques to derive environment constraints for partitions and stubs for missing code. Furthermore, it can automatically suggest refinements of the environment contraints based on counterexamples generated by a model checker, if a user deems a reported witness to be a false positive. We have implemented these ideas in the F-Soft verification platform, which has been used successfully to find bugs in many large open source as well as industry software projects. Dr Gupta's homepage
There is a semantic gap between the mathematics of control theory and the implementation of controllers on top of real-time operating systems. We show two examples ---stability analysis and controller-scheduler co-design--- of verification algorithms that perform an end-to-end analysis of control system implementations, translating the mathematical analysis of the control system to properties of the implementation. We also outline a theory of robustness for discrete systems, inspired by similar theories in robust control. Prof. Majumdar's homepage
Multithreaded code is getting increasingly important but remains extremely hard to develop and test. Most recent research on testing multithreaded code focuses solely on finding bugs in one given version of code. While there are many promising results, the tools are fairly slow because they explore a large number of thread schedules and do not exploit the fact that code evolves through several versions during development and maintenance. Our overarching goal is to improve (regression) testing of multithreaded code. This talk focuses on a novel approach to specifying and executing schedules for multithreaded tests. Traditionally, developers enforce a particular schedule with time delays, e.g., using Thread.sleep in Java. Unfortunately, this sleep-based approach can produce false positives or negatives, and can result in unnecessarily long testing time. We introduce a new language that allows specifying schedules as constraints on the events during test execution. We provide a tool that automatically controls the code to execute the specified schedule and a tool that helps developers to migrate their legacy, sleep-based tests into event-based tests in our language. The latter tool uses new techniques for inferring events and schedules from the executions of sleep-based tests. We describe our experience in migrating over 200 tests. The inference techniques have high precision and recall, of over 75%, and our approach reduces testing time compared to sleep-based tests, on average 3.4x. This is joint work with Milos Gligoric, Vilas Jagannath, Dongyun Jin, Qingzhou Luo, and Grigore Rosu. Prof. Marinov's homepage
Verification of multi-threaded programs has to grapple with the core problem of handling an explosive number of thread schedules. In this talk,I will describe three main approaches we have proposed to cope with this problem, spanning a range of static and dynamic analyses.
- The first is a static analysis framework for semantic reduction of thread interleavings, where we iteratively refine a transaction graph that captures the interleavings, by deriving sound invariants using abstract interpretation. For data race detection, this framework serves to reduce the number of false warnings captured by an initial lockset analysis, and enables the use of model checking on the remaining warnings to generate concrete error traces.
- The second approach combines dynamic testing with predictive analysis. Given a test execution of the multi-threaded program, we derive a symbolic predictive model to explore all (or some) alternative thread schedules over the same events, and use an SMT solver to find violations.
- The third approach is based on systematic testing, where we propose a coverage-based metric to choose which interleavings to explore during stateless model checking. Although unsound in principle, the metric works well in practice to catch many bugs, while significantly improving performance in comparison to other methods such as dynamic partial order reduction and preemptive context bounding. Dr Gupta's homepage
For some well-known games, such as the Traveler's Dilemma or the Centipede Game, traditional game-theoretic solution concepts -- most notably Nash equilibrium -- predict outcomes that are not consistent with empirical observations. We introduce a new solution concept, iterated regret minimization, which exhibits the same qualitative behavior as that observed in experiments in many games of interest, including Traveler's Dilemma, the Centipede Game, Nash bargaining, and Bertrand competition. As the name suggests, iterated regret minimization involves the iterated deletion of strategies that do not minimize regret. We provide an epistemic characterization of iterated regret minimization, show how it works in many settings. Prof Halpern's homepage
In this talk I shall first highlight the impact of surveillance botnet attacks and discuss the high level design of such botnets using a real-world attack as a case study. We will discuss the reasons behind their spectacular successes despite their centralized design -- a fundamental weakness that limits their scalability and robustness. Botnets designed primarily as vehicles for economic crime reached these limits a while back. As a consequence, they moved to more decentralized designs based on the use of structured overlay topologies. While this has allowed them to scale in vast numbers, it can also be used as a point of detection. In the second part of the talk, I shall present techniques (O(nlogn)) to localize botnet members based on the unique communication patterns arising from their overlay topologies used for command and control. Experimental results on synthetic topologies embedded within Internet traffic traces from an ISP's backbone network indicate that our techniques (i) can localize the majority of bots with low false positive rate, and (ii) are resilient to the partial visibility arising from partial deployment of monitoring systems, and measurement inaccuracies arising from partial visibility and dynamics of background traffic. Prof. Nagaraja's homepage
I will discuss a generalization of the Shannon Sampling Theorem that allows for reconstruction of signals in arbitrary bases. Not only can one reconstruct in arbitrary bases, but this can also be done in a completely stable way. When extra information is available, such as sparsity or compressibility of the signal in a particular bases, one may reduce the number of samples dramatically. This is done via Compressed Sensing techniques, however, the usual finite-dimensional framework is not sufficient. To overcome this obstacle I'll introduce the concept of Infinite Dimensional Compressed Sensing. Prof. Hansen's homepage
Message passing algorithms on graphical models have been highly successful in inference and optimization problems in a wide range of fields. This talks considers graphical models with linear mixing, which model large systems where the interactions between components are through aggregates of small, linearizable perturbations. For such systems, I present a new class of estimation and optimization algorithms called generalized approximate message passing (GAMP) based on Gaussian approximations of loopy belief propagation. GAMP is computationally simple and significantly more general than earlier AMP methods. Also, for the case of large, random graphs, a state evolution analysis provides sharp, easily characterizable asymptotics with verifiable conditions for optimality. The potential for the GAMP algorithm is far reaching, and in this talk, I will discuss applications to inter-cellular interference coordination (ICIC) in next-generation cellular wireless networks. Dynamic interference coordination has emerged as a central challenge as networks introduce femtocells and unplanned, heterogeneous deployments. I will show that the GAMP method provides a tractable method for distributed interference coordination that is computationally simple, has low communication overhead and can incorporate complex physical-layer mechanisms including beamforming, subband frequency allocation and dynamic orthogonalization. Joint work with Ritesh Madan, Qualcomm. Prof. Rangan's homepage
Advances in information technology enable the creation of new markets as social coordination mechanisms in numerous ways, including those where humans and computer algorithms interact. Because humans and computers differ in their capabilities to emit and process market signals, there is a need to understand what determines the aggregate level of signaling. We frame the arising signaling regime as alternative equilibrium outcomes in a coordination game. More specifically, we tackle the general research question from the perspective of new electronic credit markets. On online social lending platforms, borrowers' offers typically contain detailed personal information next to hard facts such as credit ratings. We investigate whether a change of the market mechanisms in the form of the introduction of an automated trading agent shifts the dynamics of information revelation from a high-effort signaling norm to a low-effort information equilibrium. After developing the theoretical background for our analysis, we scrutinize our hypothesis in the form of a natural experiment at Smava.de and find strong support for our proposition. Prof. Grossklags' homepage
In recent times, many generally short lived RFID authentication protocols have been proposed with only moderate success. The vast majority of RFID authentication protocols assume proximity between readers and tags, due to the limited range of the radio channel. However, in real scenarios an intruder can be located between the prover (tag) and the verifier (reader) and trick the latter into thinking that the former is in close proximity. This attack is generally known as a relay attack and can come in different shapes, including distance fraud, mafia fraud and terrorist attacks. Distance bounding protocols represent a promising countermeasure to hinder relay attacks, and some interesting proposals have grown lately. Another very exciting problem, introduced by Juels in 2004, is how to show that two tags have been simultaneously scanned. This could potentially have numerous real life applications. He called this kind of evidence a yoking-proof, and it is expected to be verifiable offline. Then, some authors suggested the generalization of the proof for a larger number of tags. This is another very promising and still quite unexplored research field. Prof. Peris-Lopez' homepage
A mechanism is a rule that determines allocation of a resource and payment across individual users. The theory of mechanism design looks into how to design mechanisms that have good properties with respect to * Truthfulness * Revenue * Social welfare What makes the area extremely important and of much research interest are numerous applications, including sponsored search and the allocation of network and computational resources. In this tutorial, I will focus on the revenue and social welfare properties of various standard mechanisms and give an update on the state-of-the-art research in the area. Dr Vojnovic's homepage
In this paper we propose nontrivial codes that achieve a non-zero zero-error rate for several odd-letter noisy-typewriter channels. Some of these codes (specifically those which are defined for a number of letters of the channel of the form 2^n+1) achieve the best-known lower bound on the zero-error capacity. We build the codes using linear codes over rings as we do not require the multiplicative inverse to build the codes. This is a joint work with Francisco Ruiz-Rodriguez. Prof. Perez-Cruz's homepage
In Shannon's original paper and textbooks in information theory, the entropy of a discrete random variable is assumed or shown to be a continuous function. However, we found that all Shannon's information measures including entropy and mutual information are discontinuous in the general case that the random variables take values in possibly countably infinite alphabets. This fundamental property explains why strong typicality and Fano's inequality can only be applied on finite alphabets. Note that strong typicality and Fano's inequality have wide applications in information theory so that it is important to extend them in full generality. In this talk, details about the discontinuity of Shannon's information measures will be given. We will show how these results lead to a new definition of typicality and an inequality tighter than Fano's inequality. Applications in network coding and information theoretic security will be discussed. Prof. Yeung's homepage
Very much effort has been devoted to the design of LDPC codes to achieve capacity under BP decoding. In this work we rather focus on the decoder design. We borrow from the tree-structured expectation propagation (TEP) algorithm to propose novel decoders that improve the BP performance. We first develop the TEP as a general decoder. Then, we deeply study the impact of this decoding on the BEC. We prove that it can achieve a better threshold at linear complexity with the length of the code. And, for short codes, it exhibits a good performance. The codes can also be designed to improve the threshold. To achieve capacity, we go further and develop decoders with no linear complexity that achieve the MAP solution. We end this talk by presenting some present lines of research. Prof. Murillo-Fuentes' homepage
The goal of this presentation is to promote a new approach for dealing with noisy data --- typically, images or videos here. Image denoising consists in approximating the noiseless image by performing some, usually non-linear, processing of the noisy image. Most standard techniques involve assumptions on the result of this processing (sparsity, low high-frequency contents, etc.); i.e., the denoised image. Instead, the SURE-LET methodology consists in approximating the processing itself (seen as a function) over some linear combination of elementary non-linear processings (LET: Linear Expansion of Thresholds), and to optimize the coefficients of this combination by minimizing a statistically unbiased estimate of the Mean Square Error (SURE: Stein's Unbiased Risk Estimate, in the case of additive Gaussian noise). I will introduce the technique and outline its advantages (fast, noise-robust, flexible, image adaptive). A comprehensive set of results will be shown and compared with the state-of-the-art. Prof. Blu's homepage
We present the first Internet architecture designed to provide route control, failure isolation, and explicit trust information for end-to-end communications. SCION separates ASes into groups of independent routing sub-planes, called trust domains, which then interconnect to form complete routes. Trust domains provide natural isolation of routing failures and human misconfiguration, give endpoints strong control for both inbound and outbound traffic, provide meaningful and enforceable trust, and enable scalable routing updates with high path freshness. As a result, our architecture provides strong resilience and security properties as an intrinsic consequence of good design principles, avoiding piecemeal add-on protocols as security patches. Meanwhile, SCION only assumes that a few top-tier ISPs in the trust domain are trusted for providing reliable end-to-end communications, thus achieving a small Trusted Computing Base. Both our security analysis and evaluation results show that SCION naturally prevents numerous attacks and provides a high level of resilience, scalability, control, and isolation. Prof. Perrig's homepage