2013 Schedule

30/05/2013 @ 15:15 room BC 420

One Size Fits None - (Everything You Learned in Your DBMS Class is Wrong)

Prof. Michael Stonebraker, MIT

The traditional wisdom for building DBMSs is to:
1) organize blocks of data on disk, with a main memory block cache.
2) implement an Aries-style write-ahead log.
3) use record level locking.
4) utilize a multi-threaded architecture.
5) Use an active-passive architecture for replication
6) use multi-threading
This traditional wisdom is exemplified in all the major DBMS products, including DB2, MySQL, Postgres, SQL Server, and Oracle. In a paper in ICDE 2005, we argued that “one size does not fit all” and that a variety of data management solutions should be considered for deployment. Nine years later, we make an even stronger statement; namely the traditional wisdom is all wrong, and systems architected according to it are not good at anything.
Unless you squint, the DBMS market is divided into thirds, namely On-Line Transaction Processing (OLTP), data warehousing (DW) and everything else. The “everything else” market is a mix of No_SQL, hadoop, graph DBMSs bases, array DBMSs, etc. None of these relate to the traditional wisdom; hence, we deal with the other two markets.
In the DW market, it is clear that the world is moving to column stores, and the only successful vendors will be selling that technology. We explain the architecture that implements systems column stores like Vertica, Hana and Paraccel. As will be crystal clear, column stores have no relationship to the traditional wisdom.
Then, we turn to OLTP DBMSs and argue, using performance numbers, that a main memory orientation with anti-caching of main memory objects when space is exhausted is a far better alternative than a disk based system with caching. Similarly, a transaction log dominates a data log, such as Aries and an active-active architecture for replication is preferred to an active-passive one. Lastly, deterministic concurrency control solutions are similarly dominant, compared to dynamic locking. In short, next generation OLTP systems will be built using anti-caching, single-threading, deterministic execution, and active-active replication. They will bear little resemblance to today’s traditional systems.
In summary, the traditional wisdom is not a good idea in any application area, and “one size fits none”. I expect DBMS textbooks, future DBMS courses and the commercial market will quickly reflect this point of view.
03/06/2013 @ 10:00 room BC 420

Protecting Sensitive Data in Web Browsers with ScriptPolice

Prof. Brad Karp, University College London

The web browser has become an attractive target for attackers who wish to obtain users' sensitive data. The browser is rife with untrusted JavaScript: pages execute scripts, and extensions execute with elevated privilege that entitles them to see content from all origins, and to send data to third-party servers.Two principal threat models apply to a user's sensitive data within a browser. A malicious extension author may write extension code that reads sensitive page content and sends it to a remote server he controls. And a malicious page author may exploit an honest but buggy extension, thus leveraging its elevated privilege to disclose sensitive information from other origins.
In this talk, I will demonstrate zero-day vulnerabilities in real-world extensions for a widely used browser that allow maliciously crafted JavaScript in pages to leak a user's sensitive information. I will then describe two classes of policy that protect sensitive data in web browsers by limiting the privilege of JavaScript code. *Containment* policies block the export of sensitive information from an extension, however obtained. They protect against both malicious extensions and malicious pages. *Prevention* policies, by contrast, stop the misuse of an extension's privileges by a page. Both types of policy are effective for a wide range of extensions, and are thus easy to deploy in browsers. Finally, I will present ScriptPolice, a policy system for the Chrome browser's V8 JavaScript interpreter that supports simple containment and prevention policies. We demonstrate that on a variety of extensions and pages, ScriptPolice effectively protects sensitive data in the browser, while typically incurring added latency indistinguishable by the user.
(Joint work with Petr Marchenko of UCL and Ulfar Erlingsson of Google.)

Organized by SuRI 2013
03/06/2013 @ 14:00 room BC 420

Cost-based optimization and algebra-based execution for queries on XML data

Prof. Vasilis Vassalos, Athens University of Economics and Business

This talk will address the problem of optimization and execution of queries on XML data. We will discuss a complete, generic and modular XPath cost-based optimization and execution framework. The framework is based on a logical XPath algebra and a comprehensive set of rewriting rules that together enable us to algebraically capture many existing and novel processing strategies for XPath queries. Key pieces of the framework are the physical operators that are available to the execution engine, to turn queries into execution plans. Such operators, to be efficient, need to implement sophisticated algorithms for logical XPath or XQuery operations. Moreover, to enable a cost-based optimizer to choose among them correctly, it is also necessary to provide cost models for such operator implementations. We will present different families of algorithms for XPath physical operators, along with detailed cost model. We will present experimental evaluations of the performance of these operators over different XML storage engines. Another important part of the framework that we will discuss is an efficient cost-based plan selection algorithm for queries. Such a cost-based query optimizer is independent of the underlying physical data model and storage system and of the available logical operator implementations, depending on a set of well-defined APIs. Finally, to round out the presentation, we will also present an implementation of those APIs, including primitive access methods, the physical operators, statistics estimators and cost models.

Organized by SuRI 2013
04/06/2013 @ 10:00 room BC 420

How Many Down? Toward Understanding Systematic Risk in Networks

Prof. Jens Grossklags, Pennsylvania State University

The systematic risk of a networked system depends to a large extent on the ways in which the network is connected. In this talk, I will discuss the connection between a network's systematic risk and its topology, using a model of risk propagation from the literature on interdependent security games, and focusing on the loss distribution in terms of the number of compromised nodes. I will show that it is NP-hard to compute this loss distribution for an arbitrary network topology. Nevertheless, it is possible to derive efficient formulae for loss distributions resulting from homogeneous, star, and E-R random topologies. Further, I will introduce a simulation algorithm for approximating the loss distribution in general. Applying the simulation methodology to the study of scale-free networks, we find systematic risks which distinguish these networks substantively from even their own random subnets. This implies on the one hand, that a random subnet of a network with large systematic risk may still be insurable. On the other hand, the true systematic risk of a networked system may not be discoverable by risk assessment methods, such as incident reporting, that are based on subsampling.

Organized by SuRI 2013
04/06/2013 @ 14:00 room BC 420

Checking the World's Software for Exploitable Bugs

Prof. David Brumley, Carnegie Mellon University

Attackers only need to find a single exploitable bug in order to install worms, bots, and other malware on vulnerable computers. Unfortunately, developers rarely have the time or resources to fix all bugs. This raises a serious security question: which bugs are exploitable, and thus should be fixed first? My research teams vision is to automatically check the world's software for exploitable bugs. Our approach is based on program verification, but with a twist. Traditional verification takes a program and a specification of safety as inputs, and checks that all execution paths of the program meet the safety specification. The twist in AEG is we replace typical safety properties with an ``un-exploitability'' property, and the ``verification'' process becomes finding a program path in which the un-exploitability property does not hold. Our analysis generates working control flow hijack and command injection exploits for exploitable paths. I'll discuss our results with a data set of over 1,000 programs and over 370 days of analysis time. Despite the large amount of analysis, there is still much to be done. In the last part of this talk, I'll describe several of the remaining research challenges.

Organized by SuRI 2013
06/06/2013 @ 10:00 room BC 420

Learning a Zonotope and More: Cryptanalysis of NTRUSign Countermeasures

Léo Ducas, Ecole Normale Supérieure, Paris

Lattices have attracted a lot of interest in the domain of Public Key Cryptography; and they are now well understood tools to build scheme which security can be reduced to the hardness of lattice problems
such as finding the shortest vector. Yet, the early signature scheme NTRUSign, from 2003, does not rely on those recent tools, and its security was an open question. Despite the existence of provably secure schemes, this question remains essential in practice because its efficiency is far better than provably secure ones.

A first step was done by Nguyen and Regev in 2006, showing that a "raw" version of NTRUSign was subject to a statistical attack. Precisely, they showed that the signature belong to a parallelepiped, which is related to the secret key; and that it is possible to learn that parallelepiped given enough signatures.

Yet the full version of NTRUSign contained a preventive countermeasure against this kind of attack, consisting of a adding a randomized perturbation, hoping to prevent any statistical attacks. In this work will first show that this perturbation results in a Zonotope, and that it is still possible to learn that zonotope; this attack was implemented and the full secret key could be recovered from about 5000 signatures. We will also tackle alternative perturbation techniques, interestingly leading to the famous Graph Isomorphism problem.

Organized by SuRI 2013
07/06/2013 @ 10:00 room BC 420

Less is More: Experiences in Deconstructing Hardware Coherence

Prof. Stefanos Kaxiras, Uppsala University

Much of the complexity and overhead (directory, state bits, invalidations, broadcasts) of a typical hardware coherence implementation stems from the effort to make it “invisible” even to the strongest memory consistency models. In this talk, we show how a much simpler, directory-less/broadcast-less, multicore coherence can outperform a directory protocol without its complexity and overhead, with just a data-race-free guarantee from software. This simplification of coherence brings further simplifications to the entire on-chip memory system, for example unconstrained scaling to multiple buses, simple on-chip networks, and a new solution for efficient virtual-cache coherence. Significant area, energy, and performance benefits ensue as a result of simplifying the multicore memory organization, making our approach a prime candidate for coherent, shared-virtual-memory, heterogeneous architectures.

Organized by SuRI 2013
07/06/2013 @ 11:15 room BC 420

When and how can we compute approximate solutions to NP-hard problems?

Prof. Sanjeev Arora, Princeton University

Can efficient algorithms find approximately optimal solutions (provably) to NP-hard problems? This question has animated a big research effort in theoretical CS in the past few decades. This included the PCP Theorems and many exciting approximation algorithms.
For several problems we even know the precise approximation threshold that can be achieved by efficient algorithms, and also know that improving upon that threshold is no easier than exact optimization. This theory makes connections with a host of other disciplines. Even the PCP Theorem (which says that proofs can be probabilistically checked by examining a constant number of bits in them) seems at first sight to have nothing to do with approximation. This talk, which is geared to a broad scientific audience, will survey this field.

Organized by SuRI 2013
10/06/2013 @ 10:00 room BC 420

A Mobile Platform and Social Stack for Personal Data: Open Mustard Seed

Dr. John Clippinger, Massachusetts Institute of Technology

According to a recent World Economic Forum report, personal data has become a new asset class and the "new oil of the Internet" . As such, personal data both need to be protected and shared, analyzed, as well as, monetized. Regulatory practices have been slow to keep pace with the changing nature of data capture, analysis, and use. As a consequence, innovations in digital legal, regulatory and governance practices and mechanisms are needed to keep pace with advances in sensor, machine learning, and "Big Data" technologies. This talk presents an approach that gives individuals and groups control over their personal data while enabling the trusted exchange of data. Project Open Mustard Seed, a collaboration of ID3 and the MIT Media Lab, provides an open source platform for innovations in governance, authentication, identity management, access control, auditing, analytics and visualization technologies for highly scalable forms of coordinated action and exchange. The goal is to enable new forms of "data banking', collective action and digital institution building and experimentation that are self-governing and correcting.

Organized by SuRI 2013
10/06/2013 @ 11:15 room BC 420

Approximate Strang-Fix: Sparse Sampling with any acquisition device

Prof. Pier Luigi Dragotti, Imperial College London

The problem of reconstructing partially observed or sampled signals is an important one that finds application in many areas of signal processing. Traditional acquisition and reconstruction approaches are heavily influences by classical Shannon sampling theory which gives an exact sampling and interpolation formula for bandlimited signals. In the last few years, several new methods have been developed for the sampling and exact reconstruction of specific classes of sparse non-bandlimited signals known as signals with finite rate of innovation (FRI). This is achieved by using adequate acquisition devices (sampling kernels) and reconstruction schemes. Valid sampling kernels allow to connect the samples to some essential information of the original signal, typically, its Fourier or Laplace transform at specific frequencies. An example of valid kernel is given by the family of exponential reproducing functions. These satisfy the generalised Strang-Fix conditions, which ensure that proper linear combinations of the kernel with its shifted versions reproduce exponentials exactly.

In the first part of the talk, we discuss sampling and perfect reconstruction of parametric sparse signals such as piecewise sinusoidal signals or classes of 2-D signals using exponential reproducing kernels. We then show how to choose the exponential reproducing kernel that leads to the most stable reconstruction when estimating FRI signals from noisy samples. This analysis leads to the design of the e-MOMS family of kernels (Maximum order minimum support exponential reproducing kernels)
which we show includes all the compact support kernels used in FRI theory so far. Next we depart from the situation in which we can choose the sampling kernel and develop a new strategy that is universal in that it works with any kernel. We do so by noting that meeting the exact exponential reproduction condition is too stringent a constraint. We thus allow for a controlled error in the reproduction formula in order to use the exponential reproduction idea with arbitrary kernels and develop a universal reconstruction method which is also robust to noise.

To emphasize the relevance of these new theories, we conclude the talk by presenting applications in image super-resolution and in neuroscience. In particular, we will present a new fast algorithm for spike detection from two-photon calcium imaging and will show how to perform spike sorting at sub-Nyquist rates.
This is joint work with T. Blu (CUHK), H. Pan (CUHK) J. A. Uriguen (ICL), J. Onativia Bravo (ICL) and S. Schultz (ICL).
This work is supported by the European Research Council (ERC) starting investigator award Nr. 277800 (RecoSamp).

Organized by SuRI 2013
10/06/2013 @ 14:00 room BC 420

Whole Genome Sequencing: Innovation Dream or Privacy Nightmare?

Dr. Emiliano De Cristofaro, Xerox PARC, Palo Alto

Recent advances in DNA sequencing technologies have put ubiquitous availability of whole human genomes within reach. It is no longer hard to imagine the day when everyone will have the means to obtain and store one's own DNA sequence. Widespread and affordable availability of whole genomes immediately opens up important opportunities in a number of health-related fields. In particular, common genomic applications and tests performed in vitro today will soon be conducted computationally, using digitized genomes. New applications will be developed as genome-enabled medicine becomes increasingly preventive and personalized. However, the very same progress also amplifies worrisome privacy concerns, since a genome represents a treasure trove of highly personal and sensitive information.
In this talk, we will overview biomedical advances in genomics and discuss associated privacy, ethical, and security challenges. We begin to address privacy-respecting genomic tests by focusing on some important applications, such as, Personalized Medicine, Paternity Tests, Ancestry Testing, and Genetic Compatibility Tests. After carefully analyzing these applications and their requirements, we propose a set of efficient privacy-enhancing techniques based on private set operations. This allows us to implement, in silico, some operations that are currently performed via in vitro methods, in a secure fashion. Experimental results demonstrate that proposed techniques are both feasible and practical today. Finally, we explore a few alternatives to securely store human genomes and allow authorized parties to run tests in such a way that only the required minimum amount of information is disclosed, and present an Android API framework geared for privacy-preserving genomic testing.

Organized by SuRI 2013
10/06/2013 @ 15:15 room BC 420

System-wide intrusion recovery and running applications over encrypted data

Prof. Nickolai Zeldovich, Massachusetts Institute of Technology

Computer systems are routinely compromised---as a result of software vulnerabilities, mis-configuration by administrators, or insecure choices by end users---and compromises seem inevitable in almost any system.
This talk will describe two of our recent research projects to provide security despite inevitable compromises. First, for integrity, we have been building systems that provide "system-wide undo", which allows users or administrators to recover the integrity of a system after an intrusion, by undoing the attacker's actions and all causal effects thereof, while preserving legitimate user changes. Second, to protect confidentiality, we have been building systems that run applications over encrypted
data, so that even if a server is compromised, an adversary learns only encrypted data, and cannot obtain plaintext confidential information.

Organized by SuRI 2013
10/06/2013 @ 16:30 room BC 420

Protecting confidentiality in Cloud data processing: Europe's Keyser Söze strategy

Caspar Bowden, independent advocate for information self-determination rights

"Strong economic incentives to adopt the Cloud processing paradigm presents obvious risks to the confidentiality of data exposed to foreign jurisdictions. Homomorphic encryption seems unlikely to be practical and current 'trusted computing' technology is designed against consumer-grade adversaries. European policymakers have been proposing frameworks for legal certification which would permit unlimited export of personal data, subject to a commercial security audit of the Cloud platform against external threats. However there appears to be a dissonance between regulator expectations that foreign 'requests' for data will be discrete and follow due process, and evidence from whistle-blowers that apparatus for continuous mass-surveillance is already systematically deployed. Moreover the small-print of these frameworks appears to have been crafted to turn a blind-eye to secret 'national security' access to data, even though relevant foreign laws do not comply with European human rights standards, for example by discriminating by nationality and allowing purely political purposes unrelated to criminality. This talk will describe the recent policy history, from the Safe Harbour Agreement to current controversies over the new draft EU Data Protection Regulation."

Organized by SuRI 2013
11/06/2013 @ 10:00 room BC 420

A high-level language for secure distributed computation

Prof. Andrew Myers, Cornell University

People exchange code and data increasingly freely across the Internet and the Web, but both code and data are vectors for attacks on confidentiality and integrity. The Fabric project is developing higher-level programming models and programming languages that get us closer to programming the Internet Computer directly. Fabric supports the free exchange of code and data across a decentralized, distributed system. But unlike the Web, Fabric has a principled basis for compositional security: language-based information flow. Fabric raises the level of abstraction for programmers, which simplifies programming, but also makes it easier to reason clearly about
security, even in the presence of distrusted mobile code.

Organized by SuRI 2013
11/06/2013 @ 11:15 room BC 420

Leveraging FPGA Reconfigurability during Post-Silicon Debug and Validation: Opportunities, Successes, and Challenges

Prof. Steve Wilton, University of British Columbia

Electronic devices have come to permeate every aspect of our daily lives. At the heart of these devices are Integrated Circuits. State-of-the-art chips can now contain several billion transistors; designing and verifying that these circuits function correctly under all expected (and unexpected) operation conditions is extremely challenging. While simulation is an important tool, it is not sufficient; simulation is slow, and it is increasingly important to be able to debug circuits in-situ. Debugging fabricated chips, post-silicon, is the only solution to ensure working systems.

An emerging tool in improving the debug experience is the use of Field-Programmable Gate Array (FPGA) technology. In this talk, we will focus on two ways FPGA-like technology can be used to accelerate post-silicon debug. First, we will show how debug productivity can be enhanced by embedding small reconfigurable logic analyzers on chip, and using these to not only record traces, but intelligently process data to control the system and make better use of existing on-chip trace storage. Challenges include minimizing the overhead while providing enough flexibility to support many debug scenarios.

Second, we will discuss how the reconfigurable nature of FPGAs can be used to efficiently provide observability while debugging in an FPGA prototyping environment. In particular, we will show how we can create flexible overlay networks that provide connectivity between trace buffers and the circuit under test using unused FPGA routing resources. We will show that this technique effectively results in no area and speed overhead to the circuit under test, yet provides a significantly improved debug experience.

Organized by SuRI 2013
11/06/2013 @ 14:00 room BC 420

Censorship Circumvention: Staying Ahead in a Cat-and-Mouse Game

Prof. Nikita Borisov, University of Illinois at Urbana-Champaign

The Internet enables access to a wide variety of information sources; many countries and organizations, however, try to restrict such access for political and social reasons. People whose access has been censored make use of a variety of circumvention technologies to find the information they need; in turn, the censors use increasingly sophisticated tools to render these technologies ineffective. One of the most powerful techniques available to the censors has been the insider attack, wherein the censor pretends to be a user of a system in order to learn secret information about its functions. For example, censors continually update a blacklist of IP addresses belonging to circumvention proxies. I will discuss some new techniques designed specifically resist this insider threat.
rBridge focuses on the distribution of proxy addresses to users. It tracks the reputation score of each user, representing the likelihood of this user revealing a proxy address to the censors, and uses an introduction mechanism to resist Sybil attacks. A particular challenge of rBridge is to preserve the privacy of its users by keeping the knowledge about which users know which proxies secret.
Cirripede is an alternate approach that seeks to eliminate the insider threat entirely. It uses redirection proxies that are activated by a special cryptographic signal, which can be generated using only public information but can only be recognized by the proxies. Instead of hiding the location of the proxies, Cirripede places them in highly connected ISPs, such that blocking Cirripede would result in high collateral damage.

Organized by SuRI 2013
11/06/2013 @ 15:15 room BC 420

Accountable Key Infrastructure (AKI): A Proposal for a Public-Key Validation Infrastructure

Prof. Adrian Perrig, ETH Zürich

Recent trends in public-key infrastructure research explore the tradeoff between decreased trust in Certificate Authorities (CAs), resilience against attacks, ommunication overhead (bandwidth and latency) for setting up an SSL/TLS connection, and availability with respect to verifiability of public key information. In this paper, we propose AKI as a new public-key validation infrastructure, to reduce the level of trust in CAs. AKI integrates an architecture for key revocation of all entities (e.g., CAs, domains) with an architecture for accountability of all infrastructure parties through checks-and-balances. AKI efficiently handles common certification operations, and gracefully handles catastrophic events such as domain key loss or compromise. We propose AKI to make progress towards a public-key validation infrastructure with key revocation that reduces trust in any single entity.

Organized by SuRI 2013
11/06/2013 @ 16:30 room BC 420

Title to be announced..

L. Aaron Kaplan, CERT

Organized by SuRI 2013
12/06/2013 @ 10:00 room BC 420

Detecting zero day attacks by Cognitive Security

Prof. Michal Pechoucek, Czech Technical University

Current worldwide network infrastructure is under continual attack.
Custom build, sophisticated mallware is exploited for economical purposes and is very difficult to detect its operation once it successfully penetrates the perimeter. Based on the state-of-the-art research in the field of artificial intelligence, machine learning, game theory and agent-based computing studied at the Agent Technology Centre, Czech Technical University, the company Cognitive Security
(COSE) have designed and developed anomaly detection system providing introspection into vulnerability of customers' networks. The Cognitive One (C1) system processes packet headers in the form of NETFLOW data, combines several custom build statistical classifiers and is optimised for delivering the extremely low false positive rate while maintaining high detection capability. COSE, a startup from Czech Technical University has been acquired by CISCO Systems in January 2013 and Prague-based team became a core of the newly formed CISCO R&D laboratory building a next generation Thread Defence technology.

Organized by SuRI 2013
12/06/2013 @ 11:15 room BC 420

Imaging Arithmetic: Physics U Math > Physics + Math

Prof. Gaurav Sharma, University of Rochester

For several real-world problems, signal and image processing approaches are most successful when they combine the insight offered by the physics underlying the problem with the mathematical framework and tools inherent in digital signal and image processing. Electronic imaging systems are a particularly fertile ground for problems in this class because they deal specifically with the capture of physical scenes and with the reproduction of images on physical devices. In this presentation, we highlight specific examples of problems in electronic imaging for which the combination of physical insight, mathematical tools, and engineering ingenuity leads to particularly elegant and effective solutions.

We illustrate the above ideas in some depth using a number of case studies drawn from our research in electronic imaging, in each case highlighting how the combination of physical modeling/insight with mathematical analysis enables a solutions that each of these tools alone is unable to address adequately. The examples cover a wide range of applications, including methods for show-through cancelation in scanning, print watermarks detectable by viewers without using any viewing aids, multiplexed images that revealed under varying illumination, improved metrics for the accuracy of color capture devices, and color halftone separation estimation from scans.

Organized by SuRI 2013
12/06/2013 @ 14:00 room BC 420

Cryptosense: Security Analysis for Cryptographic APIs

Prof. Graham Steel, CNRS, ENS de Cachan and INRIA

In practice, most developers use cryptography via an application program interface (API) either to a software library or a hardware device where keys are stored and all cryptographic operations take place. Designing such interfaces so that they offer flexible functionality but cannot be abused to reveal keys or secrets has proved to be extremely difficult, with a number of published vulnerabilities in widely-used APIs appearing over the last decade.
This talk will discuss research on the use of formal methods to specify and verify such interfaces in order to either detect flaws or prove security properties. We will focus on the example of RSA PKCS#11, the most widely used interface for cryptographic devices, and show how research has progressed from initial theoretical results through to a powerful tool, the Cryptosense Analyzer, which can reverse engineer the particular configuration of PKCS#11 in use on some device under test, construct a model of the device's functionality, and call a model checker to search for attacks. If an attack is found, it can be executed automatically on the device, and
advice for secure configuration is given. The talk will conclude with a live demonstration.

Organized by SuRI 2013
12/06/2013 @ 15:15 room BC 420

Surveillance, censorship and the Tor network

Jake Appelbaum, T&A

Surveillance and censorship is an increasing concern for people worldwide. The Tor network is one of many tools that users may use to protect themselves - it is a distributed peer to peer system that utilizes strong cryptography to protect users online.

Organized by SuRI 2013
12/06/2013 @ 16:30 room BC 420

Unsupervised Network Anomaly Detection

Dr. Philippe Owezarski, CNRS, Toulouse

Network anomaly detection is a critical aspect of network management for instance for QoS, security, etc. The continuous arising of new anomalies and attacks create a continuous challenge to cope with events that put the network integrity at risk. Most network anomaly detection systems proposed so far employ a supervised strategy to accomplish the task, using either signature-based detection methods or supervised-learning techniques. However, both approaches present major limitations: the former fails to detect and characterize unknown anomalies (letting the network unprotected for long periods), the latter requires training and labeled traffic, which is difficult and expensive to produce. Such limitations impose a serious bottleneck to the previously presented problem. We introduce an unsupervised approach to detect and characterize network anomalies, without relying on signatures, statistical training, or labeled traffic, which represents a significant step towards the autonomy of networks. Unsupervised detection is accomplished by means of robust data-clustering techniques, combining Sub-Space clustering with Evidence Accumulation or Inter-Clustering Results Association, to blindly identify anomalies in traffic flows. Several post-processing techniques such as correlation, ranking and characterization, are applied on extracted anomalies to improve results and reduce operator workload. The detection and characterization performances of the unsupervised approach are evaluated on real network traffic.

Organized by SuRI 2013
17/06/2013 @ 10:00 room BC 420

Internet privacy: Towards more transparency

Balachander Krishnamurthy, AT&T Labs Research

Internet privacy has become a hot topic recently with the radical growth of Online Social Networks (OSN) and attendant publicity about various leakages. For the last several years, we have been examining aggregation of user's information by a steadily decreasing number of entities as unrelated Web sites are browsed. I will present results from several studies on leakage of personally identifiable information (PII) via Online Social Networks and popular non-OSN sites. Linkage of information gleaned from different sources presents a challenging problem to technologists, privacy advocates, government agencies, and the multi-billion dollar online advertising industry. Economics might hold the key in increasing transparency of the largely hidden exchange of data in return for access of so-called free services. I will
also talk briefly about doing privacy research at scale.

Organized by SuRI 2013
17/06/2013 @ 14:00 room BC 420

Developing Security Protocols by Refinement

Prof. David Basin, ETH Zurich

We survey recent work of ours on developing security protocols by step-wise refinement. We present a refinement strategy that guides the transformation of abstract security goals into protocols that are secure when operating over an insecure channel controlled by a Dolev-Yao-style intruder. The refinement steps used successively introduce local states, an intruder, communication channels with security properties, and cryptographic operations realizing these channels. The abstractions used provide insights on how the protocols work and foster the development of families of protocols sharing a common structure and properties. In contrast to post-hoc verification methods, protocols are developed together with their correctness proofs. We have implemented our method in Isabelle/HOL and used it to develop a number of entity authentication and key transport protocols.
(Joint work with Christoph Sprenger, ETH Zurich)

Organized by SuRI 2013
18/06/2013 @ 10:00 room BC 420

A Decade of Building Broken Chips

Prof. Krishna V. Palem, Rice University

Well over a decade ago, many believed that an engine of growth driving the semiconductor and computing industries, captured nicely by Gordon Moore’s remarkable prophecy (Moore’s law), was speeding towards a dangerous cliff-edge. Ranging from expression of concern to doomsday scenarios, the exact time when serious hurdles would beset us varied quite a bit—some of the more optimistic warnings giving Moore’s law till 2020! Needless to say, a lot of people have spent time and effort with great success to find ways for substantially extending the time when we would encounter the dreaded cliff-edge, if not avoid it altogether. When faced with this issue, I decided to consider a different approach—one which suggested falling off the metaphorical cliff as a design choice, but in a controlled manner. This would result in devices that could switch and produce bits that are correct, namely have the intended value, only with a probabilistic guarantee. As a result, the results could in fact be incorrect. Such devices and associated circuits and computing structures are now broadly referred to as inexact designs, circuits and architectures. In this talk, I will start with the beginnings of this idea in 2002—one that Technology Review labeled as being heretical in their TR10 citation—and give an overview of a range of ideas that my students and other groups around the world have been developing since, that embody inexact computing today. Despite being probabilistic, inexact designs can be significantly more efficient in the energy they consume, their speed of execution and area needs, which makes them attractive for resilient applications which can tolerate error. I will also contrast this style of design with traditional approaches with a rich history, aimed at realizing reliable computing from unreliable elements, starting with von Neumann’s influential lectures and further developed elegantly by Shannon-Weaver and others.

Organized by SuRI 2013
18/06/2013 @ 14:00 room BC 420

Turbo-Decoding of RNA Secondary Structure

Prof. Gaurav Sharma, University of Rochester

RNA has recently emerged as an important molecule in cellular biology with several direct functional roles in addition to its traditionally known role as an intermediary carrier of the code for protein synthesis. For RNAs that serve these direct noncoding roles, knowledge of the molecular structure is fundamental to understanding their function, to finding new RNA genes, and to the design of targeted therapeutics. Due to the difficulty and cost associated with experimental procedures for determining structure, computational methods for predicting structure are of significant research interest. In this talk, we focus on computational techniques for RNA secondary structure prediction – the first step in the hierarchy of RNA structure estimation that predicts the folding configuration of a linear RNA chain. Specifically, we consider methods that predict secondary structure for multiple RNA homologs by combining intra-sequence folding information and inter-sequence alignment information. The comparative analysis implicit in these multi-sequence methods provides significant improvements in accuracy over single sequence prediction methods but the resulting computational complexity is often prohibitive. We present our recent work that addresses this challenge via a novel iterative algorithm, TurboFold. TurboFold formulates RNA folding in a probabilistic framework as the problem of estimating base pairing probabilities and iteratively re-computes these base pairing probabilities by combining intrinsic information, derived from the sequence itself via a thermodynamic model, with extrinsic information, derived from the other sequences in the input set. This process yields updated estimates of base pairing probability, which are in turn used to recompute the extrinsic information for subsequent interations, resulting in the overall iterative estimation procedure that defines TurboFold. We benchmark TurboFold against alternative methods and highlight several of its advantages. Finally, we explore connections with turbo decoding in digital communications, which inspired TurboFold, and outline our continuing research that seeks to develop algorithms for estimating RNA structure from multiple homologs with high accuracy and low computational cost by using iterative algorithms for approximate maximum a posteriori estimation of RNA structure.

Organized by SuRI 2013
20/06/2013 @ 10:00 room BC 420

Brute force searching, the typical set and Guesswork

Prof. Ken Duffy, National University of Ireland Maynooth

The talk explores murky water between computational security, probability and information theory. If an object is selected from a finite list and an inquisitor can ask "is the object X", in cryptography it is deemed to be computationally secure so long as the list is large enough. Implicit in this is there is no prior information known to the inquisitor about the distribution of the object that selected.

The two questions this talk addresses are: if the object was selected stochastically with probabilistic properties known to the inquisitor, what is the distribution for how many attempts it takes them to correctly guess the object? If the object is selected from a source subject to typical set coding, does the Asymptotic Equipartition Property mean we can assume it was uniformly distributed?

Building on curious work that began with J. Massey in '94 and E.
Arikan in '96 (with a significant contribution from EPFL's Charles Pfister, in collaboration with W. Sullivan in '04), answering the first question gives a surprising estimate, related to stochastic orders, based on a Legendre-Fenchel transform of a function of Renyi entropy. Worryingly, the second answer transpires to be negative:
the uniform AEP ansatz leads to a guessing problem that's exponentially harder in word length than the true typical set guesswork problem.

This talk is based on work with M. Christiansen (NUIM), as well as with F. du Pin Calmon & M. Medard (MIT).

Organized by SuRI 2013
20/06/2013 @ 11:15 room BC 420

Demand Side Energy Management in Smart Buildings

Prof. Prashant Shenoy, University of Massachusetts

Today, buildings account for nearly 75% of the electricity usage and 40% of the total energy usage in modern economies. Consequently, techniques for reducing the energy and carbon footprint of buildings has emerged as an important research topic. Demand side energy management refers to techniques employed by smart buildings or its occupants to modulate the energy usage for various energy- or grid-level optimizations. In this talk, I will argue that modeling and prediction of electrical loads is a key ingredient of any demand-side energy management technique. I will describe opportunities and challenges in modeling these loads and
describe our recent work in empirically characterizing and modeling common residential loads. I will also describe how better modeling can result in more effective demand side energy management and better energy efficiency improvements in the smart grid.

Organized by SuRI 2013
20/06/2013 @ 15:15 room BC 420

Querying Big, Dynamic, Distributed Data

Prof. Minos Garofalakis, Technical University of Crete

Effective Big Data management and analysis poses several difficult challenges for modern database architectures. One key such challenge arises from the naturally streaming nature of big data, which mandates efficient algorithms for querying and analyzing massive, continuous data streams (that is, data that is seen only once and in a fixed order) with limited memory and CPU-time resources. Such streams arise naturally in emerging large-scale event monitoring applications; for instance, network-operations monitoring in large ISPs, where usage information from numerous sites needs to be continuously collected and analyzed for interesting trends. In addition to memory- and time-efficiency concerns, the inherently distributed nature of such applications also raises important communication-efficiency issues, making it critical to carefully optimize the use of the underlying network infrastructure. In this talk, we introduce the distributed data streaming model, and discuss some of our recent results on tracking complex queries over massive distributed streams, as well as new research directions in this space.

Organized by SuRI 2013

Summer Research Institute

2013 Schedule