## Research

### Multi-level Source Coding

A set of information sources (e.g., bitstreams) are jointly encoded by K encoders. There are N decoders, each of which has access to a non-empty subset of the encoded streams. Each decoder is assigned with a (ordered) level, which depends on the subset of compressed streams it has access to. The decoder at level m is interested in recovering a subset of the original information sources including streams labeled by 1,2,...,m. The goal is find admissible rate region, the minimum number of bits each encoder needs for compression such that the decoders can use the compressed streams for decoding.

This is motivated by asymmetric (unequal) reliabilities of the disks in distributed disk storage application. Moreover, packet erasure applications, such as Internet, in which the erasure probabilities for the sub-packets may be unequal because the paths over which they are sent may have different reliabilities, is a special case of this  problem.

We obtain a single letter characterization of the complete achievable rate region for the 3-description problem. In doing so, we show that it is necessary to jointly encode independent sources (i.e., similar to network coding), and that linear codes are optimal for this problem.

### Approximation of the Multiple Description Problem

In this problem a single source sequence is encoded into several streams, each of a certain rate. There are multiple users, each observes a subset of the compressed streams and aims to reconstruct the source within a certain fidelity. The  design object here is to give distortion guarantee depending on the set of available descriptions for each user. The fundamental question in the so-called Multiple Description (MD) source coding problem is to characterize the trade-off between the tuple of admissible rates and users’ distortions.

A multiple description code is important in application requiring robust communication, as well as in distributed storage systems, where each user may have access to only some if the servers.

This problem was introduced in the late 1970's, and has remained unsolved except for a special case in which K=2, and the source sequence is sampled from a Gaussian distribution.  However, it can be considered as a lossy version of the multi-level diversity coding problem. Given the results obtained for the lossless problem,  we make progress on the MD problem by giving an approximate characterization of the rate-distortion region for the Gaussian source with quadratic distortion. We show that the rate region can be sandwiched between two polytopes with matching hyperplanes, between which the gap can be upper bounded by constants dependent on the number of the descriptions, but not the distortion constraints.

### Relay-Interference Networks

Information transmission in a shared medium is one of the fundamental problems in wireless communication. In such situation a wireless channel is shared between several sources and receivers, and several information flow are competing for resources.  Here, a fundamental question is how to manage interference in a wireless network.

In this work, we study the relay-interference wireless network, in which two transmitters are attempting to communicate with two receivers, each interested in distinct transmitter messages. There are relay (helper) nodes to facilitate competing information flows over a wireless network. We examine this in the context of a deterministic wireless interaction model, which eliminates the channel noise and focuses on the signal interactions.  Using this model, we show that almost all the known schemes such as interference suppression, interference alignment and interference separation are necessary for relay-interference networks. In addition, we discover a new interference management technique, which we call interference neutralization, which allows for over-the-air interference removal, without the transmitters having complete access the interfering signals. We show that interference separation, suppression, and neutralization arise in a fundamental manner, since we show complete characterizations for special configurations of the relay-interference network.

Then, we generalize some of our result to Gaussian noisy wireless networks, where we can get grateful insight through the deterministic results to obtain an approximation (within constant gap) for the capacity region of such networks.

This is the first work in the context of deterministic wireless multiple unicast networks, and the result is quit important in the sense that the new developed transmission technique is shown to be crucial in achieving the capacity region of such networks.

### Information Theoretic Approach to Bioinformatics

DNA sequencing, referring to the process of determining the precise order of nucleotides within a DNA molecule, is the basic workhorse of modern day biology and medicine. The DNA is only observed through a large number of reads, which are short segments sampled from random locations of the sequence, and corrupted by noise. The main task in de novo assembly is to merge the reads, and determine the underlying sequence.

Comparison of the finished DNAs and study of the process of the underlying dynamics show that DNAs of different individuals are significantly similar, and hence, an already assembled DNA can be used as a representative example to facilitate sequencing. Reference genomes are typically used as a guide on which new genomes are built, enabling them to be assembled much more quickly and cheaply than de novo  assembly. Reference-based shotgun sequencing refers to the reconstruction of DNA based on both a reference and set of reads. This is equivalent is to figure out the variations between the target DNA and the one used as a reference.

A common theme in all variations of this problem is to extract a desired information from a set of observations (measurements) with stochastic behavior, where a certain cost has to be paid per unit of observation. We are interested in characterizing the fundamental limits of the problem, such as the minimum number of observation from which the desired information can be reliably determined. Moreover, understanding the trade-off between the cost of observations and the complexity of information extraction is a very interesting problem, which lies in the scope of this work.

### Interference Networks with Output Feedback and/or Delayed CSIT

Various interference management techniques have been proposed over the past few decades. However, these techniques are usually based on availability of instantaneous channel state information (CSI) at the transmitters. Such an assumption is perhaps not very realistic in practical systems, at least when dealing with fast fading links. Surprisingly, it is shown by Maddah-Ali and Tse that even delayed CSI is helpful to improve the achievable rate of wireless network with multiple flows, even if the channel realizations are independent over time. Availability of delayed CSI has been examined for various networks by different research groups, and have shown to be quite useful. A general study of effect of delayed CSIT on the performance of a wireless network is however still missing.

On the other hand, it is well-known that output feedback does not increase the capacity of point-to-point discrete memoryless channels. However, output feedback is beneficial in improving the capacity regions of more complex networks. This because in a source distributed network, output feedback allow each transmitter to (partially and causally) learn the signal sent by the other transmitter.

We have studied various multi-user networks configurations (including K-user SISO Interference and X channels, as well as 2-user MIMO Interference and X channels). In each configuration, characterizing the degrees of freedom for the network under different models of availability of delayed CSI and/or delayed feedback. It is shown that for certain network configurations, combination of delayed CSI and output feedback is as strong as having instantaneous CSI at the transmitters. Many other interesting results are reported for various networks.

### Network Coding

Consider a large network with one source and one destination node and edges with unit capacity. It is known that the point-to-point capacity of such network can be achieved by routing a single bit. Such simple scheme does not achieve the capacity of the network when a single source communicates to many receivers. However, the multicast theorem in network coding shows that, the common min-cut value towards N> 1 receivers can be achieved using packets of length log Nbits and linear operations at intermediate nodes, if the operations are deterministically known at the receivers.

In practical networks, where such deterministic knowledge is not sustainable, the most popular approaches are to append coding vectors at the headers of the packets to keep track of the operations performed on the source packets, or using algebraic subspace coding constructions which does not need the coding vector overheads. In this work we examine what are the information theoretical rates that can be achieved in a network where the intermediate node operations are unknown.

### Averaging Dynamics

One of the key challenges in modeling of social interactions in a complex multi-agent environment is the modeling of opinion  dynamics and how agents influence each  other’s opinion and how new technology  and ideas diffuse through such a network. One of the models that addresses such  dynamics is Hegselmann-Krause dynamics, which is a well accepted model for many engineering applications. Example of such applications include distributed rendezvous problem in a robotic network such as a network of space shuttles, where one may want to gather a set of robots which lack a central coordination to a common place.

Characterizing of the termination time of Hegselmann-Krause dynamics is a challenging problem, and is still unknown, even in the case of scalar dynamics. Although it is known that the termination time of the dynamics any dynamics with n agents is at least      , the best known upper bound for the termination time is           . In this work we introduce a new Lyapunov-like function and show that any dynamics arrives to the steady state in at most           steps, which improves the state of the art by an order of n.

### Reliability and Secrecy in Wireless Networks

Data transmission over relay nodes relies on the assumption that each relay node performs a predetermined function on its received signal and send the result over the network. However, a wireless relay network may have malfunctioning or malicious nodes (inserting errors), and existence of such nodes can reduce the capacity of the network. We develop the model for wireless network communication in the presence of a Byzantine adversary for different assumptions on channel/adversarial knowledge. We examine coding schemes which utilize signal interactions in the deterministic wireless network to reliably deliver information in the presence of such errors due to a Byzantine adversary.

On the other hand, reliably and secrecy of data transmission is an important issue in communication over a wireless network in presence of malicious nodes. Such node may wish to limit the communication by sending jamming signal to the network, or pretending to be the legitimate transmitter. On the other hand, the application may pose to keep the message private from all the nodes in the network, except the legitimate receiver. Both these problems have been well studied in point-to-point communication, under the topics of arbitrarily varying channels, and wiretap channels, respectively. The role of malicious nodes in wired networks has also received recent significant attention in network coded systems. However, these topics and many related problems have to be addressed in the context of wireless network. This is the topic of this project.

### Network Tomography and Group Testing

Motivated by applications in network tomography, sensor networks and infection propagation, we formulate non-adaptive group testing problems on graphs. Unlike conventional group testing problems each group here must conform to the constraints imposed by a graph.

For interesting classes of graphs we arrive at a rather surprising result, namely, that the number of tests required to identify defective items is substantially similar to that required in conventional group testing problems, where no such constraints on pooling is imposed.

### Search in Non-metric Space

In many situations we want to search and navigate a collection of objects in a space with unknown underlying relationship between the objects. More precisely, consider a database with some form of similarity or distance between objects which can not be quantified. More importantly, the similarity relationship does not necessarily satisfy nice properties. Further, tagging the objects is either too costly, or simply not possible because it is hard to define meaningful labels. Such situations are common when human perception is involved, for instance in comparing images, music, videos, and websites. A fundamental challenge here is to design perceptual human-assisted search algorithms. Sorting and ordering is a much harder task for human being compared to pairwise comparison. This    motivates designing efficient algorithms to find the most similar object to a given query by asking human users questions of the type: does object A look more like object B or object C?''.

We proposed a new algorithm, and analyzed its performance based on the number of question have to be answered by users as well as the amount of memory required to store the data. We believe that progress on this problem would lead to a large number of novel applications, ranging from a criminal identification system for law-enforcement to a company image phonebook, and perhaps to social networks, recommendation systems or new e-commerce applications.

### Privacy in Smart Grid

Smart meter collects the electrical energy consumption in short time intervals and report this information to the utility. Such high resolution data can be used by the utility in the purpose of monitoring, billing and efficient scheduling. However, this information can be used to make inference about what each single consumer does in any specific time interval, and thereby it is natural to ask about the privacy, specially of the residential consumers.

There is a tradeoff between the accuracy of the transmitted data and the user privacy. We look at this problem in an abstract way, and use the rate-distortion theory to precisely quantify the tradeoff between the accuracy of reported data (mean square distortion) and privacy (information leakage) for our proposed model. We show that the privacy-utility tradeoffs on the total load are achievable using an interference-aware reverse water-filling solution, which intuitively translates to suppressing low energy components.

### Compressed Sensing

Compressed sensing (CS) deals with the reconstruction of sparse signals from a small number of linear measurements. One of the main challenges in CS is to find the support of a sparse signal from a set of noisy observations. In the CS literature, several information-theoretic bounds on the scaling law of the required number of measurements for exact support recovery have been derived, where the focus is mainly on random measurement matrices.

In this work, we investigate the support recovery problem from an estimation theory point of view, where no specific assumption is made on the underlying measurement matrix. By using the Hammersley-Chapman-Robbins (HCR) bound, we derive a fundamental lower bound on the performance of any unbiased estimator which provides necessary conditions for reliable l2-norm support recovery. We then analyze the optimal decoder to provide conditions under which the HCR bound is achievable. This leads to a set of sufficient conditions for reliable l2-norm support recovery.

The approach used in this paper is different than the usual ones in the recent works. The interesting point about this method is its capability to be applied to arbitrary measurement matrices.

### Source Transmission over Networks

We studied the problem of lossless transmission of a set of correlated sources over a deterministic relay network for two traffic requirements, in two slightly different scenarios: In distributed multicast, where the set of sources have to be delivered to a set of destinations; And in the source exchange problem, where all the nodes with access to a source should be able to reconstruct all other sources observed at other nodes. We develop achievable rate regions and outer bounds for both these situations. For linear deterministic networks, these bounds coincide, yielding a complete characterization.

### Lossy Source Coding over Cascade Network

A lossy source coding with side information problem with one encoder and two cascaded decoders is considered. Several  variations of this problem have been studied for various source and side information distributions, availability  of the side information at encoder and/or either of the decoders. In this work we study this problem for binary source with erasure side information. The set of achievable rate-distortion triples is characterized for three instances of this problem when Hamming distortion is used to quantify the reconstruction quality.

### Huffman Coding

It is well-known that Huffman encoding algorithm provides an optimal prefix-free code for a discrete memoryless source, in the sense that for a given source distribution no other code can have a shorter expected length than that of the Huffman code. The redundancy of a Huffman code is known to be non-negative and exceeding 1. These bounds can be improved if partial knowledge about the source distribution is available.

Our focus in this work is on a class of sources, for which the frequency of only one of the symbols is know. We proved Ye and Yeung’s conjecture about the maximum redundancy of such Huffman codes, which yields in a tight upper bound. We also derived a tight lower bound for the redundancy. In consequence, we introduce a class of distributions that can achieve the derived lower bound, and investigate properties of the Huffman codes induced by such distributions.

### LT Codes and Raptor Codes

The combination of computational and error-correction performance of iterative decoding algorithms has made them the prime choice for practical error correcting systems that operate at rates extremely close to the capacity of the underlying channel. The first such algorithm is due to Gallager who  invented a simple message passing algorithm for LDPC codes using  a version of the majority-logic algorithm.

Fountain codes are a new class of codes, originally designed for transmission of information over binary erasure channels with unknown erasure probabilities. Sub-classes of such codes, such as LT-codes and Raptor codes have very efficient decoding algorithms, and can perform arbitrarily close to the capacity of the  channel.

In this work we investigated the performance of Raptor Codes using Gallager's majority decoding algorithm on the binary symmetric channels. We obtain equations which relate the error probability to the output node degree distribution and then we design good degree distributions using the Differential Evolution method.