Presentation Slide

Download Report

Transcript Presentation Slide

Jointly Optimal Transmission and
Probing Strategies for
Multichannel Systems
Saswati Sarkar
University of Pennsylvania
Joint work with Sudipto Guha (Upenn) and Kamesh
Munagala (Duke)
Multichannel Systems
• Current wireless networks have access to
multiple channels
– Frequencies in FDMA, codes in CDMA, antennas in
MIMO, polarization states of antennas, access points
in systems with multiple access points
– IEEE 802.11a, IEEE802.11b, IEEE802.11h
• Future wireless networks are likely to have
access to larger number of channels
– Potential deregulation of spectrum
– Advances in device technology
Multichannel systems
• Transmission quality in different channels
stochastically vary with time.
• A user is likely to find a channel with acceptable
transmission quality when the number of
channels is large,
– if he samples all the channels
• Sampling consumes energy and generates
interference for other users
• Amount of information a user acquires about its
channels becomes an important decision
variable.
Policy Decisions
• Probing policy
– How many channels to probe?
– Which channels to probe?
– Sequence of probing
• Selection policy
– Which channel to transmit (probed channel or
unprobed channel)?
System Goal
• Gain is the difference between the
expected rate of successful transmission
and the cost of probing
• Maximize the system gain
– Need to jointly optimize the probing and
selection policies
– Policies will be adaptive
– Policies may depend on higher order statistics
and cross statistics of channels.
Sense of déjà vu?
• Stopping Time problem (Bertsekas, Dynamic
Programming, Vol. 1)
– Maximize reward by optimally selecting
between two control actions at any given time:
(a) continue or (b) stop
– IID channels (Kanodia et al, 2004, Ji et al, 2004)
• Generalized stopping time problems
– Select between multiple control actions
– Only limited results are known
Sense of déjà vu?
• Stochastic multi-armed bandit problem
(Bertsekas, Dynamic Programming, Vol. 2)
– Observe the state of only one arm at each slot
– Select the arm to observe and get reward
from it
– State of an arm changes only after you
observe it
– State changes are stochastic
– Gittins index based policies maximize the
reward (Gittins-Jones, 1974)
Sense of déjà vu?
• Adversarial multi-armed bandit problems (Auer et
al, 2002)
– State of an arm may change even when you don’t
observe it
– State changes are adversarial
– Some versions allow you to get reward from only one
selected arm and subsequently observe the states of
all arms in the slot
– Given a class of N policies and a time horizon of T
slots, bound the performance loss as compared to the
best of the policies in terms of O(sqrt(T * log N)).
Our Contribution
• Polynomial complexity optimal solution when
channels are mutually and temporally
independent and each channels has two states.
– Sigmetrics, 2006, poster paper
• Polynomial complexity approximate solution
when channels have multiple states
– Approximation ratio ½ for arbitrary number of states
– Approximation ration 2/3 for 3 states
System Model
•
•
•
•
Single user with access to n channels
Each channel can be in one of K states, 0,…K-1.
Channel i is in state j with probability pji
Probability of successful transmission in state j is
rj
• User knows the state of channel i in a slot only if
it probes it.
• Cost of probing channel i is ci
• Channel states are temporally and mutually
important
Optimal policy for K = 2
(Sigmetrics, 2006)
• Exhaustive (S, i) policy:
– probes all channels in set S,
– if no probed channel is in state 1, transmit in channel i (backup).
• Optimal policy is exhaustive (Si, i) for some i
– Si = {j: p1j(1-p1i) > cj}
– Probes all channels with high cross-uncertainty with the backup
channel
– Probes channels in decreasing order of p1j/cj
• Search space for optimal policy restricted (n policies)
• Determine the optimal by explicitly evaluating the gain in
the above class.
Additional challenges for K > 2
• For K = 2, the optimal policy is static
– Actions can be determined before observation
• For K > 3, the optimal policy may be adaptive
– While probed channels are in state 0, focus on channels with
high expected reward
– After observing a channel in intermediate state, focus on
channels which have higher probabilities of being in the highest
state
– After observing a channel in intermediate state, may select either
the observed channel or the backup.
• Optimal policy is a decision tree
– Exponential number of decision trees
– Storage of decision tree consumes exponential complexity
Constant Factor Approximation
Algorithm for K > 2
• Let policy A be the best among those that always
transmits in a probed channel
– Gain GA
• Let policy B be the best among those that
always transmits in an unprobed channel
– Policy B is clearly the one that never probes any
channel and transmits in the channel with the highest
expected transmission rate.
– Gain GB
• Theorem 1: Max(GA , GB) >= ½ Maximum Gain
– The best among A and B attains half the maximum
gain
Proof of Theorem 1
• Q: Probability that the optimal
policy transmit in an unprobed
channel
• OPT: Expected gain of the
optimal policy
• G: Expected gain of the optimal policy
given that it transmits in an unprobed
channel
Proof of Theorem 1
• Modify the optimal so that it transmits in the best probed
channel whenever it was using the backup
– Expected gain OPT’ which is less than or equal to GA
– Expected Gain given that the optimal uses a backup x
– OPT – OPT’ <= Q (G – x)
<= QG
<= GB
OPT <= OPT’ + GB
<= GA + GB
<= 2 max(GA , GB)
Missing link
• Need to compute the policy that
maximizes gains among those that
transmit only in probed channels.
• Once a channel is observed to be in state
u, probe only those channels for which the
expected improvement above u exceeds
cost.
Special case: K = 3
• Theorem 2: There exists an optimum
policy which uses a unique backup
channel.
– There exists a channel l which satisfies the
following characteristic: if the optimum policy
uses a backup it uses l.
Approximation algorithm for
attaining 2/3 approximation ratio
• Let P(l) be the class of policies which
– never probe channel l and
– never uses any channel as a backup other than l.
• Let C be the best policy in P(l) for the best
choice of l.
• Theorem 3: Max(GA , GB , GC) >= 2/3 Maximum
Gain
– Best among A, B, C attains 2/3 of the optimum gain.
Open Problems
• Is the optimization problem NP-hard when K exceeds 2?
• Can we get better approximation ratios?
• What happens when channels are not mutually
independent?
• What happens when channels are not temporally
independent?
• What happens when you can simultaneously transmit in
multiple channels?
– Power control
• What happens when you need not transmit in every slot?
– Lazy scheduling