Transcript slides

Learning and Testing
Submodular Functions
Grigory Yaroslavtsev
http://grigory.us
Slides at
http://grigory.us/cis625/lecture3.pdf
CIS 625: Computational Learning Theory
Submodularity
β€’ Discrete analog of convexity/concavity, β€œlaw of
diminishing returns”
β€’ Applications: combintorial optimization, AGT, etc.
Let 𝒇: 2 𝑋 β†’ [0, 𝑅]:
β€’ Discrete derivative:
πœ•π‘₯ 𝒇 𝑆 = 𝒇 𝑆 βˆͺ {π‘₯} βˆ’ 𝒇 𝑆 ,
π‘“π‘œπ‘Ÿ 𝑆 βŠ† 𝑋, π‘₯ βˆ‰ 𝑆
β€’ Submodular function:
πœ•π‘₯ 𝒇 𝑆 β‰₯ πœ•π‘₯ 𝒇 𝑇 , βˆ€ 𝑆 βŠ† 𝑇 βŠ† 𝑋, π‘₯ βˆ‰ 𝑇
Approximating everywhere
β€’ Q1: Approximate a submodular
𝒇: 2 𝑋 β†’ [0, 𝑅] for all arguments with
only poly(|X|) queries?
β€’ A1: Only Θ
𝑋 -approximation
(multiplicative) possible [Goemans,
Harvey, Iwata, Mirrokni, SODA’09]
β€’ Q2: Only for 1 βˆ’ πœ– -fraction of arguments (PAC-style
membership queries under uniform distribution)?
Pr
Pr 𝑋 𝑨 𝑺 = 𝒇 𝑺
π‘Ÿπ‘Žπ‘›π‘‘π‘œπ‘šπ‘›π‘’π‘ π‘  π‘œπ‘“ 𝑨 π‘ΊβˆΌπ‘ˆ(2 )
β€’ A2: Almost as hard [Balcan, Harvey, STOC’11].
learning with
1
β‰₯1βˆ’πœ– β‰₯
2
Approximate learning
β€’ PMAC-learning (Multiplicative), with poly(|X|) queries :
Pr
Pr
π‘Ÿπ‘Žπ‘›π‘‘. π‘œπ‘“ 𝑨 π‘ΊβˆΌπ‘ˆ(2𝑋 )
1
3
Ξ© 𝑋
𝟏
𝒇 𝑺 ≀ 𝑨 𝑺 ≀ πœΆπ’‡ 𝑺
𝜢
β‰€πœΆβ‰€π‘‚
𝑋
1
β‰₯1βˆ’πœ– β‰₯
2
[Balcan, Harvey ’11]
β€’ PAAC-learning (Additive)
Pr
Pr 𝑋
π‘Ÿπ‘Žπ‘›π‘‘. π‘œπ‘“ 𝑨 π‘ΊβˆΌπ‘ˆ(2 )
β€’ Running time: 𝑋
1
|𝒇 𝑺 βˆ’ 𝑨 𝑺 | ≀ 𝜷 β‰₯ 1 βˆ’ πœ– β‰₯
2
𝑂
𝑅 2
𝜷
β€’ Running time: poly 𝑋
SODA’12]
1
log(πœ–)
𝑅 2
𝜷
,
[Gupta, Hardt, Roth, Ullman, STOC’11]
1
log
πœ–
[Cheraghchi, Klivans, Kothari, Lee,
𝑋
Learning 𝑓: 2 β†’ [0, 𝑅]
β€’ For all algorithms πœ– = π‘π‘œπ‘›π‘ π‘‘.
Learning
Time
Extra
features
Goemans,
Harvey,
Iwata,
Mirrokni
Balcan,
Harvey
𝑂
𝑋 approximation
Everywhere
PMAC
Multiplicative 𝜢
Poly(|X|)
Poly(|X|)
𝜢=𝑂
Gupta,
Hardt,
Roth,
Ullman
Cheraghchi, Raskhodnikova, Y.
Klivans,
Kothari,
Lee
PAAC
Additive 𝜷
𝑋
Under arbitrary
distribution
𝑅 2
𝑋
Tolerant
queries
𝑂 𝜷
PAC
𝒇: πŸπ‘Ώ β†’ 𝟎, … , 𝑹
(bounded integral
range 𝑅 ≀ |𝑋|)
𝑋 3 𝑅𝑂(𝑅⋅log 𝑅)
Polylog(|X|)
𝑅 𝑂(𝑅⋅log 𝑅) queries
SQqueries,
Agnostic
Learning: Bigger picture
βŠ†
Subadditive
βŠ†
XOS = Fractionally subadditive
}
[Badanidiyuru, Dobzinski,
Fu, Kleinberg, Nisan,
Roughgarden,SODA’12]
βŠ†
Submodular
βŠ†
Gross substitutes
OXS
Additive
(linear)
Coverage (valuations)
Other positive results:
β€’ Learning valuation functions [Balcan,
Constantin, Iwata, Wang, COLT’12]
β€’ (1 + πœ–) PMAC-learning (sketching) coverage
functions [BDFKNR’12]
β€’ (1 + πœ–) PMAC learning Lipschitz submodular
functions [BH’10] (concentration around
average via Talagrand)
Discrete convexity
β€’ Monotone convex 𝑓: {1, … , 𝑛} β†’ 0, … , 𝑅
8
6
4
2
0
1
2
3
… <=R …
…
…
…
…
…
…
…
n
…
…
n
β€’ Convex 𝑓: {1, … , 𝑛} β†’ 0, … , 𝑅
8
6
4
2
0
1
2
3
… <=R …
…
…
…>= n-R…
𝑋
Discrete submodularity 𝑓: 2 β†’ {0, … , 𝑅}
β€’ Case study: 𝑅 = 1 (Boolean submodular functions 𝑓: 0,1 𝑛 β†’ {0,1})
Monotone submodular = π‘₯𝑖1 ∨ π‘₯𝑖 2 ∨ β‹― ∨ π‘₯π‘–π‘Ž (monomial)
Submodular = (π‘₯𝑖1 ∨ β‹― ∨ π‘₯π‘–π‘Ž ) ∧ (π‘₯𝑗1 ∨ β‹― ∨ π‘₯𝑗𝑏 ) (2-term CNF)
β€’ Monotone submodular
β€’ Submodular
𝑋
𝑋
𝑺 β‰₯ 𝑿 βˆ’π‘Ή
𝑺 ≀𝑹
𝑺 ≀𝑹
βˆ…
βˆ…
Discrete monotone submodularity
β€’ Monotone submodular 𝑓: 2𝑋 β†’ 0, … , 𝑅
β‰₯ π’Žπ’‚π’™(𝒇 π‘ΊπŸ , 𝒇(π‘ΊπŸ ))
β‰₯ 𝒇(π‘ΊπŸ )
β‰₯ 𝒇(π‘ΊπŸ )
𝑺 ≀𝑹
𝒇(π‘ΊπŸ )
𝒇(π‘ΊπŸ )
Discrete monotone submodularity
β€’ Theorem: for monotone submodular 𝑓: 2 𝑋 β†’
0, … , 𝑅 for all 𝑇: 𝑓 𝑇 = max 𝑓(𝑆)
π‘†βŠ†π‘‡, 𝑆 ≀𝑅
β€’ 𝑓 𝑇 β‰₯
max
π‘†βŠ†π‘‡, 𝑆 ≀𝑅
𝑓(𝑆) (by monotonicity)
𝑇
𝑺 ≀𝑹
𝑺 βŠ† 𝑻,
𝑺 ≀𝑹
Discrete monotone submodularity
β€’ 𝑓 𝑇 ≀
max
π‘†βŠ†π‘‡, 𝑆 ≀𝑅
𝑓 𝑆
β€’ S’ = smallest subset of 𝑇 such that 𝑓 𝑇 = 𝑓 𝑆’
β€’ βˆ€π‘₯ ∈ 𝑆 β€² we have πœ•π‘₯ 𝑓 𝑆 β€² βˆ– π‘₯ > 0 =>
Restriction of 𝑓
β€²
𝑆
on 2
is monotone increasing =>|𝑆’| ≀ 𝑅
𝑇
𝑆 β€² : 𝑓 𝑆 β€² = 𝑓(𝑇)
πœ•π‘₯ 𝑓 𝑆 β€² βˆ– π‘₯
𝑺 ≀𝑹
𝑆′
>0
Representation by a formula
β€’ Theorem: for monotone submodular 𝑓: 2𝑋 β†’ 0, … , 𝑅
for all 𝑇:
𝑓 𝑇 = max 𝑓(𝑆)
π‘†βŠ†π‘‡, 𝑆 ≀𝑅
β€’ Alternative notation: 𝑋 β†’ 𝑛, 2𝑋 β†’ (π‘₯1 , … , π‘₯𝑛 )
β€’ Boolean kβˆ’DNF = ⋁ π‘₯𝑖1 ∧ π‘₯𝑖2 ∧ β‹― ∧ π‘₯π‘–π’Œ
β€’ Pseudoβˆ’Boolean kβˆ’DNF ( ∨ β†’ π’Žπ’‚π’™, 𝐴𝑖 = 1 β†’ π‘¨π’Š ∈ R):
π’Žπ’‚π’™π’Š [π‘¨π’Š β‹… π‘₯𝑖1 ∧ π‘₯𝑖2 ∧ β‹― ∧ π‘₯π‘–π’Œ ] (Monotone, if no negations)
β€’ Theorem (restated):
Monotone submodular 𝑓 π‘₯1 , … , π‘₯𝑛 β†’ 0, … , 𝑹 can be
represented as a monotone pseudo-Boolean 𝑹-DNF with
constants 𝐴𝑖 ∈ 0, … , 𝑹 .
Discrete submodularity
β€’ Submodular 𝑓 π‘₯1 , … , π‘₯𝑛 β†’ 0, … , 𝑹 can be
represented as a pseudo-Boolean 2R-DNF
with constants 𝐴𝑖 ∈ 0, … , 𝑹 .
β€’ Hint [Lovasz] (Submodular monotonization):
Given submodular 𝒇, define
π’‡π’Žπ’π’ 𝑺 = π’Žπ’Šπ’π‘ΊβŠ†π‘» 𝒇 𝑻
Then π’‡π’Žπ’π’ is monotone and submodular.
𝑋
𝑺 β‰₯ 𝑿 βˆ’π‘Ή
𝑺 ≀𝑹
βˆ…
Proof
β€’ We’re done if we have a coverage π‘ͺ βŠ† 2 𝑋 :
1. All 𝐓 ∈ π‘ͺ have large size: 𝐓 β‰₯ 𝑿 βˆ’ 𝑹
2. For all 𝑺 ∈ 2𝑿 there exists 𝐓 ∈ π‘ͺ ∢ 𝑺 βŠ† 𝑻
3. For every 𝐓 ∈ π‘ͺ restriction 𝒇𝑻 of 𝒇 on 2𝑻 is monotone
𝑋
𝐓
𝒇𝑻
β€’ Every 𝒇𝑻 is a monotone pB R-DNF (3)
β€’ Add at most R negated variables to
every clause to restrict to 2𝑻 (1)
β€’ 𝒇 𝑆 = max 𝒇𝑻 (𝑆) (2)
π‘»βˆˆπ‘ͺ
βˆ…
Proof
β€’ There is no such coverage => relaxation [GHRU’11]
– All 𝐓 ∈ π‘ͺ have large size: 𝐓 β‰₯ |𝑿| βˆ’ 𝑹
– For all 𝑺 ∈ 2 𝑋 there exists a pair 𝐓 β€² βŠ† 𝑻 ∈ π‘ͺ:
𝐓′ βŠ† 𝑺 βŠ† 𝑻
– Restriction of 𝒇 on all π‘Ÿ 𝑻’, 𝑻 : 𝑺 𝐓 β€² βŠ† 𝑺 βŠ† 𝑻} is
monotone
𝑋
𝐓
𝐓′
Coverage by monotone lower bounds
π’‡π’Žπ’π’
(𝑺)
𝑻
𝑻
= 𝒇(𝑺)
π’‡π’Žπ’π’
𝑺 ≀ 𝒇(𝑺)
𝑻
𝑺
𝑺
𝑻’
βˆ…
π’Žπ’π’
β€’ Let π’‡π’Žπ’π’
be
defined
as
𝒇
(𝑺) = 𝐦𝐒𝐧
𝒇(𝑺′)
𝑻
𝑻
β€²
π‘ΊβŠ†π‘Ί βŠ†π‘»
– π’‡π’Žπ’π’
is monotone submodular [Lovasz]
𝑻
– For all 𝑺 βŠ† 𝑻 we have π’‡π’Žπ’π’
𝑺 ≀ 𝒇(𝑺)
𝑻
– For all 𝐓 β€² βŠ† 𝑺 βŠ† 𝑻 we have π’‡π’Žπ’π’
(𝑺) = 𝒇(𝑺)
𝑻
β€’ 𝒇 𝑺 = 𝐦𝐚𝐱 π’‡π’Žπ’π’
(𝑺) (where π’‡π’Žπ’π’
is a monotone pB R-DNF)
𝑻
𝑻
π‘»βˆˆπ‘ͺ
Learning pB-formulas and k-DNF
β€’ 𝐷𝑁𝐹 π’Œ,𝑹 = class of pB π’Œ-DNF with 𝐴𝑖 ∈ {0, … , 𝑹}
β€’ i-slice π’‡π’Š π‘₯1 , … , π‘₯𝑛 β†’ {0,1} defined as
π’‡π’Š π‘₯1 , … , π‘₯𝑛 = 1 iff
𝒇 π‘₯1 , … , π‘₯𝑛 β‰₯ 𝑖
β€’ If 𝒇 ∈ 𝐷𝑁𝐹 π’Œ,𝑹 its i-slices π’‡π’Š are π’Œ-DNF and:
𝒇 π‘₯1 , … , π‘₯𝑛 = max (𝑖 β‹… π’‡π’Š π‘₯1 , … , π‘₯𝑛 )
1≀𝑖≀𝑹
β€’ PAC-learning:
Pr
Pr
π‘Ÿπ‘Žπ‘›π‘‘(𝑨) π‘ΊβˆΌπ‘ˆ( 0,1
𝑛)
𝑨 𝑺 =𝒇 𝑺
1
β‰₯1βˆ’πœ– β‰₯
2
β€’ Learn every i-slice π’‡π’Š on (1 βˆ’ πœ– / 𝑅) fraction of arguments => union bound
Learning Fourier coefficients
β€’ Learn π’‡π’Š (π’Œ-DNF) on 1 βˆ’ πœ– β€² = (1 βˆ’ πœ– / 𝑹) fraction of arguments
β€’ Fourier sparsity 𝑺π‘ͺ 𝝐 = # of largest Fourier
coefficients sufficient to PAC-learn every 𝒇 ∈ π‘ͺ
𝑢(π’Œ log
β€’ π‘Ίπ’Œβˆ’DNF 𝝐 = π’Œ
𝟏
𝝐
)
[Mansour]: doesn’t depend on n!
– Kushilevitz-Mansour (Goldreich-Levin): π‘π‘œπ‘™π‘¦ 𝑛, 𝑺𝑭 queries/time.
– ``Attribute efficient learning’’: π’‘π’π’π’šπ’π’π’ˆ 𝑛 β‹… π‘π‘œπ‘™π‘¦ 𝑺𝑭 queries
– Lower bound: Ξ©(2π’Œ ) queries to learn a random π’Œ-junta (∈ π’Œ-DNF) up to
constant precision.
𝑢(π’Œ log
β€’ π‘Ίπ·π‘πΉπ’Œ,𝑹 𝝐 = π’Œ
𝑹
𝝐
)
– Optimizations: Do all R iterations of KM/GL in parallel by reusing queries
Property testing
β€’ Let π‘ͺ be the class of submodular 𝒇: 0,1 𝑛 β†’ {0, … , 𝑹}
β€’ How to (approximately) test, whether a given 𝒇 is in π‘ͺ?
β€’ Property tester: (randomized) algorithm for distinguishing:
1. 𝑓 ∈ π‘ͺ
2. (𝝐-far): min 𝒇 – π’ˆ
π‘”βˆˆπ‘ͺ
𝝐-far
𝑯
β‰₯ 𝝐 2𝑛
𝝐-close
π‘ͺ
β€’ Key idea: π’Œ-DNFs have small representations:
– [Gopalan, Meka,Reingold CCC’12] (using quasi-sunflowers [Rossman’10])
βˆ€πœ– > 0, βˆ€ π’Œ-DNF formula F there exists:
π’Œ-DNF formula F’ of size ≀ π’Œ
1 𝑂(π’Œ)
log
𝝐
such that 𝐹 – 𝐹’
𝐻
≀ 𝝐2𝑛
Testing by implicit learning
β€’ Good approximation by juntas => efficient property testing
[Diakonikolas, Lee, Matulef, Onak ,Rubinfeld, Servedio, Wan]
– πœ–-approximation by 𝑱(𝝐)-junta
– Good dependence on 𝝐: π‘±π’Œβˆ’DNF 𝝐 = π’Œ
β€’ For submodular functions 𝒇: 0,1
– Query complexity
1 𝑂(π’Œ)
log 𝝐
𝑛
β†’ {0, … , 𝑹}
𝑹 𝑂 (𝑹)
𝑹 log
, independent of n!
𝝐
– Running time exponential in 𝑱 𝝐
– Ξ© π’Œ lower bound for testing π’Œ-DNF (reduction from Gap Set
Intersection)
β€’ [Blais, Onak, Servedio, Y.] exact characterization of submodular functions
1
𝐽 𝝐 = 𝑂 𝑹 log 𝑹 + log
𝝐
(𝑹+𝟏)
Previous work on testing submodularity
𝒇: 0,1 𝑛 β†’ [0, 𝑅] [Parnas, Ron, Rubinfeld β€˜03, Seshadhri, Vondrak,
ICS’11]:
β€’ Upper bound (𝟏/𝝐)𝑢( 𝒏) .
} Gap in query complexity
β€’ Lower bound: 𝛀 𝒏
Special case: coverage functions [Chakrabarty, Huang, ICALP’12].
Directions
β€’ Close gaps between upper and lower bounds,
extend to more general learning/testing settings
β€’ Connections to optimization?
β€’ What if we use 𝐿1 βˆ’distance between functions
instead of Hamming distance in property testing?
[Berman, Raskhodnikova, Y.]