Transcript slides
Learning and Testing
Submodular Functions
Grigory Yaroslavtsev
http://grigory.us
Slides at
http://grigory.us/cis625/lecture3.pdf
CIS 625: Computational Learning Theory
Submodularity
β’ Discrete analog of convexity/concavity, βlaw of
diminishing returnsβ
β’ Applications: combintorial optimization, AGT, etc.
Let π: 2 π β [0, π
]:
β’ Discrete derivative:
ππ₯ π π = π π βͺ {π₯} β π π ,
πππ π β π, π₯ β π
β’ Submodular function:
ππ₯ π π β₯ ππ₯ π π , β π β π β π, π₯ β π
Approximating everywhere
β’ Q1: Approximate a submodular
π: 2 π β [0, π
] for all arguments with
only poly(|X|) queries?
β’ A1: Only Ξ
π -approximation
(multiplicative) possible [Goemans,
Harvey, Iwata, Mirrokni, SODAβ09]
β’ Q2: Only for 1 β π -fraction of arguments (PAC-style
membership queries under uniform distribution)?
Pr
Pr π π¨ πΊ = π πΊ
πππππππππ π ππ π¨ πΊβΌπ(2 )
β’ A2: Almost as hard [Balcan, Harvey, STOCβ11].
learning with
1
β₯1βπ β₯
2
Approximate learning
β’ PMAC-learning (Multiplicative), with poly(|X|) queries :
Pr
Pr
ππππ. ππ π¨ πΊβΌπ(2π )
1
3
Ξ© π
π
π πΊ β€ π¨ πΊ β€ πΆπ πΊ
πΆ
β€πΆβ€π
π
1
β₯1βπ β₯
2
[Balcan, Harvey β11]
β’ PAAC-learning (Additive)
Pr
Pr π
ππππ. ππ π¨ πΊβΌπ(2 )
β’ Running time: π
1
|π πΊ β π¨ πΊ | β€ π· β₯ 1 β π β₯
2
π
π
2
π·
β’ Running time: poly π
SODAβ12]
1
log(π)
π
2
π·
,
[Gupta, Hardt, Roth, Ullman, STOCβ11]
1
log
π
[Cheraghchi, Klivans, Kothari, Lee,
π
Learning π: 2 β [0, π
]
β’ For all algorithms π = ππππ π‘.
Learning
Time
Extra
features
Goemans,
Harvey,
Iwata,
Mirrokni
Balcan,
Harvey
π
π approximation
Everywhere
PMAC
Multiplicative πΆ
Poly(|X|)
Poly(|X|)
πΆ=π
Gupta,
Hardt,
Roth,
Ullman
Cheraghchi, Raskhodnikova, Y.
Klivans,
Kothari,
Lee
PAAC
Additive π·
π
Under arbitrary
distribution
π
2
π
Tolerant
queries
π π·
PAC
π: ππΏ β π, β¦ , πΉ
(bounded integral
range π
β€ |π|)
π 3 π
π(π
β
log π
)
Polylog(|X|)
π
π(π
β
log π
) queries
SQqueries,
Agnostic
Learning: Bigger picture
β
Subadditive
β
XOS = Fractionally subadditive
}
[Badanidiyuru, Dobzinski,
Fu, Kleinberg, Nisan,
Roughgarden,SODAβ12]
β
Submodular
β
Gross substitutes
OXS
Additive
(linear)
Coverage (valuations)
Other positive results:
β’ Learning valuation functions [Balcan,
Constantin, Iwata, Wang, COLTβ12]
β’ (1 + π) PMAC-learning (sketching) coverage
functions [BDFKNRβ12]
β’ (1 + π) PMAC learning Lipschitz submodular
functions [BHβ10] (concentration around
average via Talagrand)
Discrete convexity
β’ Monotone convex π: {1, β¦ , π} β 0, β¦ , π
8
6
4
2
0
1
2
3
β¦ <=R β¦
β¦
β¦
β¦
β¦
β¦
β¦
β¦
n
β¦
β¦
n
β’ Convex π: {1, β¦ , π} β 0, β¦ , π
8
6
4
2
0
1
2
3
β¦ <=R β¦
β¦
β¦
β¦>= n-Rβ¦
π
Discrete submodularity π: 2 β {0, β¦ , π
}
β’ Case study: π
= 1 (Boolean submodular functions π: 0,1 π β {0,1})
Monotone submodular = π₯π1 β¨ π₯π 2 β¨ β― β¨ π₯ππ (monomial)
Submodular = (π₯π1 β¨ β― β¨ π₯ππ ) β§ (π₯π1 β¨ β― β¨ π₯ππ ) (2-term CNF)
β’ Monotone submodular
β’ Submodular
π
π
πΊ β₯ πΏ βπΉ
πΊ β€πΉ
πΊ β€πΉ
β
β
Discrete monotone submodularity
β’ Monotone submodular π: 2π β 0, β¦ , π
β₯ πππ(π πΊπ , π(πΊπ ))
β₯ π(πΊπ )
β₯ π(πΊπ )
πΊ β€πΉ
π(πΊπ )
π(πΊπ )
Discrete monotone submodularity
β’ Theorem: for monotone submodular π: 2 π β
0, β¦ , π
for all π: π π = max π(π)
πβπ, π β€π
β’ π π β₯
max
πβπ, π β€π
π(π) (by monotonicity)
π
πΊ β€πΉ
πΊ β π»,
πΊ β€πΉ
Discrete monotone submodularity
β’ π π β€
max
πβπ, π β€π
π π
β’ Sβ = smallest subset of π such that π π = π πβ
β’ βπ₯ β π β² we have ππ₯ π π β² β π₯ > 0 =>
Restriction of π
β²
π
on 2
is monotone increasing =>|πβ| β€ π
π
π β² : π π β² = π(π)
ππ₯ π π β² β π₯
πΊ β€πΉ
πβ²
>0
Representation by a formula
β’ Theorem: for monotone submodular π: 2π β 0, β¦ , π
for all π:
π π = max π(π)
πβπ, π β€π
β’ Alternative notation: π β π, 2π β (π₯1 , β¦ , π₯π )
β’ Boolean kβDNF = β π₯π1 β§ π₯π2 β§ β― β§ π₯ππ
β’ PseudoβBoolean kβDNF ( β¨ β πππ, π΄π = 1 β π¨π β R):
ππππ [π¨π β
π₯π1 β§ π₯π2 β§ β― β§ π₯ππ ] (Monotone, if no negations)
β’ Theorem (restated):
Monotone submodular π π₯1 , β¦ , π₯π β 0, β¦ , πΉ can be
represented as a monotone pseudo-Boolean πΉ-DNF with
constants π΄π β 0, β¦ , πΉ .
Discrete submodularity
β’ Submodular π π₯1 , β¦ , π₯π β 0, β¦ , πΉ can be
represented as a pseudo-Boolean 2R-DNF
with constants π΄π β 0, β¦ , πΉ .
β’ Hint [Lovasz] (Submodular monotonization):
Given submodular π, define
ππππ πΊ = ππππΊβπ» π π»
Then ππππ is monotone and submodular.
π
πΊ β₯ πΏ βπΉ
πΊ β€πΉ
β
Proof
β’ Weβre done if we have a coverage πͺ β 2 π :
1. All π β πͺ have large size: π β₯ πΏ β πΉ
2. For all πΊ β 2πΏ there exists π β πͺ βΆ πΊ β π»
3. For every π β πͺ restriction ππ» of π on 2π» is monotone
π
π
ππ»
β’ Every ππ» is a monotone pB R-DNF (3)
β’ Add at most R negated variables to
every clause to restrict to 2π» (1)
β’ π π = max ππ» (π) (2)
π»βπͺ
β
Proof
β’ There is no such coverage => relaxation [GHRUβ11]
β All π β πͺ have large size: π β₯ |πΏ| β πΉ
β For all πΊ β 2 π there exists a pair π β² β π» β πͺ:
πβ² β πΊ β π»
β Restriction of π on all π π»β, π» : πΊ π β² β πΊ β π»} is
monotone
π
π
πβ²
Coverage by monotone lower bounds
ππππ
(πΊ)
π»
π»
= π(πΊ)
ππππ
πΊ β€ π(πΊ)
π»
πΊ
πΊ
π»β
β
πππ
β’ Let ππππ
be
defined
as
π
(πΊ) = π¦π’π§
π(πΊβ²)
π»
π»
β²
πΊβπΊ βπ»
β ππππ
is monotone submodular [Lovasz]
π»
β For all πΊ β π» we have ππππ
πΊ β€ π(πΊ)
π»
β For all π β² β πΊ β π» we have ππππ
(πΊ) = π(πΊ)
π»
β’ π πΊ = π¦ππ± ππππ
(πΊ) (where ππππ
is a monotone pB R-DNF)
π»
π»
π»βπͺ
Learning pB-formulas and k-DNF
β’ π·ππΉ π,πΉ = class of pB π-DNF with π΄π β {0, β¦ , πΉ}
β’ i-slice ππ π₯1 , β¦ , π₯π β {0,1} defined as
ππ π₯1 , β¦ , π₯π = 1 iff
π π₯1 , β¦ , π₯π β₯ π
β’ If π β π·ππΉ π,πΉ its i-slices ππ are π-DNF and:
π π₯1 , β¦ , π₯π = max (π β
ππ π₯1 , β¦ , π₯π )
1β€πβ€πΉ
β’ PAC-learning:
Pr
Pr
ππππ(π¨) πΊβΌπ( 0,1
π)
π¨ πΊ =π πΊ
1
β₯1βπ β₯
2
β’ Learn every i-slice ππ on (1 β π / π
) fraction of arguments => union bound
Learning Fourier coefficients
β’ Learn ππ (π-DNF) on 1 β π β² = (1 β π / πΉ) fraction of arguments
β’ Fourier sparsity πΊπͺ π = # of largest Fourier
coefficients sufficient to PAC-learn every π β πͺ
πΆ(π log
β’ πΊπβDNF π = π
π
π
)
[Mansour]: doesnβt depend on n!
β Kushilevitz-Mansour (Goldreich-Levin): ππππ¦ π, πΊπ queries/time.
β ``Attribute efficient learningββ: πππππππ π β
ππππ¦ πΊπ queries
β Lower bound: Ξ©(2π ) queries to learn a random π-junta (β π-DNF) up to
constant precision.
πΆ(π log
β’ πΊπ·ππΉπ,πΉ π = π
πΉ
π
)
β Optimizations: Do all R iterations of KM/GL in parallel by reusing queries
Property testing
β’ Let πͺ be the class of submodular π: 0,1 π β {0, β¦ , πΉ}
β’ How to (approximately) test, whether a given π is in πͺ?
β’ Property tester: (randomized) algorithm for distinguishing:
1. π β πͺ
2. (π-far): min π β π
πβπͺ
π-far
π―
β₯ π 2π
π-close
πͺ
β’ Key idea: π-DNFs have small representations:
β [Gopalan, Meka,Reingold CCCβ12] (using quasi-sunflowers [Rossmanβ10])
βπ > 0, β π-DNF formula F there exists:
π-DNF formula Fβ of size β€ π
1 π(π)
log
π
such that πΉ β πΉβ
π»
β€ π2π
Testing by implicit learning
β’ Good approximation by juntas => efficient property testing
[Diakonikolas, Lee, Matulef, Onak ,Rubinfeld, Servedio, Wan]
β π-approximation by π±(π)-junta
β Good dependence on π: π±πβDNF π = π
β’ For submodular functions π: 0,1
β Query complexity
1 π(π)
log π
π
β {0, β¦ , πΉ}
πΉ π (πΉ)
πΉ log
, independent of n!
π
β Running time exponential in π± π
β Ξ© π lower bound for testing π-DNF (reduction from Gap Set
Intersection)
β’ [Blais, Onak, Servedio, Y.] exact characterization of submodular functions
1
π½ π = π πΉ log πΉ + log
π
(πΉ+π)
Previous work on testing submodularity
π: 0,1 π β [0, π
] [Parnas, Ron, Rubinfeld β03, Seshadhri, Vondrak,
ICSβ11]:
β’ Upper bound (π/π)πΆ( π) .
} Gap in query complexity
β’ Lower bound: π π
Special case: coverage functions [Chakrabarty, Huang, ICALPβ12].
Directions
β’ Close gaps between upper and lower bounds,
extend to more general learning/testing settings
β’ Connections to optimization?
β’ What if we use πΏ1 βdistance between functions
instead of Hamming distance in property testing?
[Berman, Raskhodnikova, Y.]