An Evaluation of Solutions to Filtering of Search Results

Download Report

Transcript An Evaluation of Solutions to Filtering of Search Results

An Evaluation of Solutions to
Filtering of Search Results
by Access Constraints
Gert Schmeltz Pedersen and Christian Tønsberg
Technical Information Center
@ Technical University of Denmark
funded by the DEFF Fedora project
triggered by discussion on fedora-commons-developers list in January 2008
investigation presented at Open Repositories, Southampton, April 2008
Overview
➲
The Problem
➲
Analysis
➲
Demonstration
➲
Larger-scale experiment
➲
Conclusion
The Problem
➲
Search results contain hits that the user does
not have the access rights to read

This has become a problem for repositories that want
fine-grained control over access rights by defining
XACML policies

e.g. eSciDoc, RepoMMan /REMAP

XACML = OASIS eXtensible Access Control Markup Language
Analysis
➲
The ideal solution
➲
What can you do with XACML policies?
➲
What are the costs of various solutions to filtering of search results?
➲
What are the characteristics of repositories and their usage that are
decisive for the choice of solution?
The Ideal Solution
Abstract example
of digital objects
1Qwe rty uio
2Qwe rty uio
 3Qwe rty uio
 4Qwe rty uio
 5Qwe rty uio


pas
pas
pas
pas
pas
Search result
without access filtering
User1 Search result
with access filtering



2Qwe rty uio pas

4Qwe rty uio pas
 5Qwe rty uio pas


2Qwe rty uio pas


4Qwe rty uio pas

User2 Search result
with access filtering


2Qwe rty uio pas



➲
5Qwe rty uio pas
The ideal solution includes:

Filtering mechanism must correspond to XACML access control mechanism

Hits after filtering must be readable by user

Objects readable by user must not be filtered out

Normal paging of hits

Show number of hits

Supported for large number of users / (“virtual”) user groups

Acceptable performance
What can you do with XACML policies?
<Policy PolicyId="deny-apia-if-not-tomcat-role">
<Target>
<Subjects>
<AnySubject />
</Subjects>
<Resources>
<AnyResource />
</Resources>
<Actions>
<Action>
<ActionMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:string-equal">
<AttributeValue>urn:fedora:names:fedora:2.1:action:api-a</AttributeValue>
<ActionAttributeDesignator AttributeId="urn:fedora:names:fedora:2.1:action:api" />
</ActionMatch>
</Action>
</Actions>
</Target>
<Rule RuleId="1" Effect="Deny">
<Condition FunctionId="urn:oasis:names:tc:xacml:1.0:function:not">
<Apply FunctionId="urn:oasis:names:tc:xacml:1.0:function:string-at-least-one-member-of">
<SubjectAttributeDesignator AttributeId="fedoraRole" MustBePresent="false" />
<Apply FunctionId="urn:oasis:names:tc:xacml:1.0:function:string-bag">
<AttributeValue>administrator</AttributeValue>
<AttributeValue>professor</AttributeValue>
</Apply>
</Apply>
</Condition>
</Rule>
</Policy>
What can you do with XACML policies?
authorization (AuthZ)
Policy enforcement point (PEP) - The
system entity that performs access
control, by making decision requests
and enforcing authorization decisions.
Policy decision point (PDP) - The system
entity that evaluates applicable policy
and renders an authorization decision.
Policy administration point (PAP) - The
system entity that creates a policy
or policy set
Policy information point (PIP) - The system
entity that acts as a source of attribute
values
e.g.authentication (AuthN)
providing user attributes
http://docs.oasis-open.org/xacml/2.0/access_control-xacml-2.0-core-spec-os.pdf
Three Alternatives - Basically
➲
Post-search filtering
- after search, ask deny/permit
for each hit in the page
- after a deny, add hit to page
- no exact hit count until the end
➲
In-search filtering
- logical partitioning of index
- adding index fields corresponding
to XACML policies
- adding query conditions
similarly (query rewrite)
- correspondence only in simple cases
➲
Pre-search filtering
- physical partitioning of index
- each index contains only
accessible objects
- at search, no filtering
- correspondence only in simple cases
Index
 1Qwe rty uio
 2Qwe rty uio
 3Qwe rty uio
 4Qwe rty uio
 5Qwe rty uio
Index
 1Qwe rty uio
 2Qwe rty uio
 3Qwe rty uio
 4Qwe rty uio
 5Qwe rty uio
abc
def
ghi
jkl
mno
Index
 1Qwe rty uio
 2Qwe rty uio
 3Qwe rty uio
 4Qwe rty uio
 5Qwe rty uio
Search result
  2Qwe rty uio
  4Qwe rty uio
 5Qwe rty uio
Search result
  2Qwe rty uio def
  4Qwe rty uio jkl
 5Qwe rty uio mno
Search result
  2Qwe rty uio
  4Qwe rty uioas
 5Qwe rty uioas
XACML deny for red user => filter out
XACML deny for green user => filter out
Filtering


2Qwe rty uio



4Qwe rty uio
5Qwe rty uio
Cost Model
Co s t m o d e l in d ica t in g im p o r t a n ce o f r e p o s it o r y a n d u s a g e ch a r a ct e r is t ics
Co s t e x p re s s e d a s “ t im e t o c o m p u t e ” - h e re in HOURS – d e t a ils b e h in d in m illis e c o n d s – TIME UNIT h e re 2 4 HOURS
4 i m port ant charact eri st i cs:
Many underl y i ng assum pt i ons t o be checked ex peri m ent al l y
num ber of fox m l records i n reposi t ory
10000
100000
1000000
10000000
50
500
5000
50000
500
5000
50000
500000
50
20
5
1
Post -Search
0.06
0.56
5.56
55.56
In-Search
0.07
0.69
6.94
69.44
Pre-Search
0.45
9.17
167.71
2778.47
Post -Search
0
0.01
0.06
0.56
In-Search
0
0
0.01
0.14
Pre-Search
0
0.09
1.68
16.67
Post -Search
0.01
0.35
13.89
694.44
In-Search
0.01
0.07
0.69
6.94
0
0
0.03
0.69
Post -Search
0.01
0.35
13.94
695.00
In-Search
0.01
0.07
0.71
7.08
0
0.09
1.71
17.37
num ber of updat es per t i m e uni t
num ber of searches per t i m e uni t
percent age of perm i t s, av erage ov er user/fox m l
INDEX CREATION --- IN HOURS :
INDEX UPDATE PER TIME UNIT --- IN HOURS :
SEARCH fi rst hi t page PER TIME UNIT --- IN HOURS :
Pre-Search
TOTAL COST PER TIME UNIT --- IN HOURS :
Pre-Search
In-Search COST: Keepi ng correspondence bet ween pol i ci es and ex t ra i ndex fi el ds and query rewri t e
Demonstration – the Smiley objects case
Demonstration – three user roles - postsearch
Demonstration setup
fedoraAdmin
smileyAdmin
smileyUser
Interface: SearchResultFiltering
XACML policy for smileyAdmin
XACML policy for smileyUser
Class: SearchResultFilteringDemoImpl
lucene
Demo:...
SmileyStuff
SmileyBeerGl
ass
Smiley...
SmileyWasteb
asket
allFoxmlToLucene.xslt
smileyFoxmlToLucene.xslt
inSmileyFoxmlToLucene.xslt
all Demo objects (25)
SmileyStuff++ (13)
in SmileyStuff (12)
for presearch
Demonstration setup
fedora-users.xml
<users>
<user name="fedoraAdmin" password="fedoraAdmin">
<attribute name="fedoraRole">
<value>administrator</value>
</attribute>
</user>
<user name="smileyAdmin1" password="smileyAdmin1">
<attribute name="smileyRole">
<value>SmileyAdmin</value>
</attribute>
</user>
<user name="smileyUser1" password="smileyUser1">
<attribute name="smileyRole">
<value>SmileyUser</value>
</attribute>
</user>
</users>
Demonstration setup
XACML policy for SmileyAdmin
<Policy xmlns="urn:oasis:names:tc:xacml:1.0:policy"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
PolicyId="administrator"
RuleCombiningAlgId="...:xacml:1.0:rule-combining-algorithm:first-applicable">
<Target>
<Subjects>
<Subject>
<SubjectMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:string-equal">
<AttributeValue DataType="...#string">SmileyAdmin</AttributeValue>
<SubjectAttributeDesignator AttributeId="smileyRole" MustBePresent="false"
DataType="http://www.w3.org/2001/XMLSchema#string"/>
</SubjectMatch>
</Subject>
</Subjects>
<Resources>
<Resource>
<ResourceMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:regexp-string-match">
<AttributeValue DataType="...#string">demo:Smiley.*</AttributeValue>
<ResourceAttributeDesignator
AttributeId="urn:fedora:names:fedora:2.1:resource:object:pid"
DataType="http://www.w3.org/2001/XMLSchema#string"/>
</ResourceMatch>
</Resource>
</Resources>
</Target>
<Rule RuleId="1" Effect="Permit"/>
</Policy>
Demonstration setup
XACML policy for SmileyUser
<Policy xmlns="urn:oasis:names:tc:xacml:1.0:policy"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
PolicyId="administrator"
RuleCombiningAlgId="...:xacml:1.0:rule-combining-algorithm:first-applicable">
<Target>
<Subjects>
<Subject>
<SubjectMatch MatchId="urn:oasis:names:tc:xacml:1.0:function:string-equal">
<AttributeValue DataType="...#string">SmileyUser</AttributeValue>
<SubjectAttributeDesignator AttributeId="smileyRole" MustBePresent="false"
DataType="http://www.w3.org/2001/XMLSchema#string"/>
</SubjectMatch>
</Subject>
</Subjects>
...
</Target>
<Rule RuleId="1" Effect="Permit">
<Condition FunctionId="urn:oasis:names:tc:xacml:1.0:function:not">
<Apply FunctionId="urn:oasis:names:tc:xacml:1.0:function:string-is-in">
<AttributeValue DataType="...#string">demo:SmileyStuff</AttributeValue>
<ResourceAttributeDesignator
AttributeId="urn:fedora:names:fedora:2.1:resource:object:pid"
DataType="http://www.w3.org/2001/XMLSchema#string"/>
</Apply>
</Condition>
</Rule>
</Policy>
Demonstration setup
public interface SearchResultFiltering {
public String selectIndexNameForPresearch(String fgsUserName, String indexName);
public String rewriteQueryForInsearch(String fgsUserName, String indexName, String query);
public StringBuffer filterResultsetForPostsearch(String fgsUserName, StringBuffer resultSetXml);
}
/**
* This demo implementation of SearchResultFiltering shall reflect the XACML policies
* for the two demo smiley roles SmileyAdmin and SmileyUser
*/
public class SearchResultFilteringDemoImpl implements SearchResultFiltering {
public String selectIndexNameForPresearch(String fgsUserName, String indexNameParam) {
String indexName = indexNameParam;
String userRole = getUserRole(fgsUserName);
if ("SmileyAdmin".equals(userRole)) indexName = "SmileyAdminIndex";
else if ("SmileyUser".equals(userRole)) indexName = "SmileyUserIndex";
else if ("administrator".equals(userRole)) indexName = "AllObjectsIndex";
return indexName;
}
public String rewriteQueryForInsearch(String fgsUserName, String indexName, String query) {
// query rewriting shall correspond to the additional index field(s) in the xslt indexing stylesheet.
String rewrittenQuery = query;
String userRole = getUserRole(fgsUserName);
if ("SmileyAdmin".equals(userRole))
rewrittenQuery = "( " + query + " ) AND smiley AND PID:demo*";
else if ("SmileyUser".equals(userRole))
rewrittenQuery = "( " + query + " ) AND smiley AND PID:demo* NOT PID:\"demo:SmileyStuff\"";
return rewrittenQuery;
}
public StringBuffer filterResultsetForPostsearch(String fgsUserName, StringBuffer resultSetXml) {
StringBuffer result = resultSetXml;
// foreach hit in resultset, evaluate XACML policies, if not deny (~permit) then include in result
...
}
Demonstration setup
Indexing stylesheet for administrator
...
<xsl:if test="foxml:digitalObject/foxml:objectProperties/foxml:property[@NAME='...#state' and @VALUE='Active']">
<xsl:if test="not(foxml:digitalObject/foxml:datastream[@ID='METHODMAP']
or foxml:digitalObject/foxml:datastream[@ID='DS-COMPOSITE-MODEL'])">
<xsl:apply-templates mode="activeFedoraObject"/>
</xsl:if>
</xsl:if>
...
Indexing stylesheet for SmileyAdmin
...
<xsl:if test="foxml:digitalObject/foxml:objectProperties/foxml:property[@NAME='...#state' and @VALUE='Active']">
<xsl:if test="not(foxml:digitalObject/foxml:datastream[@ID='METHODMAP']
or foxml:digitalObject/foxml:datastream[@ID='DS-COMPOSITE-MODEL'])">
<xsl:if test="starts-with($PID,'demo:Smiley')">
<xsl:apply-templates mode="activeFedoraObject"/>
</xsl:if>
</xsl:if>
</xsl:if>
...
Indexing stylesheet for SmileyUser
...
<xsl:if test="foxml:digitalObject/foxml:objectProperties/foxml:property[@NAME='...#state' and @VALUE='Active']">
<xsl:if test="not(foxml:digitalObject/foxml:datastream[@ID='METHODMAP']
or foxml:digitalObject/foxml:datastream[@ID='DS-COMPOSITE-MODEL'])">
<xsl:if test="starts-with($PID,'demo:Smiley') and not(starts-with($PID,'demo:SmileyStuff'))">
<xsl:apply-templates mode="activeFedoraObject"/>
</xsl:if>
</xsl:if>
</xsl:if>
...
Demonstration – three user roles - insearch
Demonstration – three user roles - presearch
Larger-scale experiment - setup
many
user
roles
Interface: SearchResultFiltering
XACML policy for user role i
XACML policy for user role j
Class: SearchResultFilteringEvalImpl
130.000
lucene
objects
from
Danish
Research
Databases
allFoxmlToLucene.xslt
userroleiFoxmlToLucene.xslt
userrolejFoxmlToLucene.xslt
allObjectsIndex
userRoleiIndex
userRolejIndex
for presearch
Conclusion
➲
The investigation and analysis clarified important aspects



➲
Three alternative approaches, no one close to the ideal solution
Cost model indicated importance of characteristics
Tailored application-specific shortcuts necessary
The demonstration implementation has pinpointed details of the approaches



The most significant and difficult aspect in in-search and pre-search is how to map
from policy to index creation and further for in-search how to rewrite queries
This indicates that policies must be kept simple
For post-search, policy check for each hit is costly
➲
A larger-scale experiment on a large repository with many user roles underway
➲
Next version of GSearch to include search result filtering, October-November 2008.