Indexing structured data with security information
A system for indexing and searching includes an input interface and a processor. The interface is to receive a request to search for a term. The processor is to determine a search response based at least in part on a security associated with an index field and the term.
Latest Workday, Inc. Patents:
- Email-based transactions with forms
- Systems and methods for improving computational speed of planning by tracking dependencies in hypercubes
- Email based task management system
- Tenant security control of data for application services
- Memory efficient multi-versioned concurrency for a hypercube memory cache using virtual tuples
This application is a continuation of U.S. patent application Ser. No. 14/814,376, now U.S. Pat. No. 10,733,162, entitled INDEXING STRUCTURED DATA WITH SECURITY INFORMATION filed Jul. 30, 2015 which is incorporated herein by reference for all purposes.
BACKGROUND OF THE INVENTIONA business database system stores business information including personnel information, financial information, technical information, etc. Typical searching of a database system uses an index. However, since many or most items are in the index, a search may return sensitive items in response to a search. The sensitive items, in some cases, are not supposed to be revealed to the user performing the search.
Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings.
The invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention. Unless stated otherwise, a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. As used herein, the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.
A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.
Indexing structured data with security information is disclosed. A system for indexing comprises an input interface to receive a request to search for a term, and a processor to determine a search response based at least in part on a security associated with an index field and the term. In some embodiments, the system comprises a memory coupled to the processor and configured to provide the processor with instructions.
In some embodiments, a system for indexing structured data with security information comprises a record database and a database search system. The database search system builds a search index from the record database and searches the search index to determine a set of search results. In some embodiments, records stored by the record database comprise hierarchically stored objects, each object comprising one or more record fields. Each record comprises a security policy (e.g., an indication of users that may or may not access the record and/or the manner in which they may or may not access the record). In addition, each record field comprises a security policy. The search index comprises a set of index fields, each index field corresponding to a record field. Each index field comprises a security policy derived from the security policy of the associated record field and the security policy of the record associated with the associated record field. Each index field additionally stores a value corresponding to the value stored by the record field. When a user performs a search, the index is searched for index fields that the user is allowed to access (e.g., according to the security policy) and for values that match the search term.
In some embodiments, a method to provide high performance and highly secure search by indexing and searching structured data based on contextual, user, and role-based security policies of the indexed data and checking against the security privileges of the search user is disclosed. Security is highly configurable and dynamic.
In some embodiments, separate security policies apply to the visibility of the record and to the visibility of its individual fields. In some embodiments, in the event that a search user is not allowed to view a record, they are not presented any information hinting to its existence. In some embodiments, in the event that a search user is allowed to view a record, but not some fields within the record, they are not able to find that record through search using the value for the non-viewable field. In some embodiments, every record of a given type might have different security policies and every record might have multiple security policies. In some embodiments, security policy assignments to records and search user security membership change regularly and those changes need to be reflected in search results as soon as possible.
In some embodiments, to do this indicates that:
-
- The index includes the security groups needed for that record and its fields as well as those fields' values.
- Security changes are reflected in the index as soon as possible after the change.
- When a user invokes a search, the set of security groups the user is a member of are included in the search query.
- In the event that a user's security settings change, this is reflected in the search results on the next search after the change.
- A list of the user's security groups is included with the query expression to compare against the security groups authorized for the record and its fields.
In some embodiments, there are many different security group types that are all accounted for in the indexing and search solution. These can include two basic forms and an aggregated form:
-
- Unconstrained access:
- Visibility of a record or field depends on the search user belonging to one of the security groups associated with the record or field.
- e.g. “All Users” (All search users can see this record), or “All Managers” (search users who are in the manager group can see this record), or “Located in Europe” (search users whose location is in Europe can see this record)
- In some embodiments, a security group used for unconstrained access has the form SGn, where n represents an integer.
- Constrained access:
- Visibility of a record or field depends on the search user belonging to both security group defined on the record or field as well as matching the additional constraint.
- In some embodiments, a security group SGn has a further constraint with the form SGn_Cm, where n represents an integer, where m represents an integer.
- For example in the constrained access “Employee as Self” (SG123 might represent the Employee as Self security group and 678 might represent an employee number, so SG123_C678 could identify Employee as Self for employee 678).
- Aggregation:
- Security based on required membership of multiple security groups. This aggregation can be a combination of Constrained and Unconstrained forms.
- e.g. (“Manager” and “Located in Europe”), or (“Located in US” and “Employee as Self”), or (“Manager” and “Cost center 123”)
- In some embodiments, an aggregation where membership must be simultaneously in 2 or more groups is indicated using an ampersand (&) between the security groups (e.g. SGn_Cm)
- Unconstrained access:
In various embodiments, a security group comprises a set of users and the access permissions for the set of users. For example, a given user might belong to multiple security groups (e.g., the number of security groups is 1, 2, 3, 4, 5, 10, 20, 50, 100, 200, 1000, or any other number of security groups):
In some embodiments, for each record to be indexed, the security groups for the visibility of the record, the values for each field contained within the record, and the security groups corresponding to each instance of each field within the record are stored.
In some embodiments, an example of how the information on a single record is packaged for indexing in a typical inverted index based search engine is as follows:
This example is a record describing a gift with two fields, one for the ID of the gift and another with the Title of the gift. For simplicity, assume a simple tokenization based on splitting on whitespace, ignoring punctuation, and lowercasing alphabetic characters. This is shown here in javascript object notation (JSON) to show the structure of the data to be indexed.
The above example record results in a postings list being created for each of the following values if it does not already exist and the record id (1111) is added to each list:
-
- (for record visibility):
- 12345:SG2_C5&SG7→1111
- 12345:SG22→1111
- 12345: SG33_C4→1111
- (for the tokenized values of the first indexed field (ID)):
- 12345$1:value:“gift”→1111
- 12345$1:value:“16”→1111
- 12345$1:value:“3”→1111
- (for the security groups of the first indexed field (ID)):
- 12345$1:field_security:SG33_C4→1111
- 12345$1:field_security:SG2_C5&SG7→1111
- (for the tokenized values of the second indexed field (Title)):
- 12345$2:value:“hcm”→1111
- 12345$2:value:“gift”→1111
- 12345$2:value:“13”→1111
- 12345$2:value:“hrcore”→1111
- 12345$2:value:“wats”→1111
- (for the security groups of the second indexed field (Title)):
- 12345$2:field_security:SG2_C5&SG7→1111
- 12345$2:field_security:SG22→1111
- 12345$2:field_security:SG33_C4→1111
This record is a simple example showing unconstrained security group access (SG22), constrained security group access (SG33_C4), and an aggregated access based on membership in both an unconstrained security group and a constrained security group (SG2_C5&SG7).
A user belonging to group SG22 or a user belonging to group SG33 and a constraining context of C4 is able to search for this record via its “Title” field, but only a user belonging to group SG22 is able to search for this record via its “ID” field. This record's existence is visible to both of these security groups. A user belonging to group SG22 uses unconstrained access while a user of group SG33 requires the additional constraining context of C4 for access. Similarly, a user that belongs to both group SG7 and also group SG2 with the additional constraint of C5, can search by either “Title” or “ID” and this record is visible to them.
- (for record visibility):
In some embodiments, a search user is expected to enter a simple text string of key words to search. The search system then modifies this query so that the proper security checks are included. This is done by using the index definition for the record type to understand the configured record and field security groups, and the search user's security group memberships. The basic logic used to expand the query is:
-
- For each doc_type and each record of that type
-
- Return the list of records that match the search terms within the visible fields
In some embodiments, an example query set up and execution:
-
- Assume user belongs to Security Groups=SG77, SG88, and SG33_C4
- In SG77 and SG88, the user has unconstrained access.
- In SG33, the user is constrained to only documents or fields that have the further constraint of C4 (whatever that is defined to be within data model).
- The user does not belong to any aggregation security.
- Assume no tokenization of security group identifiers
- User's typed in search: gift 13 hrcore
- Assume we want to find a match for any token user typed (relevancy will put the best matches at the top of the result list)
- Assume user belongs to Security Groups=SG77, SG88, and SG33_C4
Here is the example query using a pseudo-query language after query expansion, including comments marked by double slash (//) to help provide clarity:
-
- In the check for record visibility, record 1111 matched on 12345:SG33_C4, so it is visible. We continue checking this record for field matches.
- In the check for visibility on Field 1, there are no matches, so we move on to next field since we must match on both security groups and field values.
- In the check for visibility on Field 2, record 1111 matches on 12345$2:field_security:SG33_C4, so we then check on matching the values for Field 2.
- In the check for values on Field 2, record 1111 matches on “gift”, “13’, and “hrcore”. Matching on any of these values would be considered a match from the assumptions given for this example.
- Record 1111 would be placed on the list of records to be returned.
In some embodiments, performance comes from standard query optimizations of the above logic and the indexing of security and field value information for near direct lookup.
Although the foregoing embodiments have been described in some detail for purposes of clarity of understanding, the invention is not limited to the details provided. There are many alternative ways of implementing the invention. The disclosed embodiments are illustrative and not restrictive.
Claims
1. A system for indexing and searching, comprising:
- an input interface to receive a query to search for a term; and
- a processor to determine a response to the query based at least in part on the term and an index field of a search index, wherein the index field comprises an index field value associated with a record field of a record and a security policy associated with the record and the record field, wherein the security policy comprises a first set of security groups for visibility of the record and a second set of security groups for visibility of the record field, wherein determining the response to the query comprises to: determine a user security associated with the query to search for the term, wherein the user security includes a set of user security groups associated with a user; modify the query, using the index field, to determine an expanded query that includes checks to perform record matching and field visibility matching of the record field to provide the record for the response to the query to search for the term, wherein the expanded query comprises the first set of security groups for visibility of the record, the second set of security groups for visibility of the record, and a set of tokenized values of the index field; determine, using the expanded query, whether both the record and the record field are visible to the user, comprising determining the first set of security groups for visibility of the record includes a first at least one user security group of the set of user security groups and the second set of security groups for visibility of the record field of the record includes a second at least one user security group of the set of user security groups; in response to a determination that both the record and the record field are visible to the user, determine, using the expanded query, whether the term matches the index field; and in response to a determination that the term matches the index field value, field, include the record in the response to the query to search for the term, wherein the record in the response is obtained using the expanded query.
2. The system of claim 1, wherein the index field comprises part of the search index.
3. The system of claim 1, wherein the search index comprises a set of index fields.
4. The system of claim 1, wherein the index field is associated with an instance of the record.
5. The system of claim 4, wherein the instance of the record is associated with an identifier.
6. The system of claim 1, wherein the record is associated with an identifier.
7. The system of claim 4, wherein the index field is associated with an instance of the record field of the instance of the record.
8. The system of claim 7, wherein the instance of the record field comprises a record field value.
9. The system of claim 8, wherein the field is associated with the record field value.
10. The system of claim 1, wherein in the event that the record field of the record is created, the index field is added to the search index.
11. The system of claim 10, wherein adding the index field comprises queuing adding the index field.
12. The system of claim 10, wherein adding the index field comprises adding the index field.
13. The system of claim 10, wherein adding the index field comprises adding the security policy of the index field.
14. The system of claim 1, wherein the security policy is determined by combining the security groups for visibility of the record and the security groups for visibility of the record field of the record.
15. The system of claim 10, wherein adding the index field comprises adding a record identifier.
16. The system of claim 10, wherein adding the index field comprises adding a record instance identifier.
17. The system of claim 10, wherein adding the index field comprises adding a numerical record field identifier.
18. The system of claim 7, wherein in the event that the instance of the record field of the instance of a record is changed, the index field is changed.
19. A method for indexing and searching, comprising:
- receiving a query to search for a term; and
- determining, using a processor, a response to the query based at least in part on the term and an index field of a search index, wherein the index field comprises an index field value associated with a record field of a record and a security policy associated with the record and the record field, wherein the security policy comprises a first set of security groups for visibility of the record and a second set of security groups for visibility of the record field, wherein determining the response to the query comprises: determining a user security associated with the query to search for the term, wherein the user security includes a set of user security groups associated with a user; modifying the query, using the index field, to determine an expanded query that includes checks to perform record matching and field visibility matching of the record field to provide the record for the response to the query to search for the term, wherein the expanded query comprises the first set of security groups for visibility of the record, the second set of security groups for visibility of the record, and a set of tokenized values of the index field; determining, using the expanded query, whether both the record and the record field are visible to the user, comprising determining the first set of security groups for visibility of the record includes a first at least one user security group of the set of user security groups and the second set of security groups for visibility of the record field of the record includes a second at least one user security group of the set of user security groups; in response to a determination that both the record and the record field are visible to the user, determining, using the expanded query, whether the term matches the index field and in response to a determination that the term matches the index field, including the record in the response to the query to search for the term, wherein the record in the response is obtained using the expanded query.
20. A computer program product for indexing and searching, the computer program product being embodied in a non-transitory computer readable storage medium and comprising computer instructions for:
- receiving a query to search for a term; and
- determining a response to the query based at least in part on the term and an index field of a search index, wherein the index field comprises an index field value associated with a record field of a record and a security policy associated with the record and the record field, wherein the security policy comprises a first set of security groups for visibility of the record and a second set of security groups for visibility of the record field, wherein determining the response to the query comprises: determining a user security associated with the query to search for the term, wherein the user security includes a set of user security groups associated with a user; modifying the query, using the index field, to determine an expanded query that includes checks to perform record matching and field visibility matching of the record field to provide the record for the response to the query to search for the term, wherein the expanded query comprises the first set of security groups for visibility of the record, the second set of security groups for visibility of the record, and a set of tokenized values of the index field; determining, using the expanded query, whether both the record and the record field are visible to the user, comprising determining the first set of security groups for visibility of the record includes a first at least one user security group of the set of user security groups and the second set of security groups for visibility of the record field of the record includes a second at least one user security group of the set of user security groups; in response to a determination that both the record and the record field are visible to the user, determining, using the expanded query, whether the term matches the index field; and in response to a determination that the term matches the index field, including the record in the response to the query to search for the term, wherein the record in the response is obtained using the expanded query.
5335346 | August 2, 1994 | Fabbio |
5974409 | October 26, 1999 | Sanu |
6356897 | March 12, 2002 | Gusack |
6961849 | November 1, 2005 | Davis |
7240046 | July 3, 2007 | Cotner |
7467133 | December 16, 2008 | Subramaniam |
7512965 | March 31, 2009 | Amdur |
7599934 | October 6, 2009 | Conlan |
7711750 | May 4, 2010 | Dutta |
7831570 | November 9, 2010 | Sack |
7970790 | June 28, 2011 | Yang |
8332430 | December 11, 2012 | Koide |
8805882 | August 12, 2014 | Lewis |
8868540 | October 21, 2014 | Ture |
9165044 | October 20, 2015 | Psenka |
9197668 | November 24, 2015 | Boucher |
9342705 | May 17, 2016 | Schneider |
9720923 | August 1, 2017 | Gaikwad |
10031978 | July 24, 2018 | Brette |
10073875 | September 11, 2018 | Larson |
10331689 | June 25, 2019 | Sorrentino |
20020087500 | July 4, 2002 | Berkowitz |
20050050083 | March 3, 2005 | Jin |
20050060286 | March 17, 2005 | Hansen |
20060059144 | March 16, 2006 | Canright |
20060167850 | July 27, 2006 | Fish |
20060206485 | September 14, 2006 | Rubin |
20070027840 | February 1, 2007 | Cowling |
20070106639 | May 10, 2007 | Subramaniam |
20070118504 | May 24, 2007 | Subramaniam |
20070220004 | September 20, 2007 | Fifield |
20070233685 | October 4, 2007 | Burns |
20090094193 | April 9, 2009 | King |
20100106709 | April 29, 2010 | Imai |
20110113072 | May 12, 2011 | Lee |
20120078859 | March 29, 2012 | Vaitheeswaran |
20120136901 | May 31, 2012 | Raatikka |
20120317129 | December 13, 2012 | Qayyum |
20120324240 | December 20, 2012 | Hattori |
20130080466 | March 28, 2013 | Kliewe |
20150006581 | January 1, 2015 | Luo |
20150121545 | April 30, 2015 | Chandrasekaran |
20160191544 | June 30, 2016 | Kim |
20160337366 | November 17, 2016 | Wright |
20170300712 | October 19, 2017 | Timmerman |
20180137302 | May 17, 2018 | Crimm |
- Rask et al., “Implementing Row- and Cell-Level Security in Classified Databases Using SQL Server 2005”, Microsoft Corp. (Year: 2005).
Type: Grant
Filed: Jun 30, 2020
Date of Patent: Apr 22, 2025
Patent Publication Number: 20210004360
Assignee: Workday, Inc. (Pleasanton, CA)
Inventors: Michael Wilson (San Jose, CA), Philip Monroe (San Francisco, CA), Darius Kasad (San Ramon, CA), Tejas Mandke (Emeryville, CA), David Vieira (Oakland, CA), Vladimir Giverts (San Francisco, CA)
Primary Examiner: Debbie M Le
Assistant Examiner: Huen Wong
Application Number: 16/917,386
International Classification: G06F 16/22 (20190101); G06F 21/62 (20130101);