An IRM system which can transfer the responsibility of protection from human beings to a content aware automated process will be extremely valuable in case of large organizations.The need to integrate DLP and IRM is critical
Lots have been written about famous data breaches and the need for Data Loss Prevention. I will spare the reader the aggravation of reading it again here. There are hundreds of data security systems designed to control and prevent data breaches, and yet, every week we here about a new Data Breach. It is clear that users and administrators are unable to fully protect sensitive data. The main problem is that Data changes all the time. Users are focused on doing their job and not on data security. Aggravating the problem is that Hackers, Malware, Spyware and Viruses are focused on extracting such data from the perimeter.
What is a CSO to do?
Content awareness and the 4 W's
A good solution is to provide Content-Aware Information Rights Management System. Automatic Content visibility transfers the obligation of Data Security from users to a process. Imagine a system that automatically identifies files containing Credit Cards, Source Code, Images or any other intellectual property. Furthermore, imagine a process in which pre-defined IRM Policies are automatically enforced on such files as soon as they are saved on desktops or files-hares. Such policies are the 4 W’s that are so crucial to protecting Data.
The 4 W’s – Who – What – Where and When
Access controls and usage control are two aspects of Data Security that are often ignored. Mapping the content discovery to the IRM policies (see example picture below) provides automatic control of the 4 W’s:
WHO can access the information: The IRM system's identity establishment method, LDAP or non-LDAP databases as defined in custom applications and portals.
WHAT can recipients do with the information: Control specific allowed actions on files: View, Edit, Print (Print Screen), Forward/Share, Copy/Paste.
WHEN can each user access the information: IRM can control the time-span in which the recipient has access to the file. A document may have allowed access from August, 20, 4 pm to August 23rd, midnight. Alternatively time span may be defined as 2 days from first access.
WHERE the information can be used : This important Control restricts usage of the information to only a pre - specified list of computers identified by the hardware (mac address) or to a specific range of IP addresses or networks. CSO’s can now control Data even if such data is outside the perimeter. This is a very good way to provide data protection for Smart Mobile Devices. One can prevent such devices from ever seeing the data. Users, who have such credentials, may view the files with the local Browser.
The discovery agent must be monitoring the system constantly so that anytime a file is saved; it is scanned for a pattern or fingerprint and then the mapped IRM Policy is enforced.

Detecting the data correctly
It is worth mentioning here that there are two types of Data: Structured and unstructured Data. In my many meetings with CSOs I found that this is somewhat confusing. Here I refer to the need to protect files which hold either Intellectual Property or data in the file that also resides in the Database. Intellectual digital Property is any file that is deemed sensitive or confidential. Database Data is often multiple fields residing in an email or a file and is typically comes from the Human Resource Database, the CRM or any other application utilizing a Database. Such data may be the Last Name and the Salary of an employee.
Discovery systems use multiple detection engines to detect data inside files. The detection technique can be divided to Precise Algorithms and Imprecise Algorithms. Precise Algorithms are those that use fingerprints or registered data for exact data matching. Among them are Cyclical hashes, Rolling hashes, Watermarking/tagging, Recursive Transitional Gaps (GTB proprietary). Of course, not all fingerprinting engines are the same. One has to avoid false positives and false negatives at all cost.
Imprecise Algorithms are those that use Data Patterns, Bayesian analysis and Statistical analysis. Such engines prove to be highly inaccurate and present an unacceptable rate of false positive. It is highly recommended to test these techniques and to determine the acceptable level of false positives and of false positives. Of course, much attention must be paid to the array of file types supported by such engines. Naturally, a Bank may be interested in support for Microsoft Office, while Engineering Company may be more interested in support for DXF files or binary fingerprinting.
Organization will be well advised to use the appropriate detection technique based on the data they want to protect.
Conclusions
The marriage of Content-awareness and IRM provide organization comprehensive access control on sensitive data for internal and external constituents. Sensitive or confidential data is automatically encrypted based on file content and access to such data is controlled by either the File Owner or designated Administrator. External constituents may also have access rights to such files but only if they have been approved. This way organizations are able to secure files even after such files are circulating outside the perimeter.






