Data Catalog API Primer

Open Raven's Data Catalog is the place to find the total results of your scans and what was found. It has a couple of key characteristics:

  1. Findings are current and not historical. If a scan previously found files in an asset that has since been deleted, then our APIs will not return any information about it.
  2. The latest scan on any given asset will be the one referenced for filters, previous scans will not be.
  3. Previews of data are redacted and are limited in cardinality. The cardinality limits by asset type. Per object and dataclass in S3 we will not store more than 100 previews. Per column, dataclass, and table in RDS we will not store more than 100 previews.

We have a couple of common user flows that the UI experience handles:

  1. Browsing your findings at asset levels and a sub level specific to each scannable asset type
  2. Viewing a limited set of previews for data you found.
  3. Managing your findings and marking them as false positives or ignoring them
  4. Viewing the found results per scan job, keeping in mind the limitations of how the data catalog works

On the other hand, our external API has tools to support some potential use cases:

  1. Gathering the data class findings for a specific asset
  2. Obtaining data previews in files for a specific asset to help with triage
  3. Getting the findings by asset in bulk to augment information found in other security tools
  4. Being able to ignore findings for an asset in your own external triage process
  5. Gathering the findings for a recent data scan

Available Endpoints

List Asset Data Classification Findings

We offer a cursor paginated API that returns findings aggregated at the asset level. Please note that ignored or false positive findings will not be returned here.

https://developer.openraven.com/reference/list

This API can be used to support use cases like getting findings by asset in bulk to augment an external API.

Get Asset Data Classification Finding

This API can be used to fetch findings aggregated at the asset level for an individual asset. Targeting here is by assetId which is a global modifier. In AWS case this is an ARN, documentation for different formats of ARN's exist on pages like this: https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-arn-format.html

https://developer.openraven.com/reference/get

Search Asset Data Classification Findings

This is an API that can be used to filter by Data Class or Scan Job.

https://developer.openraven.com/reference/search

The Scan Job filter functionality here can be used to list the findings a Scan Run has found. The non-historical nature of our catalog means that this is best used immediately after a Scan Run completes.

List Finding Previews for an Asset

You can retrieve partially redacted "previews" of the data we classified in your data sources per asset. This is returned per data class and per the finest granularity we scan at for your asset e.g. S3 Object or RDS table column.

All preview entries contain the core fields of the "preview" and the number of redacted characters. However, the rest of the values can differ by Data Class implementation and are stored in an unstructured field.

https://developer.openraven.com/reference/getpreviews

Ignoring Asset data Classification Findings

When triaging data externally in your own system you can choose to ignore the current findings for an asset. You can either choose to perform this action for all data classes or a single data class.

https://developer.openraven.com/reference/ignore