Skip to main content

Reachability Analysis

depscan can perform reachability analysis for Java, JavaScript, TypeScript, and Python with built-in support for parsing atom reachables slices.

Our approach

There are many ways to perform reachability analysis. Most tools—including commercial ones—rely on a vulnerability database with affected modules (sinks) to detect reachable flows.

This has several downsides:

  1. These databases are often incomplete and manually maintained.
  2. If a CVE or ADP enhancement isn’t available yet, reachability won’t be detected.

In contrast, depscan computes reachable flows (via atom) without relying on vulnerability data upfront. It then identifies a smaller subset of those flows that are actually vulnerable. From there, we can further narrow it down to flows that are Endpoint-Reachable, Exploitable, Container-Escapable, etc.

Getting started

Simply invoke depscan with the research profile and language type to enable this feature.

To receive a verbose output including the reachable flows, pass the argument --explain

--profile research -t language [--explain]

Example analysis for a Java project

depscan --profile research -t java -i <source directory> --reports-dir <reports directory> --explain

Example analysis for a JavaScript project

depscan --profile research -t js -i <source directory> --reports-dir <reports directory> --explain

Example analysis for a PHP project

Ensure PHP > 7.4 is installed. However, we support scanning PHP 5.2 - 8.3. Alternatively, use the depscan container image.

depscan --profile research -t php -i <source directory> --reports-dir <reports directory> --explain

Customize the Reachability Analyzer

depscan supports two different reachability analyzers with FrameworkReachability being the default:

  • FrameworkReachability: The Framework-Forward Reachability (FFR) algorithm computes reachability by identifying sources or entry points originating from framework-provided inputs or routes that reach a library sink.
  • SemanticReachability: The Semantic Reachability algorithm builds on Framework-Forward Reachability by incorporating endpoint and service information to better prioritize vulnerabilities.

Pass the argument --reachability-analyzer SemanticReachability to use the Semantic Reachability analyzer.

SemanticReachability achieves optimal results when supplemented with comprehensive data, including BOM files from multiple project types, various life cycle stages, container SBOMs, atom slices (reachables and usages), and openapi specification files. Since depscan cannot automatically generate all required BOMs (aka formulation problem), it is recommended to build the project and container images, manually generate the necessary BOM files, and then execute depscan using the --bom-dir argument.

Things to consider:

  • Ensure the BOM files correspond to the same version of the source code. Mixing versions may lead to unexpected results, such as duplicate-looking component names.
  • When the application or service uses a single container image, set the DEPSCAN_SOURCE_IMAGE environment variable to enable depscan to automatically generate the container BOM during lifecycle analysis or semantic reachability analysis.
  • When multiple container images are involved, such as in a docker-compose setup, follow these steps:
    1. Run docker compose build.
    2. Manually generate BOMs for each container image, saving them with unique filenames ending in .cdx.json, and place them in a single directory. Example: cdxgen -t docker -o reports/sbom-container1.cdx.json image_name
    3. Run depscan with the --bom-dir argument pointing to that directory. Example: --bom-dir reports --reports-dir reports
  • Semantic analysis (computing the atom slices) is resource-intensive. For larger codebases, allocate over 64GB of memory and expect runtimes of 15 minutes or more.

Sample semantic analysis reports, including the BOM and slices dataset, are available in this Hugging Face repository.

Example semantic reachability analysis for a Java project

depscan -t java -i <source directory> --reports-dir <reports directory> --reachability-analyzer SemanticReachability --explain

Example semantic reachability analysis for a JavaScript project

depscan -t js -i <source directory> --reports-dir <reports directory> --reachability-analyzer SemanticReachability --explain
depscan --bom-dir <bom directory> --reports-dir <reports directory> --reachability-analyzer SemanticReachability --explain

This is the recommended invocation style for a thorough semantic analysis.

Explainability

Pass the --explain argument to get a detailed explanation of the analysis performed by depscan.

Explanation Mode

When SemanticReachability is enabled and the --explanation-mode argument is provided, the explanation style can be customized. depscan supports the following explanation styles:

  1. Endpoints – Lists all HTTP endpoints, their methods, and code hotspots detected through static analysis.
  2. EndpointsAndReachables – Includes endpoints along with a selection of up to 20 reachable data flows, providing detailed explanations. This is the default.
  3. NonReachables – Highlights up to 20 non-reachable data flows, useful for showcasing areas where validation and sanitization are properly implemented.
  4. LLMPrompts - Generates prompt text for use with LLMs for code fix suggestions.
depscan --bom-dir <bom directory> --reports-dir <reports directory> --reachability-analyzer SemanticReachability --explain --explanation-mode NonReachables

Use the environment variable MAX_REACHABLE_EXPLANATIONS to customize the number of data-flow explanations.

Troubleshooting

atom file or slices are not generated under the reports directory

Generating atom (intermediate representation) and performing static slicing for reachability and usages is a memory-intensive operation. For large projects, memory upwards of 32GB is the minimum. For example, for DejaCode (a Python application), we need a minimum of 40GB RAM. In resource-constrained environments, atom might crash with the log messages usually suppressed unless depscan is invoked with the environment variable SCAN_DEBUG_MODE=debug.

To troubleshoot step by step, try generating the atom file directly using the instructions from the atom documentation.

No reachable flows were identified

Computing reachable flows using atom requires an accurate SBOM generated by cdxgen with certain arguments. These are typically --profile research or --deep depending on the project type. For some project types such as Python or JavaScript, additional semantic tags may be required to correctly identify reachable flows. Look for files matching the pattern *-reachables.slices.json in the reports directory. If the file is empty or small, then no reachable flows were identified. In certain cases, there could be a file with contents that lack any value for the "purls" attribute. This usually indicates that the SBOM was lacking suitable information for PURL identification.

Consider starting a discussion or filing a bug if the issue could be reproduced with a public project.

Container SBOMs were not generated

For simple container scans, pass the image name via the --src argument. For lifecycle or semantic reachability analysis, use --src to pass the source directory and pass the container image name using the environment variable DEPSCAN_SOURCE_IMAGE.

Post-build binary SBOMs were not generated

Ensure the application is first built, since depscan cannot automatically build arbitrary projects and container images. To customize the build directory, use the environment variable DEPSCAN_BUILD_DIR.

pip install owasp-depscan[all]

Alternatively, use the official depscan container image.

docker pull ghcr.io/owasp-dep-scan/dep-scan
# podman pull ghcr.io/owasp-dep-scan/dep-scan