Skip to content

Conversation

@vdiez
Copy link
Contributor

@vdiez vdiez commented Nov 12, 2025

JS-946
JS-946

Summary

This PR replaces typescript-eslint's internal program creation with direct TypeScript program management, giving full control over CompilerHost and enabling efficient program caching with incremental updates for SonarLint analysis.

Core Objectives

  1. Direct program creation control - Create and manage TypeScript programs directly rather than delegating to typescript-eslint
  2. Efficient caching with GC - Cache programs while allowing garbage collection under memory pressure
  3. Incremental updates - Support program updates when files change without full recreation
  4. Simplified configuration - Merge compiler options from all tsconfigs without complex per-file mapping
  5. Per-file caching strategy - Cache based on individual files for better reuse
  6. Custom CompilerHost - Full control over filesystem access for content injection and monitoring

Architecture

New Components

1. IncrementalCompilerHost (packages/jsts/src/program/incrementalCompilerHost.ts)

  • Custom CompilerHost implementation for TypeScript programs
  • Injects file contents without disk writes (in-memory overlay)
  • Tracks all filesystem calls from TypeScript for monitoring
  • Supports incremental file updates with version tracking
  • Caches source files for performance optimization

2. ProgramCacheManager (packages/jsts/src/program/programCacheManager.ts)

Hybrid LRU + WeakMap caching strategy:

LRU Map (strong refs)          WeakMap (weak refs)
┌─────────────────┐           ┌──────────────────┐
│ Metadata (~1KB) │──keyObj──>│ Program (~50MB)  │
│ - File list     │           │ - Can be GC'd    │
│ - Hashes        │           │ - Under pressure │
│ - Timestamps    │           │                  │
└─────────────────┘           └──────────────────┘
   Always in memory              Reclaimable
  • LRU Map: Strong references to metadata (always in memory)
  • WeakMap: Weak references to programs (can be GC'd under memory pressure)
  • Natural memory management without explicit eviction logic
  • Finds cached programs containing a single source file
  • Detects file content changes via hash comparison
  • Handles incremental program updates when files change
  • Configurable max cache size (default: 10 programs)

3. Program Creation Functions (packages/jsts/src/program/program.ts)

  • extractCompilerOptions(): Parses tsconfig.json with proper extends resolution (including from node_modules)
  • mergeCompilerOptions(): Merges all discovered tsconfig compiler options
  • createOrGetCachedProgramForFile(): Main entry point for program caching

Modified Components

4. analyzeWithWatchProgram (packages/jsts/src/analysis/projectAnalysis/analyzeWithWatchProgram.ts)

Complete rewrite of SonarLint analysis flow:

  • Merges compiler options from all tsconfigs upfront (ignoring files/include/exclude)
  • Analyzes each file individually using cached programs
  • Passes actual ts.Program to typescript-eslint (not tsConfigs array)
  • Logs cache statistics after analysis

5. Simplified TSConfig Discovery (packages/jsts/src/analysis/projectAnalysis/tsconfigCache.ts, file-stores/tsconfigs.ts)

Removed (~80 lines):

  • Complex file-to-tsconfig mapping logic
  • BFS search and pending queue for per-file lookup
  • Lazy lookup optimization

Kept:

  • TSConfig discovery (property/lookup/fallback)
  • Project reference handling

TypeScript Program Type: SemanticDiagnosticsBuilderProgram

Why This Choice

The implementation uses ts.createSemanticDiagnosticsBuilderProgram(). I think this appears to be a good fit because:

Program Type Incremental Memory Emit Best For
createProgram() High One-off compilation
SemanticDiagnosticsBuilder Medium Type checking only
EmitAndSemanticDiagnosticsBuilder High Full builds with emit
IncrementalProgram Low* CLI builds (requires .tsbuildinfo)

* Requires disk I/O for .tsbuildinfo files

Advantages:

  • Designed specifically for type-checking without code emission
  • No overhead for tracking emit state or managing output files
  • Built-in incremental update support - reuses ASTs for unchanged files
  • Request/response friendly (no filesystem watching or disk persistence)
  • Industry-standard choice (used by VS Code's TypeScript language service)

Incremental behavior:

// Initial analysis - parses fileA.ts + all imports (150 files)
const program1 = ts.createSemanticDiagnosticsBuilderProgram(['fileA.ts'], ...);

// User edits fileA.ts
const program2 = ts.createSemanticDiagnosticsBuilderProgram(
  ['fileA.ts'], options, host,
  program1  // Reuses 149 unchanged files, re-parses only fileA.ts
);

Trade-offs with alternatives:

  • createProgram() - Simpler but no incremental updates (parses everything each time)
  • EmitAndSemanticDiagnosticsBuilder - Higher memory for emit tracking we don't use
  • IncrementalProgram - Lower memory but requires .tsbuildinfo disk I/O (problematic for concurrent IDE instances)
  • Watch mode - Long-running process model doesn't fit request/response pattern as naturally

All program types provide identical type checker APIs - the difference is in update efficiency, not functionality.

Key Features

1. Per-File Caching Strategy

Instead of creating programs containing all requested files, we:

  • Create programs from single entry points
  • Let TypeScript discover dependencies automatically
  • Cache based on what files a program contains
  • Subsequent requests for any file in that dependency tree → cache hit

Example:

Request 1: fileA.ts → Creates program with [fileA, fileB, fileC, lib.d.ts...] (150 files)
Request 2: fileB.ts → Cache HIT! (fileB is already in cached program)
Request 3: fileZ.ts → Cache MISS (create new smaller program for different area)

This allows multiple smaller, focused programs to coexist rather than requiring large programs containing all files.

2. Incremental Updates with Content Change Detection

// Content change detection via SHA-256 hashing
findProgramForFile(sourceFile, fileContent, compilerOptionsHash) {
  // Find cached program containing this file
  const cachedProgram = ...;

  // Compare content hash
  if (hash(fileContent) !== cachedHash) {
    // Update CompilerHost with new content
    host.updateFile(sourceFile, fileContent);

    // Recreate program incrementally (reuses unchanged files)
    const newProgram = ts.createSemanticDiagnosticsBuilderProgram(
      [sourceFile], options, host,
      cachedProgram  // Passed for reuse
    );
  }
}

3. Compiler Options Merging

// Parse tsconfig.json (handles 'extends' properly)
extractCompilerOptions(tsconfig) {
  const config = ts.readConfigFile(tsconfig, ts.sys.readFile);
  return ts.parseJsonConfigFileContent(config.config, ...);
}

// Merge all options, ignore files/include/exclude
mergeCompilerOptions(tsconfigs) {
  const allOptions = tsconfigs.map(extractCompilerOptions);
  return merge({}, ...allOptions);
}

This approach:

  • Properly resolves extends (including from node_modules like @tsconfig/node16)
  • Doesn't create programs (just extracts configuration)
  • Lets TypeScript discover dependencies based on actual imports

Analysis Flow

┌─────────────────────────────────────────────────────────┐
│ 1. Discover & merge tsconfig compiler options           │
└────────────────────┬────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────┐
│ 2. For each file in request:                            │
│    ├─ Check cache: findProgramForFile(file)             │
│    │  ├─ Cache HIT (unchanged) → reuse program          │
│    │  ├─ Cache HIT (changed) → update & recreate        │
│    │  └─ Cache MISS → create new program                │
│    │                                                     │
│    └─ Pass program.getProgram() to typescript-eslint    │
└────────────────────┬────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────┐
│ 3. Cache management:                                    │
│    ├─ LRU evicts old entries when at capacity           │
│    └─ WeakMap allows GC to reclaim programs             │
└─────────────────────────────────────────────────────────┘

Expected Performance Characteristics

Scenario Time Notes
First analysis of file ~500ms-2s Parses file + all dependencies
Cache hit (unchanged) ~1ms Hash comparison only
Cache hit (changed) ~50-200ms Re-parse changed file, reuse others
Cache miss after GC ~500ms-2s Recreate program (same as first)

These are estimates that should be validated with real-world usage.

Memory Profile

Typical SonarLint session (10 cached programs):

  • LRU metadata: ~10KB (always in memory)
  • Programs in WeakMap: ~500MB (can be GC'd)
  • Under memory pressure: GC reclaims programs, metadata remains

Benefits

  1. Control: Full control over CompilerHost for customization and interception
  2. Performance: Incremental updates and program reuse minimize parsing overhead
  3. Memory Safety: WeakMap allows GC to reclaim memory naturally under pressure
  4. Cache Efficiency: Per-file strategy yields better hit rates
  5. Simplicity: Removed complex per-file mapping (~80 lines) in favor of straightforward caching
  6. Flexibility: Can inject file contents, monitor filesystem access, optimize resolution

Breaking Changes

Removed APIs

  • Cache.getTsConfigForInputFile() - No longer needed
  • Cache.getTsConfigMapForInputFile() - Removed BFS logic
  • Cache.clearFileToTsConfigCache() - Not applicable
  • TsConfigStore.getTsConfigForInputFile() - Not used anymore

Test Updates Needed

Test file packages/jsts/tests/analysis/tsconfigs.test.ts needs updates as it tests the removed file-to-tsconfig mapping functionality.

Testing

Main source code compiles successfully:

npx tsc -b packages

Test files have compilation errors due to removed APIs (need updates).

Future Improvements

  1. Configurable cache size: Make maxSize configurable via settings
  2. Cache statistics endpoint: Expose cache metrics for monitoring
  3. Warmup strategies: Pre-create programs for common files
  4. Multi-root programs: Group related files in single program for even better reuse
  5. Persistent cache: Serialize programs to disk between sessions

@hashicorp-vault-sonar-prod hashicorp-vault-sonar-prod bot changed the title Implement direct TypeScript program caching for SonarLint JS-946 Implement direct TypeScript program caching for SonarLint Nov 12, 2025
@vdiez vdiez marked this pull request as draft November 13, 2025 22:46
@vdiez
Copy link
Contributor Author

vdiez commented Nov 15, 2025

Latest Changes Summary

1. Global Source File Content Cache with Lazy Loading

  • Added sourceFileContentCache and currentFilesContext in program.ts for truly lazy file loading
  • IncrementalCompilerHost now reads: (1) global cache → (2) files context → (3) disk
  • When reading from disk, both global cache AND files object are updated (keeps them in sync)
  • Removed redundant fileContentsMap from IncrementalCompilerHost

2. Global TSConfig Content Cache

  • Added tsconfigContentCache in program.ts to avoid repeated disk reads of tsconfig files
  • Integrated with createSafeParseConfigHost() for automatic caching
  • Cache is cleared proactively when tsconfigs change via TsConfigStore.clearCache()

3. Refactored createProgram() Function

  • Old signature: createProgram(tsConfig: string)
  • New signature: createProgram(programOptions: ProgramOptions, baseDir: string)
  • Now returns SemanticDiagnosticsBuilderProgram instead of ts.Program
  • Uses IncrementalCompilerHost for lazy file loading
  • Extracts project reference sanitization into reusable logic

4. Unified Program Creation in analyzeFilesFromTsConfig

  • Now reuses createProgram() instead of manually creating SemanticDiagnosticsBuilderProgram
  • Eliminates code duplication between SonarQube and SonarLint paths
  • Both paths now use same IncrementalCompilerHost infrastructure

5. Project Reference Sanitization

  • Extracted sanitizeProjectReferences() helper function
  • Handles directory paths → /tsconfig.json conversion consistently
  • Used in 3 locations: createProgram(), analyzeFilesFromTsConfig() (2 places)

6. Test Updates

  • Updated 6 tests in program.test.ts to use new createProgram signature
  • Updated 7 tests in analyzer.test.ts to extract underlying program with .getProgram()
  • Updated 1 test in options.test.ts
  • All tests now follow pattern: createProgramOptions(tsConfig)createProgram(options, baseDir)

Benefits

  • Single source of truth for file contents (global cache)
  • Truly lazy loading - files only read when TypeScript needs them
  • Cache consistency - files object stays in sync with cache
  • Less duplication - createProgram() reused everywhere
  • Better maintainability - centralized program creation logic

@vdiez vdiez changed the title JS-946 Implement direct TypeScript program caching for SonarLint JS-946 Implement direct TypeScript program caching Nov 19, 2025
@vdiez
Copy link
Contributor Author

vdiez commented Nov 19, 2025

Updates - November 19, 2025

Module Resolution Fix

Fixed failing test "jsonParse does not resolve imports, createProgram does" by adding proper module resolution defaults:

  • Added module: ts.ModuleKind.CommonJS to defaultCompilerOptions
  • Added moduleResolution: ts.ModuleResolutionKind.NodeJs to defaultCompilerOptions

Root cause: Without these defaults, TypeScript falls back to Classic module resolution which doesn't support:

  • Path mappings from tsconfig paths configuration
  • Automatic index.ts file resolution in directories

This ensures modern module resolution works correctly for all analysis runs.

Global SourceFile Cache

Implemented a significant performance optimization by adding a global cache for parsed TypeScript SourceFile ASTs:

Architecture:

  • Cache structure: Map<fileName, Map<scriptTarget, {contentHash, sourceFile}>>
  • Multiple programs now share the same parsed AST for identical files
  • Parsing happens once per unique (file, target, content) combination

Benefits:

  • Memory efficiency: One parsed AST per file+target across all programs
  • Performance: Eliminates redundant parsing when creating multiple programs
  • Incremental updates: Cache invalidation when file content changes

API (in sourceFileCache.ts):

getCachedSourceFile(fileName, scriptTarget, contentHash): ts.SourceFile | undefined
setCachedSourceFile(fileName, scriptTarget, contentHash, sourceFile): void
invalidateCachedSourceFile(fileName): void
clearSourceFileContentCache(): void // Now clears both content and parsed caches

Design Decision - Per-Target Caching:

We cache SourceFiles per ScriptTarget even though testing proved:

  • AST structure is identical across different targets (ES5, ES2020, ESNext)
  • Type checking uses the program's target, not the SourceFile's languageVersion

Rationale for keeping target-specific caching:

  1. Marginal performance cost - Most analysis runs use the same target configuration
  2. Metadata integrity - SourceFile.languageVersion matches the intended target
  3. Future-proofing - Enables rule filtering based on configured target if needed
  4. TypeScript API contract - Creating SourceFiles with their intended target is the documented usage pattern

See detailed rationale in the code comments. If profiling shows diverse targets are common, we can simplify to target-agnostic caching (always use ESNext) with minimal impact.

Config File Diagnostics

Added support for tsconfig parsing diagnostics:

  • Filter diagnostics by severity (errors vs warnings)
  • Throw only on fatal errors, preserve warnings for program reporting
  • Pass diagnostics via configFileParsingDiagnostics property
  • Programs can now report tsconfig warnings via getConfigFileParsingDiagnostics()

TSConfig Cache Enhancement

Updated tsconfig cache structure to track missing files:

  • Changed from Map<string, string> to Map<string, {contents: string, missing: boolean}>
  • Enables better error handling and diagnostics for missing extended tsconfigs
  • missingTsConfig flag now properly tracked through the CustomParseConfigHost

Test Updates

  • All tests now use createStandardProgram() instead of direct ts.createProgram() calls
  • Enforces branded ProgramOptions type safety at compile time
  • Removed unnecessary direct TypeScript API usage

Documentation

Added branded-program-options.md with comprehensive documentation:

  • Explains the branded type pattern and its benefits
  • Documents three single points of control for program creation
  • Provides rationale for always using ts.parseJsonConfigFileContent
  • Includes architecture diagrams and examples

Status: ✅ All 13 tests passing, TypeScript compilation successful

Commit: a3addf9 - "Add module resolution defaults and global SourceFile cache"

@vdiez vdiez force-pushed the typescript-program-caching branch 3 times, most recently from d98064d to b22519f Compare November 26, 2025 19:59
@vdiez vdiez force-pushed the typescript-program-caching branch 2 times, most recently from 266c555 to 1b5a26d Compare December 2, 2025 13:23
@vdiez
Copy link
Contributor Author

vdiez commented Dec 2, 2025

🤖 Claude Code Investigation Update

I'm continuing to debug the backslash-reference test issue. Initial findings:

  • tsconfig parsing works correctly (rootNames are correct)
  • IncrementalCompilerHost is set on programOptions
  • But ts.createProgram returns empty getSourceFiles() (only lib.d.ts)
  • Host methods (fileExists, getSourceFile) are never called for the test files

Currently investigating filesContext and other filesystem methods like readDirectory. Will commit and push when I have meaningful progress.

@vdiez vdiez force-pushed the typescript-program-caching branch from a071859 to 2b7d5cb Compare December 2, 2025 19:44
@vdiez vdiez force-pushed the typescript-program-caching branch from 636b9e4 to 88dcae3 Compare December 5, 2025 10:02
@sonarqube-next
Copy link

sonarqube-next bot commented Dec 5, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant