AnalogRetriever
Learning Cross-Modal Representations for Analog Circuit Retrieval

1 Tsinghua University 2 The University of Hong Kong 3 University of Cambridge 4 Nanjing Univ. of Posts & Telecom.
*Equal contribution    Corresponding authors

Try It Live

Query AnalogRetriever interactively right here — search by text, schematic, or netlist and retrieve matching analog circuits across modalities. Hosted on Hugging Face Spaces.

Open the Demo in a New Tab

Motivation for AnalogRetriever

Figure 1. Motivation. (a) In traditional analog design, engineers manually search across fragmented sources by keyword, followed by time-consuming trial-and-error implementation. (b) AnalogRetriever maps text descriptions, schematic images, and SPICE netlists into a shared semantic embedding space, enabling unified cross-modal retrieval and downstream design generation via RAG.

Abstract

Analog circuit design relies heavily on reusing existing intellectual property (IP), yet searching across heterogeneous representations such as SPICE netlists, schematics, and functional descriptions remains challenging. Existing methods are largely limited to exact matching within a single modality, failing to capture cross-modal semantic relationships. To bridge this gap, we present AnalogRetriever, a unified tri-modal retrieval framework for analog circuit search.

We first build a high-quality dataset on top of Masala-CHAI through a two-stage repair pipeline that raises the netlist compile rate from 22% to 100%. Built on this foundation, AnalogRetriever encodes schematics and descriptions with a vision-language model and netlists with a port-aware relational graph convolutional network (RGCN), mapping all three modalities into a shared embedding space via curriculum contrastive learning. Experiments show that AnalogRetriever achieves an average Recall@1 of 75.2% across all six cross-modal retrieval directions, significantly outperforming existing baselines. When integrated into the AnalogCoder agentic framework as a retrieval-augmented generation module, it consistently improves functional pass rates and enables previously unsolved tasks to be completed. Our code and dataset will be released.

75.2%
Avg Recall@1 across 6 cross-modal directions
15×
over the strongest baseline (CROP, 4.7%)
22→100%
netlist compile rate after refinement
86.7%
new SoTA on AnalogCoder (Claude Sonnet 4.6 + RAG)

Tri-Modal Framework

AnalogRetriever maps SPICE netlists (port-aware RGCN), schematic images (ViT), and text descriptions (Transformer) into a shared d=768 embedding space. Images and text are encoded with a pretrained CLIP backbone (bottom 16 of 24 ViT blocks frozen to avoid catastrophic forgetting), while netlists are parsed into heterogeneous graphs with 20 port-aware edge types covering all device terminals (MOSFET drain/gate/source/bulk, BJT collector/base/emitter, diodes, passives, controlled sources, and shared-net connections). The model is trained with tri-modal contrastive learning, an auxiliary circuit-type classification loss, and a three-phase curriculum with hard-negative mining.

AnalogRetriever framework

Figure 2. AnalogRetriever framework. Three modality-specific encoders map netlists (port-aware RGCN), schematic images (ViT), and text descriptions (Transformer) into a shared embedding space, trained with tri-modal contrastive learning so that matching circuits are pulled together and non-matching ones pushed apart.

Three-Phase Curriculum Training

Jointly training a randomly initialized RGCN with pretrained CLIP is unstable. We address this with a three-phase curriculum that progressively increases both the number of trainable parameters (RGCN → RGCN+CLIP) and the sampling difficulty (hard-negative ratio α: 0.05 → 0.3):

  • Phase 1 — Graph Encoder Warm-Up. Only the RGCN is trained; CLIP is frozen. The graph encoder aligns to the existing CLIP space using the code-involved directions with random in-batch negatives.
  • Phase 2 — Transition. CLIP is unfrozen and the full six-way contrastive loss is enabled, still with random sampling, establishing a stable joint-optimisation trajectory.
  • Phase 3 — Curriculum Hard-Negative Mining. Negatives are sampled from the same functional cluster, sharpening discrimination among structurally distinct circuits that share similar functionality (e.g. common-source vs. common-gate, Miller-compensated vs. folded-cascode op-amps).
Three-phase curriculum training

Figure 3. Three-phase curriculum training. Phase 1 warms up the graph encoder with frozen CLIP weights; Phase 2 enables full six-way contrastive learning with random negatives; Phase 3 introduces hard-negative mining to distinguish topologically similar circuits.

High-Quality Tri-Modal Dataset COMING SOON

We build upon Masala-CHAI, the largest publicly available tri-modal analog circuit dataset. Our audit revealed severe quality issues: of the 6,371 schematic images, only 22.0% compile under Ngspice and a mere 11.4% pass a DC operating-point check. We design a two-stage LLM-based refinement pipeline, connected by an Ngspice simulator acting as the ground-truth oracle, that audits and repairs the dataset into 6,354 verified triplets with near-100% compilation and DC pass rate.

Two-stage refinement pipeline

Figure 4. Two-stage LLM-based dataset refinement pipeline. Stage 1 performs initial netlist repair with Ngspice validation; Stage 2 applies iterative feedback-guided refinement, where a teacher model repairs failed cases using DC error logs until convergence.

Before/after refinement examples

Figure 6. Before / after refinement. Top: SPICE netlist with errors and fixes. Bottom: generic vs. refined functional description.

Stage # Triplets Compile (%) DC Pass (%)
Original MASALA-Chai 6,06922.011.4
+ Stage 1 (Initial Repair) 6,37199.274.1
+ Stage 2 (Feedback Refinement) 6,371100.099.7
Final (after filtering) 6,354100.0100.0

Table 1. Dataset quality before and after our two-stage refinement — compile rate 22.0% → 100% and DC pass rate 11.4% → 100%.

Cross-Modal Retrieval Results

AnalogRetriever achieves an average Recall@1 of 75.2% across all six cross-modal directions, outperforming the strongest external baseline (CROP, 4.7%) by over 15×. Introducing the code modality yields mutual enhancement: even Text↔Image directions improve by up to +8.7 R@1. Port-aware RGCN adds +1.6 Avg R@1 over the edge-agnostic GCN, and the three-phase curriculum with auxiliary classification adds +7.5 Avg R@1 over the non-curriculum variant. Every direction exceeds 94% at R@5 and 97% at R@10.

Model I→C T→I T→C C→I I→T C→T Avg
R@1
R@1R@5R@10 R@1R@5R@10 R@1R@5R@10 R@1R@5R@10 R@1R@5R@10 R@1R@5R@10
CLIP [13] 1.24.68.9 2.28.213.5 2.99.716.2 2.910.115.7 2.77.612.9 3.211.217.4 2.5
CROP [14] 2.38.415.0 2.28.213.5 9.225.837.1 2.48.514.2 2.77.612.9 9.424.733.0 4.7
ChatLS [28] 0.95.911.3 2.28.213.5 9.524.035.1 1.66.610.0 2.77.612.9 7.220.431.3 4.0
NetTAG [29] 0.22.54.5 2.28.213.5 0.53.46.8 0.11.44.1 2.77.612.9 0.43.47.1 1.0
TI (Bi-Modal) 70.592.996.8 69.894.196.7
TIC (GCN) 62.992.295.7 70.895.798.4 67.795.799.0 60.991.695.6 71.396.498.8 65.595.798.9 66.5
TIC (RGCN) 65.292.596.8 71.896.298.8 69.196.499.2 61.992.996.7 71.196.798.9 67.396.499.0 67.7
AnalogRetriever 74.795.598.1 78.296.897.9 75.696.398.6 72.094.897.9 78.596.798.3 72.496.798.3 75.2

Table 2. Cross-modal retrieval performance on the test set (N=1,000). I: Image (schematic), T: Text (description), C: Circuit (netlist). The upper block shows external baselines; the lower block our ablations. AnalogRetriever is the full model with RGCN and curriculum learning. Cell shading scales with the value within each column; the best per column is bold.

Retrieval-Augmented Generation

We integrate AnalogRetriever into a retrieval-augmented generation (RAG) pipeline with AnalogCoder, a training-free LLM agent for analog design via PySpice code generation. Evaluated on AnalogCoder's 24-task benchmark across eight LLMs, AnalogRetriever delivers a positive gain on all eight models, averaging +5.6% absolute (62.0% → 67.6%). Augmenting Claude Sonnet 4.6 reaches 86.7%, a new state of the art — the benefit generalizes across model families and scales.

LLM Baseline +RAG Δ
GPT-4o-mini30.840.8+10.0
Nemotron-120B34.237.5+3.3
GPT-5.4-mini65.067.5+2.5
Gemini-3-Flash65.070.0+5.0
Kimi-K265.073.3+8.3
GLM-4.673.380.0+6.7
Qwen3.5-397B78.385.0+6.7
Claude-Sonnet-4.684.286.7+2.5
Average62.067.6+5.6

Table 3. Functional correctness (%) on the AnalogCoder benchmark for eight LLMs, with and without AnalogRetriever. Δ is the absolute gain (avg +5.6).

RAG case studies

Figure 7. Case studies. Without retrieval (Failure), the LLM produces structurally incorrect circuits (red). With a retrieved schematic as a topological reference (Success), it generates functionally valid circuits — Task 9: Miller amplifier (0/5 → 5/5); Task 17: Wien-bridge oscillator (0/5 → 4/5).

BibTeX

@article{wang2026analogretriever,
  title   = {AnalogRetriever: Learning Cross-Modal Representations for Analog Circuit Retrieval},
  author  = {Wang, Yihan and Li, Lei and Lai, Yao and Wang, Jing and Lu, Yan},
  journal = {arXiv preprint arXiv:2604.23195},
  year    = {2026}
}