Innovation Mining for Patents

Using LLM-driven analysis, similarity search, and knowledge graphs to reveal overlooked applications at scale.
Date
Fall 2025
Blog
Link
LinkedIn
Link
Process Book
Link
Final Video
Link
Miro
Link
Poster
Link
Venn diagram of the main ideas from the project

Innovation Mining for Patents

This project investigates how an agentic AI system can “revive” dormant patents by converting dense, underused IP into structured, cross-domain opportunity hypotheses that are easier to explore and act on. The process book frames the core mission as unlocking hidden innovation through automated analysis and cross-industry discovery, with an end-to-end pipeline designed to produce actionable insights rather than just summaries.

From a research perspective, the work starts by studying current patent evaluation workflows and benchmarking tool limitations to identify where human effort bottlenecks occur and where automation can add value. Key findings shaping the approach include that evaluation remains deeply manual, meaningful “hidden value” often sits at the modular technology level, and stakeholders require explainable outputs rather than black-box recommendations. These insights drove explicit design principles around interpretability, scalability, domain-agnostic reasoning, and human-in-the-loop oversight.

Methodologically, the project operationalizes patent understanding as a sequence of modular reasoning tasks implemented as five agents: ingestion, decomposition, analogy discovery, knowledge graph construction, and evaluation/scoring. Ingestion converts raw patent PDFs into a structured object with extracted sections, metadata, summaries, keywords, and embeddings; decomposition then isolates reusable functional modules (sensors, control logic, mechanical systems, algorithms, interfaces) to enable granular matching beyond whole-document similarity. Analogy discovery combines embedding similarity with LLM reasoning to generate multiple analogy types (functional, system-level, component-level, industry-level), translating module-level capabilities into cross-domain opportunity candidates.

The technical architecture is organized around a document store, a vector database for similarity search, and a knowledge graph database that captures relationships among patents, modules, domains, and applications. Dense representations are generated using sentence-transformers across patent, module, domain, and application levels to support hierarchical retrieval for analogy discovery. The LLM layer is driven by prompt templates spanning summarization, decomposition, analogical reasoning, scoring, and explanation generation, with chain-of-thought prompting and few-shot examples to enforce stepwise logic and consistency, plus versioning and A/B testing to maintain traceability. The process book also describes internal validation mechanisms, including expert review for module consistency, similarity-metric checks for analogy coherence, and hold-out test sets to validate scoring reliability.

Empirically, the framework is demonstrated on two case studies: a sunshade patent (selected for clear module boundaries) and a more complex Active Vehicle Control patent (used to stress-test software-centric decomposition and multi-sensor/control reasoning). For the sunshade, the pipeline extracts core modules (Light Sensor, Control Logic Unit, Motor Actuator, Shade Barrier) and proposes cross-domain applications such as smart-home blinds, greenhouse shading, and solar panel protection, then ranks opportunities with quantitative match scores (e.g., smart automated blinds). For the vehicle control patent, decomposition isolates algorithmic modules (Behavior Prediction, Sensor Fusion, Auto-Control Logic, Safety/Stability Model) and surfaces high-scoring analogies in emerging mobility domains (self-driving vehicles, delivery drones, smart traffic control), illustrating how module-level representations and structured reasoning can produce non-obvious transfer hypotheses and a ranked decision view.

Stay Connected

Follow our journey on Medium and LinkedIn.