Skip to content
Header Secondary Logo
Header Secondary Logo

Patent Sequence Search: Why You’re Missing Crucial Sequences

Patent Sequence Search: Why You’re Missing Crucial Sequences

Share

Explore Aptean GenomeQuest

Patent Sequence Search: Why You’re Missing Crucial Sequences

22 Jun 2017

Henk Heus
Scientist in lab

I’m going to let you in on a little secret: there’s a reason why we are the market leader in patent sequence search. It has surprisingly little to do with our user-friendly search interface, our stellar customer support - or our good looks. While (at least some of) these things certainly help, it is the content that can only be found in our GQ-Pat database of patent sequences that makes the real difference. Think all patent sequence databases are the same? Let me explain what I mean in some more detail.

When, for example, a life science patent application is filed at the USPTO they ask that the inventor put all sequences into a nicely formatted list. This so-called “ST.25 listing” helps the examiners with their workflow and makes it straightforward to collect all sequences submitted to the office over time. In an ideal world, every inventor and every patent office would list sequences like this and that would be the end of it.

Unfortunately, very few patent offices in the world actually have official sequence-filing rules, and even when they do have them, they’re frequently ignored. As a result, sequences can be found anywhere in a patent: inside the text, in tables or even as part of the figures. If your search only spans the ST.25 sequence listings, you’re going to miss out on a lot of them!

You might be wondering, “so how big is this problem, really? How many sequences can be found outside of the official ST.25 listings?”

I’m afraid that the answer is “a whole lot”. If you don’t know what you’re doing, you can easily miss more than 38% of US, WO, EP, JP, KR patent documents with sequences in them. How sure are we that this is the right number? The number is based on a massive amount of internal sequence data, that we confirmed by comparing the number of PNs with products based on ST.25 listings like US Gene, WO Gene and PatSeq.

Sure, anyone can download ST.25 sequence listings, and index them for BLAST search. But this approach will cause you to miss out on real, critical-to-your-company, sequences. Why? Because all those sequences that are located in text, tables and figures aren’t indexed!

Still not convinced that you’re missing documents that are relevant to you? Here’s a list with some of the largest patent assignees in our patent sequence database, and the percentage of patent applications they filed with sequences hidden in the text, tables and figures.

GQ-Pat Database Sample

These are all documents that would never be found anywhere except in the GQ-Pat database.

Patent Assignee (normalized) Patents filed with sequences hidden in text, tables, and figures.
SHANGHAI BIOWINDOW52.85%
ABBOTT41.33%
SANOFI 39.32%
ASTRAZENECA30.74%
NOVO NORDISK29.11%
PFIZER27.83%
NOVARTIS27.35%
JOHNSON & JOHNSON26.12%
MERCK SHARP DOME23.50%
BAYER20.42%
BRISTOL MYERS SQUIBB20.07%
TAKEDA19.27%
UNIVERSITY CALIFORNIA17.18%
GLAXO SMITHKLINE 14.05%
ROCHE12.85%
BASF 10.31%
LIFE TECHNOLOGIES9.88%
MONSANTO 8.53%
AMGEN8.53%
IONIS PHARMACEUTICALS 7.17%

How does GQ Life Sciences from Aptean make sure these documents are found? Over the last five years, we have invested millions of dollars and countless hours to find every last sequence that is out there. We use proprietary algorithms to flag documents with even a minuscule chance of containing sequences. That set is then manually curated to capture all of the sequences in them. Our human curators also verify additional information like the SEQ ID NO and whether a sequence is mentioned in the claims or not. We have examined the entire backfile of over 100 million historical patents for sequences.

Of course, we haven’t stopped there – we continuously process the new patents that are being published to ensure our patent database contains the most up-to-date sequence information available. In addition to 38% more US, WO, EP, JP, KR patents, we have also indexed 153,000 documents from authorities in China, Canada, India and Brazil that you will not easily find anywhere else.

With its extensive data coverage (over 500 million sequences), powerful search tools and user-friendly functionality, Aptean GenomeQuest is the obvious choice for searching the entire sequence domain, both patent and non-patent.

Avoid the pitfalls of using free solutions for IP sequence searching. Download our RFP template or start a free trial today!

Ready for Your IP Sequence Search Solution

Use our free request for proposal (RFP) template to identify the right IP sequence search solution for your business.

Syringe