ii-sdv 2014 analysing patent full text – comparison against analysis of abstract and bibliographic...
TRANSCRIPT
Analyzing Patent Full-Text
A Study1 April 7, 2014
Analysing Patent Full TextRichard Gynn - LexisNexis
Analyzing Patent Full-Text
A Study2 April 7, 2014
Agenda
1) Full Text Availability
2) Analyzing full text
- Discussion/considerations
- Big picture analysis
- Detailed analysis - Study
3) Conclusions
Full Text content available from vendors has evolved to a point
where most of the top publishing authorities are readily available.
Analysing Patent Full Text. Availability
Full Text Availability – Top 10 Publishing Authorities (available from most big vendors)
April 7, 2014Analyzing Patent Full-Text
A Study4
China, Korea, Japan are not
the big deal they used to be!Text can be available to analyse in English
Full Text Availability – Authorities available from at least one vendor
April 7, 2014Analyzing Patent Full-Text
A Study5
Full Text Availability by volume- > 100k publications
April 7, 2014Analyzing Patent Full-Text
A Study6
0
5
10
15
20
25
JP US
CN
DE
EP
KR
GB
FR
WO
CA
AU
TW SU ES
AT
SE IT
RU
CH
NL
BE FI
BR
DK IN NO PL IL
DD ZA
MX
HU PT
CS
AR IE NZ
CZ
GR
Mil
lio
ns
Full Text Availability by volume- > 100k publications
April 7, 2014Analyzing Patent Full-Text
A Study7
0
5
10
15
20
25
JP US
CN
DE
EP
KR
GB
FR
WO
CA
AU
TW SU ES
AT
SE IT
RU
CH
NL
BE FI
BR
DK IN NO PL IL
DD ZA
MX
HU PT
CS
AR IE NZ
CZ
GR
Mil
lio
ns
31 of these 39 are currently
available from vendorsAccount for vast majority of total volume
Full Text Availability by volume - < 100k publications
April 7, 2014Analyzing Patent Full-Text
A Study8
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
90,000
100,000
HK
YU
RO SG TR
MY
LU BG
PH
UA
TH CL
EA ID HR SK
CO SI
VN PE
UY
OA
EG IS EC
Full Text Availability by volume - < 100k publications
April 7, 2014Analyzing Patent Full-Text
A Study9
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
90,000
100,000
HK
YU
RO SG TR
MY
LU BG
PH
UA
TH CL
EA ID HR SK
CO SI
VN PE
UY
OA
EG IS EC
Much smaller amounts currently
available from vendors ~ 300,000If all were to become available would add about 1.5% to full text
that is currently available, e.g. equivalent to Spain or Taiwan
Full Text Availability by volume - < 10k publications
April 7, 2014Analyzing Patent Full-Text
A Study10
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
9,000
10,000M
A
AP
VE
EE
LV GT
CU LT
MD CR
PA CY
DO
MC
ZM
ZW SV
SM JO PY
GE
DZ
KE
MT
HN
MW N
I
ME TJ
GC
BO
MN
BA KZ
BY
TT
Full Text Availability by volume - < 10k publications
April 7, 2014Analyzing Patent Full-Text
A Study11
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
9,000
10,000M
A
AP
VE
EE
LV GT
CU LT
MD CR
PA CY
DO
MC
ZM
ZW SV
SM JO PY
GE
DZ
KE
MT
HN
MW N
I
ME TJ
GC
BO
MN
BA KZ
BY
TT
One currently available from vendorsIn total these would add about 0.1% to full text that is currently available
Analyzing Patent Full-Text
A Study12 April 7, 2014
• Are we nearly there yet?• There’s a lot of full text available to make use • Most vendors have a significant volumes• Rapidly diminishing returns for each authority added
Full Text Availability
Bringing You The World• We are already in a good place
• In terms of % availability at least
Analysing Patent Full Text. Discussion/considerations
Analyzing Patent Full-Text
A Study14 April 7, 2014
Full Text – What Is It?
Full-text – what is it?• Everything of course?!
― …will concentrate on:
Considerations
April 7, 2014Analyzing Patent Full-Text
A Study15
There’s clearly a lot out
there, so why don’t we see
so much analysis of patent
full text?
Analyzing Patent Full-Text
A Study16 April 7, 2014
Considerations - Language• Can only compare like for like in same language
…non-Latin character issues too• Noise – Patent full-text likes to state things like
…the complete opposite of what it’s about!
Considerations - Language
How I might introduce myself…If I was a patent!
나는사람들이밥, 앤드류, 데이브앨런같은이름이, 이름이. 나는밥, 앤드류, 데이브나앨런아니에요. 내이름은리처드입니다
I have a name, people have names like
Bob, Andrew, Dave and Alan. I’m not
Bob, Andrew, Dave or Alan.
My name is Richard
私は人々がボブ、アンドリュー、デイブとアラ私は人々がボブ、アンドリュー、デイブとアラ私は人々がボブ、アンドリュー、デイブとアラ私は人々がボブ、アンドリュー、デイブとアランのような名前を持っている、名前を持っていンのような名前を持っている、名前を持っていンのような名前を持っている、名前を持っていンのような名前を持っている、名前を持っています。私はボブ、アンドリュー、デイブかアランます。私はボブ、アンドリュー、デイブかアランます。私はボブ、アンドリュー、デイブかアランます。私はボブ、アンドリュー、デイブかアラン
ないよ。私ないよ。私ないよ。私ないよ。私の名の名の名の名前はリチャードです前はリチャードです前はリチャードです前はリチャードです
Considerations
Other Considerations:
• Massive amounts of data
– Time?
– How deal with ?
• Will it contain anything useful?
/benefit outweigh effort?
April 7, 2014Analyzing Patent Full-Text
A Study17
• Tools
– Big picture?
– Details?
Big Picture - Landscape Analysis
April 7, 2014Analyzing Patent Full-Text
A Study18
Big picture, topographic mapping (Discussion)
Here more full text could provide:• Broader country analysis (often full-text not available)• More consistency across authorities – e.g. more claims
― Compare like for like, e.g. not claims, title & abstract against title
• Full text more useful for details
• Themes/commonalities easier to
find using claims, title, abstract
• Whilst useful, vast majority of
landscape analysis done elsewhere,
…i.e. details rather than big picture
Analysing Patent Full Text. Study
The Details - Study
Detailed analysis – looking for what?• New/emerging, different• Competitive/market comparisons• Strength, weakness, opportunity, threat
April 7, 2014Analyzing Patent Full-Text
A Study20
What can I find using the full
text that I couldn’t using title,
abstract and bibliography?
The Details - The Technology
April 7, 2014Analyzing Patent Full-Text
A Study21
Terahertz analysis, e.g. imaging, spectroscopy?Terahertz radiation - between Infra-red and microwave
The Details - The Search
April 7, 2014Analyzing Patent Full-Text
A Study22
• Broad Strategy― Analysis IPCs + Terahertz
Radiation Synonyms
― Keyword Terahertz
Imaging & Spectroscopy
5,955 documents/3,365 families
Study - PatentOptimizer
Analyzing Patent Full-Text
A Study23 April 7, 2014
Analysis Details:
• Small/emerging areas of 6-7 families
• Look at terms & phrases, parts, claim
elements (all numbers represent families)
PatentOptimizer™ Analysis of EP, PCT & US results• English Translations
PatentOptimizer – Terms & Phrases
April 7, 2014Analyzing Patent Full-Text
A Study24
Diagnosis - General
PatentOptimizer – Terms & Phrases
April 7, 2014Analyzing Patent Full-Text
A Study25
Not found in Title, Abstract (or claims) –
All From Spectral Image IncLearned – Something seemingly unique to them
SAME DOCUMENTS
PatentOptimizer – Terms & Phrases
April 7, 2014Analyzing Patent Full-Text
A Study26
Not found in Title, Abstract (or claims) – All
monitoring vitamin K concentration in bloodLearned – A more recent (emerging?) use
Diagnosis - General
PatentOptimizer – Parts
April 7, 2014Analyzing Patent Full-Text
A Study27
Remote monitoring, e.g. of Bluetooth® headset userLearned – Interesting, but not massively relevant result, would like to
investigate applications further
Diagnosis - general
PatentOptimizer – Claim Elements
April 7, 2014Analyzing Patent Full-Text
A Study28
Looking for infiltration or extravasation
during intravenous infusionLearned – New possibly interesting area, seemingly
dominated by one organisation
Diagnosis – general
A61M – introducing remedies
Study - VantagePoint
Analyzing Patent Full-Text
A Study29 April 7, 2014
Analysis Details:
• Data Statistics
• Terms uniquely appearing in full text
• Highly occurring terms used in small
numbers of documents
• Investigate terms unique to 2013
priority onward
Vantage Point Analysis of TotalPatent full text results• English Translations
Vantage Point - Statistics
Very low percent of terms and words, available for analysis are actually in the title and abstract
Title & Abstract
• 42,614 words & phrases
• 16,251 words
Claims
• ~132k words and phrases not in Title or Abstract
• ~44k words in Title or Abstract
Full-text
• ~1.3M unique words & phrases
• ~650k unique words
April 7, 2014Analyzing Patent Full-Text
A Study30
Vantage Point – Terms only appearing in full text 2013 onwards
April 7, 2014Analyzing Patent Full-Text
A Study31
Vantage Point – Terms only appearing in full text 2013 onwards
April 7, 2014Analyzing Patent Full-Text
A Study32
Detection of tetracycline drug –
concern in resistance to antibioticsLearned – New area (clearer language in full-text)
optical investigation
Vantage Point – Terms only appearing in full text 2013 onwards
April 7, 2014Analyzing Patent Full-Text
A Study33
Looking for gas hydrates (fracking)Learned – New area (uncovered by more consistent
repetition in full text)
general investigation,
sampling
Analysing Patent Full Text. Conclusions
Findings
April 7, 2014Analyzing Patent Full-Text
A Study35
• Full text useful
• Claims less so (in this case)
Most words and phrases in the “full text”, did not appear in Abstract & Title
• Text mined wasn’t necessarily applications, but pointed towards
• More consistent repetition in full text
Helped mainly find new/niche applications
• Probably wouldn’t have found other ways
Interesting companies & technologies to look at further
Conclusions
Conclusions (Noise and huge amounts of info):
• Background did not really come in as an issue
• Used English translations to avoid language issues
• Most noise was from search results
• My judgement – about 50% proved somewhat
interesting upon further investigation
• Can this be automated/put into a process?
• 4/5+ family groupings seems to be about the
sweet spot
April 7, 2014Analyzing Patent Full-Text
A Study36
What More?
What more?
Further this:
• Life Sciences
• Define processes
Dedicated machine?
• Detailed full-text analysis
Study analysis of parts
• Sellers, inventors, manufacturers etc.
April 7, 2014Analyzing Patent Full-Text
A Study37
Easier than expectedMore possible & better timescales
Questions
April 7, 2014Analyzing Patent Full-Text
A Study38
Analysing Patent Full Text. Study – Additional Examples
PatentOptimizer – Terms & Phrases
April 7, 2014Analyzing Patent Full-Text
A Study40
2 of 6 have tattoo in Abstract OR Title
(same if include claims)Learned – THz radiation can be used for tattoo removal
Diagnosis, surgery - General
PatentOptimizer – Terms & Phrases
April 7, 2014Analyzing Patent Full-Text
A Study41
Not found in Abstract & Title
(One claimed -Optical Diagnostics)
Determining microorganism
presence/kind
PatentOptimizer – Claim Elements
April 7, 2014Analyzing Patent Full-Text
A Study42
SAME DOCUMENTS
Identifying/determining antimocrobial
resistance of Burkholderia CepaciaLearned – Smaller more niche areas?
PatentOptimizer – Terms & Phrases
April 7, 2014Analyzing Patent Full-Text
A Study43
Not found in Title, Abstract (or claims) – All Some
detectors, some looking for heavy metal contaminationLearned – Some areas to investigate further?
PatentOptimizer – Claim Elements
April 7, 2014Analyzing Patent Full-Text
A Study44
Glucose Monitoring – Far-IR (5/7 have in Abstract & Title)Learned – Not much more than from Title & Abstract
Blood measurement