online counterfeit enterprise · the blackhat seo techniques, such as keyword stuffing4 and...
TRANSCRIPT
Page | 1 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
Online Counterfeit Enterprise Pioneering Criminal Online Sociometry
Author: Frank Angiolelli Contributions by: Eric Feinberg
12/15/2013
Page | 2 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
Contents
Abstract .................................................................................................................................................. 3
Prolific Counterfeit Enterprises .............................................................................................................. 3
Chinese Actors ........................................................................................................................................ 3
Russian Actors ...................................................................................................................................... 11
Free OSP Content ................................................................................................................................. 14
Free Email Providers ............................................................................................................................. 15
Finding Victims ..................................................................................................................................... 15
Socially Engineered Content ................................................................................................................. 16
Attribution Through Forensics and Sociometrics ................................................................................. 17
Offensive Social Engineering Points to Source ..................................................................................... 19
How Much Money Are They Making? .................................................................................................. 20
Current Weaknesses in Takedowns: .................................................................................................... 22
Resiliency Insulates the Counterfeit Enterprise ................................................................................... 22
The Response to Takedowns ................................................................................................................ 23
Identity Theft & Financial Fraud ........................................................................................................... 24
Cost to Effect Ratio ............................................................................................................................... 24
Addressing Criminal Counterfeit Enterprise ......................................................................................... 25
Takeaways ............................................................................................................................................ 25
Appendix A: Prior Work in This Area .................................................................................................... 26
Appendix B: Methodology & Theory .................................................................................................... 27
Page | 3 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
Abstract
This paper presents the results of our studies of online Counterfeit Enterprise (CE) and efforts to
perform Online Criminal Sociometry to define the Counterfeit Entperise (CE). Conducted using the HIIT
System from July 2013 through December 2013, the analysis shows the presence of a few small groups
engaged in criminal activity who are responsible for the vast majority of counterfeit e-commerce
apparel, accessories and pharmaceuticals websites.
Prolific Counterfeit Enterprises
Counterfeit websites are not a new problem, but they are gaining in speed and intensity. Based on our
research, three primary groups are responsible for the vast majority of counterfeit websites in the high
fashion, shoes, sports apparel, watches (HSSW) and pharmaceuticals space.
These groups are broken down into Chinese HSSW, Chinese Pharmaceuticals and Russian Affiliate based
Pharmaceuticals. The most prolific of these are Chinese actors which are operating a very sophisticated
criminal Counterfeit Enterprise that is resilient, wide spread, indiscriminate of brand and effective.
The Chinese operation itself appears to be broken into two separate highly siloed “units”, HSSW and
pharmaceuticals. Both units employ very similar MOs which include linkfarms, compromised websites,
compromised hosting accounts and creating counterfeit and trademark infringing websites en mass.
These methods appear to differ in frequency, construction, distribution and complexity from other
Pharmaceutical counterfeit operations and small scale HSSW brand counterfeiters which are more
brand specific.
The Russian based operations have entirely different MOs which are discernable.
Chinese Actors
Chinese actors have a sophisticated network of operations that is rather large is size and scope. They
include a sophisticated distribution network, paid sponsored advertisements and Blackhat search engine
optimization.
Chinese Distribution Network The current (link and website) distribution network of CEs is a hierarchical botnet providing obfuscation
and frustrating detection efforts. The term botnet is not used lightly, as the CE controls the websites,
content and the links programmatically and can be changed at will, with speed, including intra-day
changes.
Page | 4 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
Figure 1: Visualization of CE Distribution Hierarchy
Bottom Tier – The Link Farm
At the bottom of the distribution network stands three distinct types of websites that we have identified
so far. The majority is comprised of content created through Markovian1 generators, as discussed in the
previous work of Thomas Lavergne, Tanguy Urvoy and Fancois Yvon2 and demonstrated by Jason Bury3.
The Blackhat SEO techniques, such as keyword stuffing4 and linkfarming5, are used in conjunction with
the Markovian generators to boost the ranking of the second tier sites in search engines. Additionally,
these concepts have been discussed in previous works by Bharat and Henzinger6 as well as Wu and
Davison7.
The first type is a botnet8 of tens of thousands of compromised websites, as well as compromised web
hosting accounts making up a large percentage of the bottom tier. Those compromised sites are being
used to host specific campaigns where multiple simultaneous campaigns can be hosted on the same
compromised websites. The content at this level is highly specific and represents replicated (and
detectable) methods of compromise and upload.
1 https://en.wikipedia.org/wiki/Markov_chain 2 http://www.uni-weimar.de/medien/webis/research/events/pan-08/pan08-papers-final/lavergne08-
detecting-fake-content-with-relative-entropy-scoring.pdf 3 http://www.soliantconsulting.com/blog/2013/02/draft-title-generator-using-markov-chains 4 https://support.google.com/webmasters/answer/66358?hl=en 5 http://www.webopedia.com/TERM/L/link_farming.html 6 K. Bharat and M. R. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. In
Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 104{111, Melbourne, AU,Aug. 1998.
7 http://www.cse.lehigh.edu/~brian/pubs/2005/www/link-farm-spam.pdf 8 http://www.microsoft.com/security/resources/botnet-whatis.aspx
Page | 5 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
In an interview with one webmaster whose websites were
compromised, he discussed how the CE had gained access to
his email, reset his GoDaddy account password and used FTP
scripts to mass upload content to hundreds of sites under his
control. The webmaster email credentials were compromised
and used by the same enterprise uncovered in this paper. Our
unproven expectation is that the credentials were harvested
through malware or social engineering.
Figure 2: Bottom Tier Compromised Website Hosting Unauthorized Blog Indexed By Google
The second type of the bottom tier is hundreds of blog websites created by the CE providing backlinks to
the second tier using the text generators [See Figure 4: Markovian Generated Text]. The CE uses these
blogs to dump content and links at will, providing better search engine rankings due to the link farm.
“The webmasters email
credentials were
compromised and used by
the same enterprise
uncovered in this paper.”
Page | 6 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
Figure 3: Markovian Generated Text
Figure 4: The Links on the Page Show Numerous Brands
Thirdly, at the bottom tier, are linkfarms built through existing forms with no or weak authentication
containing backlinks [See Figure 6: Comment/Forum Spam With Backlinks to Second Tier]. These are, in
some cases, millions of links strong for an individual search term.
Page | 7 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
Figure 5: Comment/Forum Spam With Backlinks to Second Tier
By parsing through this data, we have observed definitively that the HSSW CE operations are
interconnected and brand indiscriminate. These websites provide Blackhat SEO and provide resiliency to
the CE. The property of being brand indiscriminate applies to the Pharmaceutical CE or division as well.
Forum Spam Sources
The forum spam has numerous sources and email addresses with patterns. For example, very active
hosts include:
• .com.cn dynamic IP addresses
• Pegtech IP Address Blocks
• Wholesale Internet IP Addresses
• Chinese ISP Dynamic IPs
The speed at which these posts are occurring have been observed reaching 200-300 per hour per
website, placing the volume of this activity into the billions and possibly trillions per year. The sources
are various throughout the world, but patterns do emerge whereby an individual ip address is
associated with multiple email addresses, individual email addresses are associated with multiple brands
being spammed and individual email addresses are associated with multiple IPs.
Page | 8 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
Redirection & Cloak Methods:
• 302 Redirect
• Standard HTML (i.e. Base HREF Methods)
• Obfuscated Javascript (Visual Presentation
& on Mouse Click)
• JS files called by HTML5 asynchronous
processing (Visual Presentation and on
Mouse Click)
Figure 6: Example of Forum Spam
Figure 7: Forum Spam is Cross Sourced Frustrating Blocking Efforts
Second Tier – Redirection & Cloaking
In the second tier of the distribution network are
cloaked9 and compromised websites. While the
compromised sites are currently used as a “Second
Tier” they can be converted by the CE to host
Bottom Tier Content. While standard 302
redirectors are being employed, the cloaking often
occurs through replicated obfuscated javascript or
links to js files using the HTTP5 asynchronous
method. The javascript itself is not difficult to de-
obfuscate, however when the links are
asynchronous javascript files linked in the HTML, the
detection method can be quite challenging, though
not impossible. The content of these cloaking
techniques is changed regularly and the sites hosting them are changed regularly.
A second tier website may be used for both cloaking and 302 redirection over time, indicating the level
of control the owner has over these websites.
9 http://webdesign.about.com/od/seo/i/aa092704.htm
Page | 9 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
These websites are currently common in search engines and counterfeit paid advertisements because
they are difficult to detect. The search engine and the advertiser believe they are presenting content
which may be legitimate, however once a user actually arrives at the site, they are redirected or their
browser overlays another website, which is a counterfeit. The second tier websites funnel users to the
top tier, however the destination, or “payload” changes over time.
Figure 8: Visualization of Distribution Network
In the case where the website is overlaid on top of the original content, if the user clicks on a link, they
are redirected.
These websites serve multiple purposes:
• Frustrating detection methods – The detection of the actual counterfeit website through
programmatic means is difficult due to rotating obfuscation methods.
• Providing a common link of distributed command and control – The top tier can change at will
while the second tier maintains a direction and control over which top tier websites are presented to
the users.
• Resiliency – Because the second tier is generally not addressed, seizing a specific domain will not
have an impact on the CE ability to operate. In response, the CE simply replicates the code onto a new
domain and hosting account and drops that domain into the second tier, which is constantly linked to
through the botnet and linkfarms. As a result, the new website is placed at the top of the search engine
very quickly. Usually, this happens well before any individual website is “seized”.
Page | 10 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
Figure 9: Example of a Second Tier Compromised Website
Top Tier – The Counterfeit Website
At the top tier are counterfeit websites that rotate on a seemingly irregular basis ranging from days
to weeks. New counterfeit sites or existing sites are entered into the distribution network at
varying frequencies based on the CE objectives at the time, which appear to be based on social
engineering in addition to a standard counterfeit content. Once entered into the network, the
links in the second tier are either changed out or the second tier redirection is updated to point
to the new website.
Page | 11 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
Figure 10: Bottom and Second Tier Websites (A-E) For Cyclical References As Well as Outputting to the Payload Website (F)
Additionally, the paths from the bottom tier to the top tier vary (as demonstrated in the figure above).
The links in the second tier of the botnet change to point to different websites whose objective is to
funnel the user to the same top tier website using a different path. We have reason to believe this is
based on a sophisticated system of command and control.
Russian Actors Russian actors operate in a different MO. Their operation is smaller in scale than the Chinese actors,
much less obfuscated and unsiloed. For example, the Russian operators will compromise websites,
however that compromise is not used to create an active botnet of compromised sites with Markovian
text, but instead only straight passive and static backlinks.
The Russian actors also have an unsiloed property where their operation can be tied through
Sociometrics to other activities including:
- PayDay Loan Scams
- Pornographic Websites
- Law Firm Referral Services
- Illegal Movie Downloads
The Russian actors are more centered in the Pharmaceutical space where they apply an “Affiliate
Based” methodology for distribution. Additionally, they are much less active in the Paid
Sponsored Advertisement space than the Chinese actors.
Page | 12 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
Figure 11: Affiliate Based Distribution of Websites
Page | 13 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
Figure 12: Notice Replication in the Phone Numbers
One operation is running TDS redirectors know to be used in some malware deployments and
clickfraud – in.cgi. This activity was traced back to tds.moncarlo.net, registered to
[email protected], giving reference to Markov chain Monte Carlo as a randomized
redirection system. That user also owns a number of pharmaceutical sites and when you
attempt to connect to the site without a proper URL, you are presented with only the following.
Page | 14 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
Free OSP Content Free OSPs inside the United States are being used
to deliver content to victims as well. The OSPs,
including Facebook, Tumblr, Pinterest, Twitter,
Blogspot, Webs.com and others. These sites are
being used to deliver Second and Top Tier content
in addition to advertising email addresses for sales
Examples of Counterfeit Pharmaceuticals on Free
Website Providers - online-mexico-pharmacy-prednisone.webs.com
- drugsatchemistcoltedpharma.webs.com
- nolvadex-web-pharmacy45.webs.com
- pharmacy-2616195.webs.com
- nolvadex-canada-pharmacy.webs.com
- pharmacy-that-sell-synthroid.webs.com
- pharmacyzyvoxlinezolidinintern60.tumblr.com
Domains Directly Owned by This
“Name”
247-health-online.com
canada-express-shop.com
moncarlo.net
my-health-24.com
Compromised Websites
Page | 15 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
contacts. Among out sites, the top “free website” providers in descending order are:
1. Tubmlr.com
2. Webs.com
3. Blogspot.com
4. Wordpress.com
5. Tripod.com
6. Weebly.com
Free Email Providers Of the 2,706 unique free email addresses we have identified, the distribution is weighted heavily for
using @gmail.com or @hotmail.com. Their reuse on other websites varies from single use to dozens of
sites.
Finding Victims Delivering this content to victims is achieved through a few different methods.
• Search Engines
Page | 16 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
• Sponsored Advertisements10
• Spam
• Social Networks & OSPs (i.e. Facebook,
Twitter, Tumblr, Pinterest, etc…)
Socially Engineered Content
In addition to the normal counterfeit content that is created weekly, CEs are creating websites socially
engineered to meet the seasons and holidays throughout the year, and can be country specific. In
summer, fashion sites, sunglasses, Major League Baseball and other content specific to summer time is
created and distributed. As the year progresses and we approach football season, National Football
League jerseys come into prevalence.
The content development is specific to holidays as well. This includes “Black Friday”, “Cyber Monday”
and even country specific holidays like “Boxing Day” (UK).
10 http://fortknoxnetworks.blogspot.com/2013/08/cybercriminals-using-facebook-paid.html
Page | 17 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
Figure 13: Examples of Socially Engineered, Holiday Specific Counterfeit Site
The expectation is that the user will enter these search terms during
the holiday seasons, or be more prone to search for Sunglasses
during the summer time.
After the development period, the code is deployed to websites
which are pushed through the botnet for SEO results relevant to the
socially engineered term and potentially advertised and spammed.
This process occurs for numerous brands and trademarks and can
easily result in hundreds or even thousands of unique users on the
first day.
In one case, a website was created, deployed and then reached 12
Million in Alexa ranking within 10 days. The ranking was increasing quickly enough to show the
distribution network was achieving success.
Attribution Through Forensics and Sociometrics
We employ a number of methods to uncover entire operations, or create entity disambiguation. The
Criminal Online Sociogram below represents an operation of 125 websites we have definitively tied
together valued at about $100M/year in sales. The MO and forensics ties this operation to the larger
Chinese HSSW operation likely above several billion dollars in counterfeit sales.
Page | 18 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
Figure 14: Criminal Sociogram of 125 Counterfeit Websites
Counterfeit Pharmaceuticals o Group 1 - Chinese actors –Their MO and distribution model is similar enough to identify their
methods are the same as the HSSW counterfeiters, however their semantics and programming are
different enough to believe this is a different unit or subset of the same operation. This group employs
the same Markovian generator techniques, compromised websites, compromised social media accounts
and other blackhat techniques similar to the HSSW Counterfeiters
o Group 2 - Russian/Ukrainian Actors – The Russian and Ukrainian actors tend to rely more on the
“affiliate” model for replication and distribution. Their methods, distribution and resiliency model is
different from the Chinese actors MO and semantic commonalities.
High Fashion/Sports Apparel/Sunglasses/Watches (HSSW) o Group 1 - Chinese Actors – Our data and intelligence shows that this is leviathan in scale. A well
organized operation, this group includes hackers, spammers, advertisers, developers, “sales” support
(term used loosely), order fulfillment and the financial administration needed to maintain numerous
bank, e-commerce and credit card processing accounts. For example, this group seems to effectively
manage hundreds, if not thousands, of email addresses, and “customer support chat” accounts.
Where we have been able to make contact with specific people in this organization, we have seen
mostly Fuzhou and Changsha China as their locations, however on the ground intelligence has
additionally notified us of physical recruitment operations in the GongDong Province. This group is
highly replicated in their MO, distribution, semantics and they are responsible for activity that is orders
of magnitude higher than any other group on the web for HSSW. Their activities include
Page | 19 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
▪ HSSW Content
▪ Social Media & Digital Media Advertisements
▪ Website Compromise
▪ Hosting Account Compromise
▪ Forum and Social Media Spam
▪ Email Spam
▪ Counterfeit Social Media Groups
▪ Fake Social Media Accounts
▪ Twitter and Facebook Postings in Groups
Group 2 - Other unidentified groups operate at much smaller volumes with great difference to the MOs
and resiliency models. For example, one website being operated by a small time counterfeiter was
seized in a Legal Action. That counterfeiter took to his Facebook account to announce that he moved his
content to another web page. This individual is interested in customer loyalty, which the large scale
Chinese HSSW operation has no concerns about.
Various Groups o Outside of the HSSW groups and pharmaceuticals, additional activities appear to be engaged in
various activities on a small scale worldwide. For example, a number of counterfeit Drivers license,
passport and social security as well as diploma counterfeiters appear to be operating. We have
dedicated only a few cycles to this content but have none-the-less identify some individuals utilizing
replicated content for high resiliency. There are a few fake ID operations that appear to center around
British Columbia, Canada and South East Asia, though more resources or time would prove useful.
Offensive Social Engineering Points to Source
We performed controlled requests for takedowns on a few targeted sites from the distribution botnet
using a method where the domain owner would see the domain we were sending the email from. We
picked three different domains distributed inside the HSSW botnet.
On three separate occasions, as the takedown notices were sent out, a computer in China with the same
User agent string connected to the website. This computer was running Windows XP with Internet
Explorer 6 and using a free version of Chinese enterprise chat software, indicating a possible usage
relationship with qq.com.
The first connection was made from Changsha, China followed twice from Fuzhou, China a few days
later. Interestingly enough, when we contacted the prospective sellers of this merchandise and perform
forensics on the email headers, the users were located in Fuzhou, China as well.
Page | 20 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
How Much Money Are They Making?
We estimate $250 to $1600 per day per site, based on our limited intelligence in this area at this time.
Based on this information, the number of sites we have identified amount to between $1.3 Billion and
$8 Billion annually, likely somewhere in the middle. Coming from the premise that we have not
identified all of the sites they operate, the Chinese actors involved in these operations are likely the top
grossing CE on the web, however more information is required to refine that determination.
Page | 21 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
The graph below was created through studying the log files obtained for the cybersquatted domain loulsvuitton.com. This was hosted under an
account which was made public and contained four different brand counterfeiting websites. The log files for loulsvuitton.com were intact and
clearly demonstrated the website was copied from louisvuitt0n.com, which we had identified in July in a Facebook Advertisement.
On April 22nd, the site louisvuitt0n.com was in testing for e-commerce code for a few days. Once published and advertised, the site grossed
$1,476 in sales on the first day it was advertised.
Meanwhile, on the same hosting account were multiple brand-counterfeiting websites, one of which was an NFL Jersey counterfeit site whose
code and metadata definitively tied it to more than 100 sites. Based on the estimated sales volume, this specific HSSW is worth >$100M/year.
The site louisvuitt0n.com was seized sometime in December 2013, and by that point had grossed about $376,000 and the sister site,
loulsvuitton.com was still online.
Page | 22 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
Current Weaknesses in Takedowns: This is a counterfeit enterprise that is exploiting the weaknesses in protocol, legislative and regulatory
framework. Our studies of this operation reveal some relevant information on response.
1. 3% of counterfeit sites are being seized. We discover 97 counterfeit websites for every 3 that
we find are seized. While there are some minor variations, we have evidence that the vast majority of
sites go undetected.
2. They operate over a year before they are found. Across our data set, the average length of time
a counterfeit domain has been operating when addressed or identified is 553 days, or 1.5 years (this is a
moving number). During that time frame, using the logs which we have obtained, we estimate each
domain is billing between $138,000 and $553,000.
3. The math does not add up to impacting the overall operation. Seizing 3% of sites which have
been operating for 1.5 years is not enough to impact the CE. Even when trademark owners are
aggressively pursuing these websites and approach a 50% seizure rate through our observations, that
impact is 50% of only one brand, which leads to a much smaller impact on the overall organization. For
example, if the group is counterfeiting 10 brands, and one brand’s enforcement approaches 50%
seizure, the impact to the organization is 5%. They are undeterred.
4. Alexa rankings are a poor metric. Some in the industry are addressing only websites which rank
high in Alexa rankings, believing they will be addressing the lowest hanging fruit. The issue with this
approach is that the enterprise can collect tens of thousands of dollars from a counterfeit website prior
to it achieving a rank on Alexa and hundreds of thousands of dollars before the ranking warrants any
attention. Because the site is part of an interwoven CE comprising thousands of websites, the money
continues to flow.
Resiliency Insulates the Counterfeit Enterprise Since we are proving this activity is rooted in a limited number of organizations, the strategy of
resiliency shows itself as effective. This protects their profits through diversification, demonstrated in
the graph below. If the overall enterprise deploys a large scale and distributed model, the enterprise
itself can funnel money at high volume despite individual domain seizures. Additionally, successful
website designs are replicated before they are seized and then placed into the distribution network.
Socially engineered domains, containing specific phrases like “blackfriday” or “cybermonday” large
volumes of traffic before they appear to be discovered.
Page | 23 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
Figure 15: Demonstration of Resiliency in the Profit Model
While it is true that seizure of bank accounts and funding may be occurring, the billing processors and
accounts are distributed across a number of providers and accounts, again achieving resiliency.
The Response to Takedowns
In controlled experiments, we have affected takedowns of
cherry picked domains owned by the Chinese Pharmaceutical
and HSSW. The domains were not seized in a legal action, only
suspended. After a takedown is completed, the website
becomes re-hosted or re-registered in an average of 3 days,
sometimes with the same hoster or registrar.
In one case, we completed a takedown of a website, which was
hosted at a specific web hoster in the United States. In response to our takedown, the CE created
another account at the same hoster and brought the content back online.
“In response to our takedown,
the CE (Counterfeit
Enterprise) simply created
another account at the same
hoster and re-published the
same website.”
Page | 24 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
The CE also moved their content to numerous other hosters, mostly reacting by moving their websites
outside the United States. Some reacted by moving their hosting to dedicated server environments,
others moved to Maldova or Switzerland. We theorize that if we scale up, the Sociometry of their
reactions would further expose operations.
Identity Theft & Financial Fraud In our studies of the operations of these websites, large numbers contain “account creation” and
“payment” mechanisms that are not utilizing standard HTTPs encryption. In addition to the plain text
transmission of PII across the large scale internet framework, the information is being collected by a
Criminal Enterprise based in China or Russia.
In events where data has been collected, it is not subject to any standards of storage and PII protections
generally accepted by the internet at large. In cases where the log files or databases are inadvertently
left in public facing positions, the data is complete and readable by anyone with little recourse or
consequences to the CE.
Additionally, payment methods like Moneygram and Western Union increase the likelyhood of simple
financial fraud.
Cost to Effect Ratio The cost for the counterfeit enterprise to create and publish a
website is nominal. For the purposes of example, we will say it
costs $25 to put up a counterfeit website and $500 to seize it.
The ratio of interdiction to crime is 20/1, meaning it costs $20
to interdict every $1 of criminal CE investment. Unless we bend this cost curve down significantly, the CE
wins every day. Their expansion in recent years seems to support that the ROI for them is quite
effective.
During 2013, Google received 235 Million DCMA requests11 12, meanwhile the Counterfeit Enterprise
created > 400 Billion Links in their linkfarm for Chinese HSSW alone. This would give credence to the
feeling expressed by the RIAA “We are using a bucket to deal with an ocean”13. If every single one of
those resources were tasks to interdicting the HSSW botnet, it would result in addressing >1% of the
problem.
11 http://torrentfreak.com/google-discarded-21000000-takedown-requests-in-2013-
131227/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Torrentfreak+(Torrentfreak)
12 http://www.google.com/transparencyreport/removals/copyright/?hl=en 13 http://www.riaa.com/blog.php?content_selector=riaa-news-blog&blog_selector=One-Year-
&news_month_filter=5&news_year_filter=2013
“We are using a bucket to deal
with an ocean” -Brad Buckles,
RIAA
Page | 25 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
In the current DCMA & OCILLA landscape, most registrars, hosters and OSPs opt for a reporting system
that requires filling out catchpas and other methods to force reporting on a one-at-a-time basis.
Additionally, some organizations require specific information for each website, additionally increasing
the costs to respond.
While some organizations are cooperative in addressing bulk complains, the counterfeiters gravitate to
organizations that do not.
If, according to DCMA, the hosters and registrars are not required to monitor their own networks for the
publishing of counterfeit and trademark infringing materials and their processes slow down bulk
reporting by those employed in detection and identification, our system currently supports a cost to
effect ratio that supports counterfeiting.
Addressing Criminal Counterfeit Enterprise The methods of forensics and Sociometrics can be used to expose the enterprise as a whole, as
demonstrated in this paper. This requires specific technology and skill sets employed in unison with law
enforcement to interdict the CE itself. Intellectual Property owners are ill equipped to address this
problem because of the malicious nature and volume of activity, including compromising websites,
compromising credentials and violating Trademark and Intellectual Property laws en mass.
The motivations are clear. The criminal operation negatively impacts:
• Intellectual property owners
• Government tax revenues
• Online advertisers
• Auction websites
• Consumers
• Resale marketplaces
• Search engines
• Banks and Financial Organizations
• Payment Processors
• Social networks
• Jobs in America, further impacting the economy, tax revenues and business.
• The economy as a whole
Most importantly it is negatively impacting consumer confidence in the internet.
Takeaways This is a criminal problem being run by highly sophisticated enterprises. Those enterprises can be
exposed and that information can be used to further diminish their influence and revenues. We are
Page | 26 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
interested in a cooperative framework to address this problem using a system that can vet and correlate
their activity.
Our research shows that the CEs are highly resistant to the effects of a single or group website
takedown and that cloaking is effective enough to block identifying enough websites. More
sophisticated intelligence gathering which employs successful disambiguation methods can expose the
organization as a whole.
The CE itself can reap hundreds of thousands of dollars from a website before it is ranked on Alexa.
Websites suspended, can be put back online in 24 to 72 hours and seized websites can be recreated and
promoted in days.
The concept that we have applied to this, identifying the enterprise operation as a whole, could
potentially have a greater impact on online criminal operations by identifying whole portions of their
Sociometrically mapped networks and with the cooperation of the industry may further secure the
internet from nefarious parties.
The existence of counterfeit factories, order systems and sales depends on the counterfeiter placing
their online content in front of an audience. Currently, they are very successful at achieving this goal.
Appendix A: Prior Work in This Area • Foundations of Sociometry (1941): J. L. Moreno
http://www.psicologia1.uniroma1.it/repository/387/Moreno_1941.pdf
• An Introduction to Sociogram Construction: Hollander
http://asgpp.org/pdf/carl%20hollander%20sociogram.pdf
• Sociometric Applications in Criminology: Brag and Rounds
http://www.educ.ttu.edu/uploadedFiles/personnel-folder/lee-duemer/epsy-
6304/documents/Sociometric%20application%20in%20criminology%20and%20other%20settings.pdf
• Research Methods of Criminology and Criminal Justics (Criminal Sociometry) : Dantzker and
Hunter
http://books.google.com/books?id=pvXzFpKGQUgC&pg=PA63&lpg=PA63&dq=sociometry+crimi
nal&source=bl&ots=qWEtHYySZt&sig=uR9VsqvJZgEzGpfxL2TV6CUFc4Q&hl=en&sa=X&ei=9ZS8UuPHCc-
GrAeUqIGADw&ved=0CCsQ6AEwAA
• Detecting Fake Content with Relative Entropy Scoring: Lavergne, Urvoy and Yvon
http://www.uni-weimar.de/medien/webis/research/events/pan-08/pan08-papers-
final/lavergne08-detecting-fake-content-with-relative-entropy-scoring.pdf
• K. Bharat and M. R. Henzinger. Improved algorithms for topic distillation in a hyperlinked
environment. In Proceedings of the 21st International ACM SIGIR Conference on Research and
Development in Information Retrieval, pages 104{111, Melbourne, AU,Aug. 1998.
• Identifying Link Farm Spam: Wu and Davidson
http://www.cse.lehigh.edu/~brian/pubs/2005/www/link-farm-spam.pdf
• Entity Identification in the Semantic Web: - Morris, Velegrakis and Bouquet
Page | 27 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
http://disi.unitn.it/~velgias/docs/MorrisVB08.pdf
• Finding User Semantics on the Web Using Word Co-occurrence: Mori, Matsuo and Ishizuka
http://www.win.tue.nl/persweb/Camera-ready/5-Mori-short.pdf
• Entity Disambiguation fo Knowledge Base Population: Dredze, McNamee, Rao, Gerber and Finan
http://www.cs.jhu.edu/~delip/entity_linking_coling10.pdf
• Entropy Compression and Information Content: Fossum
http://www.isi.edu/~vfossum/entropy.pdf
• Shanon Entropy and Komogorov Complexity: Grunwalk and Vitanyi
http://homepages.cwi.nl/~paulv/papers/info.pdf
• Defining Habits: Dickens and the Psychology of Repetition
http://www.case.edu/artsci/engl/Library/Vrettos--Defining%20Habits.pdf
Appendix B: Methodology & Theory
In an effort to better study Counterfeit Enterprise (referred to as CE) operations worldwide, we required
a system that collected, analyzed and scored websites as well performing correlation of data. This
information could be used to map associations. The design resulted in the HIIT System, which is the
brainchild of this research project, and was based on the premise that weaknesses must be present in
CE which will uncover relationships between seemingly unrelated sites and methods. Some of the
weaknesses are:
Modus operandi (MO), defined as “A method of operating or functioning."14 The premise is that
complexity has limitations when limited resources are applied to creating a widespread system and MOs
are detectable, even if well obfuscated. This takes the form of repetition in technology, in text
communication (AKA ngrams15) as well as distribution methods. Our methods primarily center on
applied sciences in data mining, data correlation, ontological semantics, observational behavior and
forensics in semantics and programming which are deobfuscated into relationships.
Prior work in these areas is well documented in the semantic science areas including the following
examples [See Appendix A for a Complete list]:
• Mor, Matsuo and Ishizuka, Finding User Semantics on the Web16
• Morris, Velegrakis, Bouquet – Entity Identification on the Semantic Web17
• Saferstein, R. 2004. Criminalistics: An introduction to forensic science.18
14 http://www.merriam-webster.com/dictionary/modus%20operandi 15 https://en.wikipedia.org/wiki/N-gram 16 http://www.win.tue.nl/persweb/Camera-ready/5-Mori-short.pdf 17 http://disi.unitn.it/~velgias/docs/MorrisVB08.pdf 18 http://www.amazon.com/Criminalistics-Introduction-Forensic-Science-Edition/dp/0135045207
Page | 28 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
Criminal Sociometry. We build on the concept of Sociograms & Sociometry introduced by Jacob L.
Moreno19 in the 1930s. This is defined as “the inquiry into the evolution and organization of groups and
the position of individuals within them” and further discussed by Carl Hollender20 as “a map
of…interpersonal lines of communication”. The application here is dual-homed and more based on the
Criminology Sociometry presented by Berg and Rounds21 described as “assessing group relational
structures such as hierarchies [and] friendship networks” when the subject of that study is set of online
activities.
Those nodes can be correlated using a series of data points including HTML structure, software, text
content (otherwise known as ngrams), registrar information, email information and a number of other
data points. This concept was discussed in “Research Methods for Criminology and Criminal Justice”
(Dantzkey and Hunter22) though the application here is based on a behavioral relationship of an online
enterprise.
The Law of Large Numbers. In probability theory, the law of large numbers (LLN) is a theorem that
describes the result of performing the same experiment a large number of times.23 The concept is that
discrepancies or variations are less pronounced when you conduct an experiment thousands of times, or
in this case, observe behavior across thousands of sites. The application of this theory is that if we track
very specific pieces of information that can identify commonalities among actors, over time the statistics
will flatten out and we can identify a relationship (sociometry) in the enterprise as well as a behavioral
patterns in the Sociometrical relationship.
To demonstrate, when one flips a coin 10 times, it is possible that the result is 5 heads and 5 tails,
however this is not predictable and variations can be extreme. On the other hand, if you flip a coin
10,000 times, the variations are much less pronounced and one discovers that you will flip heads 50% of
the time, overall.
19 http://www.psicologia1.uniroma1.it/repository/387/Moreno_1941.pdf 20 http://asgpp.org/pdf/carl%20hollander%20sociogram.pdf 21 http://www.educ.ttu.edu/uploadedFiles/personnel-folder/lee-duemer/epsy-
6304/documents/Sociometric%20application%20in%20criminology%20and%20other%20settings.pdf 22
http://books.google.com/books?id=pvXzFpKGQUgC&pg=PA63&lpg=PA63&dq=sociometry+criminal&source=bl&ots=qWEtHYySZt&sig=uR9VsqvJZgEzGpfxL2TV6CUFc4Q&hl=en&sa=X&ei=9ZS8UuPHCc-GrAeUqIGADw&ved=0CCsQ6AEwAA
23 http://en.wikipedia.org/wiki/Law_of_large_numbers
Page | 29 © 2013 Fort Knox Networks, Frank Angiolelli. All Rights Reserved.
Figure 16: Graph of Coin Flips Demonstrates How Anomalies are Smoothed Over A Large Data Set
24
By identifying the MO, using advanced correlation, exploiting the limited number of ngrams and tracking
that information out to build a data set large enough (>10,000) and flatten out inconsistencies (Law of
Large Numbers), we expose the enterprise.
24 http://upload.wikimedia.org/wikipedia/commons/thumb/f/f9/Largenumbers.svg/400px-
Largenumbers.svg.png