Skip User Agents - BloomReach Experience - Open Source CMS

This article covers a Hippo CMS version 11. There's an updated version available that covers our most recent release.

04-07-2016

Skip User Agents

For a relevant customer experience, it is unnecessary that crawlers like those used by Google, Yahoo, Bing and others get targeted pages. In addition it is undesirable that crawlers influence Scoring and Normalization averages. The same holds for link checker tools and services, like WatchMouse, W3C-checklink or Xenu Link Sleuth. Also, in the Real-Time Visitor Analysis screen it is most likely not desirable that these kind of crawler or linkchecker visitors are shown. From a crawler's point of view, it also doesn't make sense that it indexes a targeted page. For example the stores nearby block on the right of a page doesn't add any value to be indexed by search engines.

Hence, by default, we do not target these kind of requests at all. By default, the Relevance Module ignores requests from a large set of commonly used crawlers and link checkers. This is done by checking the request User-Agent header whether it contains some string that indicates that it is a robot or link checker. You can add extra user agents to skip in the repository at the multi-valued property at: /targeting:targeting/targeting:skipUserAgents

It pays to take a look at the collected request log data to exclude agents that do not need to be targeted.

The default list of skipped user agents is the as below, largely taken from http://www.useragentstring.com/pages/useragentstring.php

  1. ABACHOBot

  2. Accoona-AI-Agent

  3. AddSugarSpiderBot

  4. AnyApexBot

  5. Arachmo

  6. B-l-i-t-z-B-O-T

  7. Baiduspider

  8. BecomeBot

  9. BeslistBot

  10. BillyBobBot

  11. Bimbot

  12. Bingbot

  13. BlitzBOT

  14. boitho.com-dc

  15. boitho.com-robot

  16. btbot

  17. CatchBot

  18. Cerberian Drtrs

  19. Charlotte

  20. ConveraCrawler

  21. cosmos

  22. Covario IDS

  23. DataparkSearch

  24. DiamondBot

  25. Discobot

  26. Dotbot

  27. EARTHCOM.info

  28. EmeraldShield.com WebBot

  29. envolk[ITS]spider

  30. EsperanzaBot

  31. Exabot

  32. FAST Enterprise Crawler

  33. FAST-WebCrawler

  34. FDSE robot

  35. FindLinks

  36. FurlBot

  37. FyberSpider

  38. g2crawler

  39. Gaisbot

  40. GalaxyBot

  41. genieBot

  42. Gigabot

  43. Girafabot

  44. Googlebot

  45. Googlebot-Image

  46. Googlebot-Mobile

  47. Googlebot-News

  48. Googlebot-Video

  49. GurujiBot

  50. HappyFunBot

  51. hl_ftien_spider

  52. Holmes

  53. htdig

  54. iaskspider

  55. ia_archiver

  56. iCCrawler

  57. ichiro

  58. igdeSpyder

  59. IRLbot

  60. IssueCrawler

  61. Jaxified Bot

  62. Jyxobot

  63. KoepaBot

  64. L.webis

  65. LapozzBot

  66. Larbin

  67. LDSpider

  68. LexxeBot

  69. Linguee Bot

  70. LinkWalker

  71. lmspider

  72. lwp-trivial

  73. mabontland

  74. magpie-crawler

  75. Mediapartners-Google

  76. MJ12bot

  77. MLBot

  78. Mnogosearch

  79. mogimogi

  80. MojeekBot

  81. Moreoverbot

  82. Morning Paper

  83. msnbot

  84. MSRBot

  85. MVAClient

  86. mxbot

  87. NetResearchServer

  88. NetSeer Crawler

  89. NewsGator

  90. NG-Search

  91. nicebot

  92. noxtrumbot

  93. Nusearch Spider

  94. NutchCVS

  95. Nymesis

  96. obot

  97. oegp

  98. omgilibot

  99. OmniExplorer_Bot

  100. OOZBOT

  101. Orbiter

  102. PageBitesHyperBot

  103. Peew

  104. polybot

  105. Pompos

  106. PostPost

  107. Psbot

  108. PycURL

  109. Qseero

  110. Radian6

  111. RAMPyBot

  112. RufusBot

  113. SandCrawler

  114. SBIder

  115. ScoutJet

  116. Scrubby

  117. SearchSight

  118. Seekbot

  119. semanticdiscovery

  120. Sensis Web Crawler

  121. SEOChat::Bot

  122. SeznamBot

  123. Shim-Crawler

  124. ShopWiki

  125. Shoula robot

  126. silk

  127. Sitebot

  128. Snappy

  129. sogou spider

  130. Sosospider

  131. Speedy Spider

  132. Sqworm

  133. StackRambler

  134. suggybot

  135. SurveyBot

  136. SynooBot

  137. Teoma

  138. TerrawizBot

  139. TheSuBot

  140. Thumbnail.CZ robot

  141. TinEye

  142. truwoGPS

  143. TurnitinBot

  144. TweetedTimes Bot

  145. TwengaBot

  146. updated

  147. Urlfilebot

  148. Vagabondo

  149. VoilaBot

  150. Vortex

  151. voyager

  152. VYU2

  153. webcollage

  154. Websquash.com

  155. wf84

  156. WoFindeIch Robot

  157. WomlpeFactory

  158. Xaldon_WebSpider

  159. yacy

  160. Yahoo! Slurp

  161. Yahoo! Slurp China

  162. YahooSeeker

  163. YahooSeeker-Testing

  164. YandexBot

  165. YandexImages

  166. YandexMetrika

  167. Yasaklibot

  168. Yeti

  169. YodaoBot

  170. yoogliFetchAgent

  171. YoudaoBot

  172. Zao

  173. Zealbot

  174. zspider

  175. ZyBorg

  176. AbiLogicBot

  177. Link Valet

  178. Link Validity Check

  179. LinkExaminer

  180. LinksManager.com_bot

  181. Mojoo Robot

  182. Notifixious

  183. online link validator

  184. Ploetz + Zeller

  185. Reciprocal Link System PRO

  186. REL Link Checker Lite

  187. SiteBar

  188. Vivante Link Checker

  189. WatchMouse

  190. W3C-checklink

  191. Xenu Link Sleuth

Did you find this page helpful?
How could this documentation serve you better?
On this page
    Did you find this page helpful?
    How could this documentation serve you better?