welcome: please sign in

次の245語(ハイライト表示)は、1209語の辞書 1209語のLocalSpellingWordsを含む)中に見つかりませんでした。
accessible   accessing   agent   Allow   allow   allowed   allows   Although   although   appear   asks   assumes   attachments   attempting   bad   because   blocked   building   but   By   by   calendar   called   can   cause   certain   changed   class   client   clients   collection   Config   configuration   configured   consult   control   controls   convenient   Crawl   crawlers   curl   damaging   database   default   Default   delay   deny   depend   deploys   describing   Description   details   different   disable   does   doing   done   dos   download   drowning   engaged   engine   engines   eventual   every   example   explicit   expose   expression   fashion   ffffcc   files   find   follow   following   For   for   forbid   format   forms   frequently   from   functionality   general   given   having   head   high   how   htdocs   if   ignore   ignoring   including   indexed   indexing   influence   inside   instruct   instructions   interest   interesting   internally   invoke   invoking   know   known   large   leech   lets   like   link   linked   little   load   Make   Making   many   matches   mechanisms   mining   more   mostly   multiconfig   multiple   name   Name   navigate   need   nofollow   noindex   normal   Note   numbers   observe   On   on   One   only   opposite   or   order   out   output   own   part   past   path   pattern   perform   permit   persuaded   posts   practice   preventing   proceeds   process   produced   program   programs   publicly   purpose   queries   rather   redefine   regular   related   remove   replace   reputation   requesting   resemble   resembles   resources   results   revisions   risk   robot   Robots   robots   root   rowbgcolor   rules   searchable   seconds   sections   See   seen   services   setting   settings   should   Since   sites   so   some   Some   specific   specify   spiders   static   still   stored   such   supervision   sure   tags   tend   tends   than   that   their   These   these   they   things   this   This   through   to   To   tools   typically   ua   understand   unknown   up   use   used   useful   User   various   visit   visiting   way   web   well   wget   when   which   will   wish   within   without   Xapian  

メッセージを消す
location: HelpOnRobots

Robots

Robots (known also as crawlers and spiders) are programs which navigate Internet sites and download the content without explicit supervision, typically for the purpose of building a database to be used by search engines, although they may also be engaged in other forms of data-mining. Although such programs may perform a useful purpose, they can also cause a high load on sites they visit by requesting large numbers of pages and other resources; they can also expose content from a site that is of little general interest, drowning out the interesting content in the eventual search engine results produced for the site.

(!) Note that MoinMoin's own search functionality does not depend on having robots access the pages of a wiki. See HelpOnXapian for details of the search engine indexing that can be done internally within MoinMoin.

MoinMoin controls robots through the following mechanisms:

Name

Description

ua_spiders

This configuration setting controls access to actions, preventing such programs from visiting things like past page revisions (through the "info" action) or from attempting to change the content on a MoinMoin site in some way.

html_head_index

These configuration settings control the <HEAD> tags that appear in HTML output. Since some robots process <META> tags and observe instructions related to link-following, these settings may be changed to influence how robots navigate a site.

html_head_normal

html_head_posts

html_head_queries

robots.txt

MoinMoin deploys a collection of static resources inside a directory called htdocs, including a file called robots.txt. By editing this file, robots can be persuaded to access (or not access) a site.

Making a site more publicly searchable

By default, MoinMoin tends to forbid robots, deny access by robots to actions, and instruct robots not to follow links if they should end up accessing pages on a wiki. Although a wiki configured in this fashion may still appear in search engine results, mostly because other sites may have linked to pages on such a wiki, many pages will be unknown to such search engines.

To permit indexing by robots...

  1. Change the robots.txt file to resemble the following:

    User-agent: *
    Allow: /
    Crawl-delay: 20

    This allows any robot (User-agent), but asks that they only access the site every 20 seconds or less frequently. You can specify a more specific pattern to only permit certain robots, and add multiple sections describing the allowed access rules for different robots. Make sure that the URL path given for Allow matches the root of the wiki or the part of the wiki that should be indexed.

  2. In your configuration change or add the html_head_normal setting so that it resembles the following:

    html_head_normal = '<meta name="robots" content="index,follow">\n'
    

    This lets robots know that they should index normal pages and follow the links to find other pages. These settings can be changed to noindex and nofollow to instruct robots to do the opposite.

Note that robots are free to ignore the instructions given, although doing so is seen as bad practice, and so the more well-known search engine services tend to observe the instructions rather than risk damaging their reputation by ignoring them.

Making actions accessible by certain clients

Some users wish to use programs other than their normal web browser to access wiki content. For example, a calendar client may need to invoke an action in order to download content in a format it can understand, and it may appear as the curl program when accessing a site; or it may be convenient for some users to use tools like wget to access files stored as attachments.

To allow certain clients to perform actions, change the ua_spiders setting in your configuration. One way of doing this is to redefine the setting as follows:

ua_spiders = multiconfig.DefaultConfig.ua_spiders.replace("|wget", "").replace("|curl", "")

The above assumes that the configuration class is multiconfig.DefaultConfig - you will need to consult which class is used in your own configuration - and proceeds to edit the setting to remove wget and curl from a regular expression describing robots or clients that are blocked from invoking actions.