Skip to content
This repository was archived by the owner on Feb 6, 2025. It is now read-only.
/ robots.txt Public archive

This library allow to parse a Robots.txt file and then check for URL status according to defined rules.

License

Notifications You must be signed in to change notification settings

bee4/robots.txt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Feb 24, 2016
5fa1a9a · Feb 24, 2016
Feb 24, 2016
Feb 24, 2016
Feb 16, 2016
Feb 16, 2016
Aug 27, 2015
Feb 16, 2016
Feb 24, 2016
Mar 13, 2015
Feb 24, 2016
Feb 24, 2016
Jan 19, 2016
Mar 13, 2015

Repository files navigation

bee4/robots.txt

Build Status Scrutinizer Code Quality Code Coverage SensiolabInsight

License

This library allow to parse a Robots.txt file and then check for URL status according to defined rules. It follow the rules defined in the RFC draft visible here: http://www.robotstxt.org/norobots-rfc.txt

Installing

Latest Stable Version Total Downloads

This project can be installed using Composer. Add the following to your composer.json:

{
    "require": {
        "bee4/robots.txt": "~2.0"
    }
}

or run this command:

composer require bee4/robots.txt:~2.0

Usage

<?php

use Bee4\RobotsTxt\ContentFactory;
use Bee4\RobotsTxt\Parser;

// Extract content from URL
$content = ContentFactory::build("https://httpbin.org/robots.txt");

// or directly from robots.txt content
$content = new Content("
User-agent: *
Allow: /

User-agent: google-bot
Disallow: /forbidden-directory
");

// Then you must parse the content
$rules = Parser::parse($content);

//or with a reusable Parser
$parser = new Parser();
$rules = $parser->analyze($content);

//Content can also be parsed directly as string
$rules = Parser::parse('User-Agent: Bing
Disallow: /downloads');

// You can use the match method to check if an url is allowed for a give user-agent...
$rules->match('Google-Bot v01', '/an-awesome-url');      // true
$rules->match('google-bot v01', '/forbidden-directory'); // false

// ...or get the applicable rule for a user-agent and match
$rule = $rules->get('*');
$result = $rule->match('/'); // true
$result = $rule->match('/forbidden-directory'); // true

About

This library allow to parse a Robots.txt file and then check for URL status according to defined rules.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages