Skip to content

Commit 284f3d1

Browse files
committed
0.2, see Changelog
1 parent fc3972e commit 284f3d1

File tree

17 files changed

+629
-43
lines changed

17 files changed

+629
-43
lines changed

.scrutinizer.yml

Lines changed: 1 addition & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@ filter:
22
excluded_paths: [tests/*]
33
checks:
44
php:
5-
code_rating: true
65
remove_extra_empty_lines: true
76
remove_php_closing_tag: true
87
remove_trailing_whitespace: true
@@ -19,17 +18,4 @@ checks:
1918
tools:
2019
external_code_coverage:
2120
timeout: 600
22-
runs: 3
23-
php_analyzer: true
24-
php_code_coverage: false
25-
php_code_sniffer:
26-
config:
27-
standard: PSR2
28-
filter:
29-
paths: ['src']
30-
php_loc:
31-
enabled: true
32-
excluded_dirs: [vendor, tests]
33-
php_cpd:
34-
enabled: true
35-
excluded_dirs: [vendor, tests]
21+
runs: 3

README.md

Lines changed: 221 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,16 +19,235 @@ Via Composer
1919
composer require swader/diffbot-php-client
2020
```
2121

22-
## Usage
22+
## Usage - simple
2323

24-
Todo
24+
Simplest possible use case:
25+
26+
```php
27+
$diffbot = new Diffbot('my_token');
28+
29+
$articleApi = $diffbot->createArticleAPI('http://www.sitepoint.com/diffbot-crawling-visual-machine-learning/');
30+
31+
echo $articleApi->call()->author; // prints out "Bruno Skvorc"
32+
```
33+
34+
That's it, this is all you need to get started.
35+
36+
## Usage - advanced
37+
38+
Full API reference manual in progress, but the instructions below should do for now - the library was designed with brutal UX simplicity in mind.
39+
40+
### Setup
41+
42+
To begin, always create a Diffbot instance. A Diffbot instance will spawn API instances.
43+
To get your token, sign up at http://diffbot.com
44+
45+
```php
46+
$diffbot = new Diffbot('my_token');
47+
```
48+
49+
### Pick API
50+
51+
Then, pick an API.
52+
53+
Currently available [*automatic*](http://www.diffbot.com/products/automatic/) APIs are:
54+
55+
- [product](http://www.diffbot.com/products/automatic/product/) (crawls products and their reviews, if available)
56+
- [article](http://www.diffbot.com/products/automatic/article/) (crawls news posts, blogs, etc, with comments if available)
57+
- [image](http://www.diffbot.com/products/automatic/image/) (fetches information about images - useful for 500px, Flickr etc). The Image API can return several images - depending on how many are on the page being crawled.
58+
- [discussion](http://www.diffbot.com/products/automatic/discussion/) (fetches discussion / review / comment threads - can be embedded in the Product or Article return data, too, if those contain any comments or discussions)
59+
- [analyze](http://www.diffbot.com/products/automatic/analyze/) (combines all the above in that it automatically determines the right API for the URL and applies it)
60+
61+
Video is coming soon.
62+
63+
There is also a [Custom API](http://www.diffbot.com/products/custom/) like [this one](http://www.sitepoint.com/analyze-sitepoint-author-portfolios-diffbot/) - unless otherwise configured, they return instances of the Wildcard entity)
64+
65+
All APIs can also be tested on http://diffbot.com
66+
67+
The API you picked can be spawned through the main Diffbot instance:
68+
69+
```
70+
$api = $diffbot->createArticleAPI('http://www.sitepoint.com/diffbot-crawling-visual-machine-learning/');
71+
```
72+
73+
### API configuration
74+
75+
All APIs have some optional fields you can pass with parameters. For example, to extract the 'meta' values of the page alongside the normal data, call `setMeta`:
76+
77+
```php
78+
$api->setMeta(true);
79+
```
80+
81+
Some APIs have other flags that don't qualify as fields. For example, the Article API can be told to ignore Discussions (aka to not extract comments). This can speed up the fetching, because by default, it does look for them. The configuration methods all have the same format, though, so to accomplish this, just use `setDiscussion`:
82+
83+
```php
84+
$api->setDiscussion(false);
85+
```
86+
87+
All config methods are chainable:
88+
89+
```php
90+
$api->setMeta(true)->setDiscussion(false);
91+
```
92+
93+
### Calling
94+
95+
All API instances have the `call` method which returns a collection of results. The collection is iterable:
96+
97+
```php
98+
$imageApi = $diffbot->createImageAPI('http://smittenkitchen.com/blog/2012/01/buckwheat-baby-with-salted-caramel-syrup/');
99+
/** @var Image $imageEntity */
100+
foreach ($imageApi->call() as $imageEntity) {
101+
echo 'Image dimensions: ' . $imageEntity->getHeight() . ' x ' . $imageEntity->getWidth() . '<br>';
102+
}
103+
104+
/* Output:
105+
Image dimensions: 333 x 500
106+
Image dimensions: 333 x 500
107+
Image dimensions: 334 x 500
108+
Image dimensions: 333 x 500
109+
Image dimensions: 333 x 500
110+
Image dimensions: 333 x 500
111+
Image dimensions: 333 x 500
112+
Image dimensions: 333 x 500
113+
Image dimensions: 333 x 500
114+
*/
115+
```
116+
117+
In cases where only one entity is returned, like Article or Product, iterating works all the same, it just iterates through through the one single elements. The return data is **always** a collection!
118+
119+
However, for brevity, you can access properties directly on the collection, too.
120+
121+
```php
122+
$articleApi = $diffbot->createArticleAPI('http://www.sitepoint.com/diffbot-crawling-visual-machine-learning/');
123+
echo $articleApi->call()->author;
124+
// or $articleApi->call()->getAuthor();
125+
```
126+
127+
In this case, the collection applies the property call to the first element which, coincidentally, is also the only element. If you use this approach on the image collection above, the same thing happens - but the call is only applied to the first image entity in the collection.
128+
129+
### Just the URL, please
130+
131+
If you just want the final generated URL (for example, to paste into Postman Client or to test in the browser and get pure JSON), use `buildUrl`:
132+
133+
```php
134+
$url = $articleApi->buildUrl();
135+
```
136+
137+
You can continue regular API usage afterwards, which makes this very useful for logging, etc.
138+
139+
### Pure response
140+
141+
You can extract the pure, full Guzzle Response object from the returned data and then manipulate it as desired (maybe parsing it as JSON and processing it further on your own):
142+
143+
```php
144+
$articleApi = $diffbot->createArticleAPI('http://www.sitepoint.com/diffbot-crawling-visual-machine-learning/');
145+
$guzzleResponse = $articleApi->call()->getResponse();
146+
```
147+
148+
Individual entities do not have access to the response - to fetch it, always fetch from their parent collection (the object that the `call()` method returns).
149+
150+
### Discussion and Post
151+
152+
The Discussion API returns some data about the discussion and contains another collection of Posts. A Post entity corresponds to a single review / comment / forum post, and is very similar in structure to the Article entity.
153+
154+
You can iterate through the posts as usual:
155+
156+
```php
157+
$url = 'http://community.sitepoint.com/t/php7-resource-recap/174325/';
158+
$discussion = $diffbot->createDiscussionAPI($url)->call();
159+
/** @var Post $post */
160+
foreach($discussion->getPosts() as $post) {
161+
echo 'Author: '.$post->getAuthor().'<br>';
162+
}
163+
164+
/*
165+
Output:
166+
167+
Author: swader
168+
Author: TaylorRen
169+
Author: s_molinari
170+
Author: s_molinari
171+
Author: swader
172+
Author: s_molinari
173+
Author: swader
174+
Author: s_molinari
175+
Author: swader
176+
Author: s_molinari
177+
Author: TomB
178+
Author: s_molinari
179+
Author: TomB
180+
Author: Wolf_22
181+
Author: swader
182+
Author: swader
183+
Author: s_molinari
184+
*/
185+
```
186+
187+
An Article or Product entity can contain a Discussion entity. Access it via `getDiscussion` on an Article or Product entity and use as usual (see above).
188+
189+
## Custom API
190+
191+
Used just like all others. There are only two differences:
192+
193+
1. When creating a Custom API call, you need to pass in the API name
194+
2. It always returns Wildcard entities which are basically just value objects containing the returned data. They have `__call` and `__get` magic methods defined so their properties remain just as accessible as the other Entities', but without autocomplete.
195+
196+
The following is a usage example of my own custom API for author profiles at SitePoint:
197+
198+
```php
199+
$diffbot = new Diffbot('brunoskvorc');
200+
$customApi = $diffbot->createCustomAPI('http://sitepoint.com/author/bskvorc', 'authorFolioNew');
201+
202+
$return = $customApi->call();
203+
204+
foreach ($return as $wildcard) {
205+
dump($wildcard->getAuthor()); // Bruno Skvorc
206+
dump($wildcard->author); // Bruno Skvorc
207+
}
208+
```
209+
210+
Of course, you can easily extend the basic Custom API class and make your own, as well as add your own Entities that perfectly correspond to the returned data. This will all be covered in a tutorial in the near future.
25211

26212
## Testing
27213

214+
Just run PHPUnit in the root folder of the cloned project.
215+
Some calls do require an internet connection (see `tests/Factory/EntityTest`).
216+
28217
```bash
29218
phpunit
30219
```
31220

221+
### Adding Entity tests
222+
223+
**I'll pay $10 for every new set of 5 Entity tests, submissions verified set per set - offer valid until I feel like there's enough use cases covered.** (a.k.a. don't submit 1500 of them at once, I can't pay that in one go).
224+
225+
If you would like to contribute by adding Entity tests, I suggest following this procedure:
226+
227+
1. Pick an API you would like to contribute a test for. E.g., Product API.
228+
2. In a scratchpad like `index.php`, build the URL:
229+
230+
```php
231+
$diffbot = new Diffbot('my_token');
232+
$url = $diffbot
233+
->createProductAPI('http://someurl.com')
234+
->setMeta(true)
235+
->...(insert other config methods here as desired)...
236+
->buildUrl();
237+
echo $url;
238+
```
239+
240+
3. Grab the URL and paste it into a REST client like Postman or into your browser. You'll get Diffbot's response back. Keep it open for reference.
241+
4. Download this response, with headers, into a JSON file. Preferably into `tests/Mocks/Products/[date]/somefilename.json`, like the other tests are. This is easily accomplished by executing `curl -i "[url] > somefilename.json"` in the Terminal/Command Line.
242+
5. Go into the appropriate tests folder. In this case, `tests/Entity` and open `ProductTest.php`. Notice how the file is added into the batch of files to be tested against. Every provider has it referenced, along with the value the method being tested should produce. Slowly go through every test method and add your file. Use the values in the JSON you got in step 3 to get the values.
243+
6. Run `phpunit tests/Entity/ProductTest.php` to test just this file (much faster than entire suite). If OK, send PR :)
244+
245+
If you'd like to create your own Test classes, too, that's fine, no need to extend the ones that are included with the project. Apply the whole process just rather than extending the existing `ProductTest` class make a new one.
246+
247+
### Adding other tests
248+
249+
Other tests don't have specific instructions, contribute as you see fit. Just try to minimize actual remote calls - we're not testing the API itself (a.k.a. Diffbot), we're testing this library. If the library parses values accurately from an inaccurate API response because, for example, Diffbot is currently bugged, that's fine - the library works!
250+
32251
## Contributing
33252

34253
Please see [CONTRIBUTING](CONTRIBUTING.md) for details and [TODO](TODO.md) for ideas.

TODO.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,13 @@ Active todos, ordered by priority
99

1010
## Medium
1111

12-
- write usage example
1312
- add streaming to Crawlbot - make it stream the result (it constantly grows)
1413
- implement Video API (currently beta)
15-
- improve Custom API
16-
- improve Wildcard Entity - apply to Custom API
14+
- add test case with mock for product that has discussion (Amazon?)
1715

1816
## Low
1917

18+
- add more usage examples
2019
- work on PhpDoc consistency ($param type vs type $param)
2120
- get more mock responses and test against them
2221
- write example with custom EntityIterator (different Entity set for different API) and custom Entity (i.e. authorProfile, which parses some of the data and prepares for further use)

src/Abstracts/Entity.php

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,5 +33,7 @@ public function __call($name, $arguments)
3333
$property = lcfirst(substr($name, 3, strlen($name) - 3));
3434
return $this->$property;
3535
}
36+
37+
throw new \BadMethodCallException('No such method: '.$name);
3638
}
3739
}

src/Api/Custom.php

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,18 @@ class Custom extends Api
1111

1212
public function __construct($url, $name)
1313
{
14+
15+
/*
16+
@todo Throw exception for invalid names.
17+
Diffbot HQ will provide regex for invalid chars in API names. Once
18+
done, modify this case to throw exceptions for invalid ones, and write
19+
test cases.
20+
21+
Note that all API names with ? and / in their name currently fail to
22+
execute in the Diffbot test runner, so it's questionable whether they're
23+
even supposed to be supported.
24+
*/
25+
1426
parent::__construct($url);
1527
$this->apiUrl .= '/' . trim($name);
1628
}

src/Entity/Discussion.php

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ public function __construct(array $data)
1717
foreach ($this->data['posts'] as $post) {
1818
$this->posts[] = new Post($post);
1919
}
20+
$this->data['posts'] = $this->posts;
2021
}
2122

2223
/**

src/Entity/EntityIterator.php

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,4 +64,23 @@ public function valid()
6464
{
6565
return ($this->cursor < $this->count());
6666
}
67+
68+
public function __call($name, $args)
69+
{
70+
$isGetter = substr($name, 0, 3) == 'get';
71+
if ($isGetter) {
72+
$property = lcfirst(substr($name, 3, strlen($name) - 3));
73+
74+
return $this->$property;
75+
}
76+
77+
throw new \BadMethodCallException('No such method: ' . $name);
78+
}
79+
80+
public function __get($name)
81+
{
82+
$entity = ($this->cursor == -1) ? $this->data[0] : $this->current();
83+
84+
return $entity->$name;
85+
}
6786
}

tests/Abstracts/EntityIteratorTest.php

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,24 @@ protected function prepareResponses()
3838
return $this->responses;
3939
}
4040

41+
public function testBadMethodCall()
42+
{
43+
$ef = new \Swader\Diffbot\Factory\Entity();
44+
$ei = $ef->createAppropriateIterator($this->prepareResponses()['Images/one_image_zola.json']);
45+
46+
$this->setExpectedException('BadMethodCallException');
47+
$ei->invalidMethodCall();
48+
}
49+
50+
public function testMagic()
51+
{
52+
$ef = new \Swader\Diffbot\Factory\Entity();
53+
$ei = $ef->createAppropriateIterator($this->prepareResponses()['Images/one_image_zola.json']);
54+
55+
$this->assertEquals('image', $ei->type);
56+
$this->assertEquals('image', $ei->getType());
57+
}
58+
4159
public function testCount()
4260
{
4361
$fileExpectations = [

0 commit comments

Comments
 (0)