forked from cran-task-views/WebTechnologies
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathWebTechnologies.html
2231 lines (2226 loc) · 97.5 KB
/
WebTechnologies.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>CRAN Task View: Web Technologies and Services</title>
<link rel="stylesheet" type="text/css" href="../CRAN_web.css" />
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
<meta name="citation_title" content="CRAN Task View: Web Technologies and Services" />
<meta name="citation_author" content="Thomas Leeper, Scott Chamberlain, Patrick Mair, Karthik Ram, Christopher Gandrud" />
<meta name="citation_publication_date" content="2017-06-16" />
<meta name="citation_public_url" content="https://CRAN.R-project.org/view=WebTechnologies" />
<meta name="DC.title" content="CRAN Task View: Web Technologies and Services" />
<meta name="DC.creator" content="Thomas Leeper, Scott Chamberlain, Patrick Mair, Karthik Ram, Christopher Gandrud" />
<meta name="DC.issued" content="2017-06-16" />
<meta name="DC.identifier" content="https://CRAN.R-project.org/view=WebTechnologies" />
</head>
<body>
<h2>CRAN Task View: Web Technologies and Services</h2>
<table summary="WebTechnologies task view information">
<tr><td valign="top"><b>Maintainer:</b></td><td>Thomas Leeper, Scott Chamberlain, Patrick Mair, Karthik Ram, Christopher Gandrud</td></tr>
<tr><td valign="top"><b>Contact:</b></td><td>thosjleeper at gmail.com</td></tr>
<tr><td valign="top"><b>Version:</b></td><td>2017-06-16</td></tr>
<tr><td valign="top"><b>URL:</b></td><td><a href="https://CRAN.R-project.org/view=WebTechnologies">https://CRAN.R-project.org/view=WebTechnologies</a></td></tr>
</table>
<div>
<p>
This Task View contains information about to use R and the world wide web together. The base version of R does not ship with many tools for interacting with the web. Thankfully, there are an increasingly large number of tools for interacting with the web. This task view focuses on packages for obtaining web-based data and information, frameworks for building web-based R applications, and online services that can be accessed from R. A list of available packages and functions is presented below, grouped by the type of activity. The
<a href="https://github.com/ropensci/opendata">
Open Data Task View
</a>
provides further discussion of online data sources that can be accessed from R.
</p>
<p>
If you have any comments or suggestions for additions or improvements for this Task View, go to GitHub and
<a href="https://github.com/ropensci/webservices/issues">
submit an issue
</a>, or make some changes and
<a href="https://github.com/ropensci/webservices/pulls">
submit a pull request
</a>. If you can't contribute on GitHub,
<a href="mailto:[email protected]">
send Thomas an email
</a>. If you have an issue with one of the packages discussed below, please contact the maintainer of that package. If you know of a web service, API, data source, or other online resource that is not yet supported by an R package, consider adding it to
<a href="https://github.com/ropensci/webservices/wiki/ToDo">
the package development to do list on GitHub
</a>.
</p>
<h2 id="tools-for-working-with-the-web-from-r">
Tools for Working with the Web from R
</h2>
<p>
<strong>
Core Tools For HTTP Requests
</strong>
</p>
<p>
There are two packages that should cover most use cases of interacting with the web from R.
<a href="../packages/httr/index.html">httr</a>
provides a user-friendly interface for executing HTTP methods (GET, POST, PUT, HEAD, DELETE, etc.) and provides support for modern web authentication protocols (OAuth 1.0, OAuth 2.0). HTTP status codes are helpful for debugging HTTP calls. httr makes this easier using, for example,
<tt>stop_for_status()</tt>, which gets the http status code from a response object, and stops the function if the call was not successful. (See also
<tt>warn_for_status()</tt>.) Note that you can pass in additional libcurl options to the
<tt>config</tt>
parameter in http calls.
<a href="../packages/RCurl/index.html">RCurl</a>
is a lower-level package that provides a closer interface between R and the
<a href="https://curl.haxx.se/libcurl/">
libcurl C library
</a>, but is less user-friendly. It may be useful for operations on web-based XML or to perform FTP operations. For more specific situations, the following resources may be useful:
</p>
<ul>
<li>
<a href="../packages/curl/index.html">curl</a>
is another libcurl client that provides the
<tt>curl()</tt>
function as an SSL-compatible replacement for base R's
<tt>url()</tt>
and support for http 2.0, ssl (https, ftps), gzip, deflate and more. For websites serving insecure HTTP (i.e. using the "http" not "https" prefix), most R functions can extract data directly, including
<tt>read.table</tt>
and
<tt>read.csv</tt>; this also applies to functions in add-on packages such as
<tt>jsonlite::fromJSON()</tt>
and
<tt>XML::parseXML</tt>.
<a href="../packages/httpRequest/index.html">httpRequest</a>
is another low-level package for HTTP requests that implements the GET, POST and multipart POST verbs.
<a href="../packages/crul/index.html">crul</a>
(
<a href="https://github.com/ropenscilabs/crul">
GitHub
</a>) is an R6-based curl interface.
</li>
<li>
<a href="https://github.com/hrbrmstr/curlconverter">
curlconverter
</a>
(not on CRAN) is a useful package for converting curl command-line code, for example from a browser developer's console, into R code.
</li>
<li>
<a href="../packages/request/index.html">request</a>
(
<a href="https://github.com/sckott/request">
GitHub
</a>) provides a high-level package that is useful for developing other API client packages.
<a href="../packages/httping/index.html">httping</a>
(
<a href="https://github.com/sckott/httping">
GitHub
</a>) provides simplified tools to ping and time HTTP requests, around httr calls.
<a href="../packages/httpcache/index.html">httpcache</a>
(
<a href="https://github.com/nealrichardson/httpcache">
GitHub
</a>) provides a mechanism for caching HTTP requests.
</li>
<li>
For dynamically generated webpages (i.e., those requiring user interaction to display results),
<a href="../packages/RSelenium/index.html">RSelenium</a>
(
<a href="https://github.com/ropensci/RSelenium/">
GitHub
</a>) can be used to automate those interactions and extract page contents. It provides a set of bindings for the Selenium 2.0 webdriver using the
<a href="https://github.com/seleniumhq/selenium-google-code-issue-archive">
JsonWireProtocol
</a>. It can also aid in automated application testing, load testing, and web scraping.
<a href="../packages/seleniumPipes/index.html">seleniumPipes</a>
(
<a href="https://github.com/johndharrison/seleniumPipes">
GitHub
</a>) provides a "pipe"-oriented interface to the same.
<a href="https://github.com/cpsievert/rdom">
rdom
</a>
(not on CRAN) uses
<a href="http://phantomjs.org/">
phantomjs
</a>
to access a webpage's Document Object Model (DOM).
</li>
<li>
Another, higher-level alternative package useful for webscraping is
<a href="../packages/rvest/index.html">rvest</a>
(
<a href="https://github.com/hadley/rvest">
GitHub
</a>), which is designed to work with
<a href="../packages/magrittr/index.html">magrittr</a>
to make it easy to express common web scraping tasks.
</li>
<li>
Many base R tools can be used to download web content, provided that the website does not use SSL (i.e., the URL does not have the "https" prefix).
<tt>download.file()</tt>
is a general purpose function that can be used to download a remote file. For SSL, the
<tt>download()</tt>
function in
<a href="../packages/downloader/index.html">downloader</a>
wraps
<tt>download.file()</tt>, and takes all the same arguments.
</li>
<li>
Tabular data sets (e.g., txt, csv, etc.) can be input using
<tt>read.table()</tt>,
<tt>read.csv()</tt>, and friends, again assuming that the files are not hosted via SSL. An alternative is to use
<tt>httr::GET</tt>
(or
<tt>RCurl::getURL</tt>) to first read the file into R as a character vector before parsing with
<tt>read.table(text=...)</tt>, or you can download the file to a local directory.
<a href="../packages/rio/index.html">rio</a>
(
<a href="https://github.com/leeper/rio">
GitHub
</a>) provides an
<tt>import()</tt>
function that can read a number of common data formats directly from an https:// URL. The
<a href="../packages/repmis/index.html">repmis</a>
function
<tt>source_data()</tt>
can load and cache plain-text data from a URL (either http or https). That package also includes
<tt>source_Dropbox()</tt>
for downloading/caching plain-text data from non-public Dropbox folders and
<tt>source_XlsxData()</tt>
for downloading/caching Excel xlsx sheets.
</li>
<li>
<em>
Authentication
</em>: Using web resources can require authentication, either via API keys, OAuth, username:password combination, or via other means. Additionally, sometimes web resources that require authentication be in the header of an http call, which requires a little bit of extra work. API keys and username:password combos can be combined within a url for a call to a web resource (api key: http://api.foo.org/?key=yourkey; user/pass: http://username:[email protected]), or can be specified via commands in
<a href="../packages/RCurl/index.html">RCurl</a>
or
<a href="../packages/httr/index.html">httr</a>. OAuth is the most complicated authentication process, and can be most easily done using
<a href="../packages/httr/index.html">httr</a>. See the 6 demos within
<a href="../packages/httr/index.html">httr</a>, three for OAuth 1.0 (linkedin, twitter, vimeo) and three for OAuth 2.0 (facebook, GitHub, google).
<a href="../packages/ROAuth/index.html">ROAuth</a>
is a package that provides a separate R interface to OAuth. OAuth is easier to to do in
<a href="../packages/httr/index.html">httr</a>, so start there.
<a href="https://github.com/MarkEdmondson1234/googleAuthR">
googleAuthR
</a>
provides an OAuth 2.0 setup specifically for Google web services.
</li>
</ul>
<p>
<strong>
Parsing Structured Web Data
</strong>
</p>
<p>
The vast majority of web-based data is structured as plain text, HTML, XML, or JSON (javascript object notation). Web service APIs increasingly rely on JSON, but XML is still prevalent in many applications. There are several packages for specifically working with these format. These functions can be used to interact directly with insecure webpages or can be used to parse locally stored or in-memory web files.
</p>
<ul>
<li>
<em>
XML
</em>: There are two packages for working with XML:
<a href="../packages/XML/index.html">XML</a>
and
<a href="../packages/xml2/index.html">xml2</a>
(
<a href="https://github.com/hadley/xml2">
GitHub
</a>). Both support general XML (and HTML) parsing, including XPath queries. The package
<a href="../packages/xml2/index.html">xml2</a>
is less fully featured, but more user friendly with respect to memory management, classes (e.g., XML node vs. node set vs. document), and namespaces. Of the two, only the
<a href="../packages/XML/index.html">XML</a>
supports
<em>
de novo
</em>
creation of XML nodes and documents. The
<a href="../packages/XML2R/index.html">XML2R</a>
(
<a href="https://github.com/cpsievert/XML2R">
GitHub
</a>) package is a collection of convenient functions for coercing XML into data frames. An alternative to
<a href="../packages/XML/index.html">XML</a>
is
<a href="https://sjp.co.nz/projects/selectr/">
selectr
</a>, which parses CSS3 Selectors and translates them to XPath 1.0 expressions.
<a href="../packages/XML/index.html">XML</a>
package is often used for parsing xml and html, but selectr translates CSS selectors to XPath, so can use the CSS selectors instead of XPath.
</li>
<li>
<em>
HTML
</em>: All of the tools that work with XML also work for HTML, though HTML is - in practice - more prone to be malformed. Some tools are designed specifically to work with HTML.
<tt>xml2::read_html()</tt>
is a good first function to use for importing HTML.
<a href="../packages/htmltab/index.html">htmltab</a>
(
<a href="https://github.com/crubba/htmltab">
GitHub
</a>) extracts structured information from HTML tables, similar to
<tt>XML::readHTMLTable</tt>
of the
<a href="../packages/XML/index.html">XML</a>
package, but automatically expands row and column spans in the header and body cells, and users are given more control over the identification of header and body rows which will end up in the R table. The
<a href="http://selectorgadget.com/">
selectorgadget browser extension
</a>
can be used to identify page elements.
<a href="http://www.Omegahat.org/RHTMLForms/"><span class="Ohat">RHTMLForms</span></a>
reads HTML documents and obtains a description of each of the forms it contains, along with the different elements and hidden fields.
<a href="../packages/scrapeR/index.html">scrapeR</a>
provides additional tools for scraping data from HTML documents.
<a href="../packages/htmltidy/index.html">htmltidy</a>
(
<a href="https://github.com/hrbrmstr/htmltidy">
GitHub
</a>) provides tools to "tidy" messy HTML documents.
</li>
<li>
<em>
JSON
</em>: There are several packages for reading and writing JSON:
<a href="../packages/rjson/index.html">rjson</a>,
<a href="../packages/RJSONIO/index.html">RJSONIO</a>, and
<a href="../packages/jsonlite/index.html">jsonlite</a>.
<a href="../packages/jsonlite/index.html">jsonlite</a>
includes a different parser from
<a href="../packages/RJSONIO/index.html">RJSONIO</a>
called
<a href="https://lloyd.github.io/yajl/">
yajl
</a>. We recommend using
<a href="../packages/jsonlite/index.html">jsonlite</a>. Check out the paper describing jsonlite by Jeroen Ooms
<a href="https://arxiv.org/abs/1403.2805" class="uri">
https://arxiv.org/abs/1403.2805
</a>.
<a href="../packages/tidyjson/index.html">tidyjson</a>
(
<a href="https://github.com/sailthru/tidyjson">
GitHub
</a>) converts JSON into a data.frame.
<a href="../packages/jqr/index.html">jqr</a>
provides bindings for the fast JSON library,
<a href="http://stedolan.github.io/jq/">
jq
</a>.
<a href="../packages/jsonvalidate/index.html">jsonvalidate</a>
(
<a href="https://github.com/ropenscilabs/jsonvalidate">
GitHub
</a>) validates JSON against a schema using the "is-my-json-valid" Node.js library;
<a href="../packages/validatejsonr/index.html">validatejsonr</a>
does the same using the RapidJSON C++ library.
<a href="../packages/ndjson/index.html">ndjson</a>
(
<a href="https://gitlab.com/hrbrmstr/ndjson">
GitHub
</a>) supports the "ndjson" format.
</li>
<li>
<em>
RSS/Atom
</em>:
<a href="../packages/feedeR/index.html">feedeR</a>
(
<a href="https://github.com/DataWookie/feedeR">
GitHub
</a>) can be used to parse RSS or Atom feeds.
</li>
<li>
<a href="https://github.com/hrbrmstr/swagger">
swagger
</a>
(not on CRAN) can be used to automatically generate functions for working with an web service API that provides documentation in
<a href="http://swagger.io/">
Swagger.io
</a>
format.
</li>
</ul>
<p>
<strong>
Tools for Working with URLs
</strong>
</p>
<ul>
<li>
The
<tt>httr::parse_url()</tt>
function can be used to extract portions of a URL. The
<tt>RCurl::URLencode()</tt>
and
<tt>utils::URLencode()</tt>
functions can be used to encode character strings for use in URLs.
<tt>utils::URLdecode()</tt>
decodes back to the original strings.
<a href="../packages/urltools/index.html">urltools</a>
(
<a href="https://github.com/Ironholds/urltools">
GitHub
</a>) can also handle URL encoding, decoding, parsing, and parameter extraction.
</li>
<li>
The
<a href="https://github.com/jayjacobs/tldextract">
tldextract
</a>
package extract top level domains and subdomains from a host name. It's a port of
<a href="https://github.com/john-kurkowski/tldextract">
a Python library of the same name
</a>.
</li>
<li>
<a href="https://github.com/hrbrmstr/iptools">
iptools
</a>
can facilitate working with IPv4 addresses, including for use in geolocation.
</li>
<li>
<a href="../packages/urlshorteneR/index.html">urlshorteneR</a>
(
<a href="https://github.com/dmpe/urlshorteneR">
GitHub
</a>) offers URL expansion and analysis for Bit.ly, Goo.gl, and is.gd.
<a href="../packages/longurl/index.html">longurl</a>
uses the
<a href="http://longurl.org/">
longurl.org
</a>
API to provide similar functionality.
</li>
<li>
<a href="../packages/gdns/index.html">gdns</a>
(
<a href="https://github.com/hrbrmstr/gdns">
GitHub
</a>) provides access to Google's secure HTTP-based DNS resolution service.
</li>
</ul>
<p>
<strong>
Tools for Working with Scraped Webpage Contents
</strong>
</p>
<ul>
<li>
Several packages can be used for parsing HTML documents.
<a href="../packages/boilerpipeR/index.html">boilerpipeR</a>
provides generic extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe Java library.
<a href="http://www.Omegahat.org/RTidyHTML/"><span class="Ohat">RTidyHTML</span></a>
interfaces to the libtidy library for correcting HTML documents that are not well-formed. This library corrects common errors in HTML documents.
<a href="../packages/W3CMarkupValidator/index.html">W3CMarkupValidator</a>
provides an R Interface to W3C Markup Validation Services for validating HTML documents.
</li>
<li>
For XML documents, the
<a href="http://www.Omegahat.org/XMLSchema/"><span class="Ohat">XMLSchema</span></a>
package provides facilities in R for reading XML schema documents and processing them to create definitions for R classes and functions for converting XML nodes to instances of those classes. It provides the framework for meta-computing with XML schema in R.
<a href="https://github.com/hrbrmstr/xslt">
xslt
</a>
is a package providing an interface to the
<a href="http://vslavik.github.io/xmlwrapp/">
xmlwrapp
</a>
an XML processing library that provides an XSLT engine for transforming XML data using a transform stylesheet. (It can be seen as a modern replacement for
<a href="http://www.Omegahat.org/Sxslt/"><span class="Ohat">Sxslt</span></a>, which is an interface to Dan Veillard's libxslt translator, and the
<a href="http://www.Omegahat.org/SXalan/"><span class="Ohat">SXalan</span></a>
package.) This may be useful for webscraping, as well as transforming XML markup into another human- or machine-readable format (e.g., HTML, JSON, plain text, etc.).
<a href="http://www.Omegahat.org/SSOAP/"><span class="Ohat">SSOAP</span></a>
provides a client-side SOAP (Simple Object Access Protocol) mechanism. It aims to provide a high-level interface to invoke SOAP methods provided by a SOAP server.
<a href="http://www.Omegahat.org/XMLRPC/"><span class="Ohat">XMLRPC</span></a>
provides an implementation of XML-RPC, a relatively simple remote procedure call mechanism that uses HTTP and XML. This can be used for communicating between processes on a single machine or for accessing Web services from within R.
</li>
<li>
<a href="http://www.Omegahat.org/Rcompression/"><span class="Ohat">Rcompression</span></a>
(not on CRAN): Interface to zlib and bzip2 libraries for performing in-memory compression and decompression in R. This is useful when receiving or sending contents to remote servers, e.g. Web services, HTTP requests via RCurl.
</li>
<li>
<a href="../packages/tm.plugin.webmining/index.html">tm.plugin.webmining</a>: Extensible text retrieval framework for news feeds in XML (RSS, ATOM) and JSON formats. Currently, the following feeds are implemented: Google Blog Search, Google Finance, Google News, NYTimes Article Search, Reuters News Feed, Yahoo Finance and Yahoo Inplay.
</li>
<li>
<a href="../packages/webshot/index.html">webshot</a>
uses
<a href="http://phantomjs.org/">
PhantomJS
</a>
to provide screenshots of web pages without a browser. It can be useful for testing websites (such as Shiny applications).
</li>
</ul>
<p>
<strong>
Other Useful Packages and Functions
</strong>
</p>
<ul>
<li>
<em>
Javascript
</em>:
<a href="../packages/V8/index.html">V8</a>
(
<a href="https://github.com/jeroenooms/v8">
GitHub
</a>) is an R interface to Google's open source, high performance JavaScript engine. It can wrap Javascript libraries as well as NPM packages. The
<a href="http://www.Omegahat.org/SpiderMonkey/"><span class="Ohat">SpiderMonkey</span></a>
package provides another means of evaluating JavaScript code, creating JavaScript objects and calling JavaScript functions and methods from within R. This can work by embedding the JavaScript engine within an R session or by embedding R in an browser such as Firefox and being able to call R from JavaScript and call back to JavaScript from R.
</li>
<li>
<em>
Email:
</em>:
<a href="../packages/mailR/index.html">mailR</a>
is an interface to Apache Commons Email to send emails from within R.
<a href="../packages/sendmailR/index.html">sendmailR</a>
provides a simple SMTP client.
<a href="../packages/gmailr/index.html">gmailr</a>
provides access the Google's gmail.com RESTful API.
</li>
<li>
<em>
Miscellaneous
</em>:
<a href="../packages/webutils/index.html">webutils</a>
(
<a href="https://github.com/jeroenooms/webutils">
GitHub
</a>) contains various functions for developing web applications, including parsers for
<tt>application/x-www-form-urlencoded</tt>
as well as
<tt>multipart/form-data</tt>.
<a href="../packages/mime/index.html">mime</a>
(
<a href="https://github.com/yihui/mime">
GitHub
</a>) guesses the MIME type for a file from its extension.
<a href="../packages/rsdmx/index.html">rsdmx</a>
(
<a href="https://github.com/opensdmx/rsdmx/wiki">
GitHub
</a>) provides tools to read data and metadata documents exchanged through the Statistical Data and Metadata Exchange (SDMX) framework. The package currently focuses on the SDMX XML standard format (SDMX-ML).
<a href="https://github.com/ropenscilabs/robotstxt">
robotstxt
</a>
(not on CRAN) provides R6 classes for parsing and checking robots.txt files.
<a href="../packages/uaparserjs/index.html">uaparserjs</a>
(
<a href="http://github.com/hrbrmstr/uaparserjs">
GitHub
</a>) uses the javascript
<a href="https://github.com/ua-parser">
"ua-parser" library
</a>
to parse User-Agent HTTP headers.
</li>
</ul>
<h2 id="web-and-server-frameworks">
Web and Server Frameworks
</h2>
<ul>
<li>
<a href="https://msdn.microsoft.com/en-us/microsoft-r/deployr-welcome">
DeployR
</a>
is part of Microsoft R Server that provides support for integrating R as an application and website backend.
</li>
<li>
The
<a href="../packages/shiny/index.html">shiny</a>
package makes it easy to build interactive web applications with R.
</li>
<li>
Other web frameworks include:
<a href="../packages/fiery/index.html">fiery</a>
(
<a href="https://github.com/thomasp85/fiery">
GitHub
</a>) that is meant to be more flexible but less easy to use than shiny;
<a href="https://github.com/nteetor/prairie">
prairie
</a>
(not on CRAN) which is a lightweight web framework that uses magrittr-style syntax and is modeled after
<a href="http://expressjs.com/">
expressjs
</a>;
<a href="https://github.com/att/rcloud">
rcloud
</a>
(not on CRAN) which provides an iPython notebook-style web-based R interface; and
<a href="../packages/Rook/index.html">Rook</a>, which contains the specification and convenience software for building and running Rook applications.
</li>
<li>
The
<a href="../packages/opencpu/index.html">opencpu</a>
framework for embedded statistical computation and reproducible research exposes a web API interfacing R, LaTeX and Pandoc. This API is used for example to integrate statistical functionality into systems, share and execute scripts or reports on centralized servers, and build R based apps.
</li>
<li>
Several general purpose server/client frameworks for R exist.
<a href="../packages/Rserve/index.html">Rserve</a>
and
<a href="../packages/RSclient/index.html">RSclient</a>
provide server and client functionality for TCP/IP or local socket interfaces.
<a href="../packages/httpuv/index.html">httpuv</a>
provides a low-level socket and protocol support for handling HTTP and WebSocket requests directly within R. Another related package, perhaps which
<a href="../packages/httpuv/index.html">httpuv</a>
replaces, is
<a href="https://cran.rstudio.com/src/contrib/Archive/websockets/">
websockets
</a>.
<a href="../packages/servr/index.html">servr</a>
provides a simple HTTP server to serve files under a given directory based on httpuv.
</li>
<li>
Several packages offer functionality for turning R code into a web API.
<a href="../packages/jug/index.html">jug</a>
is a simple API-builder web framework, built around
<a href="../packages/httpuv/index.html">httpuv</a>.
<a href="../packages/FastRWeb/index.html">FastRWeb</a>
provides some basic infrastructure for this.
<a href="../packages/plumber/index.html">plumber</a>
allows you to create a REST API by decorating existing R source code.
</li>
<li>
The
<a href="http://www.Omegahat.org/WADL/"><span class="Ohat">WADL</span></a>
package provides tools to process Web Application Description Language (WADL) documents and to programmatically generate R functions to interface to the REST methods described in those WADL documents. (not on CRAN)
</li>
<li>
The
<a href="http://www.Omegahat.org/RDCOMServer/"><span class="Ohat">RDCOMServer</span></a>
provides a mechanism to export R objects as (D)COM objects in Windows. It can be used along with the
<a href="http://www.Omegahat.org/RDCOMClient/"><span class="Ohat">RDCOMClient</span></a>
package which provides user-level access from R to other COM servers. (not on CRAN)
</li>
<li>
<a href="http://rapporter.net/welcome/en">
rapporter.net
</a>
provides an online environment (SaaS) to host and run
<a href="../packages/rapport/index.html">rapport</a>
statistical report templates in the cloud.
</li>
<li>
<a href="../packages/radiant/index.html">radiant</a>
(
<a href="https://github.com/radiant-rstats/radiant">
GitHub
</a>) is Shiny-based GUI for R that runs in a browser from a server or local machine.
</li>
<li>
<a href="https://github.com/seankross/neocities">
neocities
</a>
wraps the API for the
<a href="https://neocities.org/">
Neocities
</a>
web hosting service. (not on CRAN)
</li>
<li>
The
<a href="https://r.tiki.org/tiki-index.php">
Tiki
</a>
Wiki CMS/Groupware framework has an R plugin (
<a href="https://doc.tiki.org/PluginR">
PluginR
</a>) to run R code from wiki pages, and use data from their own collected web databases (trackers). A demo:
<a href="https://r.tiki.org/tiki-index.php">
http://r.tiki.org/
</a>. More info in a
<a href="http://ueb.vhir.org/2011+UseR">
useR!2013 presentation
</a>.
</li>
<li>
The
<a href="https://www.mediawiki.org/wiki/MediaWiki">
MediaWiki
</a>
has an extension (
<a href="https://www.mediawiki.org/wiki/Extension:R">
Extension:R
</a>) to run R code from wiki pages, and use uploaded data. A mailing list is available:
<a href="https://stat.ethz.ch/mailman/listinfo/r-sig-mediawiki">
R-sig-mediawiki
</a>.
</li>
<li>
<a href="../packages/whisker/index.html">whisker</a>: Implementation of logicless templating based on
<a href="http://mustache.github.io/">
Mustache
</a>
in R. Mustache syntax is described in
<a href="http://mustache.github.io/mustache.5.html" class="uri">
http://mustache.github.io/mustache.5.html
</a>
</li>
<li>
<a href="http://www.Omegahat.org/CGIwithR/"><span class="Ohat">CGIwithR</span></a>
(not on CRAN) allows one to use R scripts as CGI programs for generating dynamic Web content. HTML forms and other mechanisms to submit dynamic requests can be used to provide input to R scripts via the Web to create content that is determined within that R script.
</li>
</ul>
<h2 id="web-services">
Web Services
</h2>
<p>
<strong>
Cloud Computing and Storage
</strong>
</p>
<ul>
<li>
Amazon Web Services is a popular, proprietary cloud service offering a suite of computing, storage, and infrastructure tools.
<a href="../packages/aws.signature/index.html">aws.signature</a>
provides functionality for generating AWS API request signatures.
<ul>
<li>
<em>
Simple Storage Service (S3)
</em>
is a commercial server that allows one to store content and retrieve it from any machine connected to the Internet.
<a href="http://www.Omegahat.org/RAmazonS3/"><span class="Ohat">RAmazonS3</span></a>
and
<a href="https://github.com/robertzk/s3mpi">
s3mpi
</a>
(not on CRAN) provides basic infrastructure for communicating with S3.
<a href="https://cran.rstudio.com/src/contrib/Archive/AWS.tools/">
AWS.tools
</a>
(
<a href="https://github.com/armstrtw/AWS.tools">
GitHub
</a>) interacts with S3 and EC2 using the AWS command line interface (an external system dependency). The CRAN version is archived.
<a href="https://github.com/lalas/awsConnect">
awsConnect
</a>
(not on CRAN) is another package using the AWS Command Line Interface to control EC2 and S3, which is only available for Linux and Mac OS.
</li>
<li>
<em>
Elastic Cloud Compute (EC2)
</em>
is a cloud computing service. AWS.tools and
<a href="https://github.com/lalas/awsConnect">
awsConnect
</a>
(not on CRAN) both use the AWS command line interface to control EC2.
<a href="http://code.google.com/p/segue/"><span class="Gcode">segue</span></a>
(not on CRAN) is another package for managing EC2 instances and S3 storage, which includes a parallel version of
<tt>lapply()</tt>
for the Elastic Map Reduce (EMR) engine called
<tt>emrlapply()</tt>. It uses Hadoop Streaming on Amazon's EMR in order to get simple parallel computation.
</li>
<li>
<em>
DBREST
</em>:
<a href="http://www.Omegahat.org/RAmazonDBREST/"><span class="Ohat">RAmazonDBREST</span></a>
provides an interface to Amazon's Simple DB API.
</li>
<li>
<a href="https://cloudyr.github.io/">
The cloudyr project
</a>, which is currently under active development on GitHub, aims to provide a unified interface to the full Amazon Web Services suite without the need for external system dependencies.
</li>
</ul>
</li>
<li>
<em>
Cloud Storage
</em>:
<a href="../packages/googleCloudStorageR/index.html">googleCloudStorageR</a>
interfaces with Google Cloud Storage.
<a href="../packages/boxr/index.html">boxr</a>
(
<a href="https://github.com/brendan-R/boxr">
GitHub
</a>) is a lightweight, high-level interface for the
<a href="https://docs.box.com/docs/">
box.com API
</a>.
<a href="https://github.com/karthik/rdrop2">
rDrop2
</a>
(
<a href="https://github.com/karthik/rdrop2">
GitHub
</a>; not on CRAN) is a Dropbox interface that provides access to a full suite of file operations, including dir/copy/move/delete operations, account information (including quotas) and the ability to upload and download files from any Dropbox account.
<a href="../packages/backblazer/index.html">backblazer</a>
(
<a href="https://github.com/phillc73/backblazer">
GitHub
</a>) provides access to the Backblaze B2 storage API.
</li>
<li>
<em>
Docker
</em>:
<a href="https://github.com/sckott/analogsea">
analogsea
</a>
is a general purpose client for the Digital Ocean v2 API. In addition, the package includes functions to install various R tools including base R, RStudio server, and more. There's an improving interface to interact with docker on your remote droplets via this package.
</li>
<li>
<a href="https://github.com/Crunch-io/rcrunch">
rcrunch
</a>
(not on CRAN) provides an interface to
<a href="http://crunch.io/">
crunch.io
</a>
storage and analytics.
</li>
<li>
<a href="https://github.com/vpnagraj/rrefine">
rrefine
</a>
(not on CRAN) provides a client for the
<a href="http://openrefine.org/">
OpenRefine
</a>
(formerly Google Refine) data cleaning service.
</li>
</ul>
<p>
<strong>
Document and Code Sharing
</strong>
</p>
<ul>
<li>
<em>
Code Sharing
</em>:
<a href="../packages/gistr/index.html">gistr</a>
(
<a href="https://github.com/ropensci/gistr">
GitHub
</a>) works with GitHub gists (
<a href="https://gist.github.com/">
gist.github.com
</a>) from R, allowing you to create new gists, update gists with new files, rename files, delete files, get and delete gists, star and un-star gists, fork gists, open a gist in your default browser, get embed code for a gist, list gist commits, and get rate limit information when authenticated.
<a href="../packages/git2r/index.html">git2r</a>
provides bindings to the git version control system and
<a href="https://github.com/cscheid/rgithub">
rgithub
</a>
(not on CRAN) provides access to the GitHub.com API, both of which can facilitate code or data sharing via GitHub.
<a href="../packages/gitlabr/index.html">gitlabr</a>
is a
<a href="https://about.gitlab.com/">
GitLab
</a>
-specific client.
</li>
<li>
<em>
Data archiving
</em>:
<a href="../packages/dvn/index.html">dvn</a>
(
<a href="https://github.com/ropensci/dvn">
GitHub
</a>) provides access to The Dataverse Network API.
<a href="../packages/rfigshare/index.html">rfigshare</a>
(
<a href="https://github.com/ropensci/rfigshare">
GitHub
</a>) connects with
<a href="https://figshare.com/">
Figshare.com
</a>.
<a href="https://cran.rstudio.com/src/contrib/Archive/dataone/">
dataone
</a>
provides read/write access to data and metadata from the
<a href="https://www.dataone.org/">
DataONE network
</a>
of Member Node data repositories.
<a href="../packages/dataone/index.html">dataone</a>
(
<a href="https://github.com/DataONEorg/rdataone">
GitHub
</a>) provides a client for
<a href="https://www.dataone.org">
DataONE
</a>
repositories.
</li>
<li>
<em>
Google Drive/Google Documents
</em>:
<a href="https://github.com/noamross/driver">
driver
</a>
(not on CRAN) is a thin client for the Google Drive API. The
<a href="http://www.Omegahat.org/RGoogleDocs/"><span class="Ohat">RGoogleDocs</span></a>
package is an example of using the RCurl and XML packages to quickly develop an interface to the Google Documents API.
<a href="http://www.Omegahat.org/RGoogleStorage/"><span class="Ohat">RGoogleStorage</span></a>
provides programmatic access to the Google Storage API. This allows R users to access and store data on Google's storage. We can upload and download content, create, list and delete folders/buckets, and set access control permissions on objects and buckets.
</li>
<li>
<em>
Google Sheets
</em>:
<a href="../packages/googlesheets/index.html">googlesheets</a>
(
<a href="https://github.com/jennybc/googlesheets">
GitHub
</a>) can access private or public Google Sheets by title, key, or URL. Extract data or edit data. Create, delete, rename, copy, upload, or download spreadsheets and worksheets.
<a href="../packages/gsheet/index.html">gsheet</a>
(
<a href="https://github.com/maxconway/gsheet">
GitHub
</a>) can download Google Sheets using just the sharing link. Spreadsheets can be downloaded as a data frame, or as plain text to parse manually.
</li>
<li>
<a href="../packages/imguR/index.html">imguR</a>
(
<a href="https://github.com/cloudyr/imguR">
GitHub
</a>) is a package to share plots using the image hosting service
<a href="http://imgur.com/">
Imgur.com
</a>. knitr also has a function
<tt>imgur_upload()</tt>
to load images from literate programming documents.
</li>
<li>
<a href="https://github.com/cloudyr/rscribd">
rscribd
</a>
(not on CRAN): API client for publishing documents to
<a href="https://www.scribd.com/">
Scribd
</a>.
</li>
</ul>
<p>
<strong>
Data Analysis and Processing Services
</strong>
</p>
<ul>
<li>
<em>
Crowdsourcing
</em>: Amazon Mechanical Turk is a paid crowdsourcing platform that can be used to semi-automate tasks that are not easily automated.
<a href="../packages/MTurkR/index.html">MTurkR</a>
(
<a href="https://github.com/cloudyr/MTurkR">
GitHub
</a>)) provides access to the Amazon Mechanical Turk Requester API.
<a href="https://github.com/cloudyr/microworkers">
microworkers
</a>
(not on CRAN) can distribute tasks and retrieve results for the Microworkers.com platform.
</li>
<li>
<em>
Geolocation/Geocoding
</em>: Several packages connect to geolocation/geocoding services.
<a href="../packages/rgeolocate/index.html">rgeolocate</a>
(
<a href="https://github.com/ironholds/rgeolocate">
GitHub
</a>) offers several online and offline tools.
<a href="https://github.com/trestletech/rydn">
rydn
</a>
(not on CRAN) is an interface to the Yahoo Developers network geolocation APIs,
<a href="https://github.com/corynissen/geocodeHERE">
geocodeHERE
</a>
(not on CRAN, a wrapper for Nokia's
<a href="https://maps.here.com/">
HERE
</a>
geocoding API), and
<a href="https://github.com/hrbrmstr/ipapi">
ipapi
</a>
(
<a href="https://github.com/hrbrmstr/ipapi">
GitHub
</a>) can be used to geolocate IPv4/6 addresses and/or domain names using the
<a href="http://ip-api.com/">
ip-api.com
</a>
API.
<a href="../packages/threewords/index.html">threewords</a>
connects to the
<a href="http://what3words.com/">
What3Words API
</a>, which represents every 3-meter by 3-meter square on earth as a three-word phrase.
<a href="../packages/opencage/index.html">opencage</a>
(
<a href="https://github.com/ropenscilabs/opencage">
GitHub
</a>) provides access to to the
<a href="https://geocoder.opencagedata.com/">
OpenCage
</a>
geocoding service.
<a href="../packages/geoparser/index.html">geoparser</a>
(
<a href="https://github.com/ropenscilabs/geoparser">
GitHub
</a>) interfaces with the
<a href="https://geoparser.io/">
Geoparser.io
</a>
web service to identify place names from plain text.
<a href="https://github.com/hrbrmstr/nominatim">
nominatim
</a>
(not on CRAN) connects to the
<a href="https://github.com/hrbrmstr/nominatim">
OpenStreetMap Nominatim API
</a>
for reverse geocoding.
<a href="https://github.com/ropenscilabs/rgeospatialquality">
rgeospatialquality
</a>
(not on CRAN) provides bindings for the geospatial quality API.
<a href="https://github.com/erzk/PostcodesioR">
PostcodesioR
</a>
(not on CRAN) provides post code lookup and geocoding for the United Kingdom.
</li>
<li>
<em>
Image Processing
</em>:
<a href="https://github.com/cloudyr/RoogleVision">
RoogleVision
</a>
(not on CRAN) links to the Google Cloud Vision image recognition service.
</li>
<li>
<em>
Machine Learning as a Service
</em>: Several packages provide access to cloud-based machine learning services.