Skip to content

Commit

Permalink
Add OCR to vignette
Browse files Browse the repository at this point in the history
  • Loading branch information
jeroen committed Aug 4, 2017
1 parent b62de12 commit 71c1004
Show file tree
Hide file tree
Showing 3 changed files with 50 additions and 2 deletions.
12 changes: 12 additions & 0 deletions inst/doc/intro.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ dev.off <- function(){
invisible(grDevices::dev.off())
}

has_tesseract <- isTRUE(require(tesseract, quietly = TRUE))

## ------------------------------------------------------------------------
str(magick::magick_config())

Expand Down Expand Up @@ -242,3 +244,13 @@ buf <- as.integer(frink[[1]])
rr <- raster::brick(buf)
raster::plotRGB(rr, asp = 1)

## ----eval=FALSE----------------------------------------------------------
# install.packages("tesseract")

## ---- eval = has_tesseract-----------------------------------------------
img <- image_read("http://jeroen.github.io/images/testocr.png")
print(img)

# Extract text
cat(image_ocr(img))

20 changes: 19 additions & 1 deletion inst/doc/intro.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ vignette: >
%\VignetteEncoding{UTF-8}
---

```{r, echo = FALSE}
```{r, echo = FALSE, }
library(knitr)
"print.magick-image" <- function(x, ...){
ext <- ifelse(length(x), tolower(image_info(x[1])$format), "gif")
Expand All @@ -27,6 +27,8 @@ library(knitr)
dev.off <- function(){
invisible(grDevices::dev.off())
}
has_tesseract <- isTRUE(require(tesseract, quietly = TRUE))
```

The new [magick](https://cran.r-project.org/package=magick) package is an ambitious effort to modernize and simplify high-quality image processing in R. It wraps the [ImageMagick STL](https://www.imagemagick.org/Magick++/STL.html) which is perhaps the most comprehensive open-source image processing library available today.
Expand Down Expand Up @@ -511,3 +513,19 @@ raster::plotRGB(rr, asp = 1)
```

The raster package also does not seem to support transparency, which perhaps makes sense in the context of spatial imaging.

## OCR text extraction

A recent edition to the package is to extract text from images using OCR. This requires the tesseract package:

```{r eval=FALSE}
install.packages("tesseract")
```

```{r, eval = has_tesseract}
img <- image_read("http://jeroen.github.io/images/testocr.png")
print(img)
# Extract text
cat(image_ocr(img))
```
20 changes: 19 additions & 1 deletion vignettes/intro.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ vignette: >
%\VignetteEncoding{UTF-8}
---

```{r, echo = FALSE}
```{r, echo = FALSE, }
library(knitr)
"print.magick-image" <- function(x, ...){
ext <- ifelse(length(x), tolower(image_info(x[1])$format), "gif")
Expand All @@ -27,6 +27,8 @@ library(knitr)
dev.off <- function(){
invisible(grDevices::dev.off())
}
has_tesseract <- isTRUE(require(tesseract, quietly = TRUE))
```

The new [magick](https://cran.r-project.org/package=magick) package is an ambitious effort to modernize and simplify high-quality image processing in R. It wraps the [ImageMagick STL](https://www.imagemagick.org/Magick++/STL.html) which is perhaps the most comprehensive open-source image processing library available today.
Expand Down Expand Up @@ -511,3 +513,19 @@ raster::plotRGB(rr, asp = 1)
```

The raster package also does not seem to support transparency, which perhaps makes sense in the context of spatial imaging.

## OCR text extraction

A recent edition to the package is to extract text from images using OCR. This requires the tesseract package:

```{r eval=FALSE}
install.packages("tesseract")
```

```{r, eval = has_tesseract}
img <- image_read("http://jeroen.github.io/images/testocr.png")
print(img)
# Extract text
cat(image_ocr(img))
```

0 comments on commit 71c1004

Please sign in to comment.