[WIP] First DataStreams version #13

evetion · 2018-01-10T16:44:41Z

Here is a WIP for implementing DataStreams. Still rough around the edges, but I'd like some feedback for the overall direction/API.

What

DataStreams seems the way to go in the Julia data ecosystem, enabling streaming conversions, for example, between CSV and SQLite, or DataFrames, DataTables. This enables users to easily read LAS files into DataFrames and back, without needing to know any raw header/point information.

Why

This addresses most of the comments from @c42f in #4 for a new API and v0.1

Correct header creation
No raw header, but a Schema (with the raw header in the metadata)
LasIO.Source provides only the header, points on request with stream!, so no tuple (header, points) anymore
LasIO.Source works without FileIO
DataStreams enables, well, streaming 😄

TODO

For using the Source:

Implement adding scale and offset on the fly for XYZ on stream!
Expose CRS if there

For using the Sink

Conversion XYZ Float to Int32 ?
Determine scaling/offset ?
Implement CRS (and thus VLR)

Discussion

For using the Sink we need some discussion. Writing now only works with Source columns that match a LasPoint perfectly. Do we fill these gaps, and thus allow for an invalid point type? And the xyz coordinates will be in Float, which we need to scale/offset. Doing this afterwards is detrimental for performance. I would propose for float input:

User is required to give a bounding box used for scaling/offset. Precision (for scaling) is a keyword argument set by default to 2 (=> scale=0.01). We give errors/warnings when these things overflow during streaming. Creation of a Sink: LasIO.Sink(filename, bbox; precision=2, crs=nothing)

Further improvements can be made to the pointtypes. Since these are hardly used by this implementation (only looking up attributes and types for the Schema creation) we could explode some attributes such as the flag_byte into their individual components for better accessibility. I'm not sure what this would do to the performance though.

Demo

julia> s = LasIO.Source("test/srs.las")
LasIO.Source(Data.Schema:
rows: 10  cols: 10
Columns:
 "x"                   Int32  
 "y"                   Int32  
 "z"                   Int32  
 "intensity"           UInt16 
 "flag_byte"           UInt8  
 "raw_classification"  UInt8  
 "scan_angle"          Int8   
 "user_data"           UInt8  
 "pt_src_id"           UInt16 
 "gps_time"            Float64, LasHeader with 10 points.
, IOStream(<file test/srs.las>), "test/srs.las", 759)

julia> d = Data.stream!(s, DataFrame)
DataFrames.DataFrameStream{Tuple{Array{Int32,1},Array{Int32,1},Array{Int32,1},Array{UInt16,1},Array{UInt8,1},Array{UInt8,1},Array{Int8,1},Array{UInt8,1},Array{UInt16,1},Array{Float64,1}}}((Int32[28981415, 28981464, 28981512, 28981560, 28981608, 28981656, 28981703, 28981753, 28981801, 28981850], Int32[432097861, 432097884, 432097906, 432097928, 432097950, 432097971, 432097992, 432098016, 432098038, 432098059], Int32[17076, 17076, 17075, 17074, 17068, 17066, 17063, 17062, 17061, 17058], UInt16[0x0104, 0x0118, 0x0118, 0x0118, 0x0104, 0x00f0, 0x00f0, 0x0118, 0x0118, 0x0104], UInt8[0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30], UInt8[0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02], Int8[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], UInt8[0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00], UInt16[0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000], [4.99451e5, 4.99451e5, 4.99451e5, 4.99451e5, 4.99451e5, 4.99451e5, 4.99451e5, 4.99451e5, 4.99451e5, 4.99451e5]), String["x", "y", "z", "intensity", "flag_byte", "raw_classification", "scan_angle", "user_data", "pt_src_id", "gps_time"])

julia> Data.close!(d)
10×10 DataFrames.DataFrame
│ Row │ x        │ y         │ z     │ intensity │ flag_byte │ raw_classification │ scan_angle │ user_data │ pt_src_id │ gps_time  │
├─────┼──────────┼───────────┼───────┼───────────┼───────────┼────────────────────┼────────────┼───────────┼───────────┼───────────┤
│ 1   │ 28981415 │ 432097861 │ 17076 │ 0x0104    │ 0x30      │ 0x02               │ 0          │ 0x00      │ 0x0000    │ 4.99451e5 │
│ 2   │ 28981464 │ 432097884 │ 17076 │ 0x0118    │ 0x30      │ 0x02               │ 0          │ 0x00      │ 0x0000    │ 4.99451e5 │
│ 3   │ 28981512 │ 432097906 │ 17075 │ 0x0118    │ 0x30      │ 0x02               │ 0          │ 0x00      │ 0x0000    │ 4.99451e5 │
│ 4   │ 28981560 │ 432097928 │ 17074 │ 0x0118    │ 0x30      │ 0x02               │ 0          │ 0x00      │ 0x0000    │ 4.99451e5 │
│ 5   │ 28981608 │ 432097950 │ 17068 │ 0x0104    │ 0x30      │ 0x02               │ 0          │ 0x00      │ 0x0000    │ 4.99451e5 │
│ 6   │ 28981656 │ 432097971 │ 17066 │ 0x00f0    │ 0x30      │ 0x02               │ 0          │ 0x00      │ 0x0000    │ 4.99451e5 │
│ 7   │ 28981703 │ 432097992 │ 17063 │ 0x00f0    │ 0x30      │ 0x02               │ 0          │ 0x00      │ 0x0000    │ 4.99451e5 │
│ 8   │ 28981753 │ 432098016 │ 17062 │ 0x0118    │ 0x30      │ 0x02               │ 0          │ 0x00      │ 0x0000    │ 4.99451e5 │
│ 9   │ 28981801 │ 432098038 │ 17061 │ 0x0118    │ 0x30      │ 0x02               │ 0          │ 0x00      │ 0x0000    │ 4.99451e5 │
│ 10  │ 28981850 │ 432098059 │ 17058 │ 0x0104    │ 0x30      │ 0x02               │ 0          │ 0x00      │ 0x0000    │ 4.99451e5 │

julia> Data.reset!(s)

julia> d = Data.stream!(s, LasIO.Sink, "test_final.las")
Stream now at 227
LasIO.Sink{LasIO.LasPoint1}(IOStream(<file test_final.las>), LasHeader with 10 points.
, LasIO.LasPoint1)

julia> Data.close!(d)
LasIO.Sink{LasIO.LasPoint1}(IOStream(<file test_final.las>), LasHeader with 10 points.
, LasIO.LasPoint1)

➜ lasinfo test_final.las
lasinfo (170528) report for test_final.las
reporting all LAS header entries:
  file signature:             'LASF'
  file source ID:             0
  global_encoding:            0
  project ID GUID data 1-4:   00000000-0000-0000-2020-202020202020
  version major.minor:        1.0
  system identifier:          'LasIO.jl datastream             '
  generating software:        'LasIO.jl                        '
  file creation day/year:     10/2018
  header size:                227
  offset to point data:       227
  number var. length records: 0
  point data format:          1
  point data record length:   28
  number of point records:    10
  number of points by return: 10 0 0 0 0
  scale factor x y z:         0.01 0.01 0.01
  offset x y z:               0 0 0
  min x y z:                  289814.15 4320978.61 170.58
  max x y z:                  289818.50 4320980.59 170.76
reporting minimum and maximum for all LAS point record entries ...
  X            28981415   28981850
  Y           432097861  432098059
  Z               17058      17076
  intensity         240        280
  return_number       0          0
  number_of_returns   6          6
  edge_of_flight_line 0          0
  scan_direction_flag 0          0
  classification      2          2
  scan_angle_rank     0          0
  user_data           0          0
  point_source_ID     0          0
  gps_time 499450.805994 499450.806120
number of first returns:        10
number of intermediate returns: 0
number of last returns:         0
number of single returns:       0
WARNING: for return 1 real number of points by return (0) is different from header entry (10).
WARNING: there are 10 points with return number 0
overview over number of returns of given pulse: 0 0 0 0 0 10 0
histogram of classification of points:
              10  ground (2)

evetion · 2018-01-10T19:58:07Z

I think this also comes close to the comment at https://github.com/FugroRoames/PointClouds.jl:

Perhaps one day PointCloud can be implemented in terms of an underlying DataFrame [..]

julia> s = LasIO.Source("test/srs.las")
julia> d = Data.stream!(s, DataFrame)
julia> d = Data.close!(d)
julia> d[:intensity]
10-element Array{UInt16,1}:
 0x0104
 0x0118
 0x0118
 0x0118
 0x0104
 0x00f0
 0x00f0
 0x0118
 0x0118
 0x0104

evetion · 2018-01-17T10:51:50Z

Document API changes (add high level interface) by @visr
Fail on non fitting schemas in Sink
Use xcoords() and other existing functions
Add issues for architecture change to transparently rewrite raw fields

visr · 2019-01-15T14:38:11Z

I believe now, a year later, it makes more sense do add support for the new Tables.jl interface instead. Perhaps good to focus on getting in #16 first, and then revisiting this? Since #16 will also affect the API.

evetion · 2019-01-16T21:32:15Z

Not sure about this clashing with #16, these are two separate approaches in my opinion.

visr · 2019-01-16T21:53:15Z

How do you mean two separate approaches? As in we should have one or the other? I thought we can have both right?

In any case it might be good to try to get LAS 1.3 and 1.4 support in first, as it is becoming increasingly common.

[WIP] First DataStreams version without LasPoint initialization.

d56458c

evetion requested review from c42f and visr January 10, 2018 16:44

Header is now updated by streaming points to Sink.

66f8286

Coordinate conversion between Float and Int implemented.

b9ab16f

visr removed request for c42f and visr October 18, 2019 14:46

evetion mentioned this pull request Jun 25, 2020

API for returning scaled and offset point cloud #34

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] First DataStreams version #13

[WIP] First DataStreams version #13

Uh oh!

evetion commented Jan 10, 2018 •

edited

Loading

Uh oh!

evetion commented Jan 10, 2018

Uh oh!

evetion commented Jan 17, 2018

Uh oh!

visr commented Jan 15, 2019

Uh oh!

evetion commented Jan 16, 2019

Uh oh!

visr commented Jan 16, 2019

Uh oh!

Uh oh!

[WIP] First DataStreams version #13

Are you sure you want to change the base?

[WIP] First DataStreams version #13

Uh oh!

Conversation

evetion commented Jan 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

TODO

Discussion

Demo

Uh oh!

evetion commented Jan 10, 2018

Uh oh!

evetion commented Jan 17, 2018

Uh oh!

visr commented Jan 15, 2019

Uh oh!

evetion commented Jan 16, 2019

Uh oh!

visr commented Jan 16, 2019

Uh oh!

Uh oh!

evetion commented Jan 10, 2018 •

edited

Loading