Skip to content

Commit

Permalink
add bj data
Browse files Browse the repository at this point in the history
  • Loading branch information
aptx1231 committed Jun 19, 2023
1 parent 6c16381 commit a2be4be
Show file tree
Hide file tree
Showing 2 changed files with 68 additions and 1 deletion.
63 changes: 63 additions & 0 deletions bj-data-introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Data Introduction

The Beijing data is a dataset collected in Beijing cabs in November 2015, including 1018312 trajectories. We obtained the corresponding road network data from OpenStreetMap and preprocessed the trajectory data to get the Beijing trajectory dataset matched to the road network, and we believed that this dataset could promote the development of urban trajectory mining tasks.

[Download](https://drive.google.com/file/d/1NXHrmh2H5ZTBZCUmYeGSJGQbgl4bjQnt/view)

The statistical information of the data is as follows:

| Dataset | Beijing |
| ----------- | -------------------- |
| Time span | 2015.11.1~2015.11.30 |
| #Trajectory | 1018312 |
| #Usr | 1677 |
| #road/geo | 40306 |
| #edge/rel | 101023 |

The directory structure and data description are as follows:

- `bj_roadmap_edge/`
- `bj_roadmap_edge.geo`: the geo file which stores the road segment information of the road network.
- `geo_id,type,coordinates,highway,lanes,tunnel,bridge,roundabout,oneway,length,maxspeed,u,v`
- `bj_roadmap_edge.rel`: the rel file which stores the adjacent information between road segments.
- `rel_id,type,origin_id,destination_id`
- The format definition follows the [LibCity library](https://bigscity-libcity-docs.readthedocs.io/en/latest/user_guide/data/atomic_files.html).
- `traj_bj_11.csv` is a semicolon split csv file, each line represents data for one trajectory. Specifically, the meaning of each column is as follows:
- `id`, the unique id of the trajectory, which is not consecutive due to data processing.
- `path`, the road segment ID list, each ID represent a `geo_id` in `bj_roadmap_edge.geo`.
- `tlist`, the corresponding timestamp (UTC) list of each road ID in `path`.
- `length`, the routing length of the trajectory, accumulated according to the road length provided by the `geo` file.
- `speed`, the average speed of the trajectory.
- `duration`, the total time from the start to the end of the trajectory.
- `hop`, the number of IDs contained in the `path`, i.e. the number of hops.
- `usr_id`, the ID of the driver of the trajectory.
- `traj_id`, the ID of different trajectories of the same driver, which is not consecutive due to data processing.
- `vflag`, passenger marker, 0 means empty, 1 means carrying passengers.
- `start_time`, the start time of the trajectory.

Please ensure that this data is **used for research purposes only**.

If you use this data, please apply the two papers below, thank you:

```
@inproceedings{START,
title={Self-supervised Trajectory Representation Learning with Temporal Regularities and Travel Semantics},
author={Jiawei Jiang and Dayan Pan and Houxing Ren and Xiaohan Jiang and Chao Li and Jingyuan Wang},
booktitle={2023 IEEE 39th international conference on data engineering (ICDE)},
year={2023},
organization={IEEE}
}
@inproceedings{libcity,
author = {Jingyuan Wang and
Jiawei Jiang and
Wenjun Jiang and
Chao Li and
Wayne Xin Zhao},
title = {LibCity: An Open Library for Traffic Prediction},
booktitle = {{SIGSPATIAL/GIS}},
pages = {145--148},
publisher = {{ACM}},
year = {2021}
}
```
6 changes: 5 additions & 1 deletion readme.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# [ICDE2023] Self-supervised Trajectory Representation Learning with Temporal Regularities and Travel Semantics

**Update 2023/06/20: We released the BJ-data, please see the Data section to obtain it.**

This is a PyTorch implementation of Self-supervised Trajectory Representation Learning with Temporal Regularities and Travel Semantics Framework (**START**) for generic trajectory representation learning, as described in our paper: Jiawei Jiang, Dayan Pan, Houxing Ren, Xiaohan Jiang, Chao Li, Jingyuan Wang, **[Self-supervised Trajectory Representation Learning with Temporal Regularities and Travel Semantics](https://arxiv.org/abs/2211.09510)**, ICDE2023.

![](./framework.png)
Expand All @@ -23,7 +25,9 @@ For example, if you unzip the **Porto** dataset, please make sure your directory

Here `porto_roadmap_edge_porto_True_1_merge/` stores the road network data, and `porto/` stores the trajectory data.

For data privacy, we did not release the BJ data.
~~For data privacy, we did not release the BJ data.~~

We released the Beijing trajectory dataset collected in November 2015, including 1018312 trajectories. We obtained the corresponding road network data from OpenStreetMap and preprocessed the trajectory data to get the Beijing trajectory dataset matched to the road network, and we believed that this dataset could promote the development of urban trajectory mining tasks. Please refer to file [bj-data-introduction.md](./bj-data-introduction.md) for a more detailed data introduction. [Data Download](https://drive.google.com/file/d/1NXHrmh2H5ZTBZCUmYeGSJGQbgl4bjQnt/view)

## Pre-Train

Expand Down

0 comments on commit a2be4be

Please sign in to comment.