Skip to content

Commit 3e0683c

Browse files
authored
Merge pull request #1 from googlemaps/Sarthak_refactoring
refactoring
2 parents 304e223 + 7ccb7dc commit 3e0683c

25 files changed

+717
-56529
lines changed

.gitignore

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -132,13 +132,19 @@ dmypy.json
132132
#ignore the shelve file to commit
133133
*.db
134134

135+
#ignore all csv files to commit
136+
*.csv
137+
135138
# ignore run files
136139
addresses
137140
.addresses
138141
output.json
142+
output.csv
139143
api-key.js
144+
duplicationReport.csv
145+
only_addresses.csv
140146

141147
#other
142148
/.vscode
143149
.DS_Store
144-
*.code-workspace
150+
*.code-workspace

README.md

Lines changed: 67 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ This program is a wrapper around [Address Validation API](https://developers.goo
77

88
![High-Level-overview](/doc_images/High-Volume-Address-Validation-overview.png)
99

10-
The program takes a `csv` file. It then uses the API key configured in config.yaml to start the processing of the addresses.
10+
The program takes a `csv` file. It then uses the API key configured in `config.yaml` to start the processing of the addresses.
1111

1212
## Overview
1313

@@ -20,23 +20,41 @@ You will need an API Key to call the Address Validation API.
2020

2121
Running modes are essentially different scenarios or use cases under which the software can be run. There are three running modes for the software which can all be configured using the config.yaml deescribed in the next section:
2222

23+
Details of the elements we discuss in this section can be found in the [validateAddress object reference guide](https://developers.google.com/maps/documentation/address-validation/reference/rest/v1/TopLevel/validateAddress)
24+
2325
1. ### Test Mode : 1
2426

25-
In test mode you are allowed to store more details from the Address Validation API response (this can be configured from `main.py` in variable `header`).
27+
In test mode you can store more details from the Address Validation API response .
28+
29+
- place_ID
30+
- latlong
31+
- formatted_address
32+
- postal_address
33+
- verdict
34+
- address_type
35+
- usps_data
36+
- address_components
37+
38+
> **Note:** This is an extrmely permissive mode and should be avoided to be used for most scenarios. Only use case where this mode can be used is for testing and for very limited number of addresses. The responses have to be deleted within 15 days.
39+
40+
2. ### Production mode -Users : 2 (default)
41+
42+
A Production mode <ins>not</ins> initiated after user/human interaction, only minimal data elements are allowed to be stored as per [Google Maps Platform Terms of Service](https://cloud.google.com/maps-platform/terms). Typically involves successive and multiple programmatic requests to Address Validation API.
2643

27-
2. ### Production mode -NoUsers : 2 (default)
44+
- place_ID
45+
- latlong
46+
- verdict
47+
- address_components
2848

29-
a Production mode <ins>not</ins> initiated after user/human interaction, only minimal data elements are allowed to be stored as per [Google Maps Platform Terms of Service](https://cloud.google.com/maps-platform/terms). Typically involves successive and multiple programmatic requests to Address Validation API.
49+
> **Note:** All the data elements in this mode can only be cached for a maximum of 30 days and > must be deleted afterwords.Only place_ID can be stored indefinitely.
3050
31-
3. ### Production mode -Users : 3
51+
3. ### Production mode -NoUsers : 3
3252

3353
a Production mode initiated after user/human interaction, some more data may be cached for the unique purpose of the user completing his singular task.
3454

35-
* Update the mode in `config.yaml` file inside `/src` folder :
55+
- place_ID
3656

37-
```
38-
run_mode : 2
39-
```
57+
- Update the mode in `config.yaml` file:
4058

4159
### config.yaml
4260

@@ -68,49 +86,29 @@ separator : ","
6886
***Shelve db file:*** This is a temporary file created to maintain persistance for a long runninng process.
6987
```shelve_db : addresses```
7088

71-
### Overall Flow of logic
72-
73-
* Reads a `csv` file
74-
* Constructs the address as per configuration
75-
* Stores the formatted addresses in a `shelve` object. This is done to make the program more resilient and async.
76-
* The library then picks up addresses one by one from the `shelve` object and call the Address Validation API
77-
* It gets the response back, parse it and store configured values back to the `shelve` object
78-
* After all the addresses are inserted back to the datastructure, another piece of code executes and exports the data in a `csv` file
79-
* Once the program is executed, it stores the [geocode](https://developers.google.com/maps/documentation/address-validation/requests-validate-address#response) and [`place ID`](https://developers.google.com/maps/documentation/places/web-service/place-id) against each given address and exports it in a `csv` file.
80-
8189
### Key features
8290

83-
* Maintains QPM limits set by the Address Validation API
84-
* Async code and maintains state
85-
* Checks for duplicates and runs repeated addresses only once
86-
* Modes help create parity with Terms of Service
91+
- Maintains QPM limits set by the Address Validation API
92+
- Async code and maintains state
93+
- Checks for duplicates and runs repeated addresses only once
94+
- Generates a duplication report which shows which addresses are duplicated and how often
95+
- Modes help create parity with Terms of Service
8796

8897
## Install and run
8998

90-
* Requires `python3` and `PyYAML`:
99+
- Requires `python3` and `PyYAML`:
91100

92101
`brew install python3`
93102
`brew install PyYAML`
94103

95-
* Install: python-high-volume-address-validation-library software also requires to have [google-maps-services-python](https://github.com/googlemaps/google-maps-services-python) installed, the latest version that includes Address Validation API:
104+
- Install: python-high-volume-address-validation-library software also requires to have [google-maps-services-python](https://github.com/googlemaps/google-maps-services-python) installed, the latest version that includes Address Validation API:
96105
`
97106
pip3 install googlemaps
98107
`
99108

100-
* Update `config.yaml` file in `/src` folder with your API key, `csv` output path, and mode in which to run the library (see "Running Modes" section):
101-
102-
```
103-
## Address Validation API key
104-
api_key : 'YOUR_API_KEY'
105-
106-
## Name of the output csv file
107-
output_csv : './test-results.csv'
108-
109-
## There are three modes for running the software.
110-
run_mode : 1
111-
```
109+
- Update `config.yaml` file in with your API key, `csv` output path, and mode in which to run the library (see "Running Modes" section):
112110

113-
* Run:
111+
- Run:
114112
`
115113
python3 main.py
116114
`
@@ -131,6 +129,37 @@ separator : ","
131129

132130
The software works in three modes. You can set the mode to comply with [Google Maps Platform Terms of Service](https://cloud.google.com/maps-platform/terms), by configuring the `config.yaml` file corresponding to the use case under which this is run.
133131

132+
### Overall Flow of logic
133+
134+
- Reads a `csv` file
135+
- Constructs the address as per configuration
136+
- Stores the formatted addresses in a `shelve` object. This is done to make the program more resilient and async.
137+
- The library then picks up addresses one by one from the `shelve` object and call the Address Validation API
138+
- It gets the response back, parse it and store configured values back to the `shelve` object
139+
- After all the addresses are inserted back to the datastructure, another piece of code executes and exports the data in a `csv` file
140+
- Once the program is executed, it stores the [geocode](https://developers.google.com/maps/documentation/address-validation/requests-validate-address#response) and [`place ID`](https://developers.google.com/maps/documentation/places/web-service/place-id) against each given address and exports it in a `csv` file.
141+
134142
## Output
135143

136144
This program outputs a CSV file. Based on the mode selected above, the contents of the CSV file changes.
145+
146+
It will also output a duplication csv file which reports all the addresses which were duplicates in the input request.
147+
148+
## License
149+
150+
Copyright 2022 Google LLC.
151+
152+
Licensed to the Apache Software Foundation (ASF) under one or more contributor
153+
license agreements. See the NOTICE file distributed with this work for
154+
additional information regarding copyright ownership. The ASF licenses this
155+
file to you under the Apache License, Version 2.0 (the "License"); you may not
156+
use this file except in compliance with the License. You may obtain a copy of
157+
the License at
158+
159+
<http://www.apache.org/licenses/LICENSE-2.0>
160+
161+
Unless required by applicable law or agreed to in writing, software
162+
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
163+
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
164+
License for the specific language governing permissions and limitations under
165+
the License.

__init__.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# Copyright 2022 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
__version__ = "0.1"

src/__init__.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# Copyright 2022 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
__version__ = "0.1"

0 commit comments

Comments
 (0)