-
Notifications
You must be signed in to change notification settings - Fork 1
C URL Library : a set a C functions to manipulate URL strings and achieve URL Canonicalization as described by Google in its Safe Browsing developer's guide
License
kirabou/Url-Canonicalization
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
URL CANONICALIZATION AND UTILITIES FUNCTIONS This is a set a C functions to manipulate URL strings (in an RFC 3986 compliant way) and achieve URL Canonicalization as described by Google in its Safe Browsing developer's guide. See : https://developers.google.com/safe-browsing/developers_guide_v3#Canonicalization It includes the following functions (see url.h for full documentation) : - url_RemoveTabCRLF() : removes leading and trailing spaces, as well as tab (0x09), CR (0x0d), and LF (0x0a) characters from the URL. - url_RemoveFragment() : removes the fragment part of an URL. - url_RemoveQuery() : removes the query part of an URL. - url_Unescape() : percent-decode an URL, calling itself until there is no more percent-decoding to be done. - url_Normalize() : applies URL normalization rules as described in Google Safe Browsing Developer's Guide. - url_Escape() : Percent-encode an URL. Reserved characters (from RFC 3986 that is one of "!*'();:@&=+$,/?#[]") are not encoded. - url_EscapeIncludingReservedChars() : Percent-encode a string, including reserved characters. - url_Encode() : encodes a string to be compliant with application/x-www-form-urlencoded format. Same as url_EscapeIncludingReservedChars() but also replaces spaces with '+'. - url_Canonicalize() : canonicalizes an URL by successively calling url_Normalize(), then url_Escape() to get a percent-encoded url. - url_CanonicalizeWithFullEscape() : canonicalizes an URL (same as url_Canonicalize()) but returned string is fully percent-encoded, even the reserved characters. - url_ParseNextKeyValuePair() : allows parsing of a "key=value&key=value&..." string. - url_Split() : splits an URL into its schme, link and query parts, as defined by RFC3986. - url_GetHostname() : returns the hostname part extracted from an URL. - url_GetBase() : returns the base part of an URL. - url_MakeAbsolute() : turn a relative URL into an absolute URL? All these functions are supposed to be thread safe. Tests were made with Valgrind to find and fix memory leaks. test_url.c implements the tests provided by Google in its documentation to help validate a canonicalization implementation. test_url.c also provides example of basic uses of the provided functions. To compile : gcc -std=c99 test_url.c url.c -o test_url Tu run tests : ./test_url
About
C URL Library : a set a C functions to manipulate URL strings and achieve URL Canonicalization as described by Google in its Safe Browsing developer's guide
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published