Skip to content

Kafka Connect source connector used for generating simulated events for demos and tests.

License

Notifications You must be signed in to change notification settings

IBM/kafka-connect-loosehangerjeans-source

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kafka-connect-loosehangerjeans-source

Kafka Connect source connector used for generating simulated events for demos and tests.

It produces messages simulating the following events:

Topic name Description
DOOR.BADGEIN An employee using their id badge to go through a door
CANCELLATIONS An order being cancelled
CUSTOMERS.NEW A new customer has registered on the website
ORDERS.NEW An order has been placed
SENSOR.READINGS A sensor reading captured from an IoT sensor
STOCK.MOVEMENT Stock shipment received by a warehouse
ORDERS.ONLINE An online order has been placed
STOCK.NOSTOCK A product has run out-of-stock
PRODUCT.RETURNS A return request has been issued
PRODUCT.REVIEWS A product review has been posted

Avro schemas and sample messages for each of these topics can be found in the ./doc folder.

Config

Minimal

By default, minimal config is needed. Specifying that keys are strings and payloads are json is enough to start it running.

apiVersion: eventstreams.ibm.com/v1beta2
kind: KafkaConnector
metadata:
  name: kafka-datagen
  labels:
    eventstreams.ibm.com/cluster: kafka-connect-cluster
spec:
  class: com.ibm.eventautomation.demos.loosehangerjeans.DatagenSourceConnector
  tasksMax: 1
  config:
    key.converter: org.apache.kafka.connect.storage.StringConverter
    key.converter.schemas.enable: false
    value.converter: org.apache.kafka.connect.json.JsonConverter
    value.converter.schemas.enable: false

Custom

Config overrides are available to allow demos based on different domains or industries.

Example config is listed below with all possible options, each shown with their default values.

apiVersion: eventstreams.ibm.com/v1beta2
kind: KafkaConnector
metadata:
  name: kafka-datagen
  labels:
    eventstreams.ibm.com/cluster: kafka-connect-cluster
spec:
  class: com.ibm.eventautomation.demos.loosehangerjeans.DatagenSourceConnector
  tasksMax: 1
  config:
    #
    # format of messages to produce
    #
    key.converter: org.apache.kafka.connect.storage.StringConverter
    key.converter.schemas.enable: false
    value.converter: org.apache.kafka.connect.json.JsonConverter
    value.converter.schemas.enable: false

    #
    # name of the topics to produce to
    #
    topic.name.orders: ORDERS.NEW
    topic.name.cancellations: CANCELLATIONS
    topic.name.stockmovements: STOCK.MOVEMENT
    topic.name.badgeins: DOOR.BADGEIN
    topic.name.newcustomers: CUSTOMERS.NEW
    topic.name.sensorreadings: SENSOR.READINGS
    topic.name.onlineorders: ORDERS.ONLINE
    topic.name.outofstocks: STOCK.NOSTOCK
    topic.name.returnrequests: PRODUCT.RETURNS
    topic.name.productreviews: PRODUCT.REVIEWS

    #
    # startup behavior
    #
    # if true, the connector will generate one week of historical
    #  events when starting for the first time
    startup.history.enabled: false

    #
    # format of timestamps to produce
    #
    #    default is chosen to be suitable for use with Event Processing,
    #    but you could modify this if you want to demo how to reformat
    #    timestamps to be compatible with Event Processing
    #
    #    NOTE: sensor readings topic is an exception. Events on this topic
    #           ignore this config option
    #
    formats.timestamps: yyyy-MM-dd HH:mm:ss.SSS
    # format of timestamps with local time zone (UTC time in ISO 8601 format)
    #    NOTE: this format is used by default for online orders
    formats.timestamps.ltz: yyyy-MM-dd'T'HH:mm:ss.SSS'Z'

    #
    # how often events should be created
    #
    # 'normal' random orders
    timings.ms.orders: 30000              # every 30 seconds
    # cancellations of a large order followed by a small order of the same item
    timings.ms.falsepositives: 600000     # every 10 minutes
    # repeated cancellations of a large order followed by a small order of the same item
    timings.ms.suspiciousorders: 3600000  # every hour
    # stock movement events
    timings.ms.stockmovements: 300000     # every 5 minutes
    # door badge events
    timings.ms.badgeins: 600              # sub-second
    # new customer events
    timings.ms.newcustomers: 543400       # a little over 9 minutes
    # sensor reading events
    timings.ms.sensorreadings: 27000      # every 27 seconds
    # sensor reading events
    #  from a single sensor that periodically generates very high and increasing readings (before returning to a normal range)
    timings.ms.highsensorreadings: 18000  # every 18 seconds
    # online orders
    timings.ms.onlineorders: 30000        # every 30 seconds
    # return requests
    timings.ms.returnrequests: 300000     # every 5 minutes
    # product reviews
    timings.ms.productreviews: 60000      # every 1 minute

    #
    # how much of a delay to introduce when producing events
    #
    #    this is to simulate events from systems that are slow to
    #    produce to Kafka
    #
    #    events with a delay will be produced to Kafka a short
    #    time after the timestamp contained in the message payload
    #
    #    the result is that the timestamp in the message metadata
    #    will be later than the message in the message payload
    #
    #    because the delay will be random (up to the specified max)
    #    the impact of this is that messages on the topic will be
    #    slightly out of sequence (according to the timestamp in
    #    the message payload)
    #
    # orders
    eventdelays.orders.secs.max: 0              # payload time matches event time by default
    # cancellations
    eventdelays.cancellations.secs.max: 0       # payload time matches event time by default
    # stock movements
    eventdelays.stockmovements.secs.max: 0      # payload time matches event time by default
    # door badge events
    eventdelays.badgeins.secs.max: 180          # payload time can be up to 3 minutes (180 secs) behind event time
    # new customer events
    eventdelays.newcustomers.secs.max: 0        # payload time matches event time by default
    # sensor readings events
    eventdelays.sensorreadings.secs.max: 300    # payload time can be up to 5 minutes (300 secs) behind event time
    # online orders
    eventdelays.onlineorders.secs.max: 0        # payload time matches event time by default
    # out-of-stock events
    eventdelays.outofstocks.secs.max: 0         # payload time matches event time by default
    # return requests
    eventdelays.returnrequests.secs.max: 0      # payload time matches event time by default
    # product reviews
    eventdelays.productreviews.secs.max: 0      # payload time matches event time by default

    #
    # how many events should be duplicated
    #
    #   this is to simulate events from systems that offer
    #   at-least once semantics
    #
    #   messages will occasionally be duplicated, according
    #   to the specified ratio
    #   between 0.0 and 1.0 : 0.0 means events will never be duplicated,
    #                         0.5 means approximately half of the events will be duplicated
    #                         1.0 means all events will be duplicated
    #
    # orders
    duplicates.orders.ratio: 0              # events not duplicated
    # cancellations
    duplicates.cancellations.ratio: 0       # events not duplicated
    # stock movements
    duplicates.stockmovements.ratio: 0.1    # duplicate roughly 10% of the events
    # door badge events
    duplicates.badgeins.ratio: 0            # events not duplicated
    # new customer events
    duplicates.newcustomers.ratio: 0        # events not duplicated
    # sensor reading events
    duplicates.sensorreadings.ratio: 0      # events not duplicated
    # online orders
    duplicates.onlineorders.ratio: 0        # events not duplicated
    # out-of-stock events
    duplicates.outofstocks.ratio: 0         # events not duplicated
    # return requests
    duplicates.returnrequests.ratio: 0      # events not duplicated
    # product reviews
    duplicates.productreviews.ratio: 0      # events not duplicated

    #
    # product names to use in events
    #
    #    these are combined into description strings, to allow for
    #    use of Event Processing string functions like regexp extracts
    #    e.g. "XL Stonewashed Bootcut Jeans"
    #
    #    any or all of these can be modified to theme the demo for a
    #    different industry
    products.sizes: XXS,XS,S,M,L,XL,XXL
    products.materials: Classic,Retro,Navy,Stonewashed,Acid-washed,Blue,Black,White,Khaki,Denim,Jeggings
    products.styles: Skinny,Bootcut,Flare,Ripped,Capri,Jogger,Crochet,High-waist,Low-rise,Straight-leg,Boyfriend,Mom,Wide-leg,Jorts,Cargo,Tall
    products.name: Jeans

    #
    # prices to use for individual products
    #
    #    prices will be randomly generated between the specified range
    prices.min: 14.99
    prices.max: 59.99
    # prices following large order cancellations will be reduced by a random value up to this limit
    prices.maxvariation: 9.99

    #
    # number of items to include in an order
    #
    # "normal" orders will be between small.min and large.max
    #   (i.e. between 1 and 15, inclusive)
    #
    # a "small" order is between 1 and 5 items (inclusive)
    orders.small.quantity.min: 1
    orders.small.quantity.max: 5
    # a "large" order is between 5 and 15 items (inclusive)
    orders.large.quantity.min: 5
    orders.large.quantity.max: 15

    #
    # controlling when orders should be cancelled
    #
    # how many orders on the ORDERS topic should be cancelled (between 0.0 and 1.0)
    cancellations.ratio: 0.005
    # how long after an order should the cancellation happen
    cancellations.delay.ms.min: 300000   # 5 minutes
    cancellations.delay.ms.max: 7200000  # 2 hours
    # reason given for a cancellation
    cancellations.reasons: CHANGEDMIND,BADFIT,SHIPPINGDELAY,DELIVERYERROR,CHEAPERELSEWHERE

    #
    # suspicious orders
    #
    #  these are the events that are looked for in lab 5 and lab 6
    #
    # how quickly will the large order will be cancelled
    suspicious.cancellations.delay.ms.min: 900000    # at least 15 minutes
    suspicious.cancellations.delay.ms.max: 1800000   # within 30 minutes
    # how many large orders will be made and cancelled
    suspicious.cancellations.max: 3   # up to three large orders
    # customer names to be used for suspicious orders will be selected from this
    #  list, to make it easier in lab 5 and 6 to see that you have created the
    #  flow correctly, and to make it easier in lab 4 to see that there are false
    #  positives in the simplified implementation
    suspicious.cancellations.customernames: Suspicious Bob,Naughty Nigel,Criminal Clive,Dastardly Derek

    #
    # new customers
    #
    #  these events are intended to represent new customers that
    #   have registered with the company
    #
    # how many new customers should quickly create their first order
    #  between 0.0 and 1.0 : 0.0 means new customers will still be created, but they will
    #                           never create orders,
    #                         1.0 means all new customers will create an order
    newcustomers.order.ratio: 0.22
    # if a new customer is going to quickly create their first order, how long
    #  should they wait before making their order
    newcustomers.order.delay.ms.min: 180000     # wait at least 3 minutes
    newcustomers.order.delay.ms.max: 1380000    # order within 23 minutes

    #
    # online orders
    #
    #  these events are intended to represent orders for several products,
    #   illustrating the use of complex objects and primitive arrays
    #
    # number of products to include in an online order: between 1 and 5 (inclusive)
    onlineorders.products.min: 1
    onlineorders.products.max: 5
    # number of emails for the customer who makes an online order: between 1 and 2 (inclusive)
    onlineorders.customer.emails.min: 1
    onlineorders.customer.emails.max: 2
    # number of phones in an address for an online order: between 0 and 2 (inclusive)
    #    NOTE: in case of 0 phone number, `null` is generated in the events as value for the `phones` property
    onlineorders.address.phones.min: 0
    onlineorders.address.phones.max: 2
    # how many online orders use the same address as shipping and billing address
    #  between 0.0 and 1.0 : 0.0 means no online order will use the same address as shipping and billing address
    #                        1.0 means all online orders will use the same address as shipping and billing address
    onlineorders.reuse.address.ratio: 0.55
    # how many online orders have at least one product that runs out-of-stock after the order has been placed
    #  between 0.0 and 1.0 : 0.0 means no online order has some product that runs out-of-stock
    #                        1.0 means all online orders have products that run out-of-stock
    onlineorders.outofstock.ratio: 0.22

    #
    # out-of-stocks
    #
    #  these events are intended to represent products that run out-of-stock in online orders
    #
    # how long after an out-of-stock should the restocking happen (in days)
    outofstocks.restocking.delay.days.min: 1  # 1 day
    outofstocks.restocking.delay.days.max: 5  # 5 days
    # how long after an online order should the out-of-stock happen (in milliseconds)
    outofstocks.delay.ms.min: 300000   # 5 minutes
    outofstocks.delay.ms.max: 7200000  # 2 hours

    #
    # return requests
    #
    #  these events are intended to represent return requests for several products,
    #   illustrating the use of complex objects and complex arrays
    #
    # number of products to include in a return request: between 1 and 4 (inclusive)
    returnrequests.products.min: 1
    returnrequests.products.max: 4
    # quantity for each product to include in a return request: between 1 and 3 (inclusive)
    returnrequests.product.quantity.min: 1
    returnrequests.product.quantity.max: 3
    # number of emails for the customer who issued a return request: between 1 and 2 (inclusive)
    returnrequests.customer.emails.min: 1
    returnrequests.customer.emails.max: 2
    # number of phones in an address for a return request: between 0 and 2 (inclusive)
    #    NOTE: in case of 0 phone number, `null` is generated in the events as value for the `phones` property
    returnrequests.address.phones.min: 0
    returnrequests.address.phones.max: 2
    # how many return requests use the same address as shipping and billing address
    #  between 0.0 and 1.0 : 0.0 means no return request will use the same address as shipping and billing address
    #                        1.0 means all return requests will use the same address as shipping and billing address
    returnrequests.reuse.address.ratio: 0.75
    # reason given for a product return
    returnrequests.reasons: CHANGEDMIND,BADFIT,SHIPPINGDELAY,DELIVERYERROR,CHEAPERELSEWHERE,OTHER
    # how many return requests have at least one product that has a review that is posted after the return request has been issued
    #  between 0.0 and 1.0 : 0.0 means no return request has some product that has a review that is posted
    #                        1.0 means all return requests have products that have a review that is posted
    returnrequests.review.ratio: 0.32
    # how many products have a size issue in a return request
    #  between 0.0 and 1.0 : 0.0 means no product has a size issue in a given return request
    #                        1.0 means all products have a size issue in a given return request
    returnrequests.product.with.size.issue.ratio: 0.22

    #
    # product reviews
    #
    #  these events are intended to represent reviews for products returned in return requests.
    #
    # number of products that have a size issue for product reviews
    productreviews.products.with.size.issue.count: 10
    # how many product reviews have a size issue for products that are supposed to have a size issue
    #  between 0.0 and 1.0 : 0.0 means no review with a size issue is generated for products that are supposed to have a size issue
    #                        1.0 means all generated reviews have a size issue for products that are supposed to have a size issue
    productreviews.review.with.size.issue.ratio: 0.75
    # how long after a return request should the product review happen (in milliseconds)
    productreviews.delay.ms.min: 300000   # 5 minutes
    productreviews.delay.ms.max: 3600000  # 1 hour

    #
    # locations that are referred to in generated events
    #
    locations.regions: NA,SA,EMEA,APAC,ANZ
    # countries in each region
    #  NA   : CA (Canada), US (United States), MX (Mexico)
    #  SA   : BR (Brazil), PY (Paraguay), UY (Uruguay)
    #  EMEA : BE (Belgium), FR (France), CH (Switzerland), GB (United Kingdom), DE (Germany), ES (Spain)
    #  APAC : ID (Indonesia), SG (Singapore), BN (Brunei), PH (Philippines)
    #  ANZ  : AU (Australia), NZ (New Zealand)
    locations.regions.countries: NA:CA,US,MX;SA:BR,PY,UY;EMEA:BE,FR,CH,GB,DE,ES;APAC:ID,SG,BN,PH;ANZ:AU,NZ
    locations.warehouses: North,South,West,East,Central

For example, if you want to theme the demo to be based on products in a different industry, you could adjust product sizes/materials/styles/name to match your demo (the options don't need to actually be "sizes", "materials" or "styles" - they just need to be lists that will make sense when combined into a single string).

You may also want to modify the prices.min and prices.max values to match the sort of products in your demo.

Build

mvn package

About

Kafka Connect source connector used for generating simulated events for demos and tests.

Resources

License

Stars

Watchers

Forks

Languages