Simulation And Evaluation

How can experiments be more systematic and comparable?

What is the State of the Art?

  • Creation of "meaningful" distribution (e.g., Zipf distribution)?
  • Use data from search engines -- didn't work too well
    • It is possible, though. I will put a simple generator online when I get back home and correct the code - for now it is only ugly bash+awk. For the general idea you can take a look here: http://doi.acm.org/10.1145/1266894.1266939
  • Use Planet``Lab for realistic processing and communication delays
  • We need also meaningful real-life topologies. While one can use a plethora of available generators real life data is always applicable. As just one of possible starting points you could look here: http://www.opte.org/
  • -> What are others doing?

What Are We Striving For?

  • Have large data-sets for different scenarios
  • Find models for distribution of events, subscriptions, etc. and extrapolate
  • Define benchmarks based on the scenarios/models
  • -> Anything important missing?

What can WE do?

  • Publish data used for evaluation on the website!
  • Publish workload generator/simulation used!
  • Use pub/sub and produce our own data :)
  • -> Has this already been done?

Where can we obtain realistic workloads and data sets?

  • For a start:
  • [http://www.nysedata.com NYSE Data] -> No subscription information
  • [http://investopedia.com Investopedia]
  • Use traces from peer-to-peer systems (e.g., Gnutella, traces from University of Washington)
  • How can they be converted to be used for our purposes?
  • Look at the benchmark suits currently developed there [UPDATE: Matteo]
  • Information filtering/retrieval benchmarks (Annika?)
  • Can we come up with a kind of game to gather data?
  • Workload scheduling trace
  • Intrusion detection systems may provide realistic data (e.g., Snort)
  • The Gryphon project has data used in papers
  • Ask TIBCO for data
  • Use information from applications that build on pub/sub
    • Ebay as a potential source (scrape data and publish)
    • Use information from business processes
    • Can we instrument games (like multiplayer games where characters subscribe to events in their neighborhood)?
    • Can we use data, e.g., from [http://webcq.com WebCQ]
    • Can we use information extracted from weblogs?
  • PlanetLab as source for processing/communication delays?
  • AOL Log data available here: http://www.gregsadetsky.com/aol-data/ If it goes off-line I can send you a CD (-- ZbigniewJerzak)

What Data Do We Need?

  • How are subscriptions distributed/look like?
    • Predicates, attributes, values
  • What about composite events?
  • How are publications distributed?
  • Message rates?
    • Subscriptions, publications, and meta-data
  • Locality of interest?
  • How does the topology look like?
    • Broker degree, connectedness, communication delays, bandwidth
  • How are clients joining and leaving the system?
  • All this is depending on the application!
  • -> Anything missing?

What benchmarks do exist or should exist?

  • There is an EU project WASP with a work group on benchmarks called "Network-level benchmarks"
  • There is work on EP application scenarios (cf. Dagstuhl Seminar) [UPDATE: Arno]
  • "Application kernels" to modularize benchmark
  • There is an EU project on benchmarks for EP (rule-based?) systems [UPDATE: Arno]

How have other communities developed and adopted benchmarks?

  • There is a VLDB paper on benchmarks for SP
  • Benchmarks for JMS (Alex Buchmann?)
  • TPC / SPEC benchmarks
  • Peer-to-Peer
  • -> Are there any benchmarks driven by academia?

What are realistic models for workload generation?

What are good performance metrics?

I would like to draw the attention of the DEBS community to our paper titled "Constructing scalable overlay for pub-sub with many topics", which is published in PODC'07. The paper is available from http://www.ifi.uio.no/~romanvi/Papers/scalable-overlay-theory.ps This work is decidedly not about a new pub-sub system; it rather attempts to formally capture and theoretically analyze a fundamental problem of building and evaluating pub-sub overlays. Since many existing pub-sub systems have been tackling this problem from the practical standpoint, perhaps this paper can be considered a (rather small) step towards creating the unifying theory of pub-sub. Specifically, we believe that our work provides the following potential benefits for the DEBS community:

  1. It includes and can be further extended toward evaluation criteria for pub-sub overlays. This may be relevant for the effort of creating commonly used pub-sub benchmarks.
  2. It determines theoretical limits of what a practical pub-sub system designer should strive and can hope to achieve. In particular, it includes a nearly optimal centralized algorithm for building an overlay, which can be used as a baseline for distributed implementations in practice. The current paper version only targets topic-based pub-sub. Since this is a conference version limited in length, the list of references is very far from being comprehensive. In particular, we did not cite any major work on content-based pub-sub. We do intend to compile a comprehensive list of citations for the full version of this paper. This is an additional reason why feedback from the DEBS community would be so useful.

What is a solid evaluation methodology?