Going Native
There are a handful of guiding principles behind WebDriver, and I have Michael Tamm to thank for reminding me that I should perhaps articulate them more clearly. Michael's a committer on the WebDriver project, and he had this to say about one of the features requested by a user: I like the philosophy "Do only one thing, but do it perfect".
That's essentially a reiteration of the UNIX tool philosophy, and that's been a guiding light when talking about which features to add. But what are the other principles?
One of the earliest goals was to offer a developer-focussed API. We strongly believe that developers are best placed to build the tools that are most suitable for their team, so giving them a strong building block that they can use makes sense. This has also influenced the design of the API. We're "Object Based". That is, rather than having a dictionary style API (as exemplified by the existing Selenium API) we have a few, coarse-grained objects. However, it's not so fine grained that you get lost in the detail. Instead, we try and guide a developer into understanding what actions might make sense given the context their in without nannying them too much.
Another principle is that of providing safety. As Jason Yip pointed out, there are two sorts of safety: "actual" and "perceived" The difference between these is that one sort (actual) describes whether or not the software does what it's meant to. This is measurable and isn't open to debate. What is open to debate is the perceived safety --- do I believe that this software is doing what it should? This depends on the point of view of the person asking the question, but it's one of the major reasons why we try so hard to work with real browsers rather than simply wrapping the rendering engines.
Safety also ties into the next principle: correctness. We should model user behaviour as closely as possible, since we want to know how the app behaves when a user is attempting to interact with it. It was this principle that led to me resisting adding a direct way to execute arbitrary chunks of javascript; too often I'd seen this capability used to work around limitations in the underlying framework. By not providing the mechanism to work around them, incorrect behaviour is reported as a bug, we fix it in WebDriver and then everyone has the fixed version.
Correctness is also the reason behind the most recent changes I've started feeding into the codebase, starting with the IE driver.
There are two ways in which events, such as "keydown" or "click" may be generated by a testing framework. The first of these is to synthesize the event within the HTML DOM, and fire things off manually. These are the "synthetic events". This approach has the advantage that it's relatively easy to do, but it requires a careful eye for detail, and it's easy to get caught out by the quirks of the various browsers (for example, which browsers generate a "textinput" event?) The major drawback is that it becomes hard to handle edge-cases: what if the element is obscured by another, possibly transparent, one? What about the case where the element has 0 height or width (no major framework handles this, as far as I can see)? How about if the element is located somewhere the user will never be able to click, such as at -1000? What about if the default event is cancelled?
It's a nightmare.
The alternative to synthesized events is to generate them at the OS level. I call these "native events" in a stunning lack of originality. These are harder to produce, harder to hook into the app and also requires an eye for detail. The good thing is, though, that once they're done once, the same approach works across multiple browsers. Better still, we automatically isolate ourselves from the mess that is handling all those edge cases, and figuring out which browsers fire what events in which circumstances.
One of the constraining factors is that we'd like to be able to run webdriver tests in the background, without stealing focus, and in parallel. We've got most of this problem solved on Windows for both keyboard and mouse entry. We'll solve it on other platforms too (starting with X, I believe)
I'll post here as we fix things up, but stay tuned: this is going to get very interesting.
Posted in: /tech/webdriver
Solving the problem on X is trivial: run the tests in an X server serving a virtual display. Xvfb is an entirely virtual server and Xnest creates a virtual display in a new window (letting you see the results as tests are running). You can also use any VNC servers in the same way.
You're right: X should be easy because it's designed for this sort of thing.