注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

写着玩

Bob

 
 
 

日志

 
 
 
 

Layout Tests Results Dashboard  

2010-01-10 22:48:47|  分类: Chrome |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |
This is a mini-design doc for the layout test dashboard at http://src.chromium.org/viewvc/chrome/trunk/src/webkit/tools/layout_tests/flakiness_dashboard.html. This dashboard helps identify flaky tests and compare test expectations across platforms, including upstream webkit.org's expectations.

The dashboard itself is consists of flakiness_dashboard.html and dashboard/dashboard_base.js. In order to minimize maintenance cost, there is no server-side logic. Instead the dashboard grabs JSONP files from each builder. Each builder creates and archives two JSON files that it updates with each run's results. There is one results.json file per builder that holds the results for all runs and there is an expectations.json file that translates src/webkit/tools/layout_tests/text_expectations.txt into JSON that the dashboard can parse. The expectations.json files will be the same on all the bots, so the dashboard can grab it from any of them.

The JSON files are created by run_webkit_tests.py. It stores them at the top level of the layout test results directory (same place as results.html). On each run, it reads in the results.json file from that location, adds the results from the current run and writes it out again. If it does not exist locally (e.g. if the layout test results directory was deleted), then it grabs it off the archive location on the bot. Once both JSON files have been written, they are both copied to the archive location for the layout test results so that they can be accessed via http from the dashboard.

Once all the JSON files have loaded, the dashboard does a bunch of processing on them and generates the HTML for the page, which is mostly then innerHTMLed. The results.json file also has a bunch of data that is counts of failing tests that is used in the layout test progress dashboard. For the purposes of the flakiness dashboard, they can just be excluded from the JSON. If they are included, then the layout test progress dashboard would work as well.

The dashboard stores all the relevant state in the URL's hash and uses hashchange events to detect changes. Thus it only works in browsers that support the hashchange event.

Changes needed in order to add new test types (e.g upstream webkit.org or UI/browser tests)
  1. output results.json and expectations.json per builder. See the two notes below marked "For upstream webkit.org".
  2. testType: The type of tests being run. Defaults to 'layout_test_results'. We'll need to use a new testType. And modify the global "builders" variable (defined in dashboard_base.js) to contain the builders for those tests. 
  3. getPlatformAndBuildType: Needs to be modified to understand any new mapping from builderName to platform/buildType
  4. Consider excluding bugs, modifiers, expectatins, missing and extra columns from the dashboard since they don't make as much sense for other test suites (unless we add richer expectations to the other test suites).
  5. Consider excluding the ability to high/show configurations that don't make sense (e.g. hide/show tests with correct/missing expectations makes no sense if we don't an expectations file).
  6. Make the various URLs that assume chromium buildbots switch off testType to use the webkit.org bots.
See the webkit bug for progress on making the dashboard work for the webkit bots: https://bugs.webkit.org/show_bug.cgi?id=32954.

JSON file size optimizations

If we store information about every test on every run, the JSON files quickly become many megabytes large. In order to keep the size manageable we do the following:
  • The number of runs stored is hard-coded at 750.
  • Any test that has passed for all runs and takes less than 1 second to run for all those runs is dropped from the JSON.
  • The data for failures/speed of each test result is run-length encoded.
  • Strip all unnecessary whitespace.

JSON file formats

The order of all arrays is from newest run to oldest run. In other words, the results from each run are prepended to the existing results.json file's contents.

results.json

ADD_RESULTS({
"version": NUMBER,
"wontfixCounts": FAILURE_COUNTS,
"deferredCounts": FAILURE_COUNTS,
"fixableCounts": FAILURE_COUNTS,
"fixableCount": ARRAY_OF_NUMBERS,
"allFixableCount": ARRAY_OF_NUMBERS,
"buildNumbers": ARRAY_OF_NUMBERS,
"webkitRevision": ARRAY_OF_NUMBERS,
"chromeRevision": ARRAY_OF_NUMBERS,
"secondsSinceEpoch": ARRAY_OF_NUMBERS,
"tests": {
"TEST": {
"results": RUN_LENGTH_ENCODED_RESULTS,
"times": RUN_LENGTH_ENCODED_RESULTS
},
...
}
});

TEST: Relative path to the test, e.g. "LayoutTests/fast/foo/bar.html". TODO(ojan): Exclude the LayoutTests prefix now that all layout tests live in that directory.
FAILURE_COUNTS: Array of objects mapping single-char failure type to number of failures of that type. The mapping from single-char failure type to the human-readable string used in expectations.json is held in EXPECTATIONS_MAP in dashaboard_base.js.
ARRAY_OF_NUMBERS: An array of numbers.
RUN_LENGTH_ENCODED_RESULTS: An array of RUN_LENGTH_VALUEs.
RUN_LENGTH_VALUE: An array with exactly two items. The first item is the number of runs that this value represents. The second item is the value of the result for that run. The values used for the results are in dashboard_base.js as EXPECTATIONS_MAP. If a test is listed in results.json, but was not run during this run for some reason (e.g. it's skipped), then the 'N' value is used for that run to indicate we have no data for that test for that run. There's a copy below for reference:
var EXPECTATIONS_MAP = {    'T': 'TIMEOUT',    'C': 'CRASH',    'P': 'PASS',    'F': 'TEXT',    'I': 'IMAGE',    'Z': 'IMAGE+TEXT',    // We used to glob a bunch of expectations into "O" as OTHER. Expectations    // are more precise now though and it just means MISSING.    'O': 'MISSING',    'N': 'NO DATA',    'X': 'SKIP'  };
version: The version of the JSON format that we're using. This is so we can make backwards-incompatible changes and know to upgrade the JSON data appropriately. Currently just hard-coded to 1.
wontfixCounts: Number of tests marked WONTFIX broken down by type of failure that actually occurred.
deferredCounts: Number of tests marked DEFER broken down by type of failure that actually occurred.
fixableCounts: Number of tests in test_expectations.txt not marked DEFER/WONTFIX broken down by type of failure that actually occurred.
fixableCount: Number of tests not marked DEFER/WONTFIX that failed.
allFixableCount: Number of tests we not marked DEFER/WONTFIX (includes expected passes and failures)
buildNumbers: The build number for this run on the bot.
webkitRevision: The revision of webkit being used.
chromeRevision: The revision of chorme being used.
secondsSinceEpoch: Number of seconds since the epoch (time.time() in python)

TODO(ojan): Track V8/Skia revision numbers as well so we can better track regressions from V8/Skia.

For upstream webkit.org results we can exclude chromeRevision and all the count/Counts values. The latter are not meaningful if failing tests are skipped. If that ever changes, then we can change upstream to include these numbers and we can track number of failures over time.

Sample results.json:
ADD_RESULTS({
  "Webkit Mac10.5": {
    "wontfixCounts": [
      {
        "C": 0,
        "F": 105,
        "I": 2,
        "O": 0,
        "T": 1,
        "X": 33,
        "Z": 11
      }
    ],
    "fixableCounts": [
      {
        "C": 2,
        "F": 136,
        "I": 725,
        "O": 1,
        "T": 15,
        "X": 8665,
        "Z": 25
      }
    ],
    "tests": {
      "LayoutTests/fast/frames/viewsource-attribute.html": {
        "results": [
          [
            751,
            "I"
          ]
        ],
        "times": [
          [
            741,
            0
          ],
[
10,
2
]
        ]
      }
    },
    "secondsSinceEpoch": [
      1259810345
    ],
    "fixableCount": [
      904
    ],
    "allFixableCount": [
      9598
    ],
    "buildNumbers": [
      "7279"
    ],
    "webkitRevision": [
      "51615"
    ],
    "deferredCounts": [
      {
        "C": 0,
        "F": 56,
        "I": 0,
        "O": 0,
        "T": 0,
        "X": 7,
        "Z": 0
      }
    ],
    "chromeRevision": [
      "33659"
    ]
  },
  "version": 1
}

expectations.json
ADD_EXPECTATIONS({
"TEST": [ MODIFIERS_AND_EXPECTATIONS, MODIFIERS_AND_EXPECTATIONS, ... ],
"TEST": [ MODIFIERS_AND_EXPECTATIONS ]
});

TEST: Same as above in results.json.
MODIFERS_AND_EXPECTATIONS: object of the following form { "modifiers": MODIFIERS, "expectations": EXPECTATIONS }
MODIFIERS: Space separated string of modifiers. This includes bug numbers (BUG1234), platforms (LINUX, MAC, etc), build types (debug, release), SKIP, DEFER, WONTFIX and SLOW.
EXPECTATIONS: The expected results for this test. TIMEOUT, TEXT, IMAGE, PASS, etc.

For upstream webkit.org, we should probably make an expectations file that parses all the SKIPPED lists and creates the appropriate expectations file. Something like:

ADD_EXPECTATIONS({
  "foo/bar/baz.html", [ {"modifiers": "SKIP LINUX", "expectations": ""}, {"modifiers": "SKIP WIN", "expectations": ""} ],
  ...
});

  评论这张
 
阅读(182)| 评论(0)
推荐 转载

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017