Closed Bug 711794 Opened 13 years ago Closed 12 years ago

Firefox 9.0 Crash [@ js::Shape::finalize ]

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla12
Tracking Status
firefox9 + ---

People

(Reporter: marcia, Unassigned)

References

Details

(Keywords: crash, regression, Whiteboard: STR in comment #35)

Crash Data

Attachments

(4 files)

Seen while looking at Mac Firefox 9B6 crash stats. This crash is at the top of the Mac crash stats for B6 and seems to be much higher than previous betas. Some reports appear to be dupes, but there are a variety of users hitting this. Happens on Linux as well. https://crash-stats.mozilla.com/report/list?signature=js::Shape::finalize. Suspect maybe an addon may be causing it but will have to look at manual correlations since not enough volume.

Comments mention Hotmail and viewing a slideshow on kodak.com

https://crash-stats.mozilla.com/report/index/44be3d6a-01af-4ecb-95d5-5c9e12111217

Frame 	Module 	Signature [Expand] 	Source
0 	XUL 	js::Shape::finalize 	js/src/jspropertytree.cpp:197
1 	XUL 	js::gc::FinalizeArenas 	js/src/jsgc.cpp:301
2 	XUL 	GCCycle 	js/src/jsgc.cpp:1288
3 	XUL 	js_GC 	js/src/jsgc.cpp:2735
4 	XUL 	nsXPConnect::Collect 	js/src/xpconnect/src/nsXPConnect.cpp:415
5 	XUL 	nsXPConnect::GarbageCollect 	js/src/xpconnect/src/nsXPConnect.cpp:423
6 	XUL 	nsTimerImpl::Fire 	xpcom/threads/nsTimerImpl.cpp:424
7 	XUL 	nsTimerEvent::Run 	xpcom/threads/nsTimerImpl.cpp:520
8 	XUL 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:631
9 	XUL 	NS_ProcessPendingEvents_P 	obj-firefox/x86_64/xpcom/build/nsThreadUtils.cpp:195
10 	XUL 	nsBaseAppShell::NativeEventCallback 	widget/src/xpwidgets/nsBaseAppShell.cpp:130
11 	XUL 	nsAppShell::ProcessGeckoEvents 	widget/src/cocoa/nsAppShell.mm:424
12 	CoreFoundation 	CoreFoundation@0x12b50 	
13 	CoreFoundation 	CoreFoundation@0x123bc 	
14 	CoreFoundation 	CoreFoundation@0x391a8 	
15 	libsystem_c.dylib 	libsystem_c.dylib@0x4d15f 	
16 	libsystem_c.dylib 	libsystem_c.dylib@0xa0788 	
17 	CoreFoundation 	CoreFoundation@0x9213 	
18 	CoreFoundation 	CoreFoundation@0x3064 	
19 	libobjc.A.dylib 	objc_memmove_collectable 	
20 	CoreFoundation 	CoreFoundation@0xf72f 	
21 	AppKit 	AppKit@0x3e733 	
22 	libsystem_c.dylib 	libsystem_c.dylib@0x6bffe 	
23 	libsystem_c.dylib 	libsystem_c.dylib@0x6b061 	
24 	libsystem_c.dylib 	libsystem_c.dylib@0x6b02e 	
25 	libsystem_c.dylib 	libsystem_c.dylib@0x6703a 	
26 	libsystem_c.dylib 	libsystem_c.dylib@0x4d46f 	
27 	libsystem_c.dylib 	libsystem_c.dylib@0x4d6aa 	
28 	libnspr4.dylib 	PR_Unlock 	nsprpub/pr/src/pthreads/ptsynch.c:237
29 	XUL 	nsDOMEvent::Release 	nsISupportsImpl.h:210
30 	XUL 	nsXULTooltipListener::MouseMove 	nsCOMPtr.h:515
31 		@0x1033af547 	
32 	XUL 	nsEventListenerManager::HandleEventInternal 	dom/base/nsPIDOMWindow.h:693
33 	XUL 	nsEventTargetChainItem::HandleEventTargetChain 	nsCOMPtr.h:515
34 		@0x12a00041f 	
35 	libsystem_c.dylib 	libsystem_c.dylib@0xa0788 	
36 	XUL 	nsPresContext::Release 	
37 	libsystem_c.dylib 	libsystem_c.dylib@0x4d46f 	
38 	CoreFoundation 	CoreFoundation@0x1f55c 	
39 	CarbonCore 	CarbonCore@0x189d3 	
40 	CoreFoundation 	CoreFoundation@0xfc52 	
41 	CoreFoundation 	CoreFoundation@0x170edf 	
42 	CoreFoundation 	CoreFoundation@0x27ed 	
43 	XUL 	PresShell::HandleEventInternal 	
44 		@0x7fff75a514ff 	
45 	libsystem_c.dylib 	libsystem_c.dylib@0x4d46f 	
46 	CoreFoundation 	CoreFoundation@0x38e10 	
47 	CoreFoundation 	CoreFoundation@0xfc52 	
48 	HIToolbox 	HIToolbox@0x189a9 	
49 	HIToolbox 	HIToolbox@0x18948 	
50 	HIToolbox 	HIToolbox@0x188da 	
51 	libsystem_c.dylib 	libsystem_c.dylib@0x3e2b4 	
52 	libsystem_c.dylib 	libsystem_c.dylib@0x3e1ef 	
53 	HIToolbox 	HIToolbox@0x22b3 	
54 	CoreFoundation 	CoreFoundation@0x503e2 	
55 	HIToolbox 	HIToolbox@0x1821a 	
56 	libsystem_c.dylib 	libsystem_c.dylib@0x4d15f 	
57 	CoreFoundation 	CoreFoundation@0x63ed7 	
58 	CoreFoundation 	CoreFoundation@0x63d7f 	
59 	CoreFoundation 	CoreFoundation@0x38ae5 	
60 	HIToolbox 	HIToolbox@0x23d2 	
61 	HIToolbox 	HIToolbox@0x963c 	
62 	HIToolbox 	HIToolbox@0x94c9 	
63 	AppKit 	AppKit@0x93f0 	
64 	libsystem_c.dylib 	libsystem_c.dylib@0xa0788 	
65 	libobjc.A.dylib 	object_dispose 	
66 	CoreFoundation 	CoreFoundation@0x316e5 	
67 	AppKit 	AppKit@0x73b5b 	
68 	libobjc.A.dylib 		
69 	libsystem_blocks.dylib 	_Block_object_dispose 	
70 	AppKit 	AppKit@0x8bc0 	
71 	AppKit 	AppKit@0x2c1f5c 	
72 	AppKit 	AppKit@0x2c1cd1 	
73 	AppKit 	AppKit@0xae8e 	
74 	AppKit 	AppKit@0x6fb79 	
75 	CoreGraphics 	CoreGraphics@0x6d208 	
76 	CoreGraphics 	CoreGraphics@0x14b008 	
77 	AppKit 	AppKit@0x955705 	
78 	AppKit 	AppKit@0x103e2d 	
79 	AppKit 	AppKit@0x6e02f 	
80 	AppKit 	AppKit@0x88b717 	
81 	AppKit 	AppKit@0x8cf4 	
82 	AppKit 	AppKit@0x745be 	
83 	AppKit 	AppKit@0x8a3a77 	
84 	libobjc.A.dylib 	_cache_fill 	
85 	Foundation 	Foundation@0x3ac0b 	
86 	Foundation 	Foundation@0x3abf0 	
87 	libobjc.A.dylib 	objc::DenseMap<objc_object*, unsigned long, true, objc::DenseMapInfo<objc_object*>, objc::DenseMapInfo<unsigned long> >::find 	
88 	libobjc.A.dylib 	objc_autoreleasePoolPush 	
89 	CoreFoundation 	CoreFoundation@0x30ce6 	
90 	AppKit 	AppKit@0x562c 	
91 	XUL 	nsAppShell::Run 	widget/src/cocoa/nsAppShell.mm:771
92 	XUL 	nsAppStartup::Run 	toolkit/components/startup/nsAppStartup.cpp:228
93 	XUL 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3557
94 	firefox 	main 	browser/app/nsBrowserApp.cpp:198
95 	firefox 	firefox@0xac3
Bug 673925 was filed in an earlier time frame for a Windows signature that is pretty similar.
It occurs mainly with the 64-bit architecture.

There's no clear correlations with a specific add-on.

It's not caused by new users that weren't using 9.0b5 according to some comments: "it happens at startup after the update".

The Beta regression range is:
http://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=00f41dc4fcc5&tochange=0e3132ba2530
Hardware: x86 → x86_64
FF 9 Beta Correlations based on 210 crashes - nothing stands out as highly correlated in the addons realm

21% (44/210) vs.   8% (64/836) {841468a1-d7f4-4bd3-84e6-bb0f13a06c64}
     24% (50/210) vs.  12% (98/836) personas@christopher.beard (Personas, https://addons.mozilla.org/addon/10900)
     17% (36/210) vs.   6% (54/836) canitbecheaper@trafficbroker.co.uk (InvisibleHand, https://addons.mozilla.org/addon/11377)
     12% (26/210) vs.   5% (41/836) {3e0e7d2a-070f-4a47-b019-91fe5385ba79} (AddThis, https://addons.mozilla.org/addon/4076)
     16% (34/210) vs.   9% (77/836) {D4DD63FA-01E4-46a7-B6B1-EDAB7D6AD389} (Download Statusbar, https://addons.mozilla.org/addon/26)
     16% (33/210) vs.   9% (75/836) foxmarks@kei.com (Xmarks (formerly Foxmarks), https://addons.mozilla.org/addon/2410)
      9% (18/210) vs.   3% (22/836) FFToolbar@upromise
     11% (23/210) vs.   5% (42/836) {51ef49d2-624b-4194-8b97-1c468e9b0efe}
      9% (19/210) vs.   3% (27/836) {635abd67-4fe9-1b23-4f01-e679fa7484c1} (Yahoo! Toolbar, https://addons.mozilla.org/addon/2032)
      8% (16/210) vs.   2% (16/836) {18b8f08d-62fe-4dfc-ad6c-9ce46515d5ec}
Although one commonality is that there seem to be toolbars involved. Will dig a bit further,
It might be a regression from bug 702572, bug 685321, two JS bugs, or bug 708499, an XPConnect bug.
87% of crashes happen under 1 minute. Also I am getting different tital crash numbers depending on how query. but again we have a Mac crash in the top 20 overall for Beta 6,
Keywords: regression
Keywords: topcrash
The two crash signatures represent 45% of all crashes in 9.0b6 on Mac OS X.
According to some comments, it also happens in safe mode.
Crash Signature: [@ js::Shape::finalize ] → [@ js::Shape::finalize] [@ js::Shape::removeChild]
Including a few more people to see if this regression rings a bell.
This is unlikely to be bug 708499.
These reports seem to have app notes related to GL:

Renderers: 0x22600,0x20400GL Context? GL Context+
GL Layers? GL Layers+

Is that expected, or possibly related to 709369?
Here are some URLs from the js::Shape::finalize signature.

     8 http://xfinity.comcast.net/
      8 http://www.nectar.com/
      8 http://www.facebook.com/home.php?ref=hp
      8 about:blank
      7 jar:file:///Applications/Firefox.app/Contents/MacOS/omni.jar!/chrome/browser/content/browser/aboutHome.xhtml
      5 http://www.facebook.com/
      5 http://my.earthlink.net/
      4 jar:file:///Applications/Internet%20Browsers/Firefox.app/Contents/MacOS/omni.jar!/chrome/browser/content/browser/aboutSessionRestore.xhtml
      4 http://team-twilight.com/20111216/great-opportunity-for-breaking-dawn-part-1-fans/
      3 http://www.msn.co.uk/
      3 http://www.facebook.com/?ref=hp
      2 jar:file:///Applications/Firefox.app/Contents/MacOS/omni.jar!/chrome/toolkit/content/mozapps/extensions/extensions.xul
      2 http://www.yahoo.co.jp/
      2 http://www.symbaloo.com/
      2 http://www.seznam.cz/
      2 http://www.facebook.com/login.php
      2 http://www.coolchaser.com/user/toolbar_install
      2 http://www.att.net/s/context.dll?id=215001&type=clickthru&name=CommNewsletter.clickurl.att.Email_UV_DEC_Watch_now&redirecturl=http://uverse.com
      2 http://uk.search.yahoo.com/search?ourmark=1&ei=utf-8&fr=chr-nectar&slv8-&type=61465&p=firefox
      2 http://search.conduit.com/?ctid=CT206413&SearchSource=13
      2 http://postageestimator.ebay.co.uk/index.php?postage_submit=index&packagetype=largeletter&weight=130&measurement=g&destination=uk&postage_submit.x=69&postage_submit.y=9
      2 http://ph.yearinreview.yahoo.com/2011
      2 http://apps.facebook.com/vikingclan/
      1 wyciwyg://2/http://www.youtube.com/watch?v=B43JyTSknl0&feature=related
      1 wyciwyg://21/https://fb-tc-3.farmville.com/flash.php?ref=bookmarks&isOuterIframe=1&fv_canvas=1&
      1 wyciwyg://18/http://wonderwall.msn.com/music/gossip-cleavage-alert-miley-swells-with-pride-while-out-with-liam-16861.gallery
      1 jar:file:///Previous%20Systems.localized/2011-01-30_0148/Applications/Firefox.app/Contents/MacOS/omni.jar!/chrome/browser/content/browser/aboutSessionRestore.xhtml
      1 jar:file:///Applications/1.%20Daily%20Application/Firefox.app/Contents/MacOS/omni.jar!/chrome/browser/content/browser/aboutHome.xhtml
      1 http://xml.freecause.com/?action=eula&toolid=62781
      1 http://www.youtube.com/watch?v=FdAIMCKK_-w
      1 http://www.youtube.com/
      1 http://www.yahoo.co.uk/
      1 http://www.ugc.fr/complex.do?comeFrom=allMoviesLink&complexId=GOBEL
      1 http://www.targetingnow.com/delivery
      1 http://www.manhunt.net/profile/LUXORS
      1 http://www.magyarvagyok.com/videok/28-Film/7633-Szent-Peter-esernyoje-Magyar-film-05-resz.html
      1 http://www.macworld.com/
      1 http://www.livingsocial.com/cities/858/deals/165497-facial-with-microdermabrasion
      1 http://www.indowebster.web.id/showthread.php?t=132288&page=1
      1 http://www.ikea.com/hu/hu/store/budapest/store180_offers
      1 http://www.google.fr/
      1 http://www.google.co.uk/ig
      1 http://www.google.com/ig?hl=en
      1 http://www.google.com/firefox?client=firefox-a&rls=org.mozilla:en-US:official
      1 http://www.facebook.com/?ref=logo
      1 http://www.denverpost.com/
      1 http://www.comcast.net/xfinity/
      1 http://www.cebupacificair.com/
      1 http://www.breeders.net/popup2.htm?2/22/229/229823_1321077652.JPG
      1 http://www.agdb.de/beratung-service/literaturtipps-und-links/literaturtipps/
      1 http://warnet.ws/news/49877
      1 http://vkontakte.ru/photo113548919_274574287?all=1
      1 http://vk.com/al_friends.php?__query=friends&al=-1&al_id=62219514&_rndVer=85113
      1 http://videos-famosas-gracioso.blogspot.com/
      1 http://uk.msn.com/
      1 http://torrentline.net/games/shooter/page/2/
      1 http://tablica.pl/oferta/tarpan-IDuSEh.html
      1 https://www.mypoints.com/
      1 https://www.facebook.com/dialog/permissions.request
      1 http://support.slingbox.com/get/KB-2000075.html
      1 http://start.gametop.com/?utm_source=Motoracing&utm_medium=start
      1 https://s-static.ak.fbcdn.net/connect/xd_proxy.php?version=3#cb=f177aaeb6fe0c2a&origin=http%3A%2F%2Ffarm-ar-fb-3.socialgamenet.com%2Ff22f55f9758cd4&relation=top.frames%5Biframe_canvas%5D&transport=postmessage&state=closed
      1 http://soft.sibnet.ru/subcat/?id=31&pg=6
      1 http://smaxxi.chatovod.ru/widget/
      1 https://mail.google.com/mail/?tab=wm
      1 http://salmanonline.blogfa.com/post-7.aspx
      1 https://accounts.google.com/ServiceLoginAuth
      1 http://reviews.cnet.com/consoles/sony-playstation-3-slim/4505-10109_7-34175148.html?tag=mncol
      1 http://refdesk.com/
      1 http://plaza-pskov.ru/catalog/7799/3/25911/
      1 http://pics.ebaystatic.com/aw/pics/tbx/s.gif
      1 http://oasisactive.cachefly.net/adserver/adsensegenerated/Colombia/adsense_Colombia_Male_25x34_300x250.htm?0.907374640799888
      1 http://gmail.com/
      1 http://gamersunite.coolchaser.com/toolbar/uninstall
      1 http://gamersunite.coolchaser.com/toolbar/install
      1 http://forum.nov.ru/index.php?showtopic=332832
      1 http://facebook.com/
      1 http://es.yahoo.com/?fr=fptb-msgr
      1 http://en-us.start3.mozilla.com/firefox?client=firefox-a&rls=org.mozilla:en-US:official
      1 http://edition.cnn.com/
      1 http://edge.jeetyetmedia.com/728x90_bvd_youtube.htm?clientId=8c5f1473-cbc4-4696-8e97-71303bfd5f0e
      1 http://dating.uk.msn.com/edito/index.php?mtcmk=080519&name=5/115/2474-things-that-women-don-t-want-men-to-wear-on-a-date.html
      1 http://block.opendns.com/main?url=8888881572669015688078166873668516787084847079727083156980&ablock=&ref=http%3A%2F%2Fwww.gay.com%2F&w=246&h=603
      1 http://att.my.yahoo.com/
I just noticed in the correlations that there is a high correlation to 2 core 64 machines.

79% (119/150) vs.  50% (362/723) amd64 with 2 cores
(In reply to Alex Keybl [:akeybl] from comment #10)
> These reports seem to have app notes related to GL:
> 
> Renderers: 0x22600,0x20400GL Context? GL Context+
> GL Layers? GL Layers+
> 
> Is that expected, or possibly related to 709369?

This doesn't seem related at all to 709369. The App Notes you're quoting,

    Renderers: 0x22600,0x20400GL Context? GL Context+
    GL Layers? GL Layers+

is standard on Mac, it means that this user is using OpenGL accelerated layers,
which indeed is on my default on Mac OS 10.6+. But this isn't the same thing
as WebGL, and actually, this user hasn't used WebGL at all: if they had started initializing
WebGL, you would see "WebGL?" in the App Notes, followed by WebGL+ or WebGL-
depending on whether WebGL initialization was successful (or nothing at all in case the
crash occurred during WebGL initialization).

In any case, the above app notes show that this particular crash happened to someone who
didn't even start initializing a WebGL context. Bug 709369 only affects WebGL.
(In reply to Marcia Knous [:marcia] from comment #12)
> I just noticed in the correlations that there is a high correlation to 2
> core 64 machines.
> 
> 79% (119/150) vs.  50% (362/723) amd64 with 2 cores

And the x86 case is the reverse, which makes this look a lot like the regression is 64bit-only (the signature is seen in very low volume in 8.0.1 as well, so the interesting part is what regressed from that state).
The number of these crashes started growing explosively on OS X on 2011/12/16.
> The number of these crashes started growing explosively on OS X on 2011/12/16.

I've no idea why.  As Marcia said, the vast majority of these are in build 20111212185108, which I assume is FF 9 B6.  But maybe B6 only started reaching large numbers of people on 2011/12/16.

Interestingly, the number of crashes peaked on 2011/12/17, and has come down a bit since then.

All very puzzling.
> The number of these crashes started growing explosively on OS X on 2011/12/16.

Could this be the date when B5 started being auto-updated to B6?
(In reply to Steven Michaud from comment #17)
> > The number of these crashes started growing explosively on OS X on 2011/12/16.
> 
> Could this be the date when B5 started being auto-updated to B6?

Yeah, we believe that this is a regression in beta 6, possibly related to JS changes. Dave rolled https://hg.mozilla.org/try/rev/64405f8346fe and Cheng is currently reaching out to affected users to try that build.
I suspect the bug that causes crashes at js::Shape::finalize has existed for a long time (and is cross-platform, since it also happens on Windows and Linux).  But very recently something happened to make it occur much more frequently on OS X.  Yes, it's probably something that's new in B6.  But could it be something in the auto-update process itself?
> But could it be something in the auto-update process itself?

If so, that might explain the odd reduction in crash frequency since 2011-12-17.
Marcia, I've given myself a clean profile and am going to run B5 overnight (hoping that an autoupdate will take place during that time).  Could you try doing the same test on some spare machine?
(In reply to Steven Michaud from comment #21)
> Marcia, I've given myself a clean profile and am going to run B5 overnight
> (hoping that an autoupdate will take place during that time).  Could you try
> doing the same test on some spare machine?

You should be able to trigger an auto-update by setting:
    app.update.interval = 60
    app.update.idletime = 0

After restarting Firefox, it should auto update after a minute.
(In reply to comment #22)

Thanks!

I found that I also had to set app.update.download.backgroundinterval to '0'.  Then I'd get prompted to install the update to B6 2-3 minutes after having restarted B5.

Before each test I gave myself a clean profile.  I tested on OS X 10.6.8.

I've now done the auto-update from B5 to B6 five or six times, and I didn't have trouble any of those times.

So it's probably not something in the auto-update process that triggers this bug.
(In reply to Steven Michaud from comment #20)
> > But could it be something in the auto-update process itself?
> If so, that might explain the odd reduction in crash frequency since
> 2011-12-17.
Here are some comments:
"Just got a new update. Looks like it doesn't work here."
"All I'm doing is starting up Firefox like I do every morning."
"New beta autoupdated and now will not start without crashing."
"I can no longer run firefox 9. It crashes on startup. I'm downloading firefox 8."
About 31,000 Mac users succeeded in updating to 9.0b6, so the auto-update is unlikely to be the cause.

(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #14)
> (In reply to Marcia Knous [:marcia] from comment #12)
> > I just noticed in the correlations that there is a high correlation to 2
> > core 64 machines.
> > 79% (119/150) vs.  50% (362/723) amd64 with 2 cores 
> And the x86 case is the reverse, which makes this look a lot like the
> regression is 64bit-only (the signature is seen in very low volume in 8.0.1
> as well, so the interesting part is what regressed from that state).
There are two crash signatures:
* js::Shape::finalize (85%) which affects mainly x84 architectures (99%)
* js::Shape::removeChild (15%) which affects mainly x86 architectures (97%)
So "only" 84.5% of crashes are on x84 architectures. It's not high enough to conclude to a 64-bit-only regression.

The js::Shape::finalize signature is correlated to fonts:
  js::Shape::finalize|EXC_BAD_ACCESS / 0x0000000d (150 crashes)
    100% (150/150) vs.  70% (507/723) libtidy.A.dylib
    100% (150/150) vs.  73% (525/723) ColorSyncDeprecated.dylib
    100% (150/150) vs.  75% (543/723) libFontRegistry.dylib
    100% (150/150) vs.  75% (543/723) libCGXType.A.dylib
Maybe a regression from bug 693143.

It happens only on Mac OS X, so maybe a regression from bug 705931.
Hardware: x86_64 → All
Do we know if these are mismatched / frankeninstalls? Do all the dll/exe/etc versions match what we expect?
(In reply to Scoobidiver from comment #24)
> It happens only on Mac OS X, so maybe a regression from bug 705931.

Steven - is it possible that a regression caused by 705931 could also affect Linux?

The interesting thing, however, is that we only have one report from FF10 of this same crash on the 17th, when I would have expected to see a crash as early as the 9th (when 705931 landed)...

If we think that bug 705931 could somehow be related, please kick off a try build of beta with 705931 backed out. We're in contact with a couple of affected users and can hand the build off to them to see if it fixes their startup crash.
> It happens only on Mac OS X, so maybe a regression from bug 705931.

Very unlikely.  This patch only effects plugin behavior, and then only that of plugins that request plugin caching -- only the Java plugin, so far as I know.  Furthermore it just changes the size of the plugin cache.
Yes, the patch for bug 705931 landed on all platforms.  But these crashes, while they occur on all platforms, occur in disproportionately large numbers on OS X.  So the trigger we're looking for is Mac-only, I think.

And in any case the trigger isn't the cause.  The actual cause is presumably a bug in GC code.
(In reply to Christian Legnitto [:LegNeato] from comment #25)
> Do we know if these are mismatched / frankeninstalls? Do all the dll/exe/etc
> versions match what we expect?

AFAIK, we don't have library versions on Mac (or Linux for that matter), so we can't determine that easily.

(In reply to Alex Keybl [:akeybl] from comment #26)
> Steven - is it possible that a regression caused by 705931 could also affect
> Linux?

I'm not convinced that the regression affects Linux at all - we know that we had a couple of crashes with those signatures already as a baseline, the regression seems to be on top of that (but increasing it from very low level to quite significant).
(In reply to comment #29)

> (In reply to Christian Legnitto [:LegNeato] from comment #25)
>> Do we know if these are mismatched / frankeninstalls? Do all the
>> dll/exe/etc versions match what we expect?
>
> AFAIK, we don't have library versions on Mac (or Linux for that
> matter), so we can't determine that easily.

Dylibs and frameworks on the Mac actually do have version numbers, but
we don't appear to use them (for example XUL's version is, and
probably always has been, 1.0.0).

And in any case	the number of frankeninstalls wouldn't have increased
suddenly on 2011/12/16.  So I don't think this is the trigger we're
looking for.
The only method I see is dichotomy: generate two try builds, each one containing half of the patches in the regression range and make them test to users who crashes at startup. Do the same steps with the try crashy build and so on until you find the culprit.
(In response to comment #31)

I'm afraid that'd be asking more from our users than they'd be willing to give :-)

Let's hope the results from the try build mentioned in comment #18 leads us to the actual cause of these crashes (the GC bug).
For what it's worth, I've now tried auto-updating from all of the FF 9.0 betas (1 through 5), on OS X 10.6.8.  I didn't have trouble with any of them.
I have been able to reproduce this crash on my 10.7 machine.

STR:

1. Download Firefox 9.0b6
2. Install the Nectar toolbar: http://www.nectar.com/collect/toolbar/home.points
3. Crash

https://crash-stats.mozilla.com/report/index/bp-59cf56ce-e879-464c-ade9-11fba2111220
Better set of STR:

1. Download Firefox 9.0b6
2. Install the Nectar toolbar: http://www.nectar.com/collect/toolbar/home.points
3. Wait for doorhanger for Toolbar options.
4. Cancel at the option dialog.
5. Cancel at the Firefox dialog.
6. Crash.
Nectar block request is bug 712369 fwiw.

As I put in that bug, I haven't got aurora or nightly to crash yet, and running Fx9 with the same profile after I have run successfully aurora causes Fx9 to not crash (*shrugs*).
Adding Bill to the bug since he offered to take a look at the crash in a debugger.
Hooray, Marcia!!

I can repro with FF 9.0b6 on OS X 10.6.8, but not with today's mozilla-central nightly.  I'll try to figure out why.
ObjShrink (bug 637931) landed in 11 and changed shapes a lot (along with other JS stuff), so it isn't too surprising that the crash might not show up there.
I managed to get a stack trace while running a debug build with Marcia's STR.
(In reply to Bill McCloskey (:billm) from comment #40)
> Created attachment 583237 [details]
> stack trace from debug build assertion
> 
> I managed to get a stack trace while running a debug build with Marcia's STR.

Hey folks, sorry for the shotgun cc but this crash is holding up release of FF9 and Bob is looking for some help understanding how scary we think it is, and/or how likely we are to see it in the wild. Little help?
(In reply to Johnathan Nightingale [:johnath] from comment #41)
> (In reply to Bill McCloskey (:billm) from comment #40)
> > Created attachment 583237 [details]
> > stack trace from debug build assertion
> > 
> > I managed to get a stack trace while running a debug build with Marcia's STR.
> 
> Hey folks, sorry for the shotgun cc but this crash is holding up release of
> FF9 and Bob is looking for some help understanding how scary we think it is,
> and/or how likely we are to see it in the wild. Little help?

(per khuey, also adding smaug)
> how likely we are to see it in the wild

No idea.  I'm currently narrowing down the regression range among beta-debug builds.  The crashes start somewhere between 2011-12-01 and 2011-12-17.  Once we know what patch caused/triggered these crashes, we should have a much better idea.

I should have that within the hour.
> how likely we are to see it in the wild

Actually we're *already* seeing it in the wild (in 9.0 B6), and it's already the #1 Mac topcrasher there.  So this is *very* scary.
Alex asked me to test the tryserver build in Comment 18 - both 32 and 64 builds crash for me with the Nectar toolbar installed.
Marcia, I think I've found the following regression range for the Nectar crashes:

2011-12-09-mozilla-beta-debug
2011-12-10-mozilla-beta-debug

Could you try to confirm?
I thought we held off finalizing XPCOM objects until after we're done sweeping JS objects. However that's not what's happening here, see stack snippet below.

Is this expected? It doesn't surprise me at all that calling nsXULElement::Release will end up calling into JS-land in all sorts of ways.

#30 0x0000000102087569 in nsXULElement::Release (this=0x11a6ee300) at /Users/mozilla/billm/mozilla-beta/content/xul/content/src/nsXULElement.cpp:388
#31 0x000000010227c269 in XPC_WN_NoHelper_Finalize (cx=0x106d3f4d0, obj=0x11a5d2768) at /Users/mozilla/billm/mozilla-beta/js/src/xpconnect/src/xpcwrappednativejsops.cpp:668
#32 0x00000001010729aa in JSObject::finalize (this=0x11a5d2768, cx=0x106d3f4d0) at jsobjinlines.h:200
#33 0x0000000101072d17 in js::gc::Arena::finalize<JSObject> (this=0x11a5d2000, cx=0x106d3f4d0, thingKind=js::gc::FINALIZE_OBJECT2, thingSize=88) at /Users/mozilla/billm/mozilla-beta/js/src/jsgc.cpp:301
#34 0x0000000101074052 in js::gc::FinalizeTypedArenas<JSObject> (cx=0x106d3f4d0, al=0x107049170, thingKind=js::gc::FINALIZE_OBJECT2) at /Users/mozilla/billm/mozilla-beta/js/src/jsgc.cpp:348
#35 0x000000010106bcf8 in js::gc::FinalizeArenas (cx=0x106d3f4d0, al=0x107049170, thingKind=js::gc::FINALIZE_OBJECT2) at /Users/mozilla/billm/mozilla-beta/js/src/jsgc.cpp:389
#36 0x0000000101077dd7 in js::gc::ArenaLists::finalizeNow (this=0x107049010, cx=0x106d3f4d0, thingKind=js::gc::FINALIZE_OBJECT2) at /Users/mozilla/billm/mozilla-beta/js/src/jsgc.cpp:1288
#37 0x000000010106be4e in js::gc::ArenaLists::finalizeObjects (this=0x107049010, cx=0x106d3f4d0) at /Users/mozilla/billm/mozilla-beta/js/src/jsgc.cpp:1389
#38 0x000000010106c06e in SweepPhase (cx=0x106d3f4d0, gcmarker=0x7fff5fbf9ba0, gckind=GC_NORMAL, gcTimer=@0x7fff5fbf9d80) at /Users/mozilla/billm/mozilla-beta/js/src/jsgc.cpp:2320
And no, I can't find anything here indicating how likely this is to happen in the wild. :(
It already *is* happening in the wild.

(Following up comment #46)

My regression range seems to put the blame on Brendan Eich's following patch:

http://hg.mozilla.org/releases/mozilla-beta/rev/bccd17f22cc3

Note that this is "blame" for triggering these crashes in larger numbers, not (probably) for causing them.
I'll try backing out Brendan's patch on the beta branch, to see what effect this has.
(In reply to Steven Michaud from comment #51)
> I'll try backing out Brendan's patch on the beta branch, to see what effect
> this has.

I think this was already covered by:

(In reply to Marcia Knous [:marcia] from comment #45)
> Alex asked me to test the tryserver build in Comment 18 - both 32 and 64
> builds crash for me with the Nectar toolbar installed.
Jonas is right in comment 47.

I think the problem is here:

> XPC_WN_NoHelper_Finalize(JSContext *cx, JSObject *obj)
> ...
>+    if(IS_SLIM_WRAPPER_OBJECT(obj))
> ...
>+        NS_RELEASE(p);

In the non-SLIM_WRAPPER case we call this:

>+    static_cast<XPCWrappedNative*>(p)->FlatJSObjectFinalized(cx);

And that does the deferred release.

I think we should fix this, it's pretty clearly wrong.

As has been pointed out, though, this code is several years old. Something else is triggering this.
I am trying to bisect this now on beta builds. So far, I do get a failure on beta tip (actually an assert, but one that's really bad and likely a precursor to the crash), but startup OK with Nectar toolbar in 77d42322277d.
(In reply to ben turner [:bent] from comment #53)
> As has been pointed out, though, this code is several years old. Something
> else is triggering this.
Strong parent node was backed out from 9. So perhaps strong parent node was hiding the real
problem? Or some thing which then triggered the old problem.
I'll push backout-the-backout patch to try, just in case...
Btw, if anyone knows how to reproduce this on linux, please give STR.
> Btw, if anyone knows how to reproduce this on linux, please give STR.

The only STR we have right now is Marcia's Nectar STR, from comment #35.  As far as I know, this has only been tried on OS X.
Whiteboard: STR in comment #35
Update: I think I was not using the right STR when bisecting. I'm fairly sure that it works (no assert) for me in b5, but I'd need to recheck that. It does assert for me in 4d4553e11c57.

smaug: Bill found that the root of the problem was that we GC, then some node destructor tries to remove a property from a JS object (very bad to do during GC), which ends up trying to allocate a new shape (definitely bad).
Because Nectar is build with freecause (http://www.freecause.com), this also affects other toolbars. Personally I tried with the Dallas Cowboys toolbar and got the same behavior/crash...which was understandable because it acted the same way as nectar with a little s/Nectar/Dallas Cowboys/g action.
Backing out bug 708572 fixes it for me in a OS X debug build.
Steven, can you confirm?
I'm not saying Bug 708572 have caused the problem, but I was just thinking if it (and some other 
change during FF9 development) could have caused the problem to happen more often.
It looks like the actual problem has been there for a long time.

backout of backout try
https://tbpl.mozilla.org/?tree=Try&rev=c7bf93c5aa2c
Here's the XPI for the Nectar Search Toolbar in case they change it...
(In reply to comment #60)

> Backing out bug 708572 fixes it for me in a OS X debug build.
> Steven, can you confirm?

I should be able to try that in about an hour.  Right now I'm testing backing out Brendan's patch for bug 685321.
Also, I crashed with the San Jose Sharks toolbar and I didn't get breakpad, I got an apple crash report (attached)
Bug 708572 "causing" this would make sense. Since before bug 708572 nodes were very likely to appear in cycles and thus get collected by cycle collector. I.e. they wouldn't get collected during JS-GC.

So two options here are to "back out" bug 708572 (quotes because bug 708572 is itself a backout), or make slim wrappers do deferred releasing.

The latter is a very scary fix though.
I'm ready to land the backout of backout to beta (I assume we're dealing still beta and not release).
Note, it does not fix the actual crash, but apparently could make it a lot less frequent.
(though, I can't verify, since I haven't managed to reproduce the problem on linux)

Also, strong parent node has been on trunk since late July, so it has got plenty of testing, but
was backed out from FF9 because it can cause longer CC pauses in some cases.
Here are some correlations I pulled out of chofmann's report manually for the Mac crashes.
(In reply to Olli Pettay [:smaug] from comment #66)
> I'm ready to land the backout of backout to beta (I assume we're dealing
> still beta and not release).

This would need to be on the release branch for 9.0.1 if we decide to move forward with this.

What would be the risk (and user effect) of backing out the backout for FF9?
The risk to backout the backout is longer cycle collection pauses in some cases, like with
IRCCloud. (IRCCloud seems to create huge disconnected DOM trees)
we also crash with same stack in similar dialog with toolbar from dallascowboys.com/toolbar
Ok, here is the current situation, as we understand it:

The underlying problem is that slim-wrappers release XPCOM object from inside JS-GC. Releasing XPCOM objects can have all sorts of side effects, including allocating JS objects. When this happens the JS engine aborts. Hence the crash.

We *think* that this started happening more in 9 because we in bug 708572 backed out bug 335998. Bug 335998 was sort of hiding the slim-wrapper problem by making a lot of XPCOM objects (specifically nodes) to appear in cycles way more often and thus stay alive past JS-GC.

It's still unclear why we didn't see the slim-wrapper problem in FF8 since bug 335998 hadn't landed yet. *possibly* this would indicate that FF9 wouldn't be more crashy than FF8 if released in it's current state. But there could also have been other changes in the FF9 timeframe that would have made us crash more if it wasn't for bug 335998.


So, this leaves us with the following options:

1. Release as-is and hope that FF9 won't crash more than FF8. We could do more research into if we have basis to believe that, but ultimately it'll be hard to tell.

2. Put bug 335998 back in to FF9. This appears to be fixing crashes. It also puts back in code that's always been in FF10, and was in FF9 for Dec 9. So risk is reasonably mitigated.

The downside with this option is that while bug 335998 was landed, we saw some pretty bad CC pauses. However it appears that they were mostly affecting the irccloud website (due to producing very large number of DOM Nodes that aren't in the document).

3. Same as 2, but we also try to take some patch to mitigate those pauses. Specifically we could avoid cc-tracing any Nodes which have a js-wrapper colored black. This would be new code though and it's unclear how much it'd help. Would need review from peterv.

4. Try to fix the underlying slim-wrapper bug. The mechanics of writing the patch is likely not too bad. Bent is giving it a shot. But this carries a very high risk anywhere, especially this late in the game.



Talking to people on irc option 2 seemed like the most popular, and I agree. cc-stalls are better than crashes. Especially if they only affect one website (which could potentially put in a fix for FF9+).


Another thing to note is that FF10 still has bug 335998 landed with no mitigations a'la option 3.
Filed bug 712448 on the slim wrapper problem.
If someone had time, and can reproduce the problem, it might be useful to know why
8 doesn't crash and 9 does crash with non-strong parent pointers.
Further info:

Bent just pushed a patch[1] to mozilla-inbound to fix the slim-wrapper issue. This doesn't really affect the fact that it's a extremely risky patch though, especially to take on anything but m-c. But at least we'll have some testing.

[1] https://hg.mozilla.org/integration/mozilla-inbound/rev/84201b200ef3
I've confirmed that backing out the patch for bug 708572 on the beta
branch does "fix" this bug's crashes -- the browser no longer crashes
with the STR in comment #35, using either the Nectar toolbar or the
Dallas Cowboys toolbar.

Backing out Brendan's patch for bug 685321 didn't do the job.  And I
now realize that I shouldn't have thought it would.  My regression
range from comment #46 actually puts the "blame" on a patch somewhere
in the following range (inclusive):

http://hg.mozilla.org/releases/mozilla-beta/rev/1b302e06c6b1
http://hg.mozilla.org/releases/mozilla-beta/rev/bccd17f22cc3
(In reply to Olli Pettay [:smaug] from comment #61)
> backout of backout try
> https://tbpl.mozilla.org/?tree=Try&rev=c7bf93c5aa2c

QA - This build will be available at 
https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/opettay@mozilla.com-c7bf93c5aa2c/ once ready
Jonas, Steven:

Can you think of any reason why we haven't seen crashes on Windows? A different signature? Differences in native themes trigger the bug or not?
> Can you think of any reason why we haven't seen crashes on Windows?

I can't.  But then I'm not very familiar with the GC code, where the actual bug exists.

Backing out bug 708572 is surely just a band-aid.  But at least a band-aid can be applied quickly, and it seems that's what we need right now.
The band-aid has landed
https://tbpl.mozilla.org/?tree=Mozilla-Release&rev=b78fe362789b

It would be still great if someone who can reproduce the bug could try to find why
8 doesn't crash with weak parent pointers, but 9 does.
It would mean making own builds with the patch for
https://bugzilla.mozilla.org/show_bug.cgi?id=708572
starting from the FF8 aurora to FF9 release.
Probably FF8 aurora to FF9aurora should be enough.
Using binary search, or something like that, it shouldn't take too much time, 
unless the patch for bug 708572 doesn't always apply cleanly.
(In reply to Christian Legnitto [:LegNeato] from comment #77)
> Jonas, Steven:
> 
> Can you think of any reason why we haven't seen crashes on Windows? A
> different signature? Differences in native themes trigger the bug or not?

No, I can't think of any reason.

We still don't really understand what tickles the bug, we just know that bug 335998 papers it over (or makes it less likely to occur). Also since the bug is triggered by garbage collection, it is inherently intermittent.
These crashes (in small numbers) go back to FF 5!  The most recent version where they don't happen is FF 4.0.1.  So the actual bug (the actual cause of these crashes) could be quite old.
bug 712448 is about an old bug, sure.
But something has changed after FF8. In FF8 weak parent nodes don't cause this problems, like they
do in FF9.
I suspect that whatever changed in FF 8 just uncovered an older bug.

That's just a hunch -- I know very little about GC code.  But I'll spend a few hours tomorrow trying to follow my hunch, to see what I turn up.

I was going to do that today, but got distracted :-)
So, this is also happening on Windows, but just as usual, the Windows signature is slightly different because it contains the function parameters. It's also not spiking on Windows, only on Mac (Linux is also not spiking).
Bug 673925 is already connected to the Windows signature, but I think it's best to connect both to both variants of the js::Shape::finalize signature, so I'm doing that.

Note that for shipping 9.0, only Mac is a concern, Linux and Windows are very low in volume and probably the same as in previous releases, they just show that the underlying issue exists everywhere.
Crash Signature: [@ js::Shape::finalize] [@ js::Shape::removeChild] → [@ js::Shape::finalize] [@ js::Shape::finalize(JSContext*) ] [@ js::Shape::removeChild]
(In reply to Christian Legnitto [:LegNeato] from comment #77)
> Can you think of any reason why we haven't seen crashes on Windows? A
> different signature?

The latter. https://crash-stats.mozilla.com/report/list?signature=js%3A%3AShape%3A%3Afinalize%28JSContext*%29 has the Windows crashes in that function, signature contains the parameter, that's why it's different. Still, low volume and probably no regression from 8, so no concern for 9 - but it shows that the underlying cause affects all platform, as expected.
(In reply to Christian Legnitto [:LegNeato] from comment #59)
> Because Nectar is build with freecause (http://www.freecause.com), this also
> affects other toolbars. Personally I tried with the Dallas Cowboys toolbar
> and got the same behavior/crash...which was understandable because it acted
> the same way as nectar with a little s/Nectar/Dallas Cowboys/g action.

If needed, Jorge Villalobos has a contact point in FreeCause Inc.
I crashed on 10.5 after installing the Nectar toolbar, and I get a different stack signature than the ones listed here, so adding it to the bug as well for tracking: https://crash-stats.mozilla.com/report/index/bp-b7a08e40-2f0b-4476-a33e-76ce42111220. There aren't very many crashes in this stack and they are all Mac and Linux.
Crash Signature: [@ js::Shape::finalize] [@ js::Shape::finalize(JSContext*) ] [@ js::Shape::removeChild] → [@ js::Shape::finalize] [@ js::Shape::finalize(JSContext*) ] [@ js::Shape::removeChild] [@ nsGenericElement::UnbindFromTree ]
(In reply to Kohei Yoshino from comment #87)
> (In reply to Christian Legnitto [:LegNeato] from comment #59)
> > Because Nectar is build with freecause (http://www.freecause.com), this also
> > affects other toolbars. Personally I tried with the Dallas Cowboys toolbar
> > and got the same behavior/crash...which was understandable because it acted
> > the same way as nectar with a little s/Nectar/Dallas Cowboys/g action.
> 
> If needed, Jorge Villalobos has a contact point in FreeCause Inc.

Yes, quite interested in that.
(In reply to Olli Pettay [:smaug] from comment #80)
> It would mean making own builds with the patch for
> https://bugzilla.mozilla.org/show_bug.cgi?id=708572
> starting from the FF8 aurora to FF9 release.
> Probably FF8 aurora to FF9aurora should be enough.
> Using binary search, or something like that, it shouldn't take too much
> time, 
> unless the patch for bug 708572 doesn't always apply cleanly.

Olli, can you please give more detailed information what we would have to do with those builds? What exactly will tell us about weak parent pointers?
You build FF9 builds (starting from FF8 aurora uplift) and apply the original patch for
bug 708572 and test whether this crash happens. A lot slower than normal regression hunting, 
but still doable, I think.
Thanks Olli. I will start with bisecting the range now. I expect to have the result tomorrow morning UTC at latest.
(In reply to Marcia Knous [:marcia] from comment #88)
> I crashed on 10.5 after installing the Nectar toolbar, and I get a different
> stack signature than the ones listed here, so adding it to the bug as well
> for tracking:
> https://crash-stats.mozilla.com/report/index/bp-b7a08e40-2f0b-4476-a33e-
> 76ce42111220. There aren't very many crashes in this stack and they are all
> Mac and Linux.

On Windows, this would again have a different signature, with parameters added. In fact, I have seen this around, and given that we seem to have a pretty good understanding of the ::finalize signature, I'd suspect the ::UnbindFromTree one to be something different - unless you find that the same fix will make both go away.
Due to the changeset http://hg.mozilla.org/mozilla-central/rev/500c2ddb52c1 I will not be able to bisect the whole timespan between Firefox 8 and Firefox 9. It will limit us to Aug 16th to Aug 28th only. I hope that it will be enough.
(Following up comment #84)

I've just opened bug 712831, where I've got a debugging patch that shows js::Shape::finalize can be called more than once on the same js::Shape object.
For everyone following along, we built Firefox 9.0.1 with bug 708572 backed out (see https://bugzilla.mozilla.org/show_bug.cgi?id=708572#c11). We've pushed Firefox 9.0.1 for all platforms. Although we think Windows is mostly unaffected, we still decided to move forward with Windows->9.0.1 for the following reasons

* We were living with 702813 for most of the life of Firefox 9, and that the backout in 708572 has actually gotten less testing
* New crash signatures (Windows even) have been found which are related to 711794, although none yet are significant or startup crashers
* We are uncertain about the number of sites affected by 702813
* We are uncertain about whether or not we 100% fixed 702813 with the backout in 708572
* The population of Windows users who have already updated to 9.0 is small (in the low millions)
* There is tracking/analysis benefit of having all desktop platforms on the same Firefox version
Because of the build error I get I wasn't able yet to start bisecting. I hope someone can help to solve this problem: http://groups.google.com/group/mozilla.dev.platform/browse_thread/thread/b6b96a330a58196d# Thanks.
Olli, I also get a crash in my debug build when I use the first available changeset for Firefox 8.0a1 (AURORA_BASE_20110816) on mozilla-central:

Assertion failure: !cx->runtime->gcRunning, at /Volumes/data/code/firefox/nightly/js/src/jsgcinlines.h:199

So would that mean that we have to go even back into Firefox 7.0 code? Applying all the necessary patches due to compiler errors and not landed code will become a pain.
No. It means more like that nothing interesting has changed between 8 and 9 and we were
just very unlucky that the one backout patch started to trigger this crash
(which is fixed in bug 712448)
Thanks for testing.
Not a top crash anymore.
Keywords: topcrash
Should this bug be closed now that bug 712448 has been fixed?

/be
All 4 signatures still exist, but in very small numbers. Resolving fixed since as Brendan notes it seems the bug referenced in Comment 100 addressed the top crash issue.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Depends on: 712448
OS: Mac OS X → All
Target Milestone: --- → mozilla12
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: