您好,欢迎来到年旅网。
搜索
您的当前位置:首页Failure analysis of an ORB in presence of faults

Failure analysis of an ORB in presence of faults

来源:年旅网


DSoS

IST-1999-11585

Dependable Systems of Systems

Failure analysis of an ORB in presence of faults

Report Version: Deliverable IC3 Report Preparation Date: 1 October 2001 Classification: Public Circulation Contract Start Date: 1 April 2000

Duration: 36m

Project Co-ordinator: Newcastle University

Partners: DERA, Malvern – UK; INRIA – France; CNRS-LAAS – France; TU Wien – Austria; Universität Ulm – Germany; LRI Paris-Sud - France

Project funded by the European Community under the “Information Society Technology” Programme (1998-2002) FailureanalysisofanORBinpresenceoffaults

EricMarsdenandJean-CharlesFabreLAAS-CNRS,Toulouse,France{emarsden,fabre}@laas.fr

Abstract

Thisdocumentdescribesamethodandexperimentalresultsforthede-pendabilitycharacterizationofmiddlewareimplementations,andinparticu-larfailuremodeanalysisofCORBAORBimplementations.Theaimoftheworkistoprovideanoverallapproachforidentifyingandquantifyingfailuremodesusingvariousfaultinjectiontechniquesandfaultmodels.

Relatedworkindependabilitycharacterizationofexecutivesoftwarelay-ersisdiscussed.Ananalysisofthearchitectureofmiddleware-basedsystemsandtheirerrorconfinementregionsmotivatesthedevelopmentofafaultmodel.Anumberoffaultinjectionapproachesarediscussed,andresultsfromnetwork-basedcorruptionexperimentstargetingfourCORBAserviceimplementationsarepresented.

DependableSystemsofSystemsIST-1999-11585

TableofContents

12

Introduction

Objectivesandenablingtechnologies2.12.22.32.43

MiddlewareassessmentforDSoS.................Dependabilityassessmenttechniques................Faultinjection............................TargetingCORBAmiddleware...................

3556781111121415151618191921222528313133343042

Relatedwork3.13.23.3

Faultsinternaltoacomponentsystem...............Externalfaultsatalinkinginterface................Externalfaultsatalocalinterface..................

4FaultpathologyofCORBA-basedsystems4.14.24.3

ErrorconfinementregionsinaCORBA-basedsystem.......AfaultmodelforCORBA-basedsystems..............FailuremodesofanORB......................

5Methodandexperimentaltechniques5.15.25.35.45.5

Corruptionofthememoryspace..................Programmutationtechniques....................Robustnesstesting

.........................

..................

Syscallinterpositiontechniques

Network-levelfaults.........................

6Experimentalframeworkandresults6.16.26.36.46.5

Failuremodes............................Experimentalsetup.........................Targetimplementations.......................Analysisofresults..........................Analysisfromanintegrator’spointofview............

7Conclusionsandfuturework

FailureanalysisofanORBinthepresenceoffaults2DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

1Introduction

Thisdocumentdescribesamethodforcharacterizingthedependabilityofmiddlewareimplementations,inparticular,thefailuremodesofCORBAORBimplementations.Theaimoftheworkistoprovideanoverallmethodtoidentifyandquantifyfailuremodesusingvariousfaultinjectiontechniquesandfaultmodels.

Themethodisbasedonexistingworkindependabilitycharacterization.Althoughalargeamountofworkhasbeencarriedoutforthecharacterizationofexecutivesoftwareusingfaultinjection(e.g.,kernelandstandardoperatingsystem),verylittleworkhastargetedthemiddlewarelayers.Thecurrentworkinthefield,inparticularsoftware-implementedfaultinjectiontechniquesandtools,isreviewedinthisdocumentbecauseofitsinterestforthecharacterizationofCORBA-basedsystems.

Theconventionalsoftwareengineeringviewofamiddlewareisnotsufficientwhenconsideringtheassessmentofdependability-relatedproperties.WepresentamoredetailedarchitecturalviewofaCORBA-basedsystem,concentratingontheidentificationoferrorconfinementregionsandafailuremodesclassification.Basedonthisstructuralanalysis,wediscusstheclassesoffaultsthatcanaffectsuchsystems.

Anumberoffaultinjectiontechniquesthatsimulatethesefaultclassesaredescribed.Wepresentresultsfromoneofthesetechniques,whichmeasurestheimpactofcorruptmethodinvocationsonamiddlewareimplementation.TheseexperimentsareparticularlyrelevanttotheDSoSproject,sincetheyprovideinsightsintothewaysinwhicherrorsmaypropagatebetweencomponentsystems,overtheinterconnectioninfrastructure.ExperimentalresultswehaveobtainedonanumberofCORBAimplementationsrevealthatthisimpactcanbesignificant.Thedocumentisorganizedasfollows:InSection2,wediscusstheobjectivesofourworkwithinDSoS,andgiveanoverviewofdependabilityassessmenttechniquesandofCORBA-basedmiddleware.

InSection3,wediscusssomepreviousworkinfaultinjectionthatisrelatedtothefailuremodeanalysisofCORBAsystems.SomeresultsofexperimentstargetingCORBAarealsoreportedinthissection.

InSection4,weaddressthefailuremodesinmiddleware-basedsystems.Thisinvolvesdiscussingthearchitectureofamiddlewaresystemfromdifferentviewpointsandidentifyingpossibletargets,definingfaultassumptionsandmodels,classificationoffailuremodesandtheidentificationofpossiblefaultinjectionstrategies.

Section5isdevotedtothedescriptionofourmethodforexperimentallycharacterizingthedependabilityofCORBAORBimplementations.Anumberoffaultinjectiontechniquesthatarewellsuitedtothistargetaredescribed.

FailureanalysisofanORBinthepresenceoffaults3DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

Section6reportsontheexperimentalresultsweobtainedwhentargetingCORBAserviceimplementationsusingnetworkcorruptiontechniques.Inthelastsection,wepresentsomefirstlessonsthathavebeenlearnedfromourwork,fromtheviewpointofaDSoSsystemintegrator,anddrawsomeconclusions.

FailureanalysisofanORBinthepresenceoffaults4DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

2Objectivesandenablingtechnologies

InthissectionwepresenttheaimofourworkonmiddlewareassessmentwithrespecttotheDSoSproject,andgiveanoverviewofdependabilityassessmenttechniques.WeconcludewithabriefintroductiontoCORBAmiddleware.

2.1MiddlewareassessmentforDSoS

Anincreasinglylargeclassofdependablesystemsofsystemswillbebuiltonsomeformofmiddlewareinfrastructure.AlikelycandidateforthismiddlewareinfrastructureistheCORBAplatform,astandardthatiswellsuitedtotheinterconnectionofheterogeneoussystems,andiswidelyusedinindustry.Figure1illustratespossiblerolesforCORBA-compliantmiddlewareinasystemofsystems,bothasatechnologyfortheimplementationoflinkinginterfaces,possiblyusingwrappingtechniques,andasameansofinterconnectingcomponentsystems.Evidently,thedependabilityofthismiddlewarelayeriscrucialtothedependabilityoftheDSoSbuiltaboveit.

BankHertzHTTP interfacesCORBA interfacesInternetAvisRentalAgencyconnection systemGeographicInformation SystemFigure1:CORBAinfrastructureforasystemofsystems

Therehasbeenlittlepublishedresearchonthedependabilityofmiddleware-basedsystems.TheDSoSprojectaimstocontributetothisissueintwoways:

•categorizeandstudythetypesoffaultswhichcanbeexperiencedbyamiddleware-basedsystem,andthewaysinwhicherrorspropagatethroughthesystemtocausefailure.

•designandimplementmethodstoimproveamiddlewareimplementation’sbehaviourinthepresenceofthesefaults:improveerrordetection

FailureanalysisofanORBinthepresenceoffaults5DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

mechanisms,failsilenceandrobustnesscharacteristics,usingtechniquessuchaswrapping.

Thisreportcontributestothefirstpoint.Wepresentanumberofapproachesforcharacterizingthebehaviourofamiddleware-basedsystem(andinparticularaCORBA-compliantORBimplementation)inthepresenceoffaults,andpresenttheresultsoffaultinjectionexperimentstargetingseveralimplementationsoftheCORBANameService.

ThesecondpointisaddressedintheArchitectureandDesignworkpackage,whichincludesworkontheuseofwrappingtechniquestoimprovetherobustnessandfailsilenceofcomponentsystems,andoftheinterconnectioninfrastructure.Theaimistoensurethatindividualcomponentsystemshaveawell-definedbehaviourinthepresenceoffaults.TheArchitectureandDesignworkiscloselyrelatedtothepresentdeliverable;indeed,thefailuremodecharacterizationisanessentialinputtothedesignanddevelopmentoffaultcontainmentwrappers.

Ourworkonfaultinjectionalsoprovidesinsightsintothemannersinwhicherrorsmaypropagatebetweencomponentsystems,viatheconnectionsystems.Itprovidesamechanismforevaluatingtheprobabilityofsucherrorpropagation,andforcharacterizingtheireffects.Italsoinvestigatestheimpactoffaultsaffectingthecommunicationsubsystemitself.Thisworkhelpstheintegratorofasystemofsystemsanswertwoimportantquestions:

•whattypeofdependabilitypropertiescanbeassumedoftheinterconnectioninfrastructure?

•howmightthedependabilityofacomponentsystembeaffectedbytheadditionofDSoS-relatedsoftware,implementingalinkinginterface?

2.2Dependabilityassessmenttechniques

Thedependabilityofcomputersystemscanbeassessedusingeithermodel-basedormeasurement-basedtechniques.Modellingworkallowssystemdesignerstoobtainpredictionsofthedependabilityattributesofasystem,basedonprobabilisticmeasuresofthebehaviourofitssubsystems.Thesemeasuresareusefulduringthedesignphase,sincetheyenablethepertinentdependabilityattributesofdifferentsystemconfigurationstobeestimated,evenbeforetheyarebuilt.ThereisworkintheValidationworkpackagethataddressesthisproblemfromaDSoSpointofview.

However,modellingtechniquescanonlyprovidepredictionsofthedependabilityattributesofasystem.Onceasystemhasbeenimplementedanddeployed,measurement-basedtechniquescanbeappliedtoobtainmorespecificinsightsandmeasures.Therearetwomainmeasurement-basedapproachestoobtaininginformationonasystem’sdependability:

FailureanalysisofanORBinthepresenceoffaults6DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

•theobservationofalargesetofsystemsinoperation,asin[Kalyanakrishnametal.,1999].Thisapproachreliesonerrorinformationobtainedeitherfromlogsmaintainedbysystemadministratorsorfromautomaticmonitoringmechanismsprovidedbythesystem.Byanalysingthedata,onecanobtaininformationonthenatureandfrequencyoffailures,andonthetypeofusagethatledtothefailureofthecomponentsystem.

Adisadvantageofthisapproachisthatfailuresarerare,whichmakesitnecessarytocollectinformationonalargepopulationofidenticalsystemsoveralargetimespanbeforebeingabletomakestatisticallysignificantanalyses.Itisthuspoorlysuitedtoshortdevelopmentcycles.

•thedeliberateinsertionoffaultsintothetargetsystem,soastoacceleratethecharacterizationofitsbehaviourinthepresenceoffaults.Thesefaultinjectionexperimentsallowthesystem’serrordetectionmechanismstobetriggeredmorefrequentlythaninnormaloperation.Theyalsoallowevaluationofthesystem’sbehaviourwhenerrordetectioncoverageisnotperfect,asisusuallythecaseforcomplexsystems.

Field-basedobservationsarecomplementarytofaultinjectionexperiments,sincetheyprovidedataontypesoffailuresthatcanbeexperiencedbyasystem,ingivenoperationalconditions.Theanalysisoffailurereportscanbeusedtoderiveafaultmodel,whichisthenusedtodevelopfaultinjectioncampaigns.Thisincreasesthelikelihoodthatfaultinjectionexperimentsarerepresentativeofrealfaultsexperiencedbythetargetsystem.

Unfortunately,therehasbeennoreportedworkonfieldobservationoffailuresinmiddleware-basedsystems.Thismakesitdifficulttovalidatethedegreeofrepresentivityofagivenfaultmodelforthesetargets.

2.3Faultinjection

Faultinjectionisawell-knowndependabilitycharacterizationtechnique[Arlatetal.,1993],whichstudiesasystem’sreactiontoabnormalconditions.Itisatestingapproachthatiscomplementarytoanalyticalapproaches,andwhichallowstheexaminationofsystemstateswhichwouldnotbereachedbyconventionalfunctionaltesting.Theaimoffaultinjectionistosimulatetheeffectofrealfaultsimpactingatargetsystem,namelytheerrorduetotheactivationofafault.Faultinjectionexperimentsprovideanumberofusefulresults:

•anunderstandingofthesystem’sfailuremodes,oritsbehaviourinthepresenceoffaults;

•informationonthefaulttolerancemechanismsinthetargetsystem,inparticularameasurementoftheircoverage(theconditionalprobabilitythat,givenafaultinthesystem,thesystemcantolerateit).

FailureanalysisofanORBinthepresenceoffaults7DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

Anumberoffaultinjectiontechniqueshavebeendeveloped.Mostearlyworkconcernedtheinjectionofphysicalfaults[Karlssonetal.,1998],radiatingelectroniccircuitswithheavyions,tosimulatetheeffectofelectromagneticradiation,oractingdirectlyonthepinsofamicroprocessortomodifyvoltages.Duetothecomplexityandthespeedofmodernintegratedcircuits,recentresearchhasconcentratedonsoftware-implementedfaultinjection(SWIFI).Inthistechnique,thecorruptionisperformedbysoftware,andcantargetdifferentcomponentsordifferentlayersinasystem(operatingsystemkernel,systemservices,middlewarecomponents,applicationcode,systemmemoryandregisters).Thisapproachisverygenericandflexible,sincealargevarietyoffaultmodelscanbeused.Severalstudieshaveshownthatasinglebit-flipleadstosimilarerrorstothoseproducedbyphysicalfaultinjectiontechniques(e.g.,[Riménetal.,1994,Fuchs,1998]),andalsothattheysimulateerrorsproducedbysoftwarefaultsfairlyfaithfully[Madeiraetal.,2000].

Thetargetforthefaultinjectioncaneitherbetheinterfaceofasoftwarecomponent,oritsinternaladdressspace.Targetingtheinterfaceassessesthecomponent’srobustness,itsabilitytofunctioncorrectlyinthepresenceofinvalidinputsandstressfulenvironmentalconditions.Itisausefulwayofevaluatingtheprobabilityoferrorpropagationfromonesystemcomponenttoanother,duetotheirinteractions.Targetingtheaddressspaceassessestheimpactonthecomponent’sbehaviourofinternalcorruptions,resultingeitherfromphysicalfaultsorsoftwarefaults.

2.4TargetingCORBAmiddleware

Middlewareissoftwarethatmediatesbetweenanapplicationprogramandthenetwork.Itmanagestheinteractionbetweendisparateapplicationsacrossheterogeneouscomputingplatforms,abstractingfromtheprogramminglanguage,operatingsystemandhardware,andoftenprovidingconvenientaccesstoservicessuchasnaming,transactionalprocessingandconcurrencymanagement.

CORBA[OMG,2001a]isamiddlewareplatformthatfocusesoninteractionsbetweendistributedobjects.ThestandardisdefinedbytheObjectManagementGroup(OMG),anindustryconsortium.AkeyprincipleofCORBAisitsseparationofinterfaceandimplementation.Interfacesareusedtospecifytheoperationsanddatatypesthatallowaccesstoaservice;theyaredescribedinanInterfaceDefinitionLanguage(OMGIDL).Aninterfaceisindependentoftheprogramminglanguageandoperatingsystemthatisusedtoimplementtheserviceitdescribes,andofthelocationwheretheserviceisprovided.ThesoftwarethattransportsservicerequestsbetweentheclientandtheserveriscalledtheObjectRequestBroker(ORB).

Figure2showsaclientinvokingamethodnamedfooonanobjecthostedonaremotecomputingnode.Theobject’ssemanticsareimplementedbya

FailureanalysisofanORBinthepresenceoffaults8DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

programming-languagedependententitycalledaservant.ThefigureshowsthedifferentfunctionalelementscomposingaCORBAmiddlewareimplementation:

•thecoreoftheORB,whichhandlesmarshallingofinformationtoandfromtheCORBAwireformat,andcommunicationwithORBsonremotenodes,includingrequestdemultiplexingandconcurrencymanagement;

•asetofCORBAservices,includingnaming,trading,eventpropagation,accesscontrolandpersistency;

•onaserver,anobjectadapter(OA)thatisresponsiblefordispatchingincomingrequeststotheappropriateservant,controllingsecurityissuesandthelifecycleofservants;

•clientstubsthatprovideaninterfacetotheORBcore,andimplementationskeletonsthatconnecttheobjectadaptertotheservant(theseelementsareprogramminglanguagedependent,andaregeneratedautomaticallyfromtheIDLinterface);

•modulesthathandledynamicinvocation,providinganinvocationmecha-nismthatcanbeusedwhentheinterfaceofaservicewasnotknownatcompile-time(DII1andDSI2modules).

•anoptionalInterfaceRepositoryservice,whichallowsruntimeintrospectionoftheIDLinterfacesavailableinthesystem.

clientIORserv->foo(69)servantupperware

CORBAservicesDIIstubImplementationRepositoryDSIskeletonOAORB corekernelmiddleware

ORB corekernellowerware

node A

node B

Figure2:High-levelviewofaCORBAmethodinvocation

DII:DynamicInvocationInterface,thatallowsclientstoconstructmethodinvocationswithoutpassingthroughastub

DSI:DynamicSkeletonInterface,thatallowsserverstohandleincomingdynamicallyconstructedrequests.

2

1

FailureanalysisofanORBinthepresenceoffaults9DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

ORBimplementationsfromdifferentvendors,runningondifferentplatforms,areabletointeroperatebyexchangingmessagesadheringtotheGeneralInter-ORBProtocol(GIOP).Thisspecificationdescribesthedatarepresentations,messagetypesandmessageformatstobeusedforcommunicationbetweenORBs.GIOPassumesthattheunderlyingtransportprotocolisconnection-oriented,reliable,andcanbeviewedasabytestream.ThemappingofGIOPontoTCP/IPiscalledtheInternetInter-ORBProtocol(IIOP).

FailureanalysisofanORBinthepresenceoffaults10DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

3Relatedwork

Inthissection,wedescribepreviousresearchonfaultinjectionfordependabilitycharacterization.Weconcentrateonworkthathastargetedmiddlewareimplementations,aswellasexecutivesoftwaresuchasoperatingsystemkernelsandnetworkprotocolstacks.Weclassifyfaultinjectiontechniquesaccordingtowhethertheysimulatefaultsthatoriginateinternallytoacomponentsystem,orwhethertheyoriginateviaitslinkinginterfaceoritslocalinterfaces[Jonesetal.,2001].

3.1Faultsinternaltoacomponentsystem

Afaultissaidtobeinternalwhenthecorruptionofthesystem’sstatethatitcausesoriginatedinsidethesystem.Thisencompassesprogrammingerrorsthatmayaffecttheinternaldataofthesystem,andhardwarefaultsthatmaycorruptbothdataandcodememorysegments.Themostcommonfaultmodelusedisthesinglebit-flip.

Alargenumberoftoolshavebeendevelopedtoautomatetheexecutionofexperimentsusingthisfaultmodel.TwosignificantexamplesareXception,developedattheUniversityofCoimbra,Portugal[Carreiraetal.,1998],andMAFALDA,developedatLAAS-CNRS,France[Fabreetal.,2000].

Therehasbeensomework[Chungetal.,1999]investigatingtheimpactofhigh-levelfaultsonCORBAandDCOMapplications.Thefaultssimulatedarehangsandcrashesofthreads,processesandcomputingnodes.Theauthorsfoundasignificantproportionofapplicationhangs,whichledthemtorecommendtheuseofapplication-levelwatchdogmechanisms.

WearenotawareofanyworkonCORBAORBssimulatingfiner-grainedfaults,suchasbitflips.However,thetechniquehasbeenextensivelyappliedforthefailureanalysisofotherexecutivesoftware,includingoperatingsystemsandlanguageruntimes.Toillustratetheinformationthatcanbeobtainedfromthesetypesofexperiments,wepresentresultsextractedfromcampaignsapplyingtheMAFALDAtooltoanumberofCOTS3real-timemicrokernels,includingChorusandLynxOS.Figure3illustratesresultsobtainedbysubjectinganinstanceoftheChorus/ClassiXmicrokernel(composedofbasicfunctionalcomponentsimplementingbasicservicessuchassynchronization,memory,andscheduling)toaseriesofSWIFIexperimentsusingMAFALDA.

3

COTS:CommercialOff-TheShelf

FailureanalysisofanORBinthepresenceoffaults11DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

APPFAILNOOBS.9.0%28.5%ERRORSTATUS3%

APPHANG6.0%SYSHANG1.4%KDB13.9%APPFAIL2.4%NOOBS.33.0%APPHANG0.4%SYSHANG2.5%KDB4.1%APPHANG1.1%APPFAILSYSHANG3.6%2.2%NOOBS.31.3%KDB7%ERRORSTATUS5.1%

EXCEPTION38.1%

ERRORSTATUS2.1%EXCEPTION55.6%EXCEPTION49.9%a.SYNmodule(2986)b.MEMmodule(2918)c.COMmodule(2944)

Figure3:ResultsfromMAFALDAappliedtoChorus(internalbitflips)

Theseexperimentsconsistinselectingrandomlyalocationinthekerneladdressspaceandrandomlyflippingabitintothismemorycell.Thebitisrestoredassoonasthecellisread,irrespectiveofwhetherthecellcontainsinstructionsordata.ThepiediagraminFigure3ashowsthefailuremodesobservedwhenabout3000faults(transientsinglebitflips)wereinjectedinthecodesegmentofthestandardsynchronizationcomponent.Regardingthefailuremodes,about50%oftheerrorsweresuccessfullydetectedbythemicrokernelerrordetectionmechanisms(\"errorstatus\\"exception\\"kerneldebugger[KDB]\"),whileahang(\"systemhang[SYSHANG]\\"applicationhang[APPHANG]\")occurredin7.4%ofthecases.Nevertheless,9%oftheerrorsledtoanincorrectservice(\"applicationfailure[APPFAIL]\").Finally,the\"noobservation[NOOBS]\"category(29%)correspondstoerrorsthathadnoobservableconsequencesalthoughtheinjectedfaultswereactuallyactivated.Similarresultscanbeobtainedondifferentkernelcomponentssuchasthememorymanagementmodule(Figure3b)andthecommunicationmanagementmodule(Figure3c).

Section5.1discusseshowatoollikeMAFALDAcouldbeusedtocharacterizethefailuremodesofaCORBA-basedmiddlewareimplementation.

3.2Externalfaultsatalinkinginterface

Anothersourceoffaultsthatcanaffectacomponentsystemisthelinkinginterface,throughwhichitisconnectedwithothersystems.Thisformoffaultinjectionprovidesameansformeasuringasystem’srobustness,itsabilitytofunctioncorrectlyinthepresenceofinvalidinputsandstressfulenvironmentalconditions.Robustnesstestinginvolvesinjectingcorrupteddataatthelinkinginterfaceofthesystem,andobservingitsbehaviour.

In[Milleretal.,1990],therobustnessofdifferentimplementationsofstandardUNIXutilitieswasmeasured,bysubmittingthemtorandomlygeneratedinput.Despiteusinganextremelysimplefailuremodeclassification(crashornot-crash),thisfuzztestingshowedthatmostimplementationshadquitehighfailurerates.ThetechniquehasalsobeenappliedtotherobustnesstestingofPOSIX-compliant

FailureanalysisofanORBinthepresenceoffaults12DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

operatingsystemsinthecontextoftheBallistaproject[KoopmanandDeVale,1999].Thisworkconsistsofusinginvalidparametersinsystemcalls,suchasnullpointers,orusinganincorrectsequenceofsystemcalls.Itisbasedonalibraryofcorruptiontestcases,specializedforeachdatatype.Figure4showsthatallofthe15targetedoperatingsystemsexhibitedalargeproportionofnon-robustbehaviours(e.g.,between18and34%ofthetestsleadtoanabortfailuremode).

AIX 4.1FreeBSD 2.2.5HP-UX 9.05HP-UX 10.20AbortSilentRestart*****Irix 5.3Irix 6.2 LinuxLynxOSNetBSD *CatastrophicDUNIX 3.2 DUNIX 4.0DQNX 4.22 QNX 4.24SunOS 4.13SunOS 5.5010*203040Normalized failure rate (%)50Figure4:Comparisonof15POSIX-compliantoperatingsystems

TheBallistaapproachhasalsobeenappliedtotherobustnesstestingofanumberofCORBAimplementations[Panetal.,2001],withrespecttocorruptedinvocationsofaportionoftheclient-sideinterfaceexposedbyanORB.Forexample,theobject_to_stringoperation,whichconvertsanobjectreferenceintoatextualrepresentation,isinvokedwithaninvalidobjectreference,toseewhethertheORBcrashesorhangsorsignalsanexception.

Figure5presentsresultsfromthispaper.Itshowsthebreakdownofexperimentaloutcomesforthedifferenttargets.ORBimplementationsfromthreedifferentvendorsweretested,withdifferentversionsandondifferentoperatingsystems.Theirexperimentsshowahighproportionofnon-robustbehavioursuchasthreadhangsandcrashes.

Thefaultmodelconsideredinthisworkonlytargetsclient-sideoperations.Inparticular,activitythatinvolvesinteractionbetweenaclientandaserverisnotcovered.Furthermore,thefunctionalityexposedthroughORB’sclient-sideinterface,whichwastargetedinthisresearch,ismainlyusedduringtheinitializationofanapplication.MostofthefunctionalityprovidedbyanORB

FailureanalysisofanORBinthepresenceoffaults13DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

Orbix 3.0 (sun)Orbix2000 (sun)Orbix2000(Linux)omniORB2.8(sun)Thread Behaviour

Thread Hang Thread AbortomniORB3.0 (sun)ExceptionsUnknown ExceptionNo ExceptionCORBA::ExceptionCORBA::SystemExceptionomniORB3.0 (Linux)VisiBroker3.3 (sun)VisiBroker4.0 (sun)VisiBroker4.0 (Linux)0%20%40%60%80%100%Figure5:BallistaprojectappliedtoORBcharacterization

isinfactimplicit,inthesensethatitisactivatedwithoutanyexplicitcallsmadebytheapplicationlevel,andisthusdifficulttotargetusingthisapproach.

WearenotawareofothercharacterizationworkonCORBAusingfaultinjection.Othervalidationeffortshaveusedafunctionaltestingapproach(suchastheCORVALproject,whichaimstotestthefunctionalcorrectnessandtheinteroperabilityofORBimplementations)orconcentratedonperformanceevaluation(e.g.[Nimmagaddaetal.,1999]),withoutconsideringthepresenceoffaults.

3.3Externalfaultsatalocalinterface

ADSoScomponentsystem[Jonesetal.,2001]mayalsocommunicatewithitsenvironmentthroughoneormorelocalinterfaces,andmaybesubjectedtofaultsarrivingthroughthem.Examplesoflocalinterfacesareasystem’snetworkstack,andinterfacesoverwhichitmaybeprovidinglegacyservices.Thetypesoffaultsthatmayarriveoveralocalinterfaceareprotocolerrorsinacommunicationwitharemotesystem.

TherehasbeenworkonfaultinjectionforthecharacterizationofthebehaviourofthenetworkingstacksinUNIXoperatingsystems[Dawsonetal.,1997].Inthiswork,faultssuchasmessageloss,delaysandreorderingwereinjected,toassesstherobustnessoftheprotocolimplementations.In[Labovitzetal.,1998],thestabilityofroutingprotocolsusedontheInternetisstudied.

FailureanalysisofanORBinthepresenceoffaults14DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

4FaultpathologyofCORBA-basedsystems

Inthissectionwedescribethearchitectureofamiddleware-basedsystem,identifyingtheplacesinthesystemwherefaultsmayoccur,groupingthesefaultsintoclasses,andclassifyingthetypesoffailuresthatitmayexhibit.

4.1ErrorconfinementregionsinaCORBA-basedsystem

Middlewareisgenerallyseenasalayerofsoftwarethatliesbetweentheoperatingsystemandtheapplicationlayer,asshowninFigure2.Thishigh-levelviewofanORBissufficientformostCORBAdevelopment.Indeed,theCORBAspecificationsareimplementation-agnostic,anddonotmandateanyspecificrepresentationforCORBAobjects,orrequireanyparticularformofinteractionwiththeunderlyingoperatingsystem.Fordependabilityanalysis,however,moredetailedknowledgeofthearchitectureofaCORBA-basedsystemisnecessary,particularlywithrespecttotheerrorconfinementregionsimpliedbythearchitecture.

CORBAservicessuchasnamingandtheinterfacerepositoryaregenerallyimplementedasdaemons4.Aservicemayrunonasinglecomputingnode,ormayinvolvethecollaborationofmultiplecomputingnodes(federationofnameservers,forexample).

ACORBAORBcanbeimplementedinseveraldifferentways:

•kernel-basedstrategy,wheretheORBisprovidedasaserviceoftheop-eratingsystem.Thisstrategycanallowcertainperformanceoptimizations,sincetheoperatingsystemknowsthelocationofobject,andcanfacilitateauthenticationofrequests.

•daemon-basedstrategy,whereORBfunctionalityisprovidedbyoneormoredaemonprocesses,whichmediatebetweenclientsandobjectimplementations.Forexample,eachcomputingnodemayrunanactivationdaemonthatisresponsibleforactivatingserverprocessesanddispatchingincomingrequests,andforroutingoutgoingrequeststotheappropriatecomputingnode.Thisimplementationstrategyfacilitatescentralizedadministration,sinceallCORBAprocessesareknowntotheactivationdaemons.

•application-residentstrategy,wherecodeimplementingtheORBfunc-tionalityrunsinthesameexecutioncontextastheclientandtheobjectimplementations.TheORBistypicallyprovidedasasharedlibrarythatislinkedwithCORBAapplications.ThisisthemostcommonimplementationstrategycurrentlyusedonPOSIX-likesystems.Infact,eveninthetwo

daemon:standaloneoperatingsystemprocessthatrunsinthebackground,providingsomeformofservice.

4

FailureanalysisofanORBinthepresenceoffaults15DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

previousimplementationstrategies,acertainamountofORBfunctionalityishostedineachCORBAapplication,todealwiththelanguagemapping.Thekernel-residentimplementationstrategyprotectstheORBfrommodificationbyfaultyapplicationprograms,duetotheoperatingsystem’smemoryprotectionmechanisms.However,theORBserviceconstitutesanerrorpropagationchannelforallapplicationsonthesamecomputingnode.Thedaemon-basedstrategyintroducesasinglepointoffailurepercomputingnode,sincefailureoftheactivationdaemonwillimpactallapplicationprocessesonthatnode.Thedegreeoferrorconfinementofferedbytheapplication-residentimplementationstrategydependsonthewayinwhichCORBAobjectsaremappedontotheprocessesandthreadsprovidedbytheunderlyingoperatingsystem.Thismappingis(deliberately)leftunspecifiedbytheCORBAstandards,anddifferentdeploymentconfigurationsarepossible:

•adedicatedcomputingnodeforeachCORBAobject.Inthiscasetheonlyerrorpropagationchannelisthroughthenetwork(andthroughcallstoCORBAservices).However,itisunsuitedtoasystemcomprisingalargenumberoflightweightobjects.

•eachCORBAobjectinaseparateoperatingsystemprocess.Thisisaheavyweightsolutionwhenlargenumbersofobjectsarerequired,butprovidesgooderrorconfinement,sincethecrashofoneobjectdoesnotmechanicallycausethecrashofotherobjectsrunningonthesamecomputingnode.

•multipleCORBAobjectsperoperatingsystemprocess.Thistechnique,whichiscalledcollocation,leadstoseveralobjectssharingthesameaddressspace.Thecrashofoneobjectmaycausethecrashofallthecollocatedobjects,sothischoiceclearlyprovidestheleasterrorconfinement.

4.2AfaultmodelforCORBA-basedsystems

Ingeneral,informationonthetypesoffailuresexperiencedbyaclassofsystems,andtheratesatwhichtheytendtooccur,areobtainedfromfieldmeasurements.However,wearenotawareofanysuchstudyformiddleware-basedsystems.Consequently,wecanonlyderivealistoftheclassesoffaultsthataffectthesesystemsthroughstructuralanalysis,bystudyingthearchitectureofatypicalsystem,andexaminingthepointswherefaultsmayarise,andhowtheycanpropagatethroughthesystem.

Figure6providesamoredetailedviewofthepathtakenbyaremotemethodinvocation,fromtheinvokingobjecttotheservantimplementingtheservice.Thefigureshowsthatmanydifferentlayersofsoftwareandhardwarearetraversedby

FailureanalysisofanORBinthepresenceoffaults16DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

processprocessservantclientstubDIIskeletonupperware

DSIPOAmiddleware

ORB coreORB coreoperating system kerneloperating system kernelunderware

host Ahost Bhardware

networkFigure6:DetailsofthepathofaCORBArequest

therequest;clearly,theimpactoffaultsaffectingeachlayermustbetakenintoaccountwhenconsideringthefailuremodesofaCORBA-basedsystem.

ThetypesoffaultsthatcanaffectaCORBA-baseddistributedsystemcanbeclassifiedasfollows:

•physicalfaultsaffectingRAMortheprocessor’sregisters(so-calledSingleEventUpsetsorsofterrors[ZieglerandSrinivasan,1996]).Forexample,ahardwarefaultmaycauseabittobeflippedatoneormoreaddressesinmemory.

•softwarefaults(designorprogrammingerrors)attheapplication,middle-wareandoperatingsystemlevels.Forinstance,anapplicationmaypassaNULLpointertothemiddleware,orthemiddlewaremayomitcheckingoferrorcodesreturnedbytheoperatingsystem.

•“environmental”faults,suchastheinterruptionofnetworkconnectionsanddisk-fullconditions.

•resource-managementfaults:“processaging”produceseffectssuchasleak-ingofmemory(particularlycommoninCORBAapplications),fragmentationeffects,exhaustionofresourcessuchasfiledescriptors.

•communicationfaults,suchasmessageloss,duplication,reorderingorcorruption.Whilethisclassoffaultsiswidelyassumednottoaffectmiddlewarethatbuildsonareliablenetworktransportprotocol,asisthecaseofCORBA’sIIOP,recentresearchdiscussedbelowsuggeststhattheydeserveattention.

FailureanalysisofanORBinthepresenceoffaults17DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

Whileasystemofsystemsissubjecttoeachofthesefaultclasses,physicalfaultsandsoftwarefaultsarelessspecifictothisDSoScontextthanthelastthreefaultclasses.Consequently,ourworkhasconcentratedonstudyingenvironmental,resource-managementandcommunicationfaults.

4.3FailuremodesofanORB

Inthissection,webrieflyanalyzethewaysinwhichanORBmayfail,andtheimpactofthesetypesoffailuresonthesystemthatbuildsonthemiddleware.WeclassifythefailuremodesofanORBasfollows:

•crashofaprocessorofathread;•hangofaprocessorofathread;

•corruptionofincomingandoutgoingdata;•omissionandduplicationofmessages;•incorrectsignalingofexceptions.

Theimpactofthesefailuremodesdependsonthecapacityofthesystemtodetectthefailure,andonthedegreetowhichitcanmaskorrecoverfromthefailure.Themostseverefailuremodesarethosewhicharenotdetectedbythesystem,andwhichthereforeallowanerrortopropagatefromthemiddlewaretotheapplicationlevel.

Aswasnotedintheprevioussection,theeffectofaprocessorthreadcrashintermsofpropagationdependsonthechoiceofmappingbetweenCORBAobjectsandexecutionentities.Thetimetakentodetectaprocesscrashorhangalsodependsonthesystem’sconfiguration;incertaincases,aremoteclientmaynotdetectthefailureinareasonabletimespan.

Concerningexceptionsignaling:duringaCORBAmethodinvocation,theORBontheclientsideisresponsibleforpropagatingexceptionsthatoccurredontheservertotheapplicationlevel.Ontheserverside,theORBisresponsibleforpropagatinganyexceptionsthatoccurduringtheprocessingofarequesttotheclient.Ifthissignalingisincorrect,eitherbecauseanORBdoesnotsignalanexceptionwhenitshouldhave,orbecauseithassignaledaspuriouscondition,thesystem’sfaulttolerancemechanismswillnotbeactivatedcorrectly.

FailureanalysisofanORBinthepresenceoffaults18DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

5Methodandexperimentaltechniques

Inthissectionwedescribeamethodfortheexperimentalcharacterizationofthefailuremodesofamiddlewareimplementation.Thismethodisderivedfromthefaultmodelpresentedintheprevioussection,andfromanalysisofthefeasibilityofdifferentformsoffailureobservationandfaultinjectiontechniques.Wepresentanumberofexperimentalfaultinjectiontechniquesthatcanbeusedtoassesstheimpactofdifferentfaultclassesonamiddleware.

Severalfactorsmustbeconsideredbeforelaunchingafaultinjectioncampaign:

•thefaultmodel:whichclassesoferrorstoinsert,wheretoinsertthem,andwhen?Theinjectionmaybetriggeredbytheoccurrenceofaneventofinterest,oroccurafterapredeterminedtimeperiod.Thefaultmaybetransientinnature(e.g,asinglebit-flip),orpermanent(e.g.,astuck-atfault).•theobservations:howtomonitorthesystem’sbehaviourandclassifythefailuremodes?Itisimportantthatallsignificanteventsbeobserved,whichmaybedifficultinadistributedsystem.

•theworkload:whatoperationalprofileorsimulatedsystemactivityshouldbeappliedduringtheexperiment?Theworkloadisevidentlyverydependentonthetargetsystem.Differentworkloadsmayleadtoslightlydifferentresults,sincetheycausedifferentsystemactivationpatterns.

TherestofthissectionpresentsanumberofapproachesforfaultinjectioninaCORBAenvironment,whichsimulatethedifferentfaultclassesthatwereidentifiedinSection4.2.Thesefaultinjectiontechniquesareclassifiedaccordingtotheoriginofthefaulttheysimulate:

•internalfaults,arisingeitherfromhardwareorsoftwarefaults,simulatedrespectivelyusingmemorybitflipsandprogrammutationtechniques;•faultspropagatingfromtheapplicationlevel,simulatedusingrobustnesstestingandperformancestress-testing;

•faultspropagatingfromtheunderlyingoperatingsystem,simulatedusingsystemcallinterpositiontechniques;

•faultsarrivingfromthenetwork,simulatedusingmessagecorruptionandreorderingtechniques.

5.1Corruptionofthememoryspace

Thisfaultmodelconsistsofsimulatingtheimpactoffaultsaffectingthememorysubsystemofthehostcomputer,inregionssuchastheRAM,theprocessor’sregisters,anditsI/Ocontrollers.

FailureanalysisofanORBinthepresenceoffaults19DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

Twoclassesoffaultscanbeinjected:

•permanentfaults,resultingfromfaultymemorycomponents.Thesecanbestuck-at-1,stuck-at-0.

•transientfaults,resultingfromSingleEventUpsetssuchaselectro-magneticradiation.Thesefaultsareusuallymoredifficulttodetectthanpermanentfaults.

Thetriggerforthefaultinjectioncanbetemporal,inwhichcasethefaultisinjectedacertainnumberofsecondsaftertheworkloadhasbeeninitialized,orspatial,inwhichcasethefaultisinjectedoncethetargetedmemorywordisaccessedbythesystem.

ThememoryareasthatcanbetargeteddependontheORB’simplementationstrategy.Characterizationofakernel-basedORBrequiresinjectionsintothekernel’saddressspace,aswellastheaddressspacesofCORBAapplications.Characterizationofadaemon-basedORBwillinvolvecorruptionofthememoryspaceoftheactivationdaemon.Foranapplication-residentORB,differentzonesoftheprocess’addressspacecanbetargeted(seeFigure7):

•theapplicationstackandheap;

•theprivatecodefromthestubsandskeletons,thatislinkedwiththeprocess;•thestackassociatedwiththesharedlibrary;•thecode(textzone)ofthesharedlibrary.

clientserverstack and heapstack and heapclient codeserver codeDII/DSIstubskeletonDII/DSIcode of ORB corecode of system librariesoperating system kernelFigure7:MemorymappingsinaCORBAsystemwithanimplementation-residentORB

FailureanalysisofanORBinthepresenceoffaults20DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

Whilethefirstthreesectionsoftheaddressspaceareprivatetoeachprocess,thecodeofthesharedlibraryis–onmodernoperatingsystems–sharedbetweenallprocessesusingthesameORBonthatcomputingnode(itisread-only,soitcanbesharedsafely).ThismeansthatthereisapotentialerrorpropagationchannelthroughtheORBviacorruptionofthesharedlibrary’scode(indeed,thesameisthecaseofothersystemlibraries,whicharesharedbyallprocesses).

Theinformationontheaddressrangescorrespondingtoeachzonecanbeobtainedfromtheoperatingsystem,forexamplebyusingthe/procfilesystemonLinux.Asforthepreviousfaultinjectionapproaches,thistechniquerequiresaworkloadapplicationandfailureobservers.Oncetheseelementsaresetup,theexperimentalcampaignsaresimilartothetargetingofotherexecutivesoftwarecomponents,suchasoperatingsystemkernels.Indeed,existingtoolssuchasMAFALDA(seeSection3.1)canbeusedtoconducttheexperiments.Measurementsoffactorssuchasexceptionclassesanderrordetectionlatenciescanbeobtained.

5.2Programmutationtechniques

Thischaracterizationtechniqueinvestigatestheeffectofsoftwarefaults.Itconsistsofartificiallyinsertingbugsintothesourcecodeoftheprogram,andobservingthebehaviourofthemodifiedcandidate(calledamutant).Previouswork[DaranandThévenod-Fosse,1996]hasshownthatprogrammutationinduceserrorswhicharesimilarinnaturetotheerrorsproducedbyrealprogrammingfaults.

Theearlyfocusofworkusingthistechniquewastesting,wheremutationisusedtomeasuretheadequacyofasetoftestcases.Somemorerecentwork[VoasandMcGraw,1997]hasinvestigatedprogrammutationasacharacterizationtechnique.Thisisclosertoouraim,whichistoidentifythetypesoferrorsandfailureswhichcanbecausedbysoftwarefaults,andinvestigatethedegreetowhichtheyaredetectedbythesystem’serrordetectionmechanisms.

Mostworkintheliteratureconsistsofinjectingfaultswhichchangethevalueofaliteralconstant,orthetypeofanoperator(forexamplechanginga+operatorintoa-,orchangingthesignofthecomparisonoperatorinaconditionalstatement)[DaranandThévenod-Fosse,1996].Othermutationsincludereplacingthenameofavariableorafunctionbyanothervariableorfunction.Therehasbeenmorerecentworktargetingobject-orientedmutationoperators[ChevalleyandThévenod-Fosse,2001],suchaschangesbetweendeepandshallowequalitycomparisons.

Whenappliedtoamiddleware-basedsystem,thereareanumberofdifferenttargetsformutation:

•theIDLinterfaceitself.Forexample,aparameterthatusedtobepassedusinginconventionscouldbechangedtoinout.Mutationcouldalsobeappliedtothedatastructuredefinitions,removinganattributeinarecord

FailureanalysisofanORBinthepresenceoffaults21DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

definition,orreplacinganunboundedsequencebyafixedlengthsequence.•thestubsandskeletonsautomaticallygeneratedbytheIDLcompiler.ThislocationsimulatesfaultsintheCORBAtoolchain.Forexample,twoparametersinamethodcouldbeexchangedbeforebeingsentoverthenetwork.Ifthetypesoftheparametersareincompatible,thisshouldbedetectedatcompiletime.

•thesourcecodeofthesharedlibraryimplementingtheORB.Thislocationevaluatestheeffectofresidualsoftwarefaultsinthemiddlewareitself.Examinationofthemodificationsmadeinsuccessivereleasesofaparticularimplementation,todeterminethetypesofbugsthatwerecorrected,couldbeaninputintheconstructionofamodelofthesefaults.

•attheapplicationlevel.Thissimulatesfaultsmadebytheapplicationprogrammer.ClassicfaultmodelssuchasODC[SullivanandChillarege,1991],whichincludeinitializationfaultsandcorruptionofpointers,couldbeused.

Inthesamewayastheuseofanobject-orientedprogramminglanguageintroducesnewclassesofsoftwarefaults,whicharesimulatedbyobject-orientedmutationoperators,itwouldbeinterestingtoidentifyanumberofCORBA-orientedmutationoperatorswhichcouldsimulatesoftwarefaultsspecifictotheuseofaCORBAORB.Forinstance,memorymanagementisnotoriouslytrickyinaCORBAcontext,whenusingprimitiveprogramminglanguagesthatdonotprovideautomaticstoragemanagement(suchasCandC++),sowouldbeapromisingtargetformutationoperators.Othermutationoperatorscouldinvolvetheuseofobjectreferences.

Unfortunately,thisworkisnecessarilyprogramming-languagespecific.WhilethemostcommonlyusedORBsareimplementedinC++,someareimplementedinotherprogramminglanguages.ApplyingthesameworktotheseORBswouldrequireportingoftheprogrammutationtoolchain;theeffortrequiredforthisworkwoulddependontheextenttowhichthemutationoperatorsarespecifictoaparticularprogramminglanguage.

5.3Robustnesstesting

Therobustnessofasystemisameasureofitsabilitytofunctioncorrectlyinthepresenceofinvalidinputsandstressfulenvironmentalconditions.Robustnesstestinginvolvesinjectingcorrupteddataattheexternalinterfaceofthesystem,andobservingitsbehaviour.

Robustnesstestingrequiresthatthesystemundertestpresentanexplicitinterface,whichwillbetargetedbythefaultinjector.ThisisproblematicinthecaseofanORB,sincemostofthefunctionalityprovidedbyanORBisimplicit.Indeed,theexplicitfunctionalityprovidedbyanORBislimitedtothefollowing:

FailureanalysisofanORBinthepresenceoffaults22DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

•ORBinitialization:passingenvironmentinformationtotheORBlibraryandobtainingbootstrapreferencesfortheORBandservices;

•POA5management(onaserverobject):methodsallowingservantstoregisterthemselveswiththeobjectadapter,andcontroltheirlifecycle;•policymanagement:dynamicchangestothewayaspectssuchastheconcurrencymodel,oraccesscontrol,arehandled;

•theconversionofobjectreferencestoandfromtextualrepresentation;•utilityproceduresforthecreationofcertaindatatypesandlistsofvalues.Theworkreportedin[Panetal.,2001]ontherobustnesstestingofORBstargetedabout20operationsinthisinterface.TheseoperationsconstituteonlyarelativelysmallportionofthefunctionalityprovidedbyanORB,andareprimarilyusedduringtheinitializationofanapplication.Indeed,mostofthefunctionalityprovidedbyanORBisimplicit,ratherthanresultingfromexplicitcallstoapublicinterface.ConsiderforexampleabasicCORBAmethodinvocationinaC++program:

result=theObject->theMethod(\"argument1\42);ThevariabletheObjectisaninstanceofaclassthatextendsclassesprovidedbytheORBimplementation.TheORBidentifiesthecomputingnodeonwhichtheobjectisrunning,connectstoagivenportonthatmachine,serializestheparametersofthecalltoastandardformat,andsendsthemovertheconnection.Itthenwaitsfortheserver’sresponse,anddeserializesthereplyintothevariableresult,orsignalsaC++exception.

Allthisactivityistransparenttotheapplicationprogrammer,sincethecallissyntacticallyidenticaltoastandardmethodinvocationonalocal,non-CORBAobject.Giventhatthisfunctionalityisnotexposedviaanexplicitinterface,thestandardrobustnesstestingapproachcannotbeapplied.

ThisimplicitfunctionalityprovidedbytheORBcanbebrokendownintoanumberofcategories:

•interactionwiththeapplicationprogramminglanguage:implementingmar-shallinganddemarshallingcode,handlingobjectcreationanddestruction,exceptionhandling;

•network-relatedprocessing:resolvingtheaddressesofhosts,establishingnetworkconnections,sendingandreceivinginformationfromremotehosts;

POA:PortableObjectAdapter,responsiblefordispatchingincomingmethodinvocationstothecorrectservant,andforcontrollingthelifecycleofservants

5

FailureanalysisofanORBinthepresenceoffaults23DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

•handlingconcurrencyaccordingtorequestedpolicy,incooperationwiththeoperatingsystem;

•resourcemanagement:allocatingandfreeingbuffers,etc.

Wewouldliketoapplyrobustnessevaluationtechniquestotheseclassesoffunctionality.Sincetheyarenotaccessiblethroughastandardinterface,weproposetogeneratesynthesizedinterfacesthatcanserveastargetsforfaultinjection.ThepurposeofthesesynthesizedinterfacesistoprovideawayofactivatingtheimplicitfunctionalityprovidedbyanORB.Ideally,wewouldliketobeabletoactivateeachclassofimplicitfunctionalityindividually,toobtaindetailedfailuremodeinformation.However,itisnotpossibletoisolatecertainfunctionalclassesfromtheothers–almostallinteractionswithanORBwillmakeuseofthemarshallingandnetworkingfunctionality,forinstance.

Thefollowingrequirementsshouldbesatisfiedbythesynthesizedinterfacesandthecorrespondingserviceimplementation:

•theyshoulduseallthedifferentdatatypesthatcanbedefinedinOMGIDL,includingcompounddatatypessuchasstructures;

•theyshouldincludeoperationswithargumentsandreturnvaluesthatcoverthepossiblecombinationsofthesedatatypes,includingthedifferentargumentpassingconventions(in,outandinout);

•giventhelargenumberoftestcasesimpliedbythetwoprecedingrequirements,theinjectioncodetargetingthesesynthesizedinterfacesshouldbegeneratedautomatically.Likewise,itshouldbepossibleautomaticallytogenerateaworkloadapplicationforagiveninterface(clearly,thiswillseverelylimitthesemanticleveloftheserviceswhichwecantarget);

•theserviceimplementationshouldbedeterministic,sothatfailureoftheservicecanbedetectedautomaticallyattheapplicationlevel;

•theserviceshouldbedependentonthehistoryofpreviousinvocations(i.e.,itshouldnotbestateless).Iftheservicedependsonsomeinternalstate,thereisagreaterprobabilityoffaultspropagatingtotheinterfacethaniftheservicewerestateless.

Theserequirementscanbemetbyadelayedechoservice,consistingofoperationsthattakeanynumberandtypeofarguments,andreturntheargumentssuppliedbythepreviouscalltotheservice.Thisservicecanbeimplementedforarbitrarymethodsignatures,isdeterministic,andisnotstateless.

Afaultinjectioncampaignusingthisapproachconsistsofthefollowingsteps:

FailureanalysisofanORBinthepresenceoffaults24DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

•generateaninterfacewithsomecombinationofdatatypedefinitionsandoperationsignature.Sincethesetofpossibleinterfacesisinfinite,thegenerationprocessisprobablyrandom,possiblyweighted.

•generatecorrespondingimplementationsfortheservice,theworkload,thefaultinjector,andthefaultobserver.

•invoketheservicewithcorruptedparametervalues,andobservetheservice’sbehaviour.

Thegenerationofparametersfortheinvocationsoftheserviceisawell-knownprobleminfunctionaltesting.Thereareanumberofpossibletechniques,includingstatisticalgeneration[Thévenod-Fosseetal.,1991].Thecorruptionoftheseparametersisnecessarilydependentontheirtype,andontheprogramminglanguagemapping.ManyOMGIDLtypesare“incorruptible”,inthesensethatallthebitsequencesthatcanberepresentedinmemoryhaveavalidrepresentationinthegiventype.Certaintypes,however,havearestricteddomain,andcanthusbesubjectedtoout-of-rangecorruption.

Theinterfaces,serviceimplementationsandworkloaddescribedabovecanbereusedforanumberofthefaultinjectiontechniquesdescribedinthefollowingsections.

5.3.1Performancestress-testing

Anotherformofrobustnesstestingisperformancestresstesting,wheretheunexpectedinputstothesystemconsistofanunusuallyintenseactivityoftheworkload.Theseperformancetestsevaluatethescalabilityoftheservice,intermsoftheaverageresponsetimeandjitter,asafunctionofthenumberofincomingrequestspersecond,andalsoasafunctionofthecomplexityoftherequest.ThisapproachisparticularlywellsuitedtothecharacterizationofCORBAserviceimplementations,sincetheirlevelofperformancecanaffectthewholesystemof

6.systems,andtimelyresponsesmaybecriticalforservicessuchasNotification

5.4Syscallinterpositiontechniques

Thisfaultmodelinvestigatesfaultpropagationtothemiddlewarefromthe

operatingsystemkernelandsystemlibraries.Thefailureattheoperatingsystemlevelcanhaveresultedfromvarioustypesoffaults,bothhardwareandsoftware.Themiddlewarelayerdependsonservicesprovidedbytheoperatingsystemkernel,suchasnetworking,schedulingofthreads,andstablestorageprovided

TheCORBANotificationServiceprovidesapublish/subscribeinfrastructurethatmediatesbetweeneventproducersandeventconsumers,accordingtocertainQualityofServicepolicies.

6

FailureanalysisofanORBinthepresenceoffaults25DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

throughthefilesystem.Thereareanumberofwaysforanerrortopropagatefromtheoperatingsystemtothemiddleware7:

•returninganerrorcodefromasystemcall.Indeed,mostsystemcallsreturnastatuscodeindicatingwhethertherequestedoperationcompletedsuccessfully.Ifnot,theapplicationcanreadanintegercodethatindicatesthereasonforthefailure.

•signallinganexception:theprogram’sexecutionisinterruptedbythearrivalofasignal.Iftheapplicationhasregisteredahandlerforthissignal,ithastheopportunitytoruntherelatedcode;otherwisetheapplicationisabortedbytheoperatingsystem.Forexample,theALRMsignalisusedtonotifyanapplicationthatatimeouthasexpired.

•takingtoolongtocompleteacertainsystemcall(forhardrealtimeapplications).

•corruptingdataduringinput/outputoperations,forexamplewhilereadingandwritingtostablestorage.

•failingtoinformtheapplicationthatsomeeventhasoccurred.Forinstance,applicationscanusetheselectsystemcalltosleepuntilactivityisdetectedonasetoffiledescriptors.Iftheoperatingsystemdoesnotwakeuptheapplication,itwon’thandleincomingmessages.

Arobustmiddlewareimplementationshouldbeabletohandle(certainclassesof)failuresoftheoperatingsystemgracefully.Inmanycases,thiswouldinvolvesignallinganexceptiontotheapplicationlevel,toallowanyerrorrecoverymechanismstobeexecuted.TheresponsetocertaintypesofexceptionalconditionsisspecifiedbytheCORBAstandard.Forexample,aNO_MEMORYCORBAexceptionmustbeusedtosignalaproblemwithdynamicmemoryallocation,andaPERSIST_STOREexceptiontosignalaproblemwithpersistentstorageontheserver.

Acampaignusingthisfaultinjectiontechniqueconsistsofobservingtheeffectsoftheseunexpectedoperatingsystembehavioursatthemiddlewarelevel.Theexperimentaltestbedincludesasystemcallinterpositionlayer,whichisabletointerceptagivensystemcallmadebythemiddleware.Insteadofpropagatingthiscalltotheoperatingsystemkernel,theinterpositionlayerreturns–possiblyafteracertaindelay–anerrorcodetothemiddleware.Thebehaviourofthemiddlewareisthenobserved,fromboththeapplicationlevel–isanexceptionraised,oristhefaultmasked–andfromtheoperatingsystemlevel,toseewhetherthesystemcallisrepeated(providinginformationonthemiddleware’serrorrecoverymechanisms).

InthefollowingweuseterminologyfromthePOSIXstandard,thoughsimilarconceptsexistinmostmodernoperatingsystems.

FailureanalysisofanORBinthepresenceoffaults

26

DeliverableIC3

7

DependableSystemsofSystemsIST-1999-11585

Thefaultinjectioncampaigncouldberunrandomly,byarbitrarilyselectingthetargetedsystemcallforeachrun.However,moreinterestinganalysisoftheexperimentaloutcomescanbeobtainedbytargetingspecificsystemcalls,whenthemiddlewareisinaknownstate.Usingthisapproach,itispossibletodeterminewhatactivitythemiddlewarewasinvolvedinwhenthefaultwasinjected,andexaminethecorrespondingsourcecodetoisolateportionsofcodethatcouldbemademorerobust.

AparticularlycommonactivityinCORBAmiddlewareistheexchangeofanumberofmessageswitharemotehost.Thisactivityresultsinacertaintraceofsystemcalls,whichisrepresentedinFigure8.

ddress resolutiondnsfd = socket(...)send(dnsfd, symbolic-address)recvfrom(dnsfd, ...)close(dnsfd)fd = socket(...)connect(fd, ...)setsockopt(fd, ...)fcntl(fd, ...)send(fd, ...)recv(fd, ...)close(fd)Figure8:Systemcallgraphforanetworkcommunication

Theinitialpartoftheactivityresolvesthetargethost’ssymbolicnameintoanumericalnetworkaddress.Themiddlewarethencreatesacommunicationsockettothisaddress,whichitaccessesviaanumericaldescriptor,andoptionallysetsvariousflagsonthesocket.Itthensendsandreceivesmessagesusingthisdescriptor,andfinallyclosesit.

Inordertousethesyscallinterpositiontechniquetotargetspecificmiddlewareactivities,thefollowinginformationmustbeavailable:

•adescriptionoftheoperatingsystem’sserviceinterface,listingthesignatureofeachsystemcall(includingthetypesofitsparametersandthereturnvalue)andthemeaningofeachoftheerrorcodeswhichcanbegeneratedbythatsystemcall.Thisinformationisavailableintheoperatingsystem’sprogrammingmanual.

FailureanalysisofanORBinthepresenceoffaults27DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

•alistofactivitygraphsforthetargetORBimplementation.Thesystemcalltracescanbegeneratedusingatoolsuchastruss8.

•aworkloadandfailureobserver.Thosedevelopedfortherobustnesstestingapproach(seeSection5.3)canbereusedinthiscontext.

Afaultinjectioncampaignforthisfaultmodelinvolvestargetingeachactivitygraph.Foreachactivitygraph,agivensequenceofsystemcallsisselected.Thenonesystemcallwithinthissequenceisselectedforcorruption.Forthissystemcall,oneofthepossiblefailuremodesisselected.Asyscallinterpositionlayerisgeneratedforthatsyscallandfaultactivationsequence.Theinterpositionlayerwilldetectthetargetedsequenceofsyscalls,whichbecomesthetriggerfortheinjection.

5.5Network-levelfaults

Thisfaultmodelconsistsofsimulatingtheeffectoffaultsaffectingthecommunicationsubsystem.ThisapproachisparticularlyinterestinginaDSoScontext,sinceitprovidesinformationonthewayinwhicherrorsmaypropagatebetweencomponentsystems,throughthecommunicationinfrastructure.

Thisapproachinvestigatestheimpactofcorruptmethodinvocationsarrivingoverthenetwork.Itconsistsofsendingacorruptedrequesttothetarget,andobservingitsbehaviour.Thisfaultmodelsimulatesthreedifferentclassesoffaults:

•transientphysicalfaultsinthecommunicationsubsystem,resultingforexamplefromfaultymemorybanksinrouters,orfaultyDMAtransferswiththenetworkinterfacecard.Networkcorruption,evenoverreliabletransportprotocolssuchasTCP(onwhichIIOPisbased),ismorefrequentthaniscommonlyassumed.BasedonanalysisoftraffictracesonaLANandtheInternet,[StoneandPartridge,2000]reportsthatapproximatelyonepacketin32000failstheTCPchecksum,andthatbetweenoneinafewmillionsandonein10billionpacketsaredeliveredcorruptedtotheapplicationlevel.Thisisbecausethe16-bitchecksumusedinTCPisnotabletodetectcertainerrors.Whilethisproportionisverysmall,itisnon-negligiblegiventhehighcapacityofmodernLANs.

•propagationtothetargetofafaultthatoccurredonaremotecomputingnodeinteractingwiththetarget.Thefaultmayhaveaffectedtheremoteoperatingsystemkernel,itsprotocolstackimplementation,ortheremoteORB,leadingtotheemissionofacorruptedrequest.

•maliciousfaults,suchasdenialofserviceattacksagainstthetarget.GiventhepivotalroleofthenameserviceinmostCORBA-basedsystems,an

8

SeeFigure14foranexampleofthetypeofinformationprovidedbythistool.

FailureanalysisofanORBinthepresenceoffaults28DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

attackerwhocancrashtheservicemaybeabletocausetheentiresystemtofail.Wenote,however,thatmostCORBAsystemswillbedeployedonprivatenetworkswhereallpartiescanbeassumedtobetrustworthy.Thetypesoferrorsthatcouldbeinvestigatedincludesinglebitflips,andthezeroingofsuccessivebytesinamessage.Theseareamongthemostcommonpatternsofcorruptionidentifiedin[StoneandPartridge,2000],andweassumethattheyarerepresentativeoferrorpropagationfromremotenodes.

Thereareseveralpossiblemeansofinjectingthesefaults.Wecouldusededicatednetworkhardware,butthisiscumbersomeandexpensive.Usingsoftware-implementedfaultinjection,faultscouldbeinjectedattheprotocoltransportlayer(forexamplebyinstrumentingtheoperatingsystem’snetworkstack,asin[DawsonandJahanian,1995]).However,thereisaveryhighprobabilitythatthisformofcorruptionisdetectedbytheremotehost’snetworkstack,andthereforenotdeliveredtothemiddleware.Consequently,itwouldbemoreefficienttoinjectthefaultattheapplicationlevel(seeFigure9),beforethedataisencapsulatedbythetransportlayer.TheseexperimentssimulatetheproportionofcorruptpacketsthatTCPincorrectlydeliversasbeingvalid.

ApplicationPresentationSessionTransportNetworkDatalinkPhysicalOSI model

ApplicationGIOPIIOPTCPIPEthernetHardwareCORBA model

Figure9:ProtocollevelsinCORBA

5.5.1Networkprotocolfaults

Aswellasconsideringfaultsthatcorruptthedatacontainedinincomingmessages,itwouldbeinterestingtoconsidertheimpactofhigher-levelfaults,whichaffectthesemanticsofthemessageratherthanitssyntacticalinformation.

Wedonotinvestigateprotocolfaultsthatoccuratthetransportlevelofthenetworkprotocolstack,sincethesewillbehandledbytheoperatingsystem’snetworking

FailureanalysisofanORBinthepresenceoffaults29DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

stackratherthanbythemiddleware.Rather,weconcentrateonprotocolfaultsatthelevelofGIOP,andmorespecificallyIIOP,itsmappingontoTCP(seeFigure9).Thetypesofunexpectedconditionstowhichwecanexposeatargetimplementationinclude:

•ThereceptionofunexpectedGIOPmessages.Forexample,GIOPspecifiesaLocateReplymessagetype,whichissentinresponsetoaLocateRequestmessage.AninjectedfaultcouldconsistofsendingLocateReplymessagetoanobject,withoutithavingemittedacorrespondingLocateRequestmessage.

•ThereceptionofGIOPmessagescontainingstrangerequest-ids.EachGIOPmessagecontainsarequest-id,whichisanumericalidentifierfortherequest.Thisrequest-idisthenusedintheresponse,toidentifyaresponsewitharequest.Thetargetcouldbesentdummyresponsescontainingrequest-idsthatitdidn’tsend.Additionally,theeffectofrequest-idduplicationcouldbestudied(thereceivingORBshoulddropmessagescontainingrequest-idsthatithasalreadyhandled).

•ThereceptionofGIOPmessageswhoseIORscontainunusualservicecontexts.TheservicecontextisusedbytheORBtocontainconnection-relatedinformation(suchasasessionkey,ortheidentifierofthecharactersetnegotiateduponestablishingthenetworkconnection).

FailureanalysisofanORBinthepresenceoffaults30DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

6Experimentalframeworkandresults

Inthissection,wepresenttheresultsofworkcarriedoutatLAASusingthenetwork-levelcorruptionfaultmodeldescribedinSection5.5.Ourmotivationforselectingthisfaultmodelfromthelistoftechniquespresentedintheprevioussection,forourinitialexperimentalwork,isitsrelevanceinthecontextoftheDSoSproject.Ourexperimentsaidasystemintegratorintheselectionofamiddlewareimplementationtobeusedinwrappingandasasupportfortheinterconnectioninfrastructure,byassessingtherobustnessofdifferentcandidateimplementations.Furthermore,themethodprovidesameansofcharacterizingthenatureandthelikelihoodoferrorpropagationbetweencomponentsystems,throughaCORBA-basedinterconnectioninfrastructure.

TheseexperimentstargeteddifferentimplementationsoftheCORBANameService[OMG,2001b].Thisserviceprovidesahierarchicaldirectoryforobjectreferences,allowingserverapplicationstoregisteraserviceunderasymbolicname,andclientstoobtainreferencesbyresolvinganame.

Wechosethistargetsinceitsstandardizedinterfacemakesiteasytocomparedifferentimplementationsoftheservice.Furthermore,theNameServicemayconstituteasinglepointoffailureinaCORBA-basedsystem:whileitispossibletodeployapplicationswithoutusinganamingortradingservice,byallocatingobjectreferencesstatically,mostsystemsrequirethedynamismprovidedbythisservice.ThesamefailuremodecharacterizationtechniquescouldbeappliedtootherCORBAservices,aswellastouserservicesimplementedonCORBA.Wealsobelievethatthefailuremodesexhibitedbyavendor’simplementationofthenameservicewillalsobepresent,toasignificantextent,inotherapplicationsbuiltusingthevendor’sCORBAORB.Indeed,avendor’snameserviceimplementationistypicallycomposedofacertainamountofapplicationcodeimplementingtheservice-specificfunctionality,whichislinkedwiththevendor’ssharedlibraryimplementingitsORB.Asignificantproportionoftherobustnessfailingswehaveobservedarerelativelylowlevel,andthusmorelikelytocomefromtheORBlibrarythanfromtheapplicationcode;wewouldthereforeexpectthattheywillalsobepresentinotherapplicationsusingtheORB.

6.1Failuremodes

Althoughtheclassificationoffailuremodesmaydependonthetargetcomponent,thevariouspossibleoutcomesofacomponent’sbehaviourinthepresenceoffaultsaresimilar.Roughlyspeaking,eitherthefaultissuccessfullydetectedbyvariouserrordetectionmechanisms(behaviouralchecks,executableassertions,hardwaremechanisms,etc.)andsignalledbydifferentmeans(errorstatus,exceptions,interrupts,etc.)totheinteractingcomponents,oritisnot.

Thelattercaseisthemoredifficulttoclassify.Thefirstpossiblesituationis

FailureanalysisofanORBinthepresenceoffaults31DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

thecrashorthehangofthetargetcomponent.Observingthissituationinvolvesexternalmechanismsthatcontrolthelivenessofthecomponentundertest.Whennocrashorhangareobserved,thenmoresubtlemechanismsmustbeusedtodistinguishthecorrectoutcomesofthetarget.Intesting,thisisknownasthenotionofOracle.ThisOraclemustbedefinedbeforehandandispartofboththeactivationprofileofthecomponentundertestandthefaultinjectioncampaignatruntime.Indeed,duringatestexperiment,theoutputsofthecomponentmustbeobtainedtobecompared(attheend)totheOracle.Thisistheonlywaytodetectincorrectbehaviourofthetargetcomponentduringthetestphase,whenbuilt-inerrordetectionmechanismsfail.

Weclassifytheexperimentaloutcomesforinjectionstargetingthenameserviceasfollows:

•kernelcrash:thecomputingnode(ornodes)hostingtheservicebecomesinaccessiblefromthenetwork.Wetestforthisconditionbyattemptingtoexecuteacommandfromaremotemachine.

•servicecrash:attemptstoestablishanetworkconnectiontotheservicearerefused.Typicallythismeansthattheprocessimplementingtheservicehasdied.

•servicehang:theserviceacceptstheincomingconnection,butdoesnotreplywithinagiventimespan.Notethatthisdoesnotnecessarilymeanthatotherclientsoftheserviceareblocked,sinceprocessingmaycontinueinotherthreads.

•applicationfailure(errorpropagationtotheapplicationlevel):theservicestartsreturningerroneousresultstoclients.Weassumeconservativelythaterrorpropagationtotheapplicationcausesanapplicationfailure.

•Exception:aninvocationoftheserviceresultsinaCORBAexceptionbeingraised.WedistinguishbetweenSystemExceptions(whichcomefromtheORB)andUserExceptions(whichareraisedattheapplicationlevel).Theobservationofthesefailuremodesisacrucialissueinafaultinjectioncampaign.Itisdifficulttoachieve100%coverageoftheerrordetectionmechanisms,sosomefailuresmaybeundetected.Inparticular,sinceallfaultinjectionexperimentsarefiniteintime,itispossibleforaninjectedfaultnottoleadtoanyobservableeffectduringthedurationoftheexperiment.Thisdoesnotnecessarilymeanthatthefaulthasnoeffect,sinceitseffectmaybepostponedaftertheendoftheobservationperiod(notionoferrorlatency).

Thesefailuremodesarenotequivalentfromadependabilitypointofview.Signallinganexceptionisthe“best”experimentaloutcome,sincetheserviceremainsavailabletootherusers,andtheapplicationcandecideonthemostappropriaterecoveryaction,suchasretryingtheoperation(inthecase

FailureanalysisofanORBinthepresenceoffaults32DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

ofaTRANSIENTexception)ordecidingtouseanalternativeservice(forCOMM_FAILURE).Itisimportantthattheexceptionprovideasmuchinformationaspossible;COMM_FAILUREismoreusefulthanUNKNOWN,sinceinthelattercasetheapplicationhaslessinformationonwhichtobaseitsrecoverystrategy.Themostseriousfailuremodeiserrorpropagationtotheapplicationlevel;indeed,anyfaulttolerancemechanismsimplementedattheapplicationlevelwillnotbeactivated,andtheerrorisfreetopropagatetothesystem’sserviceinterface.Thekernelandservicecrashandhangfailuremodes,whilenotpositiveoutcomes,areconsideredlessserious,sincetheycanbedetectedbysystem-dependentmechanismssuchaswatchdogtimers.

6.2Experimentalsetup

TheinfrastructureweusetosupportourfaultinjectionexperimentsisshowninFigure10.Itconsistsofthefollowingcomponents:

•theworkloadapplication,whichactivatesthetargetservice’sfunctionality(theworkloadrunsonadifferentcomputingnodefromtheservice);•thefaultinjector,whichsendsacorruptedrequesttothetargetoncetheworkloadhasbeenrunningforacertaintimespan;

•monitoringcomponents,whichobservethebehaviourofthetargetandlogtheirobservationstoanSQLdatabase;

•offlinedataanalysistools,toidentifythevariousfailuremodesbyexaminingthedatacollectedbythemonitoringcomponents.

workloadbind(), resolve(), unbind()controllertargetloggingcorrupt resolve()servicedatabaseFigure10:Experimentalconfigurationofourtestbed

FailureanalysisofanORBinthepresenceoffaults33DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

Ourworkloadapplicationrepeatedlyconstructsanaminggraph,resolvesnamesagainstthisgraph,andthendestroysthegraph.Sincethegraphisconstructedinadeterministicway,theworkloadisabletocheckthattheresultsreturnedbytheservicearecorrect(itplaystherôleoforaclewithrespecttothefunctionalspecificationoftheservice).Iftheworkloaddetectsananomalyintheoperationofthetargetservice,suchasanincorrectresult,thisissignalledasanapplicationfailure.Ifitreceivesanexceptionfromthetarget,itsignalstheappropriateexceptionoutcome.

Eachexperimentcorrespondstoasingleinjectedfault.Acontrollerprocesslaunchesthetargetserviceandobtainsitsobjectreference(intheimplementationswhichwehavetargeted,thenameserviceisimplementedasaUnixdæmon).Itthenstartstheworkloadapplication,passingittheservice’sreference.After20seconds,thefaultinjectorsendsacorruptedresolverequesttothetargetservice(foranamewhichhasnotbeengivenabinding)andwaitsforthereply.TheexpectedreplyisaNotFoundexceptionraisedbythenamingservice.Ifnoreplyarriveswithin20seconds,aServiceHangfailuremodeissignalled.Attheendoftheexperiment,themonitoringcomponentscheckforthepresenceofthedifferentfailuremodesbytryingtolaunchacommandonthetargethost,checkingforreturnedexceptions,etc.

Foreachtargetedimplementation,afaultinjectioncampaigninvolvesrunninganexperimentforeachbitorbytepositionintheresolverequest.Acampaignlastsaround48hourspertargetforthebitflipfaultmodel.

Thisfaultinjectiontechniqueisveryportable,sincetheonlyimplementation-specificcomponentinourtestbedisthecoderesponsibleforlaunchingthetargetimplementation.Thetechniqueisalsononintrusive,anddoesnotrequireanyinstrumentationofthetargetedservice.

6.3Targetimplementations

WehavecarriedoutourexperimentsonfourimplementationsoftheCORBANameService:

•omniORB2.8,byAT&TLaboratories,Cambridge.FreelyavailableundertheGNUGeneralPublicLicence,andimplementedinC++;

•ORBit0.5.0,alsoavailableundertheGNUGeneralPublicLicence,andimplementedinC;

•ORBacus4.0.4,acommercialproductfromObjectOrientedConcepts,implementedinC++;

•thetnameservbundledwithversion1.3ofSun’sJavaSDK.

AllexperimentswerecarriedoutonworkstationsrunningtheSolaris2.7operatingsystem,connectedbya100Mb/sEthernetLAN.Whilewetriedtomakethe

FailureanalysisofanORBinthepresenceoffaults34DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

experimentalconditionsassimilaraspossibleacrossimplementations,anumberoffactorsrequireparticularattention:

•persistence:theomniORBimplementationmaintainslogfilessoastoprovidepersistenceacrossserviceshutdowns.Toensureafreshenvironmentforeachexperiment,weerasetheselogfilesbeforestartingtheservice.TheORBacusimplementationcanbeconfiguredtouselogfiles,butwedonotenabletheminourexperiments.Thetwoothertestedimplementationsdonotsupportpersistence.

•numberofexperiments:weperformexperimentsforeachbitorbytepositioninthecorruptedmethodinvocation.CORBAmethodinvocationscontainanORB-dependentparametercalledtheservicecontext(whichcanbeusedtopropagateimplementation-specificdataandimplicitlypropagatetransactions).ThesizeofthisparameterdiffersslightlybetweenORBimplementations,sotheexactnumberofexperimentschangesslightlyfromtargettotarget.

•theORBitimplementationdefaultstousingnon-interoperableobjectreferences.WeconfiguredittousestandardIIOPprofiles.

Incertainexperiments,weobserveseveralfailuremodes:forexampleaservicecrashwillgenerallyresultinclientsoftheservicereceivinganexceptionindicatingthatacommunicationerrorhasoccurred.Inthefigurespresentedbelow,thefailuremodesareclassifiedaccordingtogravity,andforeachexperimentthemostseriousmodeobservedbythetestbedisselected.

6.4Analysisofresults

Inthissectionwepresenttheresultsofourfaultinjectionexperiments,forboththedouble-zeroandbitflipfaultmodels.Moregeneralanalysisfromadependablesystemintegrator’sperspectiveispresentedinSection6.5.

Figure11comparestheexperimentaloutcomesforeachtargetimplementation,forthedouble-zerofaultmodel(wheretwosuccessiveoctetsofthemessagearesettozero).Thethreeoutcomestotheleftofthelegendare“bad”,whereasthoseontherightindicaterobustbehaviour.TheoutcomeswhosenamesinthelegendareincapitalletterscorrespondtoCORBASystemExceptions.TheNotFoundoutcomeisaCORBAapplication-levelexceptionraisedbythenamingservicewhenitcannotresolveaname;thisistheexpectedbehaviouroftheserviceforourexperiments.Thesumoftheverticalbarsforeachtargetis100%.

FailureanalysisofanORBinthepresenceoffaults35DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

50

40

30

20

10

0

javaORB

ServiceCrashServiceHangUNKNOWN

omniORBORBacus

COMM FAILUREBAD OPERATIONMARSHAL

OBJECT NOT EXISTNotFound

ORBit

Figure11:Experimentaloutcomesfordouble-octetzeroingfaultmodelAfirstremarkisthatwehavenotobservedanycasesoferrorpropagationtotheapplicationlevel,whichisapositivepoint.However,therearearelativelylargeproportionofservicehangsandcrashes.

Asstatedearlier,ourservicehangfailuremodedoesnotimplythatotherclientsofthenamingserviceareblocked;weonlyconsiderthetimetakentoreplytothecorruptedinvocation.However,givenitsrelativefrequency,itisoneofthemostseriousdependabilityproblemswehaveidentified.TheupcomingCORBA2.4specificationallowsclientstospecifytimeoutsontheirrequests,whichwouldbehelpfulfordetectingthistypeofsituationwithoutresortingtoapplication-levelwatchdogmechanisms.Someoftheimplementationstestedalreadysupporttheseinterfacesorprovidesimilarmechanisms(buttheywerenotactivatedinourexperiments).

ExaminingthedetailsofthebreakdownofCORBAexceptions,weobservethattheJavaimplementationraisesveryfewCOMM_FAILUREexceptions,butalargerproportionofUNKNOWNexceptions(thisexceptionisraisedbyanORBwhenitdetectsanerrorintheserverexecutionwhosecauseitcannotdetermine–forexample,inJava,anattempttodereferenceanullpointer).UNKNOWNisalessusefulexceptiontosignaltotheapplicationlayer,sinceitconveysnoinformationonthecauseoftheexception,sofromthispointofviewtheJavaORBcanbeconsideredlessrobust.TheORBacusserviceraisesagreaterproportionofMARSHALexceptions,whichindicatesthatitsmarshallingcodedoesmoreerrorcheckingthanotherimplementations(apositivepointfromarobustnesspointofview);ORBitdoesnotraiseMARSHALexceptions.

TheproportionofOBJECT_NOT_EXISTexceptions,whichtheORBusesto

FailureanalysisofanORBinthepresenceoffaults36DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

signalthattheobjectreferenceagainstwhichthemethodwasinvokeddoesnotexist,isverysimilarbetweenimplementations.Thisistobeexpected,sinceanORBisrequiredtocheckthevalidityofthisvaluebeforedispatchingthemethodinvocationtotheappropriateservant.AsimilarremarkcanbemadefortheBAD_OPERATIONexception.

6.4.1Differencesbetweenfaultmodels

Figure12showstheexperimentaloutcomesforeachtargetimplementationforthebitflipfaultmodel.Theresultsdifferslightlyfromthoseforthedouble-zerofaultmodel.ThefirstdifferencebetweentheresultsfromthetwofaultmodelsistheappearanceofaInvalidNameexceptionwhichisnotprovokedbythedouble-zerofaultmodel.Thisexceptionisraisedbythenamingserviceeitherwhenthenameitisaskedtoresolveisempty,or–morelikelyinourcase–whenthenamecontainsaninvalidcharacter.

50

40

30

20

10

0

javaORB

ServiceCrashServiceHangUNKNOWN

omniORBORBacus

COMM FAILUREBAD OPERATIONMARSHAL

OBJECT NOT EXISTNotFound

ORBit

Figure12:Experimentaloutcomesforbitflipfaultmodel

AsecondobservationisthatthebitflipfaultmodelresultsinagreaterproportionofMARSHALandNotFoundexceptions.Inthelattercase,thedifferenceislikelytobeduetotheservicemaskingcertainerrors.Indeed,certainbitsinanIIOPmessageareunused.Forexample,thebyteorderofamessageisrepresentedbyazerooraonemarshalledintoanoctet;sevenofthesebitsarenotsignificant,andsotheircorruptionmaynotbedetectedbytheORB.Incontrast,adouble-zeroerrorisunlikelytoescapethenoticeoftheORB.

Certainotherphenomena,suchasthesmallproportionofCOMM_FAILUREand

FailureanalysisofanORBinthepresenceoffaults37DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

BAD_OPERATIONexceptionsraisedbyJavaORBforthebitflipfaultmodel,wouldrequiredeepanalysisofthesourcecodetoexplain.6.4.2Influenceoftheerrorposition

Figures11and12aggregatetheresultsoffaultsinjectedateachpossiblepositioninthemessage.Itisalsointerestingtoexaminethefailuremodesasafunctionofthepositioninthemessagewherethefaultwasinjected.Figure13showsthemostcommonexperimentaloutcomesforcertainregionsofthemessage9.

IIOP headeroperation arguments08163248byte-ordermessage-typeresponse-expected?OBJECT_NOT_EXISTMARSHALBAD_OPERATIONCOMM_FAILUREServiceHangServiceCrash#\\G#\\I#\\O#\\PGIOP-versionservice-context...message-lengthobject-keyoperation...requesting_principalFigure13:FormatofaGIOPmessage

Whenthefaultaffectsthepartofthemessagewhichidentifiestheinvokedoperation,primarilyBAD_OPERATIONexceptionsaresignalled,aswouldbeexpected.Similarly,faultsinjectedinthefirstfewbytesoftheIIOPrequest(whichcontainaspecialsignaturewhichidentifiesthemessagetype)resultmainlyinCOMM_FAILUREexceptions.

Whenthefaultaffectstheheaderbitsencodingthemessage’slength,wemostlyobserveservicehangs.Giventhatthereare32bitstoencodethemessagelength,andthatourmessagesarerelativelyshort(around900bits),abitflipinthiszone(normallysettozero)islikelytoincreasetheannouncedmessagelength,sotheservicewaitstoreadmoredatathanwillactuallyarrive.6.4.3Internalerrorcheckingmechanisms

TheORBacusservicewascompiledinitsdefaultconfiguration,withoutdeactivatinginternal“can’thappen”assertions.Whentheseassertionsfail,theprogramvoluntarilyexitsusingtheabortprocedure.ThisleadstoORBacus

9

Thedatainthefigureisvalidforallthetargetedimplementations

FailureanalysisofanORBinthepresenceoffaults38DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

showingarelativelyhighproportionofservicefailures,someofwhichcouldbeavoidedbyusingadifferentconfiguration.TheomniORBimplementationcanbeconfiguredatruntimetoabortwhenitdetectsaninternalerror,butwedidnotenablethisfeature.

6.4.4Systemcalltraceanalysis

Ourtestbedallowsustoobtainsystemcalltracesandexecutionstackbacktracesofthetargetprocess.Theseshowthatdifferentmiddlewareimplementationsactivatetheoperatingsystemindifferentways.Forinstance,theORBacusimplementationmakesalargenumberoflwp_mutexandlwp_semacalls,whichenablethesynchronizationofthreads,whereastheomniORBimplementationusesamuchnarrowerrangeofsystemcalls,primarilyforreadingandwritingtothenetworkandtoitslogfile.

ThesystemcalltracesalsoillustratedifferencesinthelevelofinternalerrorcheckingbetweenORBimplementations.Forexample,whenfaultsareinjectedintocertainbitpositions,theORBitimplementationcausesasegmentationviolationwhiledecodingthecorruptedmessage,andisforciblyabortedbytheoperatingsystem.Incontrast,theORBacusimplementationsometimesdetectsthecorruptioninternally,andisabletoprintawarningmessageindicatingthepositionintheprogramwheretheerrorwasdetected,beforevoluntarilyaborting.ThislackofinternalerrorcheckingisanimplementationdecisionforORBit,whoseprimarydesigngoalsarehighperformanceandasmallfootprint.

Figure14showsoutputfromthetrusstoolonSolaris,fortheORBitsegmentationviolationdescribedabove.Thetoolgeneratesatraceoftheinteractionbetweentheoperatingsystemandaprocess,showingthesystemcallsperformedbytheprocesswiththeirarguments,machinefaultsincurredbytheprocessandthesignalsdeliveredtoitbytheoperatingsystemkernel.ThetraceshowsthatafterhavingreadaGIOPrequestfromthenetwork,thenameserviceattemptstoaccessmemoryoutsideofitsaddressspace,receivesasegmentationviolationsignal,andisabortedbytheoperatingsystem.

FailureanalysisofanORBinthepresenceoffaults39DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

fcntl(7,F_SETFL,0x00000082)=0poll(0xFFBEF300,3,-1)=1

read(7,\"GIOP01\\001\\0U\\0\\0\\0\12)=12read(7,\"\\0\\0\\0\\001\\0\\0\\001\\0\\0\\0\"..,85)=85

Incurredfault#6,FLTBOUNDS%pc=0x00014274siginfo:SIGSEGVSEGV_MAPERRaddr=0x6E7A1124Receivedsignal#11,SIGSEGV[caught]

siginfo:SIGSEGVSEGV_MAPERRaddr=0x6E7A1124siginfo:SIGSEGVSEGV_MAPERRaddr=0x6E7A1124

[...]

getpid()=26780[26779]kill(26780,SIGABRT)=0

Receivedsignal#6,SIGABRT[caught]siginfo:SIGABRTpid=26780uid=3905

fstat(3,0xFFBED8E0)=0[...]

llseek(0,0,SEEK_CUR)=0_exit(1)

Figure14:trussoutputshowinganORBitsegmentationviolation

6.5Analysisfromanintegrator’spointofview

TheexperimentalresultspresentedinSection6showarelativelylargevariabilityofbehaviourofthetargetcandidatesinthepresenceoffaults.Thisdemonstratesthat,althoughtheservice’sinterfaceisstandardized,aparticularcandidate’sbehaviourdependsonthedesignandimplementationdecisionsmadebythevendor.Inthissection,weadopttheviewpointofasystemintegratorwhomustselectacandidateimplementationforasafetycriticalsystem.

Assuch,werankfirstcandidatesthatdeliverrelevanterrorreportinginformation,i.e.,thosewhichexhibitfewerservicehangsandUNKNOWNexceptions.Thesearethemostproblematicfailuremodeswhendecidingonfaulttolerancestrategiesanderrorrecoverymechanismsthatcanmeetthesystem’sdependabilityrequirements.BygroupingalltheexceptionsexceptforUNKNOWNtogether,weobtainthepercentagesforthebitflipfaultmodelshowninTable1.

Table1:Rankingofserviceimplementations

Implementation

ORBacusomniORBORBitJavaSDK

Exception88.079.576.658.0

UNKNOWN1.30.60.020.3

ServiceHang

6.119.315.820.7

ServiceCrash

4.60.67.61.0

FailureanalysisofanORBinthepresenceoffaults40DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

Fromthisviewpoint,theORBacusandomniORBimplementationsexhibitthesafestbehaviour:moresignificantexceptionsarereported,i.e.,fewerUNKNOWNexceptions,andthereisasmallerproportionofservicehangs.ORBacushasarelativelyhighrateofservicefailure,which(asdiscussedinSection6.4.3)ispartlyduetotheconfigurationwechose.Thistypeofreactiontoabnormalsituationsisnotnecessarilyanegativepointfromadependabilityviewpoint.Manyfaulttolerancestrategies,particularlyinadistributedcomputingcontext,makeafailsilenceassumption,whichrequirescomponentstoproduceeithercorrectresults,ornone.Silentfailurescansuccessfullybehandledbyreplication,eitherbyusingidenticalcopieslocatedondifferentsites,todealwithphysicalorenvironmentalfaults[Powell,1991],orbyusingdiversifiedcopiestoprotectagainstsoftwarefaults[Avizienis,1975,Randell,1975,Laprieetal.,1990].Wealsoobservedintheexperimentsthatthebehaviourdependsonthefaultmodel.Theresultsobtainedwithdoublezeroingandbitflipsleadtoadifferentstatisticaldistributionofthefailuremodes.However,theresultingnumbersdonotdisturbtherankinggiveninTable1.Manyissuescaninfluencetheobservedresults.Neverthelessbothtypesofexperimentsleadingtothesameconclusionsreinforcetheconfidenceonecanhaveintheranking.

Clearly,manyotheraspectsofmiddlewaredependabilitymustbetakenintoaccountinthefinalselectionofacandidate.Inparticular,theeffectsofotherclassesoffaultsneedtobeinvestigated.Fromthisviewpoint,theworkdonebytheBallistaproject[Panetal.,2001],whichusesadifferentfaultmodelandtargetsadifferentpartofthemiddleware,iscomplementarytoours.

FailureanalysisofanORBinthepresenceoffaults41DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

7Conclusionsandfuturework

ThisdocumentproposesamethodforthefailuremodeanalysisofCORBA-basedsystems.Thismethodreliesessentiallyonconventionalfaultinjectiontechniquesandonaclearidentificationofpossibletargetsinamiddlewareimplementation.AlthoughtheCORBAstandarddefinesthefeaturesthatmustbeprovidedbyCORBAORBimplementations,theirimplementationmayvarysignificantlyfromonevendortoanother.Fromadependabilityviewpoint,thedesignstrategyandtheimplementationofthestandardareofprimeimportance.Clearly,amiddlewaresuchasCORBAincludesseveralfacetsthatmakethecharacterisationquitedifficult.WeanalysedthevariouscomponentsandpossibletargetsinaCORBAmiddlewareandjustifiedtheuseofaparticularfaultinjectiontechnique.InthecontextofDSoS,theanalysisoffailuremodestargetingsensitiveservicesusingnetworkcorruptionseemedthemostrelevant,andwasthefirsttobetackled.ExperimentshavebeencarriedouttoobtainsignificantresultsonanumberofCORBAimplementations.Theseresultsshowthevariouspossiblebehavioursthatcanbeobservedandtheirimpactinasystemofsystems,fromadependabilityviewpoint.Theinsightsrevealedbytheseexperimentsarenovelandusefulinputstodeveloperrorconfinementwrappers(cfSection4ofDSoSdeliverableIC2).WehavepresentedanexperimentalrobustnessevaluationmethodforCORBA-basedservices,anddiscussresultsofexperimentstargetingfourimplementationsoftheCORBANameService.TheseexperimentscanbecarriedoutonanyCORBAserviceoruser-definedserviceontopofCORBA.ThechoiceoftheNamingServicewasjustifiedbyitsessentialroleinaCORBAdistributedsystem.Itisworthnotingthattheseexperimentsalsoevaluatetheeffectofcorruptedmethodinvocationsatthemiddlewarelevel.

Theimplementationswehavetestedshowanon-negligiblenumberofrobustnessweaknesses,butwehavenotobservedanyfailurescorrespondingtothepropagationofanerrorfromthemiddlewaretotheapplicationlevel.OurresultssuggestthattherobustnessofCORBA-basedsystemswouldbeenhancedbytheadditionofan(application-level)checksumtoGIOP.Theachievedfailuremodecharacterizationaidsintheselectionofacandidatemiddlewareimplementationforcriticalsystems,andhelpsDSoSsystemintegratorsdecideontheerrordetectionandrecoverymechanisms,faulttolerancestrategiesandarchitecturalsolutionsthatareneededtomeetdependabilityrequirements.Ourtechniqueisnon-intrusive,and(thankstothetransparencyprovidedbyCORBA)easytoport,bothtonewimplementationsoftheservice,andtoalternativeoperatingenvironments(operatingsystem,hardwareplatform).TheapproachcouldalsobeappliedforthefailuremodescharacterizationofotherCORBAservices,bymodifyingtheworkloadandthefaultinjector.

Thismethodwillbeusedtocarryoutotherexperimentsandobtainmoreresults

FailureanalysisofanORBinthepresenceoffaults42DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

regardingthefailuremodesofaCORBA-basedsystem10.First,faultinjectionwillbeperformedwithinseveraltargetcomponentscomposingthemiddlewareusingbit-flipfaultinjectiontechniquestargetingmemorysegments(simulationofhardwarefaults,asdescribedinSection5.1).Second,wewilladdresstherobustnessofimplicitfunctionsofanORBusinganadhocinterfacetothesefunctions.Therobustnessoftheseessentialfunctionswillbeevaluatedusingparameterfaultinjectiontechniques(cfSection5.3).Third,weplantoexaminetheinfluenceoffaultspropagatingfromtheoperatingsystemtothemiddleware,asdescribedinSection5.4.Theextensionoftheexperimentscarriedoutwillprovideusefulinputstothedefinitionoferrorconfinementwrappers.

Acknowledgements:TheauthorswouldliketothankJeanArlatforhelpfulcommentsontheirexperimentsandonearlyversionsofthisdocument.

References

[Arlatetal.,1993]J.Arlat,A.Costes,Y.Crouzet,J.-C.Laprie,andD.Powell.Faultinjectionanddependabilityevaluationoffault-tolerantsystems.IEEETransactionsonComputers,42(8):913–923,August1993.[Avizienis,1975]A.Avizienis.Fault-toleranceandfault-intolerance:complemen-taryapproachestoreliablecomputing.ACMSIGPLANNotices,10(6):458–4,June1975.[Carreiraetal.,1998]J.Carreira,H.Madeira,andJ.G.Silva.Xception:Atechniquefortheexperimentalevaluationofdependabilityinmoderncomputers.IEEETransactionsonSoftwareEngineering,24(2):125–136,February1998.[ChevalleyandThévenod-Fosse,2001]P.ChevalleyandP.Thévenod-Fosse.Amutationanalysistoolforjavaprograms.TechnicalReport01356,LAAS-CNRS,September2001.[Chungetal.,1999]P.E.Chung,W.Lee,J.Shih,S.Yajnik,andY.Huang.Fault-injectionexperimentsfordistributedobjects.InIEEE,editor,ProceedingsoftheInternationalSymposiumonDistributedObjectsandApplications,1999.[DaranandThévenod-Fosse,1996]M.DaranandP.Thévenod-Fosse.Softwareerroranalysis:arealcasestudyinvolvingrealfaultsandmutations.InS.J.Zeil,editor,Proceedingsofthe1996InternationalSymposiumonSoftwareTestingandanalysis,pages158–171,NewYork,January8–101996.ACMPress.

TheresultsobtainedfromtheseupcomingexperimentswillbesummarizedintheforthcomingDSoSdeliverablePCE1,togetherwiththedefinitionofthecorrespondingrobustness-enhancingmechanisms.

10

FailureanalysisofanORBinthepresenceoffaults43DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

[DawsonandJahanian,1995]S.DawsonandF.Jahanian.Probingandfaultinjectionofdependabledistributedprotocols.TheComputerJournal,38(4):286–300,1995.[Dawsonetal.,1997]S.Dawson,F.Jahanian,andT.Mitton.ExperimentsonsixcommercialTCPimplementationsusingasoftwarefaultinjectiontool.SoftwarePracticeandExperience,27(12):1385–1410,December1997.[Fabreetal.,2000]J.-C.Fabre,M.Rodríguez,J.Arlat,F.Salles,andJ.-M.Sizun.BuildingdependableCOTSmicrokernel-basedsystemsusingMAFALDA.InProceedingsofthe2000PacificRimInternationalSymposiumonDependableComputing(PRDC-2000),pages85–92.IEEEComputerSocietyPress,2000.[Fuchs,1998]E.Fuchs.Validatingthefail-silenceoftheMARSarchitecture.InProc.6thIFIPInt.WorkingConferenceonDependableComputingforCriticalApplications:DCCA-6,pages225–247.IEEEComputerSocietyPress,1998.[Jonesetal.,2001]C.Jones,K.Kopetz,E.Marsden,M.Paulitsch,D.Powell,B.Randell,andR.Stroud.Revisedversionofconceptualmodel.Researchreport,DSoS,September2001.[Kalyanakrishnametal.,1999]M.Kalyanakrishnam,Z.Kalbarczyk,andR.Iyer.FailuredataanalysisofaLANofWindowsNTbasedcomputers.InProceedingsofthe18thIEEESymposiumonReliableDistributedSystems(SRDS’99),pages178–1,Washington-Brussels-Tokyo,October1999.IEEE.[Karlssonetal.,1998]J.Karlsson,P.Folkesson,J.Arlat,Y.Crouzet,G.Leber,andJ.Reisinger.ApplicationofthreephysicalfaultinjectiontechniquestotheexperimentalassessmentoftheMARSarchitecture.InProc.5thIFIPWorkingConferenceonDependableComputingforCriticalApplications:DCCA-6,pages267–287.IEEEComputerSocietyPress,1998.[KoopmanandDeVale,1999]P.J.KoopmanandJ.DeVale.ComparingtherobustnessofPOSIXoperatingsystems.InProceedingsofthe29thAnnualInternationalSymposiumonFault-TolerantComputing(FTCS-29),pages30–37,LosAlamitos,CA,USA,1999.IEEEComputerSocietyPress.[Labovitzetal.,1998]C.Labovitz,G.R.Malan,andF.Jahanian.Internetroutinginstability.IEEE/ACMTransactionsonNetworking,6(5):515–528,October1998.[Laprieetal.,1990]J.-C.Laprie,J.Arlat,C.Beounes,andK.Kanoun.Definitionandanalysisofhardware-andsoftware-fault-tolerantarchitectures.Computer,23(7):39–51,July1990.

FailureanalysisofanORBinthepresenceoffaults44DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

[Madeiraetal.,2000]H.Madeira,D.Costa,andM.Vieira.Ontheemulationofsoftwarefaultsbysoftwarefaultinjection.InProceedingsoftheInternationalConferenceonDependableSystemsandNetworks(DSN2000),pages417–426.IEEEComputerSocietyPress,2000.[Milleretal.,1990]B.P.Miller,L.Fredriksen,andB.So.AnempiricalstudyofthereliabilityofUNIXutilities.CommunicationsoftheACM,33(12):32–44,December1990.[Nimmagaddaetal.,1999]S.Nimmagadda,C.Liyanaarachchi,A.Gopinath,D.Niehaus,andA.Kaushal.Performancepatterns:AutomatedscenariobasedORBperformanceevaluation.InProceedingsoftheFifthUSENIXConferenceonObject-OrientedTechnologiesandSystems,pages15–28.TheUSENIXAssociation,1999.[OMG,2001a]OMG.TheCommonObjectRequestBroker:ArchitectureandSpecification.Technicalreport,September2001.(formal/2001-09-01).[OMG,2001b]OMG.CORBAServices:CommonObjectServiceSpecification:NamingServiceSpecification.Documentationavailableatwww.omg.org,ObjectManagementGroup,February2001.[Panetal.,2001]J.Pan,P.Koopman,D.Siewiorek,Y.Huang,R.Gruber,andM.L.Jiang.RobustnesstestingandhardeningofCORBAORBimplementa-tions.InProceedingsoftheInternationalConferenceonDependableSystemsandNetworks(DSN2001).IEEE,June2001.[Powell,1991]D.Powell.Delta-4:AGenericArchitectureforDependableDistributedComputing.Springer-Verlag,Berlin,Germany,1991.[Randell,1975]B.Randell.Systemstructuresforsoftwarefaulttolerance.IEEETransactionsonSoftwareEngineering,SE-1(2):220–232,June1975.[Riménetal.,1994]M.Rimén,J.Ohlsson,andJ.Torin.Onmicroprocessorerrorbehaviormodeling.InProceedingsofthe24thAnnualInternationalSymposiumonFault-TolerantComputing,pages76–85,LosAlamitos,CA,USA,June1994.IEEEComputerSocietyPress.[StoneandPartridge,2000]J.StoneandC.Partridge.WhentheCRCandTCPchecksumdisagree.InProceedingsofthe2000ACMSIGCOMMConference,pages309–319,2000.[SullivanandChillarege,1991]M.SullivanandR.Chillarege.Softwaredefectsandtheirimpactonsystemavailability-astudyoffieldfailuresinoperatingsystems.21stInt.Symp.onFault-TolerantComputing(FTCS-21),pages2–9,1991.

FailureanalysisofanORBinthepresenceoffaults45DeliverableIC3

DependableSystemsofSystemsIST-1999-11585

[Thévenod-Fosseetal.,1991]P.Thévenod-Fosse,H.Waeselynck,andY.Crouzet.Anexperimentalstudyonsoftwarestructuraltesting:Deterministicversusrandominputgeneration.InFaultTolerantComputing,pages410–417,LosAlamitos,Ca.,USA,June1991.IEEEComputerSocietyPress.[VoasandMcGraw,1997]J.VoasandG.McGraw.SoftwareFaultInjection.JohnWileyandSons,1997.[ZieglerandSrinivasan,1996]J.F.ZieglerandG.R.Srinivasan.Preface:Terrestrialcosmicraysandsofterrors.Development,40(1):2–2,January1996.

FailureanalysisofanORBinthepresenceoffaultsIBMJournalofResearchand46DeliverableIC3

因篇幅问题不能全部显示,请点此查看更多更全内容

Copyright © 2019- oldu.cn 版权所有 浙ICP备2024123271号-1

违法及侵权请联系:TEL:199 1889 7713 E-MAIL:2724546146@qq.com

本站由北京市万商天勤律师事务所王兴未律师提供法律服务