DSoS
IST-1999-11585
Dependable Systems of Systems
Failure analysis of an ORB in presence of faults
Report Version: Deliverable IC3 Report Preparation Date: 1 October 2001 Classification: Public Circulation Contract Start Date: 1 April 2000
Duration: 36m
Project Co-ordinator: Newcastle University
Partners: DERA, Malvern – UK; INRIA – France; CNRS-LAAS – France; TU Wien – Austria; Universität Ulm – Germany; LRI Paris-Sud - France
Project funded by the European Community under the “Information Society Technology” Programme (1998-2002) FailureanalysisofanORBinpresenceoffaults
EricMarsdenandJean-CharlesFabreLAAS-CNRS,Toulouse,France{emarsden,fabre}@laas.fr
Abstract
Thisdocumentdescribesamethodandexperimentalresultsforthede-pendabilitycharacterizationofmiddlewareimplementations,andinparticu-larfailuremodeanalysisofCORBAORBimplementations.Theaimoftheworkistoprovideanoverallapproachforidentifyingandquantifyingfailuremodesusingvariousfaultinjectiontechniquesandfaultmodels.
Relatedworkindependabilitycharacterizationofexecutivesoftwarelay-ersisdiscussed.Ananalysisofthearchitectureofmiddleware-basedsystemsandtheirerrorconfinementregionsmotivatesthedevelopmentofafaultmodel.Anumberoffaultinjectionapproachesarediscussed,andresultsfromnetwork-basedcorruptionexperimentstargetingfourCORBAserviceimplementationsarepresented.
DependableSystemsofSystemsIST-1999-11585
TableofContents
12
Introduction
Objectivesandenablingtechnologies2.12.22.32.43
MiddlewareassessmentforDSoS.................Dependabilityassessmenttechniques................Faultinjection............................TargetingCORBAmiddleware...................
3556781111121415151618191921222528313133343042
Relatedwork3.13.23.3
Faultsinternaltoacomponentsystem...............Externalfaultsatalinkinginterface................Externalfaultsatalocalinterface..................
4FaultpathologyofCORBA-basedsystems4.14.24.3
ErrorconfinementregionsinaCORBA-basedsystem.......AfaultmodelforCORBA-basedsystems..............FailuremodesofanORB......................
5Methodandexperimentaltechniques5.15.25.35.45.5
Corruptionofthememoryspace..................Programmutationtechniques....................Robustnesstesting
.........................
..................
Syscallinterpositiontechniques
Network-levelfaults.........................
6Experimentalframeworkandresults6.16.26.36.46.5
Failuremodes............................Experimentalsetup.........................Targetimplementations.......................Analysisofresults..........................Analysisfromanintegrator’spointofview............
7Conclusionsandfuturework
FailureanalysisofanORBinthepresenceoffaults2DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
1Introduction
Thisdocumentdescribesamethodforcharacterizingthedependabilityofmiddlewareimplementations,inparticular,thefailuremodesofCORBAORBimplementations.Theaimoftheworkistoprovideanoverallmethodtoidentifyandquantifyfailuremodesusingvariousfaultinjectiontechniquesandfaultmodels.
Themethodisbasedonexistingworkindependabilitycharacterization.Althoughalargeamountofworkhasbeencarriedoutforthecharacterizationofexecutivesoftwareusingfaultinjection(e.g.,kernelandstandardoperatingsystem),verylittleworkhastargetedthemiddlewarelayers.Thecurrentworkinthefield,inparticularsoftware-implementedfaultinjectiontechniquesandtools,isreviewedinthisdocumentbecauseofitsinterestforthecharacterizationofCORBA-basedsystems.
Theconventionalsoftwareengineeringviewofamiddlewareisnotsufficientwhenconsideringtheassessmentofdependability-relatedproperties.WepresentamoredetailedarchitecturalviewofaCORBA-basedsystem,concentratingontheidentificationoferrorconfinementregionsandafailuremodesclassification.Basedonthisstructuralanalysis,wediscusstheclassesoffaultsthatcanaffectsuchsystems.
Anumberoffaultinjectiontechniquesthatsimulatethesefaultclassesaredescribed.Wepresentresultsfromoneofthesetechniques,whichmeasurestheimpactofcorruptmethodinvocationsonamiddlewareimplementation.TheseexperimentsareparticularlyrelevanttotheDSoSproject,sincetheyprovideinsightsintothewaysinwhicherrorsmaypropagatebetweencomponentsystems,overtheinterconnectioninfrastructure.ExperimentalresultswehaveobtainedonanumberofCORBAimplementationsrevealthatthisimpactcanbesignificant.Thedocumentisorganizedasfollows:InSection2,wediscusstheobjectivesofourworkwithinDSoS,andgiveanoverviewofdependabilityassessmenttechniquesandofCORBA-basedmiddleware.
InSection3,wediscusssomepreviousworkinfaultinjectionthatisrelatedtothefailuremodeanalysisofCORBAsystems.SomeresultsofexperimentstargetingCORBAarealsoreportedinthissection.
InSection4,weaddressthefailuremodesinmiddleware-basedsystems.Thisinvolvesdiscussingthearchitectureofamiddlewaresystemfromdifferentviewpointsandidentifyingpossibletargets,definingfaultassumptionsandmodels,classificationoffailuremodesandtheidentificationofpossiblefaultinjectionstrategies.
Section5isdevotedtothedescriptionofourmethodforexperimentallycharacterizingthedependabilityofCORBAORBimplementations.Anumberoffaultinjectiontechniquesthatarewellsuitedtothistargetaredescribed.
FailureanalysisofanORBinthepresenceoffaults3DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
Section6reportsontheexperimentalresultsweobtainedwhentargetingCORBAserviceimplementationsusingnetworkcorruptiontechniques.Inthelastsection,wepresentsomefirstlessonsthathavebeenlearnedfromourwork,fromtheviewpointofaDSoSsystemintegrator,anddrawsomeconclusions.
FailureanalysisofanORBinthepresenceoffaults4DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
2Objectivesandenablingtechnologies
InthissectionwepresenttheaimofourworkonmiddlewareassessmentwithrespecttotheDSoSproject,andgiveanoverviewofdependabilityassessmenttechniques.WeconcludewithabriefintroductiontoCORBAmiddleware.
2.1MiddlewareassessmentforDSoS
Anincreasinglylargeclassofdependablesystemsofsystemswillbebuiltonsomeformofmiddlewareinfrastructure.AlikelycandidateforthismiddlewareinfrastructureistheCORBAplatform,astandardthatiswellsuitedtotheinterconnectionofheterogeneoussystems,andiswidelyusedinindustry.Figure1illustratespossiblerolesforCORBA-compliantmiddlewareinasystemofsystems,bothasatechnologyfortheimplementationoflinkinginterfaces,possiblyusingwrappingtechniques,andasameansofinterconnectingcomponentsystems.Evidently,thedependabilityofthismiddlewarelayeriscrucialtothedependabilityoftheDSoSbuiltaboveit.
BankHertzHTTP interfacesCORBA interfacesInternetAvisRentalAgencyconnection systemGeographicInformation SystemFigure1:CORBAinfrastructureforasystemofsystems
Therehasbeenlittlepublishedresearchonthedependabilityofmiddleware-basedsystems.TheDSoSprojectaimstocontributetothisissueintwoways:
•categorizeandstudythetypesoffaultswhichcanbeexperiencedbyamiddleware-basedsystem,andthewaysinwhicherrorspropagatethroughthesystemtocausefailure.
•designandimplementmethodstoimproveamiddlewareimplementation’sbehaviourinthepresenceofthesefaults:improveerrordetection
FailureanalysisofanORBinthepresenceoffaults5DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
mechanisms,failsilenceandrobustnesscharacteristics,usingtechniquessuchaswrapping.
Thisreportcontributestothefirstpoint.Wepresentanumberofapproachesforcharacterizingthebehaviourofamiddleware-basedsystem(andinparticularaCORBA-compliantORBimplementation)inthepresenceoffaults,andpresenttheresultsoffaultinjectionexperimentstargetingseveralimplementationsoftheCORBANameService.
ThesecondpointisaddressedintheArchitectureandDesignworkpackage,whichincludesworkontheuseofwrappingtechniquestoimprovetherobustnessandfailsilenceofcomponentsystems,andoftheinterconnectioninfrastructure.Theaimistoensurethatindividualcomponentsystemshaveawell-definedbehaviourinthepresenceoffaults.TheArchitectureandDesignworkiscloselyrelatedtothepresentdeliverable;indeed,thefailuremodecharacterizationisanessentialinputtothedesignanddevelopmentoffaultcontainmentwrappers.
Ourworkonfaultinjectionalsoprovidesinsightsintothemannersinwhicherrorsmaypropagatebetweencomponentsystems,viatheconnectionsystems.Itprovidesamechanismforevaluatingtheprobabilityofsucherrorpropagation,andforcharacterizingtheireffects.Italsoinvestigatestheimpactoffaultsaffectingthecommunicationsubsystemitself.Thisworkhelpstheintegratorofasystemofsystemsanswertwoimportantquestions:
•whattypeofdependabilitypropertiescanbeassumedoftheinterconnectioninfrastructure?
•howmightthedependabilityofacomponentsystembeaffectedbytheadditionofDSoS-relatedsoftware,implementingalinkinginterface?
2.2Dependabilityassessmenttechniques
Thedependabilityofcomputersystemscanbeassessedusingeithermodel-basedormeasurement-basedtechniques.Modellingworkallowssystemdesignerstoobtainpredictionsofthedependabilityattributesofasystem,basedonprobabilisticmeasuresofthebehaviourofitssubsystems.Thesemeasuresareusefulduringthedesignphase,sincetheyenablethepertinentdependabilityattributesofdifferentsystemconfigurationstobeestimated,evenbeforetheyarebuilt.ThereisworkintheValidationworkpackagethataddressesthisproblemfromaDSoSpointofview.
However,modellingtechniquescanonlyprovidepredictionsofthedependabilityattributesofasystem.Onceasystemhasbeenimplementedanddeployed,measurement-basedtechniquescanbeappliedtoobtainmorespecificinsightsandmeasures.Therearetwomainmeasurement-basedapproachestoobtaininginformationonasystem’sdependability:
FailureanalysisofanORBinthepresenceoffaults6DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
•theobservationofalargesetofsystemsinoperation,asin[Kalyanakrishnametal.,1999].Thisapproachreliesonerrorinformationobtainedeitherfromlogsmaintainedbysystemadministratorsorfromautomaticmonitoringmechanismsprovidedbythesystem.Byanalysingthedata,onecanobtaininformationonthenatureandfrequencyoffailures,andonthetypeofusagethatledtothefailureofthecomponentsystem.
Adisadvantageofthisapproachisthatfailuresarerare,whichmakesitnecessarytocollectinformationonalargepopulationofidenticalsystemsoveralargetimespanbeforebeingabletomakestatisticallysignificantanalyses.Itisthuspoorlysuitedtoshortdevelopmentcycles.
•thedeliberateinsertionoffaultsintothetargetsystem,soastoacceleratethecharacterizationofitsbehaviourinthepresenceoffaults.Thesefaultinjectionexperimentsallowthesystem’serrordetectionmechanismstobetriggeredmorefrequentlythaninnormaloperation.Theyalsoallowevaluationofthesystem’sbehaviourwhenerrordetectioncoverageisnotperfect,asisusuallythecaseforcomplexsystems.
Field-basedobservationsarecomplementarytofaultinjectionexperiments,sincetheyprovidedataontypesoffailuresthatcanbeexperiencedbyasystem,ingivenoperationalconditions.Theanalysisoffailurereportscanbeusedtoderiveafaultmodel,whichisthenusedtodevelopfaultinjectioncampaigns.Thisincreasesthelikelihoodthatfaultinjectionexperimentsarerepresentativeofrealfaultsexperiencedbythetargetsystem.
Unfortunately,therehasbeennoreportedworkonfieldobservationoffailuresinmiddleware-basedsystems.Thismakesitdifficulttovalidatethedegreeofrepresentivityofagivenfaultmodelforthesetargets.
2.3Faultinjection
Faultinjectionisawell-knowndependabilitycharacterizationtechnique[Arlatetal.,1993],whichstudiesasystem’sreactiontoabnormalconditions.Itisatestingapproachthatiscomplementarytoanalyticalapproaches,andwhichallowstheexaminationofsystemstateswhichwouldnotbereachedbyconventionalfunctionaltesting.Theaimoffaultinjectionistosimulatetheeffectofrealfaultsimpactingatargetsystem,namelytheerrorduetotheactivationofafault.Faultinjectionexperimentsprovideanumberofusefulresults:
•anunderstandingofthesystem’sfailuremodes,oritsbehaviourinthepresenceoffaults;
•informationonthefaulttolerancemechanismsinthetargetsystem,inparticularameasurementoftheircoverage(theconditionalprobabilitythat,givenafaultinthesystem,thesystemcantolerateit).
FailureanalysisofanORBinthepresenceoffaults7DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
Anumberoffaultinjectiontechniqueshavebeendeveloped.Mostearlyworkconcernedtheinjectionofphysicalfaults[Karlssonetal.,1998],radiatingelectroniccircuitswithheavyions,tosimulatetheeffectofelectromagneticradiation,oractingdirectlyonthepinsofamicroprocessortomodifyvoltages.Duetothecomplexityandthespeedofmodernintegratedcircuits,recentresearchhasconcentratedonsoftware-implementedfaultinjection(SWIFI).Inthistechnique,thecorruptionisperformedbysoftware,andcantargetdifferentcomponentsordifferentlayersinasystem(operatingsystemkernel,systemservices,middlewarecomponents,applicationcode,systemmemoryandregisters).Thisapproachisverygenericandflexible,sincealargevarietyoffaultmodelscanbeused.Severalstudieshaveshownthatasinglebit-flipleadstosimilarerrorstothoseproducedbyphysicalfaultinjectiontechniques(e.g.,[Riménetal.,1994,Fuchs,1998]),andalsothattheysimulateerrorsproducedbysoftwarefaultsfairlyfaithfully[Madeiraetal.,2000].
Thetargetforthefaultinjectioncaneitherbetheinterfaceofasoftwarecomponent,oritsinternaladdressspace.Targetingtheinterfaceassessesthecomponent’srobustness,itsabilitytofunctioncorrectlyinthepresenceofinvalidinputsandstressfulenvironmentalconditions.Itisausefulwayofevaluatingtheprobabilityoferrorpropagationfromonesystemcomponenttoanother,duetotheirinteractions.Targetingtheaddressspaceassessestheimpactonthecomponent’sbehaviourofinternalcorruptions,resultingeitherfromphysicalfaultsorsoftwarefaults.
2.4TargetingCORBAmiddleware
Middlewareissoftwarethatmediatesbetweenanapplicationprogramandthenetwork.Itmanagestheinteractionbetweendisparateapplicationsacrossheterogeneouscomputingplatforms,abstractingfromtheprogramminglanguage,operatingsystemandhardware,andoftenprovidingconvenientaccesstoservicessuchasnaming,transactionalprocessingandconcurrencymanagement.
CORBA[OMG,2001a]isamiddlewareplatformthatfocusesoninteractionsbetweendistributedobjects.ThestandardisdefinedbytheObjectManagementGroup(OMG),anindustryconsortium.AkeyprincipleofCORBAisitsseparationofinterfaceandimplementation.Interfacesareusedtospecifytheoperationsanddatatypesthatallowaccesstoaservice;theyaredescribedinanInterfaceDefinitionLanguage(OMGIDL).Aninterfaceisindependentoftheprogramminglanguageandoperatingsystemthatisusedtoimplementtheserviceitdescribes,andofthelocationwheretheserviceisprovided.ThesoftwarethattransportsservicerequestsbetweentheclientandtheserveriscalledtheObjectRequestBroker(ORB).
Figure2showsaclientinvokingamethodnamedfooonanobjecthostedonaremotecomputingnode.Theobject’ssemanticsareimplementedbya
FailureanalysisofanORBinthepresenceoffaults8DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
programming-languagedependententitycalledaservant.ThefigureshowsthedifferentfunctionalelementscomposingaCORBAmiddlewareimplementation:
•thecoreoftheORB,whichhandlesmarshallingofinformationtoandfromtheCORBAwireformat,andcommunicationwithORBsonremotenodes,includingrequestdemultiplexingandconcurrencymanagement;
•asetofCORBAservices,includingnaming,trading,eventpropagation,accesscontrolandpersistency;
•onaserver,anobjectadapter(OA)thatisresponsiblefordispatchingincomingrequeststotheappropriateservant,controllingsecurityissuesandthelifecycleofservants;
•clientstubsthatprovideaninterfacetotheORBcore,andimplementationskeletonsthatconnecttheobjectadaptertotheservant(theseelementsareprogramminglanguagedependent,andaregeneratedautomaticallyfromtheIDLinterface);
•modulesthathandledynamicinvocation,providinganinvocationmecha-nismthatcanbeusedwhentheinterfaceofaservicewasnotknownatcompile-time(DII1andDSI2modules).
•anoptionalInterfaceRepositoryservice,whichallowsruntimeintrospectionoftheIDLinterfacesavailableinthesystem.
clientIORserv->foo(69)servantupperware
CORBAservicesDIIstubImplementationRepositoryDSIskeletonOAORB corekernelmiddleware
ORB corekernellowerware
node A
node B
Figure2:High-levelviewofaCORBAmethodinvocation
DII:DynamicInvocationInterface,thatallowsclientstoconstructmethodinvocationswithoutpassingthroughastub
DSI:DynamicSkeletonInterface,thatallowsserverstohandleincomingdynamicallyconstructedrequests.
2
1
FailureanalysisofanORBinthepresenceoffaults9DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
ORBimplementationsfromdifferentvendors,runningondifferentplatforms,areabletointeroperatebyexchangingmessagesadheringtotheGeneralInter-ORBProtocol(GIOP).Thisspecificationdescribesthedatarepresentations,messagetypesandmessageformatstobeusedforcommunicationbetweenORBs.GIOPassumesthattheunderlyingtransportprotocolisconnection-oriented,reliable,andcanbeviewedasabytestream.ThemappingofGIOPontoTCP/IPiscalledtheInternetInter-ORBProtocol(IIOP).
FailureanalysisofanORBinthepresenceoffaults10DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
3Relatedwork
Inthissection,wedescribepreviousresearchonfaultinjectionfordependabilitycharacterization.Weconcentrateonworkthathastargetedmiddlewareimplementations,aswellasexecutivesoftwaresuchasoperatingsystemkernelsandnetworkprotocolstacks.Weclassifyfaultinjectiontechniquesaccordingtowhethertheysimulatefaultsthatoriginateinternallytoacomponentsystem,orwhethertheyoriginateviaitslinkinginterfaceoritslocalinterfaces[Jonesetal.,2001].
3.1Faultsinternaltoacomponentsystem
Afaultissaidtobeinternalwhenthecorruptionofthesystem’sstatethatitcausesoriginatedinsidethesystem.Thisencompassesprogrammingerrorsthatmayaffecttheinternaldataofthesystem,andhardwarefaultsthatmaycorruptbothdataandcodememorysegments.Themostcommonfaultmodelusedisthesinglebit-flip.
Alargenumberoftoolshavebeendevelopedtoautomatetheexecutionofexperimentsusingthisfaultmodel.TwosignificantexamplesareXception,developedattheUniversityofCoimbra,Portugal[Carreiraetal.,1998],andMAFALDA,developedatLAAS-CNRS,France[Fabreetal.,2000].
Therehasbeensomework[Chungetal.,1999]investigatingtheimpactofhigh-levelfaultsonCORBAandDCOMapplications.Thefaultssimulatedarehangsandcrashesofthreads,processesandcomputingnodes.Theauthorsfoundasignificantproportionofapplicationhangs,whichledthemtorecommendtheuseofapplication-levelwatchdogmechanisms.
WearenotawareofanyworkonCORBAORBssimulatingfiner-grainedfaults,suchasbitflips.However,thetechniquehasbeenextensivelyappliedforthefailureanalysisofotherexecutivesoftware,includingoperatingsystemsandlanguageruntimes.Toillustratetheinformationthatcanbeobtainedfromthesetypesofexperiments,wepresentresultsextractedfromcampaignsapplyingtheMAFALDAtooltoanumberofCOTS3real-timemicrokernels,includingChorusandLynxOS.Figure3illustratesresultsobtainedbysubjectinganinstanceoftheChorus/ClassiXmicrokernel(composedofbasicfunctionalcomponentsimplementingbasicservicessuchassynchronization,memory,andscheduling)toaseriesofSWIFIexperimentsusingMAFALDA.
3
COTS:CommercialOff-TheShelf
FailureanalysisofanORBinthepresenceoffaults11DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
APPFAILNOOBS.9.0%28.5%ERRORSTATUS3%
APPHANG6.0%SYSHANG1.4%KDB13.9%APPFAIL2.4%NOOBS.33.0%APPHANG0.4%SYSHANG2.5%KDB4.1%APPHANG1.1%APPFAILSYSHANG3.6%2.2%NOOBS.31.3%KDB7%ERRORSTATUS5.1%
EXCEPTION38.1%
ERRORSTATUS2.1%EXCEPTION55.6%EXCEPTION49.9%a.SYNmodule(2986)b.MEMmodule(2918)c.COMmodule(2944)
Figure3:ResultsfromMAFALDAappliedtoChorus(internalbitflips)
Theseexperimentsconsistinselectingrandomlyalocationinthekerneladdressspaceandrandomlyflippingabitintothismemorycell.Thebitisrestoredassoonasthecellisread,irrespectiveofwhetherthecellcontainsinstructionsordata.ThepiediagraminFigure3ashowsthefailuremodesobservedwhenabout3000faults(transientsinglebitflips)wereinjectedinthecodesegmentofthestandardsynchronizationcomponent.Regardingthefailuremodes,about50%oftheerrorsweresuccessfullydetectedbythemicrokernelerrordetectionmechanisms(\"errorstatus\\"exception\\"kerneldebugger[KDB]\"),whileahang(\"systemhang[SYSHANG]\\"applicationhang[APPHANG]\")occurredin7.4%ofthecases.Nevertheless,9%oftheerrorsledtoanincorrectservice(\"applicationfailure[APPFAIL]\").Finally,the\"noobservation[NOOBS]\"category(29%)correspondstoerrorsthathadnoobservableconsequencesalthoughtheinjectedfaultswereactuallyactivated.Similarresultscanbeobtainedondifferentkernelcomponentssuchasthememorymanagementmodule(Figure3b)andthecommunicationmanagementmodule(Figure3c).
Section5.1discusseshowatoollikeMAFALDAcouldbeusedtocharacterizethefailuremodesofaCORBA-basedmiddlewareimplementation.
3.2Externalfaultsatalinkinginterface
Anothersourceoffaultsthatcanaffectacomponentsystemisthelinkinginterface,throughwhichitisconnectedwithothersystems.Thisformoffaultinjectionprovidesameansformeasuringasystem’srobustness,itsabilitytofunctioncorrectlyinthepresenceofinvalidinputsandstressfulenvironmentalconditions.Robustnesstestinginvolvesinjectingcorrupteddataatthelinkinginterfaceofthesystem,andobservingitsbehaviour.
In[Milleretal.,1990],therobustnessofdifferentimplementationsofstandardUNIXutilitieswasmeasured,bysubmittingthemtorandomlygeneratedinput.Despiteusinganextremelysimplefailuremodeclassification(crashornot-crash),thisfuzztestingshowedthatmostimplementationshadquitehighfailurerates.ThetechniquehasalsobeenappliedtotherobustnesstestingofPOSIX-compliant
FailureanalysisofanORBinthepresenceoffaults12DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
operatingsystemsinthecontextoftheBallistaproject[KoopmanandDeVale,1999].Thisworkconsistsofusinginvalidparametersinsystemcalls,suchasnullpointers,orusinganincorrectsequenceofsystemcalls.Itisbasedonalibraryofcorruptiontestcases,specializedforeachdatatype.Figure4showsthatallofthe15targetedoperatingsystemsexhibitedalargeproportionofnon-robustbehaviours(e.g.,between18and34%ofthetestsleadtoanabortfailuremode).
AIX 4.1FreeBSD 2.2.5HP-UX 9.05HP-UX 10.20AbortSilentRestart*****Irix 5.3Irix 6.2 LinuxLynxOSNetBSD *CatastrophicDUNIX 3.2 DUNIX 4.0DQNX 4.22 QNX 4.24SunOS 4.13SunOS 5.5010*203040Normalized failure rate (%)50Figure4:Comparisonof15POSIX-compliantoperatingsystems
TheBallistaapproachhasalsobeenappliedtotherobustnesstestingofanumberofCORBAimplementations[Panetal.,2001],withrespecttocorruptedinvocationsofaportionoftheclient-sideinterfaceexposedbyanORB.Forexample,theobject_to_stringoperation,whichconvertsanobjectreferenceintoatextualrepresentation,isinvokedwithaninvalidobjectreference,toseewhethertheORBcrashesorhangsorsignalsanexception.
Figure5presentsresultsfromthispaper.Itshowsthebreakdownofexperimentaloutcomesforthedifferenttargets.ORBimplementationsfromthreedifferentvendorsweretested,withdifferentversionsandondifferentoperatingsystems.Theirexperimentsshowahighproportionofnon-robustbehavioursuchasthreadhangsandcrashes.
Thefaultmodelconsideredinthisworkonlytargetsclient-sideoperations.Inparticular,activitythatinvolvesinteractionbetweenaclientandaserverisnotcovered.Furthermore,thefunctionalityexposedthroughORB’sclient-sideinterface,whichwastargetedinthisresearch,ismainlyusedduringtheinitializationofanapplication.MostofthefunctionalityprovidedbyanORB
FailureanalysisofanORBinthepresenceoffaults13DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
Orbix 3.0 (sun)Orbix2000 (sun)Orbix2000(Linux)omniORB2.8(sun)Thread Behaviour
Thread Hang Thread AbortomniORB3.0 (sun)ExceptionsUnknown ExceptionNo ExceptionCORBA::ExceptionCORBA::SystemExceptionomniORB3.0 (Linux)VisiBroker3.3 (sun)VisiBroker4.0 (sun)VisiBroker4.0 (Linux)0%20%40%60%80%100%Figure5:BallistaprojectappliedtoORBcharacterization
isinfactimplicit,inthesensethatitisactivatedwithoutanyexplicitcallsmadebytheapplicationlevel,andisthusdifficulttotargetusingthisapproach.
WearenotawareofothercharacterizationworkonCORBAusingfaultinjection.Othervalidationeffortshaveusedafunctionaltestingapproach(suchastheCORVALproject,whichaimstotestthefunctionalcorrectnessandtheinteroperabilityofORBimplementations)orconcentratedonperformanceevaluation(e.g.[Nimmagaddaetal.,1999]),withoutconsideringthepresenceoffaults.
3.3Externalfaultsatalocalinterface
ADSoScomponentsystem[Jonesetal.,2001]mayalsocommunicatewithitsenvironmentthroughoneormorelocalinterfaces,andmaybesubjectedtofaultsarrivingthroughthem.Examplesoflocalinterfacesareasystem’snetworkstack,andinterfacesoverwhichitmaybeprovidinglegacyservices.Thetypesoffaultsthatmayarriveoveralocalinterfaceareprotocolerrorsinacommunicationwitharemotesystem.
TherehasbeenworkonfaultinjectionforthecharacterizationofthebehaviourofthenetworkingstacksinUNIXoperatingsystems[Dawsonetal.,1997].Inthiswork,faultssuchasmessageloss,delaysandreorderingwereinjected,toassesstherobustnessoftheprotocolimplementations.In[Labovitzetal.,1998],thestabilityofroutingprotocolsusedontheInternetisstudied.
FailureanalysisofanORBinthepresenceoffaults14DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
4FaultpathologyofCORBA-basedsystems
Inthissectionwedescribethearchitectureofamiddleware-basedsystem,identifyingtheplacesinthesystemwherefaultsmayoccur,groupingthesefaultsintoclasses,andclassifyingthetypesoffailuresthatitmayexhibit.
4.1ErrorconfinementregionsinaCORBA-basedsystem
Middlewareisgenerallyseenasalayerofsoftwarethatliesbetweentheoperatingsystemandtheapplicationlayer,asshowninFigure2.Thishigh-levelviewofanORBissufficientformostCORBAdevelopment.Indeed,theCORBAspecificationsareimplementation-agnostic,anddonotmandateanyspecificrepresentationforCORBAobjects,orrequireanyparticularformofinteractionwiththeunderlyingoperatingsystem.Fordependabilityanalysis,however,moredetailedknowledgeofthearchitectureofaCORBA-basedsystemisnecessary,particularlywithrespecttotheerrorconfinementregionsimpliedbythearchitecture.
CORBAservicessuchasnamingandtheinterfacerepositoryaregenerallyimplementedasdaemons4.Aservicemayrunonasinglecomputingnode,ormayinvolvethecollaborationofmultiplecomputingnodes(federationofnameservers,forexample).
ACORBAORBcanbeimplementedinseveraldifferentways:
•kernel-basedstrategy,wheretheORBisprovidedasaserviceoftheop-eratingsystem.Thisstrategycanallowcertainperformanceoptimizations,sincetheoperatingsystemknowsthelocationofobject,andcanfacilitateauthenticationofrequests.
•daemon-basedstrategy,whereORBfunctionalityisprovidedbyoneormoredaemonprocesses,whichmediatebetweenclientsandobjectimplementations.Forexample,eachcomputingnodemayrunanactivationdaemonthatisresponsibleforactivatingserverprocessesanddispatchingincomingrequests,andforroutingoutgoingrequeststotheappropriatecomputingnode.Thisimplementationstrategyfacilitatescentralizedadministration,sinceallCORBAprocessesareknowntotheactivationdaemons.
•application-residentstrategy,wherecodeimplementingtheORBfunc-tionalityrunsinthesameexecutioncontextastheclientandtheobjectimplementations.TheORBistypicallyprovidedasasharedlibrarythatislinkedwithCORBAapplications.ThisisthemostcommonimplementationstrategycurrentlyusedonPOSIX-likesystems.Infact,eveninthetwo
daemon:standaloneoperatingsystemprocessthatrunsinthebackground,providingsomeformofservice.
4
FailureanalysisofanORBinthepresenceoffaults15DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
previousimplementationstrategies,acertainamountofORBfunctionalityishostedineachCORBAapplication,todealwiththelanguagemapping.Thekernel-residentimplementationstrategyprotectstheORBfrommodificationbyfaultyapplicationprograms,duetotheoperatingsystem’smemoryprotectionmechanisms.However,theORBserviceconstitutesanerrorpropagationchannelforallapplicationsonthesamecomputingnode.Thedaemon-basedstrategyintroducesasinglepointoffailurepercomputingnode,sincefailureoftheactivationdaemonwillimpactallapplicationprocessesonthatnode.Thedegreeoferrorconfinementofferedbytheapplication-residentimplementationstrategydependsonthewayinwhichCORBAobjectsaremappedontotheprocessesandthreadsprovidedbytheunderlyingoperatingsystem.Thismappingis(deliberately)leftunspecifiedbytheCORBAstandards,anddifferentdeploymentconfigurationsarepossible:
•adedicatedcomputingnodeforeachCORBAobject.Inthiscasetheonlyerrorpropagationchannelisthroughthenetwork(andthroughcallstoCORBAservices).However,itisunsuitedtoasystemcomprisingalargenumberoflightweightobjects.
•eachCORBAobjectinaseparateoperatingsystemprocess.Thisisaheavyweightsolutionwhenlargenumbersofobjectsarerequired,butprovidesgooderrorconfinement,sincethecrashofoneobjectdoesnotmechanicallycausethecrashofotherobjectsrunningonthesamecomputingnode.
•multipleCORBAobjectsperoperatingsystemprocess.Thistechnique,whichiscalledcollocation,leadstoseveralobjectssharingthesameaddressspace.Thecrashofoneobjectmaycausethecrashofallthecollocatedobjects,sothischoiceclearlyprovidestheleasterrorconfinement.
4.2AfaultmodelforCORBA-basedsystems
Ingeneral,informationonthetypesoffailuresexperiencedbyaclassofsystems,andtheratesatwhichtheytendtooccur,areobtainedfromfieldmeasurements.However,wearenotawareofanysuchstudyformiddleware-basedsystems.Consequently,wecanonlyderivealistoftheclassesoffaultsthataffectthesesystemsthroughstructuralanalysis,bystudyingthearchitectureofatypicalsystem,andexaminingthepointswherefaultsmayarise,andhowtheycanpropagatethroughthesystem.
Figure6providesamoredetailedviewofthepathtakenbyaremotemethodinvocation,fromtheinvokingobjecttotheservantimplementingtheservice.Thefigureshowsthatmanydifferentlayersofsoftwareandhardwarearetraversedby
FailureanalysisofanORBinthepresenceoffaults16DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
processprocessservantclientstubDIIskeletonupperware
DSIPOAmiddleware
ORB coreORB coreoperating system kerneloperating system kernelunderware
host Ahost Bhardware
networkFigure6:DetailsofthepathofaCORBArequest
therequest;clearly,theimpactoffaultsaffectingeachlayermustbetakenintoaccountwhenconsideringthefailuremodesofaCORBA-basedsystem.
ThetypesoffaultsthatcanaffectaCORBA-baseddistributedsystemcanbeclassifiedasfollows:
•physicalfaultsaffectingRAMortheprocessor’sregisters(so-calledSingleEventUpsetsorsofterrors[ZieglerandSrinivasan,1996]).Forexample,ahardwarefaultmaycauseabittobeflippedatoneormoreaddressesinmemory.
•softwarefaults(designorprogrammingerrors)attheapplication,middle-wareandoperatingsystemlevels.Forinstance,anapplicationmaypassaNULLpointertothemiddleware,orthemiddlewaremayomitcheckingoferrorcodesreturnedbytheoperatingsystem.
•“environmental”faults,suchastheinterruptionofnetworkconnectionsanddisk-fullconditions.
•resource-managementfaults:“processaging”produceseffectssuchasleak-ingofmemory(particularlycommoninCORBAapplications),fragmentationeffects,exhaustionofresourcessuchasfiledescriptors.
•communicationfaults,suchasmessageloss,duplication,reorderingorcorruption.Whilethisclassoffaultsiswidelyassumednottoaffectmiddlewarethatbuildsonareliablenetworktransportprotocol,asisthecaseofCORBA’sIIOP,recentresearchdiscussedbelowsuggeststhattheydeserveattention.
FailureanalysisofanORBinthepresenceoffaults17DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
Whileasystemofsystemsissubjecttoeachofthesefaultclasses,physicalfaultsandsoftwarefaultsarelessspecifictothisDSoScontextthanthelastthreefaultclasses.Consequently,ourworkhasconcentratedonstudyingenvironmental,resource-managementandcommunicationfaults.
4.3FailuremodesofanORB
Inthissection,webrieflyanalyzethewaysinwhichanORBmayfail,andtheimpactofthesetypesoffailuresonthesystemthatbuildsonthemiddleware.WeclassifythefailuremodesofanORBasfollows:
•crashofaprocessorofathread;•hangofaprocessorofathread;
•corruptionofincomingandoutgoingdata;•omissionandduplicationofmessages;•incorrectsignalingofexceptions.
Theimpactofthesefailuremodesdependsonthecapacityofthesystemtodetectthefailure,andonthedegreetowhichitcanmaskorrecoverfromthefailure.Themostseverefailuremodesarethosewhicharenotdetectedbythesystem,andwhichthereforeallowanerrortopropagatefromthemiddlewaretotheapplicationlevel.
Aswasnotedintheprevioussection,theeffectofaprocessorthreadcrashintermsofpropagationdependsonthechoiceofmappingbetweenCORBAobjectsandexecutionentities.Thetimetakentodetectaprocesscrashorhangalsodependsonthesystem’sconfiguration;incertaincases,aremoteclientmaynotdetectthefailureinareasonabletimespan.
Concerningexceptionsignaling:duringaCORBAmethodinvocation,theORBontheclientsideisresponsibleforpropagatingexceptionsthatoccurredontheservertotheapplicationlevel.Ontheserverside,theORBisresponsibleforpropagatinganyexceptionsthatoccurduringtheprocessingofarequesttotheclient.Ifthissignalingisincorrect,eitherbecauseanORBdoesnotsignalanexceptionwhenitshouldhave,orbecauseithassignaledaspuriouscondition,thesystem’sfaulttolerancemechanismswillnotbeactivatedcorrectly.
FailureanalysisofanORBinthepresenceoffaults18DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
5Methodandexperimentaltechniques
Inthissectionwedescribeamethodfortheexperimentalcharacterizationofthefailuremodesofamiddlewareimplementation.Thismethodisderivedfromthefaultmodelpresentedintheprevioussection,andfromanalysisofthefeasibilityofdifferentformsoffailureobservationandfaultinjectiontechniques.Wepresentanumberofexperimentalfaultinjectiontechniquesthatcanbeusedtoassesstheimpactofdifferentfaultclassesonamiddleware.
Severalfactorsmustbeconsideredbeforelaunchingafaultinjectioncampaign:
•thefaultmodel:whichclassesoferrorstoinsert,wheretoinsertthem,andwhen?Theinjectionmaybetriggeredbytheoccurrenceofaneventofinterest,oroccurafterapredeterminedtimeperiod.Thefaultmaybetransientinnature(e.g,asinglebit-flip),orpermanent(e.g.,astuck-atfault).•theobservations:howtomonitorthesystem’sbehaviourandclassifythefailuremodes?Itisimportantthatallsignificanteventsbeobserved,whichmaybedifficultinadistributedsystem.
•theworkload:whatoperationalprofileorsimulatedsystemactivityshouldbeappliedduringtheexperiment?Theworkloadisevidentlyverydependentonthetargetsystem.Differentworkloadsmayleadtoslightlydifferentresults,sincetheycausedifferentsystemactivationpatterns.
TherestofthissectionpresentsanumberofapproachesforfaultinjectioninaCORBAenvironment,whichsimulatethedifferentfaultclassesthatwereidentifiedinSection4.2.Thesefaultinjectiontechniquesareclassifiedaccordingtotheoriginofthefaulttheysimulate:
•internalfaults,arisingeitherfromhardwareorsoftwarefaults,simulatedrespectivelyusingmemorybitflipsandprogrammutationtechniques;•faultspropagatingfromtheapplicationlevel,simulatedusingrobustnesstestingandperformancestress-testing;
•faultspropagatingfromtheunderlyingoperatingsystem,simulatedusingsystemcallinterpositiontechniques;
•faultsarrivingfromthenetwork,simulatedusingmessagecorruptionandreorderingtechniques.
5.1Corruptionofthememoryspace
Thisfaultmodelconsistsofsimulatingtheimpactoffaultsaffectingthememorysubsystemofthehostcomputer,inregionssuchastheRAM,theprocessor’sregisters,anditsI/Ocontrollers.
FailureanalysisofanORBinthepresenceoffaults19DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
Twoclassesoffaultscanbeinjected:
•permanentfaults,resultingfromfaultymemorycomponents.Thesecanbestuck-at-1,stuck-at-0.
•transientfaults,resultingfromSingleEventUpsetssuchaselectro-magneticradiation.Thesefaultsareusuallymoredifficulttodetectthanpermanentfaults.
Thetriggerforthefaultinjectioncanbetemporal,inwhichcasethefaultisinjectedacertainnumberofsecondsaftertheworkloadhasbeeninitialized,orspatial,inwhichcasethefaultisinjectedoncethetargetedmemorywordisaccessedbythesystem.
ThememoryareasthatcanbetargeteddependontheORB’simplementationstrategy.Characterizationofakernel-basedORBrequiresinjectionsintothekernel’saddressspace,aswellastheaddressspacesofCORBAapplications.Characterizationofadaemon-basedORBwillinvolvecorruptionofthememoryspaceoftheactivationdaemon.Foranapplication-residentORB,differentzonesoftheprocess’addressspacecanbetargeted(seeFigure7):
•theapplicationstackandheap;
•theprivatecodefromthestubsandskeletons,thatislinkedwiththeprocess;•thestackassociatedwiththesharedlibrary;•thecode(textzone)ofthesharedlibrary.
clientserverstack and heapstack and heapclient codeserver codeDII/DSIstubskeletonDII/DSIcode of ORB corecode of system librariesoperating system kernelFigure7:MemorymappingsinaCORBAsystemwithanimplementation-residentORB
FailureanalysisofanORBinthepresenceoffaults20DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
Whilethefirstthreesectionsoftheaddressspaceareprivatetoeachprocess,thecodeofthesharedlibraryis–onmodernoperatingsystems–sharedbetweenallprocessesusingthesameORBonthatcomputingnode(itisread-only,soitcanbesharedsafely).ThismeansthatthereisapotentialerrorpropagationchannelthroughtheORBviacorruptionofthesharedlibrary’scode(indeed,thesameisthecaseofothersystemlibraries,whicharesharedbyallprocesses).
Theinformationontheaddressrangescorrespondingtoeachzonecanbeobtainedfromtheoperatingsystem,forexamplebyusingthe/procfilesystemonLinux.Asforthepreviousfaultinjectionapproaches,thistechniquerequiresaworkloadapplicationandfailureobservers.Oncetheseelementsaresetup,theexperimentalcampaignsaresimilartothetargetingofotherexecutivesoftwarecomponents,suchasoperatingsystemkernels.Indeed,existingtoolssuchasMAFALDA(seeSection3.1)canbeusedtoconducttheexperiments.Measurementsoffactorssuchasexceptionclassesanderrordetectionlatenciescanbeobtained.
5.2Programmutationtechniques
Thischaracterizationtechniqueinvestigatestheeffectofsoftwarefaults.Itconsistsofartificiallyinsertingbugsintothesourcecodeoftheprogram,andobservingthebehaviourofthemodifiedcandidate(calledamutant).Previouswork[DaranandThévenod-Fosse,1996]hasshownthatprogrammutationinduceserrorswhicharesimilarinnaturetotheerrorsproducedbyrealprogrammingfaults.
Theearlyfocusofworkusingthistechniquewastesting,wheremutationisusedtomeasuretheadequacyofasetoftestcases.Somemorerecentwork[VoasandMcGraw,1997]hasinvestigatedprogrammutationasacharacterizationtechnique.Thisisclosertoouraim,whichistoidentifythetypesoferrorsandfailureswhichcanbecausedbysoftwarefaults,andinvestigatethedegreetowhichtheyaredetectedbythesystem’serrordetectionmechanisms.
Mostworkintheliteratureconsistsofinjectingfaultswhichchangethevalueofaliteralconstant,orthetypeofanoperator(forexamplechanginga+operatorintoa-,orchangingthesignofthecomparisonoperatorinaconditionalstatement)[DaranandThévenod-Fosse,1996].Othermutationsincludereplacingthenameofavariableorafunctionbyanothervariableorfunction.Therehasbeenmorerecentworktargetingobject-orientedmutationoperators[ChevalleyandThévenod-Fosse,2001],suchaschangesbetweendeepandshallowequalitycomparisons.
Whenappliedtoamiddleware-basedsystem,thereareanumberofdifferenttargetsformutation:
•theIDLinterfaceitself.Forexample,aparameterthatusedtobepassedusinginconventionscouldbechangedtoinout.Mutationcouldalsobeappliedtothedatastructuredefinitions,removinganattributeinarecord
FailureanalysisofanORBinthepresenceoffaults21DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
definition,orreplacinganunboundedsequencebyafixedlengthsequence.•thestubsandskeletonsautomaticallygeneratedbytheIDLcompiler.ThislocationsimulatesfaultsintheCORBAtoolchain.Forexample,twoparametersinamethodcouldbeexchangedbeforebeingsentoverthenetwork.Ifthetypesoftheparametersareincompatible,thisshouldbedetectedatcompiletime.
•thesourcecodeofthesharedlibraryimplementingtheORB.Thislocationevaluatestheeffectofresidualsoftwarefaultsinthemiddlewareitself.Examinationofthemodificationsmadeinsuccessivereleasesofaparticularimplementation,todeterminethetypesofbugsthatwerecorrected,couldbeaninputintheconstructionofamodelofthesefaults.
•attheapplicationlevel.Thissimulatesfaultsmadebytheapplicationprogrammer.ClassicfaultmodelssuchasODC[SullivanandChillarege,1991],whichincludeinitializationfaultsandcorruptionofpointers,couldbeused.
Inthesamewayastheuseofanobject-orientedprogramminglanguageintroducesnewclassesofsoftwarefaults,whicharesimulatedbyobject-orientedmutationoperators,itwouldbeinterestingtoidentifyanumberofCORBA-orientedmutationoperatorswhichcouldsimulatesoftwarefaultsspecifictotheuseofaCORBAORB.Forinstance,memorymanagementisnotoriouslytrickyinaCORBAcontext,whenusingprimitiveprogramminglanguagesthatdonotprovideautomaticstoragemanagement(suchasCandC++),sowouldbeapromisingtargetformutationoperators.Othermutationoperatorscouldinvolvetheuseofobjectreferences.
Unfortunately,thisworkisnecessarilyprogramming-languagespecific.WhilethemostcommonlyusedORBsareimplementedinC++,someareimplementedinotherprogramminglanguages.ApplyingthesameworktotheseORBswouldrequireportingoftheprogrammutationtoolchain;theeffortrequiredforthisworkwoulddependontheextenttowhichthemutationoperatorsarespecifictoaparticularprogramminglanguage.
5.3Robustnesstesting
Therobustnessofasystemisameasureofitsabilitytofunctioncorrectlyinthepresenceofinvalidinputsandstressfulenvironmentalconditions.Robustnesstestinginvolvesinjectingcorrupteddataattheexternalinterfaceofthesystem,andobservingitsbehaviour.
Robustnesstestingrequiresthatthesystemundertestpresentanexplicitinterface,whichwillbetargetedbythefaultinjector.ThisisproblematicinthecaseofanORB,sincemostofthefunctionalityprovidedbyanORBisimplicit.Indeed,theexplicitfunctionalityprovidedbyanORBislimitedtothefollowing:
FailureanalysisofanORBinthepresenceoffaults22DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
•ORBinitialization:passingenvironmentinformationtotheORBlibraryandobtainingbootstrapreferencesfortheORBandservices;
•POA5management(onaserverobject):methodsallowingservantstoregisterthemselveswiththeobjectadapter,andcontroltheirlifecycle;•policymanagement:dynamicchangestothewayaspectssuchastheconcurrencymodel,oraccesscontrol,arehandled;
•theconversionofobjectreferencestoandfromtextualrepresentation;•utilityproceduresforthecreationofcertaindatatypesandlistsofvalues.Theworkreportedin[Panetal.,2001]ontherobustnesstestingofORBstargetedabout20operationsinthisinterface.TheseoperationsconstituteonlyarelativelysmallportionofthefunctionalityprovidedbyanORB,andareprimarilyusedduringtheinitializationofanapplication.Indeed,mostofthefunctionalityprovidedbyanORBisimplicit,ratherthanresultingfromexplicitcallstoapublicinterface.ConsiderforexampleabasicCORBAmethodinvocationinaC++program:
result=theObject->theMethod(\"argument1\42);ThevariabletheObjectisaninstanceofaclassthatextendsclassesprovidedbytheORBimplementation.TheORBidentifiesthecomputingnodeonwhichtheobjectisrunning,connectstoagivenportonthatmachine,serializestheparametersofthecalltoastandardformat,andsendsthemovertheconnection.Itthenwaitsfortheserver’sresponse,anddeserializesthereplyintothevariableresult,orsignalsaC++exception.
Allthisactivityistransparenttotheapplicationprogrammer,sincethecallissyntacticallyidenticaltoastandardmethodinvocationonalocal,non-CORBAobject.Giventhatthisfunctionalityisnotexposedviaanexplicitinterface,thestandardrobustnesstestingapproachcannotbeapplied.
ThisimplicitfunctionalityprovidedbytheORBcanbebrokendownintoanumberofcategories:
•interactionwiththeapplicationprogramminglanguage:implementingmar-shallinganddemarshallingcode,handlingobjectcreationanddestruction,exceptionhandling;
•network-relatedprocessing:resolvingtheaddressesofhosts,establishingnetworkconnections,sendingandreceivinginformationfromremotehosts;
POA:PortableObjectAdapter,responsiblefordispatchingincomingmethodinvocationstothecorrectservant,andforcontrollingthelifecycleofservants
5
FailureanalysisofanORBinthepresenceoffaults23DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
•handlingconcurrencyaccordingtorequestedpolicy,incooperationwiththeoperatingsystem;
•resourcemanagement:allocatingandfreeingbuffers,etc.
Wewouldliketoapplyrobustnessevaluationtechniquestotheseclassesoffunctionality.Sincetheyarenotaccessiblethroughastandardinterface,weproposetogeneratesynthesizedinterfacesthatcanserveastargetsforfaultinjection.ThepurposeofthesesynthesizedinterfacesistoprovideawayofactivatingtheimplicitfunctionalityprovidedbyanORB.Ideally,wewouldliketobeabletoactivateeachclassofimplicitfunctionalityindividually,toobtaindetailedfailuremodeinformation.However,itisnotpossibletoisolatecertainfunctionalclassesfromtheothers–almostallinteractionswithanORBwillmakeuseofthemarshallingandnetworkingfunctionality,forinstance.
Thefollowingrequirementsshouldbesatisfiedbythesynthesizedinterfacesandthecorrespondingserviceimplementation:
•theyshoulduseallthedifferentdatatypesthatcanbedefinedinOMGIDL,includingcompounddatatypessuchasstructures;
•theyshouldincludeoperationswithargumentsandreturnvaluesthatcoverthepossiblecombinationsofthesedatatypes,includingthedifferentargumentpassingconventions(in,outandinout);
•giventhelargenumberoftestcasesimpliedbythetwoprecedingrequirements,theinjectioncodetargetingthesesynthesizedinterfacesshouldbegeneratedautomatically.Likewise,itshouldbepossibleautomaticallytogenerateaworkloadapplicationforagiveninterface(clearly,thiswillseverelylimitthesemanticleveloftheserviceswhichwecantarget);
•theserviceimplementationshouldbedeterministic,sothatfailureoftheservicecanbedetectedautomaticallyattheapplicationlevel;
•theserviceshouldbedependentonthehistoryofpreviousinvocations(i.e.,itshouldnotbestateless).Iftheservicedependsonsomeinternalstate,thereisagreaterprobabilityoffaultspropagatingtotheinterfacethaniftheservicewerestateless.
Theserequirementscanbemetbyadelayedechoservice,consistingofoperationsthattakeanynumberandtypeofarguments,andreturntheargumentssuppliedbythepreviouscalltotheservice.Thisservicecanbeimplementedforarbitrarymethodsignatures,isdeterministic,andisnotstateless.
Afaultinjectioncampaignusingthisapproachconsistsofthefollowingsteps:
FailureanalysisofanORBinthepresenceoffaults24DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
•generateaninterfacewithsomecombinationofdatatypedefinitionsandoperationsignature.Sincethesetofpossibleinterfacesisinfinite,thegenerationprocessisprobablyrandom,possiblyweighted.
•generatecorrespondingimplementationsfortheservice,theworkload,thefaultinjector,andthefaultobserver.
•invoketheservicewithcorruptedparametervalues,andobservetheservice’sbehaviour.
Thegenerationofparametersfortheinvocationsoftheserviceisawell-knownprobleminfunctionaltesting.Thereareanumberofpossibletechniques,includingstatisticalgeneration[Thévenod-Fosseetal.,1991].Thecorruptionoftheseparametersisnecessarilydependentontheirtype,andontheprogramminglanguagemapping.ManyOMGIDLtypesare“incorruptible”,inthesensethatallthebitsequencesthatcanberepresentedinmemoryhaveavalidrepresentationinthegiventype.Certaintypes,however,havearestricteddomain,andcanthusbesubjectedtoout-of-rangecorruption.
Theinterfaces,serviceimplementationsandworkloaddescribedabovecanbereusedforanumberofthefaultinjectiontechniquesdescribedinthefollowingsections.
5.3.1Performancestress-testing
Anotherformofrobustnesstestingisperformancestresstesting,wheretheunexpectedinputstothesystemconsistofanunusuallyintenseactivityoftheworkload.Theseperformancetestsevaluatethescalabilityoftheservice,intermsoftheaverageresponsetimeandjitter,asafunctionofthenumberofincomingrequestspersecond,andalsoasafunctionofthecomplexityoftherequest.ThisapproachisparticularlywellsuitedtothecharacterizationofCORBAserviceimplementations,sincetheirlevelofperformancecanaffectthewholesystemof
6.systems,andtimelyresponsesmaybecriticalforservicessuchasNotification
5.4Syscallinterpositiontechniques
Thisfaultmodelinvestigatesfaultpropagationtothemiddlewarefromthe
operatingsystemkernelandsystemlibraries.Thefailureattheoperatingsystemlevelcanhaveresultedfromvarioustypesoffaults,bothhardwareandsoftware.Themiddlewarelayerdependsonservicesprovidedbytheoperatingsystemkernel,suchasnetworking,schedulingofthreads,andstablestorageprovided
TheCORBANotificationServiceprovidesapublish/subscribeinfrastructurethatmediatesbetweeneventproducersandeventconsumers,accordingtocertainQualityofServicepolicies.
6
FailureanalysisofanORBinthepresenceoffaults25DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
throughthefilesystem.Thereareanumberofwaysforanerrortopropagatefromtheoperatingsystemtothemiddleware7:
•returninganerrorcodefromasystemcall.Indeed,mostsystemcallsreturnastatuscodeindicatingwhethertherequestedoperationcompletedsuccessfully.Ifnot,theapplicationcanreadanintegercodethatindicatesthereasonforthefailure.
•signallinganexception:theprogram’sexecutionisinterruptedbythearrivalofasignal.Iftheapplicationhasregisteredahandlerforthissignal,ithastheopportunitytoruntherelatedcode;otherwisetheapplicationisabortedbytheoperatingsystem.Forexample,theALRMsignalisusedtonotifyanapplicationthatatimeouthasexpired.
•takingtoolongtocompleteacertainsystemcall(forhardrealtimeapplications).
•corruptingdataduringinput/outputoperations,forexamplewhilereadingandwritingtostablestorage.
•failingtoinformtheapplicationthatsomeeventhasoccurred.Forinstance,applicationscanusetheselectsystemcalltosleepuntilactivityisdetectedonasetoffiledescriptors.Iftheoperatingsystemdoesnotwakeuptheapplication,itwon’thandleincomingmessages.
Arobustmiddlewareimplementationshouldbeabletohandle(certainclassesof)failuresoftheoperatingsystemgracefully.Inmanycases,thiswouldinvolvesignallinganexceptiontotheapplicationlevel,toallowanyerrorrecoverymechanismstobeexecuted.TheresponsetocertaintypesofexceptionalconditionsisspecifiedbytheCORBAstandard.Forexample,aNO_MEMORYCORBAexceptionmustbeusedtosignalaproblemwithdynamicmemoryallocation,andaPERSIST_STOREexceptiontosignalaproblemwithpersistentstorageontheserver.
Acampaignusingthisfaultinjectiontechniqueconsistsofobservingtheeffectsoftheseunexpectedoperatingsystembehavioursatthemiddlewarelevel.Theexperimentaltestbedincludesasystemcallinterpositionlayer,whichisabletointerceptagivensystemcallmadebythemiddleware.Insteadofpropagatingthiscalltotheoperatingsystemkernel,theinterpositionlayerreturns–possiblyafteracertaindelay–anerrorcodetothemiddleware.Thebehaviourofthemiddlewareisthenobserved,fromboththeapplicationlevel–isanexceptionraised,oristhefaultmasked–andfromtheoperatingsystemlevel,toseewhetherthesystemcallisrepeated(providinginformationonthemiddleware’serrorrecoverymechanisms).
InthefollowingweuseterminologyfromthePOSIXstandard,thoughsimilarconceptsexistinmostmodernoperatingsystems.
FailureanalysisofanORBinthepresenceoffaults
26
DeliverableIC3
7
DependableSystemsofSystemsIST-1999-11585
Thefaultinjectioncampaigncouldberunrandomly,byarbitrarilyselectingthetargetedsystemcallforeachrun.However,moreinterestinganalysisoftheexperimentaloutcomescanbeobtainedbytargetingspecificsystemcalls,whenthemiddlewareisinaknownstate.Usingthisapproach,itispossibletodeterminewhatactivitythemiddlewarewasinvolvedinwhenthefaultwasinjected,andexaminethecorrespondingsourcecodetoisolateportionsofcodethatcouldbemademorerobust.
AparticularlycommonactivityinCORBAmiddlewareistheexchangeofanumberofmessageswitharemotehost.Thisactivityresultsinacertaintraceofsystemcalls,whichisrepresentedinFigure8.
ddress resolutiondnsfd = socket(...)send(dnsfd, symbolic-address)recvfrom(dnsfd, ...)close(dnsfd)fd = socket(...)connect(fd, ...)setsockopt(fd, ...)fcntl(fd, ...)send(fd, ...)recv(fd, ...)close(fd)Figure8:Systemcallgraphforanetworkcommunication
Theinitialpartoftheactivityresolvesthetargethost’ssymbolicnameintoanumericalnetworkaddress.Themiddlewarethencreatesacommunicationsockettothisaddress,whichitaccessesviaanumericaldescriptor,andoptionallysetsvariousflagsonthesocket.Itthensendsandreceivesmessagesusingthisdescriptor,andfinallyclosesit.
Inordertousethesyscallinterpositiontechniquetotargetspecificmiddlewareactivities,thefollowinginformationmustbeavailable:
•adescriptionoftheoperatingsystem’sserviceinterface,listingthesignatureofeachsystemcall(includingthetypesofitsparametersandthereturnvalue)andthemeaningofeachoftheerrorcodeswhichcanbegeneratedbythatsystemcall.Thisinformationisavailableintheoperatingsystem’sprogrammingmanual.
FailureanalysisofanORBinthepresenceoffaults27DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
•alistofactivitygraphsforthetargetORBimplementation.Thesystemcalltracescanbegeneratedusingatoolsuchastruss8.
•aworkloadandfailureobserver.Thosedevelopedfortherobustnesstestingapproach(seeSection5.3)canbereusedinthiscontext.
Afaultinjectioncampaignforthisfaultmodelinvolvestargetingeachactivitygraph.Foreachactivitygraph,agivensequenceofsystemcallsisselected.Thenonesystemcallwithinthissequenceisselectedforcorruption.Forthissystemcall,oneofthepossiblefailuremodesisselected.Asyscallinterpositionlayerisgeneratedforthatsyscallandfaultactivationsequence.Theinterpositionlayerwilldetectthetargetedsequenceofsyscalls,whichbecomesthetriggerfortheinjection.
5.5Network-levelfaults
Thisfaultmodelconsistsofsimulatingtheeffectoffaultsaffectingthecommunicationsubsystem.ThisapproachisparticularlyinterestinginaDSoScontext,sinceitprovidesinformationonthewayinwhicherrorsmaypropagatebetweencomponentsystems,throughthecommunicationinfrastructure.
Thisapproachinvestigatestheimpactofcorruptmethodinvocationsarrivingoverthenetwork.Itconsistsofsendingacorruptedrequesttothetarget,andobservingitsbehaviour.Thisfaultmodelsimulatesthreedifferentclassesoffaults:
•transientphysicalfaultsinthecommunicationsubsystem,resultingforexamplefromfaultymemorybanksinrouters,orfaultyDMAtransferswiththenetworkinterfacecard.Networkcorruption,evenoverreliabletransportprotocolssuchasTCP(onwhichIIOPisbased),ismorefrequentthaniscommonlyassumed.BasedonanalysisoftraffictracesonaLANandtheInternet,[StoneandPartridge,2000]reportsthatapproximatelyonepacketin32000failstheTCPchecksum,andthatbetweenoneinafewmillionsandonein10billionpacketsaredeliveredcorruptedtotheapplicationlevel.Thisisbecausethe16-bitchecksumusedinTCPisnotabletodetectcertainerrors.Whilethisproportionisverysmall,itisnon-negligiblegiventhehighcapacityofmodernLANs.
•propagationtothetargetofafaultthatoccurredonaremotecomputingnodeinteractingwiththetarget.Thefaultmayhaveaffectedtheremoteoperatingsystemkernel,itsprotocolstackimplementation,ortheremoteORB,leadingtotheemissionofacorruptedrequest.
•maliciousfaults,suchasdenialofserviceattacksagainstthetarget.GiventhepivotalroleofthenameserviceinmostCORBA-basedsystems,an
8
SeeFigure14foranexampleofthetypeofinformationprovidedbythistool.
FailureanalysisofanORBinthepresenceoffaults28DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
attackerwhocancrashtheservicemaybeabletocausetheentiresystemtofail.Wenote,however,thatmostCORBAsystemswillbedeployedonprivatenetworkswhereallpartiescanbeassumedtobetrustworthy.Thetypesoferrorsthatcouldbeinvestigatedincludesinglebitflips,andthezeroingofsuccessivebytesinamessage.Theseareamongthemostcommonpatternsofcorruptionidentifiedin[StoneandPartridge,2000],andweassumethattheyarerepresentativeoferrorpropagationfromremotenodes.
Thereareseveralpossiblemeansofinjectingthesefaults.Wecouldusededicatednetworkhardware,butthisiscumbersomeandexpensive.Usingsoftware-implementedfaultinjection,faultscouldbeinjectedattheprotocoltransportlayer(forexamplebyinstrumentingtheoperatingsystem’snetworkstack,asin[DawsonandJahanian,1995]).However,thereisaveryhighprobabilitythatthisformofcorruptionisdetectedbytheremotehost’snetworkstack,andthereforenotdeliveredtothemiddleware.Consequently,itwouldbemoreefficienttoinjectthefaultattheapplicationlevel(seeFigure9),beforethedataisencapsulatedbythetransportlayer.TheseexperimentssimulatetheproportionofcorruptpacketsthatTCPincorrectlydeliversasbeingvalid.
ApplicationPresentationSessionTransportNetworkDatalinkPhysicalOSI model
ApplicationGIOPIIOPTCPIPEthernetHardwareCORBA model
Figure9:ProtocollevelsinCORBA
5.5.1Networkprotocolfaults
Aswellasconsideringfaultsthatcorruptthedatacontainedinincomingmessages,itwouldbeinterestingtoconsidertheimpactofhigher-levelfaults,whichaffectthesemanticsofthemessageratherthanitssyntacticalinformation.
Wedonotinvestigateprotocolfaultsthatoccuratthetransportlevelofthenetworkprotocolstack,sincethesewillbehandledbytheoperatingsystem’snetworking
FailureanalysisofanORBinthepresenceoffaults29DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
stackratherthanbythemiddleware.Rather,weconcentrateonprotocolfaultsatthelevelofGIOP,andmorespecificallyIIOP,itsmappingontoTCP(seeFigure9).Thetypesofunexpectedconditionstowhichwecanexposeatargetimplementationinclude:
•ThereceptionofunexpectedGIOPmessages.Forexample,GIOPspecifiesaLocateReplymessagetype,whichissentinresponsetoaLocateRequestmessage.AninjectedfaultcouldconsistofsendingLocateReplymessagetoanobject,withoutithavingemittedacorrespondingLocateRequestmessage.
•ThereceptionofGIOPmessagescontainingstrangerequest-ids.EachGIOPmessagecontainsarequest-id,whichisanumericalidentifierfortherequest.Thisrequest-idisthenusedintheresponse,toidentifyaresponsewitharequest.Thetargetcouldbesentdummyresponsescontainingrequest-idsthatitdidn’tsend.Additionally,theeffectofrequest-idduplicationcouldbestudied(thereceivingORBshoulddropmessagescontainingrequest-idsthatithasalreadyhandled).
•ThereceptionofGIOPmessageswhoseIORscontainunusualservicecontexts.TheservicecontextisusedbytheORBtocontainconnection-relatedinformation(suchasasessionkey,ortheidentifierofthecharactersetnegotiateduponestablishingthenetworkconnection).
FailureanalysisofanORBinthepresenceoffaults30DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
6Experimentalframeworkandresults
Inthissection,wepresenttheresultsofworkcarriedoutatLAASusingthenetwork-levelcorruptionfaultmodeldescribedinSection5.5.Ourmotivationforselectingthisfaultmodelfromthelistoftechniquespresentedintheprevioussection,forourinitialexperimentalwork,isitsrelevanceinthecontextoftheDSoSproject.Ourexperimentsaidasystemintegratorintheselectionofamiddlewareimplementationtobeusedinwrappingandasasupportfortheinterconnectioninfrastructure,byassessingtherobustnessofdifferentcandidateimplementations.Furthermore,themethodprovidesameansofcharacterizingthenatureandthelikelihoodoferrorpropagationbetweencomponentsystems,throughaCORBA-basedinterconnectioninfrastructure.
TheseexperimentstargeteddifferentimplementationsoftheCORBANameService[OMG,2001b].Thisserviceprovidesahierarchicaldirectoryforobjectreferences,allowingserverapplicationstoregisteraserviceunderasymbolicname,andclientstoobtainreferencesbyresolvinganame.
Wechosethistargetsinceitsstandardizedinterfacemakesiteasytocomparedifferentimplementationsoftheservice.Furthermore,theNameServicemayconstituteasinglepointoffailureinaCORBA-basedsystem:whileitispossibletodeployapplicationswithoutusinganamingortradingservice,byallocatingobjectreferencesstatically,mostsystemsrequirethedynamismprovidedbythisservice.ThesamefailuremodecharacterizationtechniquescouldbeappliedtootherCORBAservices,aswellastouserservicesimplementedonCORBA.Wealsobelievethatthefailuremodesexhibitedbyavendor’simplementationofthenameservicewillalsobepresent,toasignificantextent,inotherapplicationsbuiltusingthevendor’sCORBAORB.Indeed,avendor’snameserviceimplementationistypicallycomposedofacertainamountofapplicationcodeimplementingtheservice-specificfunctionality,whichislinkedwiththevendor’ssharedlibraryimplementingitsORB.Asignificantproportionoftherobustnessfailingswehaveobservedarerelativelylowlevel,andthusmorelikelytocomefromtheORBlibrarythanfromtheapplicationcode;wewouldthereforeexpectthattheywillalsobepresentinotherapplicationsusingtheORB.
6.1Failuremodes
Althoughtheclassificationoffailuremodesmaydependonthetargetcomponent,thevariouspossibleoutcomesofacomponent’sbehaviourinthepresenceoffaultsaresimilar.Roughlyspeaking,eitherthefaultissuccessfullydetectedbyvariouserrordetectionmechanisms(behaviouralchecks,executableassertions,hardwaremechanisms,etc.)andsignalledbydifferentmeans(errorstatus,exceptions,interrupts,etc.)totheinteractingcomponents,oritisnot.
Thelattercaseisthemoredifficulttoclassify.Thefirstpossiblesituationis
FailureanalysisofanORBinthepresenceoffaults31DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
thecrashorthehangofthetargetcomponent.Observingthissituationinvolvesexternalmechanismsthatcontrolthelivenessofthecomponentundertest.Whennocrashorhangareobserved,thenmoresubtlemechanismsmustbeusedtodistinguishthecorrectoutcomesofthetarget.Intesting,thisisknownasthenotionofOracle.ThisOraclemustbedefinedbeforehandandispartofboththeactivationprofileofthecomponentundertestandthefaultinjectioncampaignatruntime.Indeed,duringatestexperiment,theoutputsofthecomponentmustbeobtainedtobecompared(attheend)totheOracle.Thisistheonlywaytodetectincorrectbehaviourofthetargetcomponentduringthetestphase,whenbuilt-inerrordetectionmechanismsfail.
Weclassifytheexperimentaloutcomesforinjectionstargetingthenameserviceasfollows:
•kernelcrash:thecomputingnode(ornodes)hostingtheservicebecomesinaccessiblefromthenetwork.Wetestforthisconditionbyattemptingtoexecuteacommandfromaremotemachine.
•servicecrash:attemptstoestablishanetworkconnectiontotheservicearerefused.Typicallythismeansthattheprocessimplementingtheservicehasdied.
•servicehang:theserviceacceptstheincomingconnection,butdoesnotreplywithinagiventimespan.Notethatthisdoesnotnecessarilymeanthatotherclientsoftheserviceareblocked,sinceprocessingmaycontinueinotherthreads.
•applicationfailure(errorpropagationtotheapplicationlevel):theservicestartsreturningerroneousresultstoclients.Weassumeconservativelythaterrorpropagationtotheapplicationcausesanapplicationfailure.
•Exception:aninvocationoftheserviceresultsinaCORBAexceptionbeingraised.WedistinguishbetweenSystemExceptions(whichcomefromtheORB)andUserExceptions(whichareraisedattheapplicationlevel).Theobservationofthesefailuremodesisacrucialissueinafaultinjectioncampaign.Itisdifficulttoachieve100%coverageoftheerrordetectionmechanisms,sosomefailuresmaybeundetected.Inparticular,sinceallfaultinjectionexperimentsarefiniteintime,itispossibleforaninjectedfaultnottoleadtoanyobservableeffectduringthedurationoftheexperiment.Thisdoesnotnecessarilymeanthatthefaulthasnoeffect,sinceitseffectmaybepostponedaftertheendoftheobservationperiod(notionoferrorlatency).
Thesefailuremodesarenotequivalentfromadependabilitypointofview.Signallinganexceptionisthe“best”experimentaloutcome,sincetheserviceremainsavailabletootherusers,andtheapplicationcandecideonthemostappropriaterecoveryaction,suchasretryingtheoperation(inthecase
FailureanalysisofanORBinthepresenceoffaults32DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
ofaTRANSIENTexception)ordecidingtouseanalternativeservice(forCOMM_FAILURE).Itisimportantthattheexceptionprovideasmuchinformationaspossible;COMM_FAILUREismoreusefulthanUNKNOWN,sinceinthelattercasetheapplicationhaslessinformationonwhichtobaseitsrecoverystrategy.Themostseriousfailuremodeiserrorpropagationtotheapplicationlevel;indeed,anyfaulttolerancemechanismsimplementedattheapplicationlevelwillnotbeactivated,andtheerrorisfreetopropagatetothesystem’sserviceinterface.Thekernelandservicecrashandhangfailuremodes,whilenotpositiveoutcomes,areconsideredlessserious,sincetheycanbedetectedbysystem-dependentmechanismssuchaswatchdogtimers.
6.2Experimentalsetup
TheinfrastructureweusetosupportourfaultinjectionexperimentsisshowninFigure10.Itconsistsofthefollowingcomponents:
•theworkloadapplication,whichactivatesthetargetservice’sfunctionality(theworkloadrunsonadifferentcomputingnodefromtheservice);•thefaultinjector,whichsendsacorruptedrequesttothetargetoncetheworkloadhasbeenrunningforacertaintimespan;
•monitoringcomponents,whichobservethebehaviourofthetargetandlogtheirobservationstoanSQLdatabase;
•offlinedataanalysistools,toidentifythevariousfailuremodesbyexaminingthedatacollectedbythemonitoringcomponents.
workloadbind(), resolve(), unbind()controllertargetloggingcorrupt resolve()servicedatabaseFigure10:Experimentalconfigurationofourtestbed
FailureanalysisofanORBinthepresenceoffaults33DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
Ourworkloadapplicationrepeatedlyconstructsanaminggraph,resolvesnamesagainstthisgraph,andthendestroysthegraph.Sincethegraphisconstructedinadeterministicway,theworkloadisabletocheckthattheresultsreturnedbytheservicearecorrect(itplaystherôleoforaclewithrespecttothefunctionalspecificationoftheservice).Iftheworkloaddetectsananomalyintheoperationofthetargetservice,suchasanincorrectresult,thisissignalledasanapplicationfailure.Ifitreceivesanexceptionfromthetarget,itsignalstheappropriateexceptionoutcome.
Eachexperimentcorrespondstoasingleinjectedfault.Acontrollerprocesslaunchesthetargetserviceandobtainsitsobjectreference(intheimplementationswhichwehavetargeted,thenameserviceisimplementedasaUnixdæmon).Itthenstartstheworkloadapplication,passingittheservice’sreference.After20seconds,thefaultinjectorsendsacorruptedresolverequesttothetargetservice(foranamewhichhasnotbeengivenabinding)andwaitsforthereply.TheexpectedreplyisaNotFoundexceptionraisedbythenamingservice.Ifnoreplyarriveswithin20seconds,aServiceHangfailuremodeissignalled.Attheendoftheexperiment,themonitoringcomponentscheckforthepresenceofthedifferentfailuremodesbytryingtolaunchacommandonthetargethost,checkingforreturnedexceptions,etc.
Foreachtargetedimplementation,afaultinjectioncampaigninvolvesrunninganexperimentforeachbitorbytepositionintheresolverequest.Acampaignlastsaround48hourspertargetforthebitflipfaultmodel.
Thisfaultinjectiontechniqueisveryportable,sincetheonlyimplementation-specificcomponentinourtestbedisthecoderesponsibleforlaunchingthetargetimplementation.Thetechniqueisalsononintrusive,anddoesnotrequireanyinstrumentationofthetargetedservice.
6.3Targetimplementations
WehavecarriedoutourexperimentsonfourimplementationsoftheCORBANameService:
•omniORB2.8,byAT&TLaboratories,Cambridge.FreelyavailableundertheGNUGeneralPublicLicence,andimplementedinC++;
•ORBit0.5.0,alsoavailableundertheGNUGeneralPublicLicence,andimplementedinC;
•ORBacus4.0.4,acommercialproductfromObjectOrientedConcepts,implementedinC++;
•thetnameservbundledwithversion1.3ofSun’sJavaSDK.
AllexperimentswerecarriedoutonworkstationsrunningtheSolaris2.7operatingsystem,connectedbya100Mb/sEthernetLAN.Whilewetriedtomakethe
FailureanalysisofanORBinthepresenceoffaults34DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
experimentalconditionsassimilaraspossibleacrossimplementations,anumberoffactorsrequireparticularattention:
•persistence:theomniORBimplementationmaintainslogfilessoastoprovidepersistenceacrossserviceshutdowns.Toensureafreshenvironmentforeachexperiment,weerasetheselogfilesbeforestartingtheservice.TheORBacusimplementationcanbeconfiguredtouselogfiles,butwedonotenabletheminourexperiments.Thetwoothertestedimplementationsdonotsupportpersistence.
•numberofexperiments:weperformexperimentsforeachbitorbytepositioninthecorruptedmethodinvocation.CORBAmethodinvocationscontainanORB-dependentparametercalledtheservicecontext(whichcanbeusedtopropagateimplementation-specificdataandimplicitlypropagatetransactions).ThesizeofthisparameterdiffersslightlybetweenORBimplementations,sotheexactnumberofexperimentschangesslightlyfromtargettotarget.
•theORBitimplementationdefaultstousingnon-interoperableobjectreferences.WeconfiguredittousestandardIIOPprofiles.
Incertainexperiments,weobserveseveralfailuremodes:forexampleaservicecrashwillgenerallyresultinclientsoftheservicereceivinganexceptionindicatingthatacommunicationerrorhasoccurred.Inthefigurespresentedbelow,thefailuremodesareclassifiedaccordingtogravity,andforeachexperimentthemostseriousmodeobservedbythetestbedisselected.
6.4Analysisofresults
Inthissectionwepresenttheresultsofourfaultinjectionexperiments,forboththedouble-zeroandbitflipfaultmodels.Moregeneralanalysisfromadependablesystemintegrator’sperspectiveispresentedinSection6.5.
Figure11comparestheexperimentaloutcomesforeachtargetimplementation,forthedouble-zerofaultmodel(wheretwosuccessiveoctetsofthemessagearesettozero).Thethreeoutcomestotheleftofthelegendare“bad”,whereasthoseontherightindicaterobustbehaviour.TheoutcomeswhosenamesinthelegendareincapitalletterscorrespondtoCORBASystemExceptions.TheNotFoundoutcomeisaCORBAapplication-levelexceptionraisedbythenamingservicewhenitcannotresolveaname;thisistheexpectedbehaviouroftheserviceforourexperiments.Thesumoftheverticalbarsforeachtargetis100%.
FailureanalysisofanORBinthepresenceoffaults35DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
50
40
30
20
10
0
javaORB
ServiceCrashServiceHangUNKNOWN
omniORBORBacus
COMM FAILUREBAD OPERATIONMARSHAL
OBJECT NOT EXISTNotFound
ORBit
Figure11:Experimentaloutcomesfordouble-octetzeroingfaultmodelAfirstremarkisthatwehavenotobservedanycasesoferrorpropagationtotheapplicationlevel,whichisapositivepoint.However,therearearelativelylargeproportionofservicehangsandcrashes.
Asstatedearlier,ourservicehangfailuremodedoesnotimplythatotherclientsofthenamingserviceareblocked;weonlyconsiderthetimetakentoreplytothecorruptedinvocation.However,givenitsrelativefrequency,itisoneofthemostseriousdependabilityproblemswehaveidentified.TheupcomingCORBA2.4specificationallowsclientstospecifytimeoutsontheirrequests,whichwouldbehelpfulfordetectingthistypeofsituationwithoutresortingtoapplication-levelwatchdogmechanisms.Someoftheimplementationstestedalreadysupporttheseinterfacesorprovidesimilarmechanisms(buttheywerenotactivatedinourexperiments).
ExaminingthedetailsofthebreakdownofCORBAexceptions,weobservethattheJavaimplementationraisesveryfewCOMM_FAILUREexceptions,butalargerproportionofUNKNOWNexceptions(thisexceptionisraisedbyanORBwhenitdetectsanerrorintheserverexecutionwhosecauseitcannotdetermine–forexample,inJava,anattempttodereferenceanullpointer).UNKNOWNisalessusefulexceptiontosignaltotheapplicationlayer,sinceitconveysnoinformationonthecauseoftheexception,sofromthispointofviewtheJavaORBcanbeconsideredlessrobust.TheORBacusserviceraisesagreaterproportionofMARSHALexceptions,whichindicatesthatitsmarshallingcodedoesmoreerrorcheckingthanotherimplementations(apositivepointfromarobustnesspointofview);ORBitdoesnotraiseMARSHALexceptions.
TheproportionofOBJECT_NOT_EXISTexceptions,whichtheORBusesto
FailureanalysisofanORBinthepresenceoffaults36DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
signalthattheobjectreferenceagainstwhichthemethodwasinvokeddoesnotexist,isverysimilarbetweenimplementations.Thisistobeexpected,sinceanORBisrequiredtocheckthevalidityofthisvaluebeforedispatchingthemethodinvocationtotheappropriateservant.AsimilarremarkcanbemadefortheBAD_OPERATIONexception.
6.4.1Differencesbetweenfaultmodels
Figure12showstheexperimentaloutcomesforeachtargetimplementationforthebitflipfaultmodel.Theresultsdifferslightlyfromthoseforthedouble-zerofaultmodel.ThefirstdifferencebetweentheresultsfromthetwofaultmodelsistheappearanceofaInvalidNameexceptionwhichisnotprovokedbythedouble-zerofaultmodel.Thisexceptionisraisedbythenamingserviceeitherwhenthenameitisaskedtoresolveisempty,or–morelikelyinourcase–whenthenamecontainsaninvalidcharacter.
50
40
30
20
10
0
javaORB
ServiceCrashServiceHangUNKNOWN
omniORBORBacus
COMM FAILUREBAD OPERATIONMARSHAL
OBJECT NOT EXISTNotFound
ORBit
Figure12:Experimentaloutcomesforbitflipfaultmodel
AsecondobservationisthatthebitflipfaultmodelresultsinagreaterproportionofMARSHALandNotFoundexceptions.Inthelattercase,thedifferenceislikelytobeduetotheservicemaskingcertainerrors.Indeed,certainbitsinanIIOPmessageareunused.Forexample,thebyteorderofamessageisrepresentedbyazerooraonemarshalledintoanoctet;sevenofthesebitsarenotsignificant,andsotheircorruptionmaynotbedetectedbytheORB.Incontrast,adouble-zeroerrorisunlikelytoescapethenoticeoftheORB.
Certainotherphenomena,suchasthesmallproportionofCOMM_FAILUREand
FailureanalysisofanORBinthepresenceoffaults37DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
BAD_OPERATIONexceptionsraisedbyJavaORBforthebitflipfaultmodel,wouldrequiredeepanalysisofthesourcecodetoexplain.6.4.2Influenceoftheerrorposition
Figures11and12aggregatetheresultsoffaultsinjectedateachpossiblepositioninthemessage.Itisalsointerestingtoexaminethefailuremodesasafunctionofthepositioninthemessagewherethefaultwasinjected.Figure13showsthemostcommonexperimentaloutcomesforcertainregionsofthemessage9.
IIOP headeroperation arguments08163248byte-ordermessage-typeresponse-expected?OBJECT_NOT_EXISTMARSHALBAD_OPERATIONCOMM_FAILUREServiceHangServiceCrash#\\G#\\I#\\O#\\PGIOP-versionservice-context...message-lengthobject-keyoperation...requesting_principalFigure13:FormatofaGIOPmessage
Whenthefaultaffectsthepartofthemessagewhichidentifiestheinvokedoperation,primarilyBAD_OPERATIONexceptionsaresignalled,aswouldbeexpected.Similarly,faultsinjectedinthefirstfewbytesoftheIIOPrequest(whichcontainaspecialsignaturewhichidentifiesthemessagetype)resultmainlyinCOMM_FAILUREexceptions.
Whenthefaultaffectstheheaderbitsencodingthemessage’slength,wemostlyobserveservicehangs.Giventhatthereare32bitstoencodethemessagelength,andthatourmessagesarerelativelyshort(around900bits),abitflipinthiszone(normallysettozero)islikelytoincreasetheannouncedmessagelength,sotheservicewaitstoreadmoredatathanwillactuallyarrive.6.4.3Internalerrorcheckingmechanisms
TheORBacusservicewascompiledinitsdefaultconfiguration,withoutdeactivatinginternal“can’thappen”assertions.Whentheseassertionsfail,theprogramvoluntarilyexitsusingtheabortprocedure.ThisleadstoORBacus
9
Thedatainthefigureisvalidforallthetargetedimplementations
FailureanalysisofanORBinthepresenceoffaults38DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
showingarelativelyhighproportionofservicefailures,someofwhichcouldbeavoidedbyusingadifferentconfiguration.TheomniORBimplementationcanbeconfiguredatruntimetoabortwhenitdetectsaninternalerror,butwedidnotenablethisfeature.
6.4.4Systemcalltraceanalysis
Ourtestbedallowsustoobtainsystemcalltracesandexecutionstackbacktracesofthetargetprocess.Theseshowthatdifferentmiddlewareimplementationsactivatetheoperatingsystemindifferentways.Forinstance,theORBacusimplementationmakesalargenumberoflwp_mutexandlwp_semacalls,whichenablethesynchronizationofthreads,whereastheomniORBimplementationusesamuchnarrowerrangeofsystemcalls,primarilyforreadingandwritingtothenetworkandtoitslogfile.
ThesystemcalltracesalsoillustratedifferencesinthelevelofinternalerrorcheckingbetweenORBimplementations.Forexample,whenfaultsareinjectedintocertainbitpositions,theORBitimplementationcausesasegmentationviolationwhiledecodingthecorruptedmessage,andisforciblyabortedbytheoperatingsystem.Incontrast,theORBacusimplementationsometimesdetectsthecorruptioninternally,andisabletoprintawarningmessageindicatingthepositionintheprogramwheretheerrorwasdetected,beforevoluntarilyaborting.ThislackofinternalerrorcheckingisanimplementationdecisionforORBit,whoseprimarydesigngoalsarehighperformanceandasmallfootprint.
Figure14showsoutputfromthetrusstoolonSolaris,fortheORBitsegmentationviolationdescribedabove.Thetoolgeneratesatraceoftheinteractionbetweentheoperatingsystemandaprocess,showingthesystemcallsperformedbytheprocesswiththeirarguments,machinefaultsincurredbytheprocessandthesignalsdeliveredtoitbytheoperatingsystemkernel.ThetraceshowsthatafterhavingreadaGIOPrequestfromthenetwork,thenameserviceattemptstoaccessmemoryoutsideofitsaddressspace,receivesasegmentationviolationsignal,andisabortedbytheoperatingsystem.
FailureanalysisofanORBinthepresenceoffaults39DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
fcntl(7,F_SETFL,0x00000082)=0poll(0xFFBEF300,3,-1)=1
read(7,\"GIOP01\\001\\0U\\0\\0\\0\12)=12read(7,\"\\0\\0\\0\\001\\0\\0\\001\\0\\0\\0\"..,85)=85
Incurredfault#6,FLTBOUNDS%pc=0x00014274siginfo:SIGSEGVSEGV_MAPERRaddr=0x6E7A1124Receivedsignal#11,SIGSEGV[caught]
siginfo:SIGSEGVSEGV_MAPERRaddr=0x6E7A1124siginfo:SIGSEGVSEGV_MAPERRaddr=0x6E7A1124
[...]
getpid()=26780[26779]kill(26780,SIGABRT)=0
Receivedsignal#6,SIGABRT[caught]siginfo:SIGABRTpid=26780uid=3905
fstat(3,0xFFBED8E0)=0[...]
llseek(0,0,SEEK_CUR)=0_exit(1)
Figure14:trussoutputshowinganORBitsegmentationviolation
6.5Analysisfromanintegrator’spointofview
TheexperimentalresultspresentedinSection6showarelativelylargevariabilityofbehaviourofthetargetcandidatesinthepresenceoffaults.Thisdemonstratesthat,althoughtheservice’sinterfaceisstandardized,aparticularcandidate’sbehaviourdependsonthedesignandimplementationdecisionsmadebythevendor.Inthissection,weadopttheviewpointofasystemintegratorwhomustselectacandidateimplementationforasafetycriticalsystem.
Assuch,werankfirstcandidatesthatdeliverrelevanterrorreportinginformation,i.e.,thosewhichexhibitfewerservicehangsandUNKNOWNexceptions.Thesearethemostproblematicfailuremodeswhendecidingonfaulttolerancestrategiesanderrorrecoverymechanismsthatcanmeetthesystem’sdependabilityrequirements.BygroupingalltheexceptionsexceptforUNKNOWNtogether,weobtainthepercentagesforthebitflipfaultmodelshowninTable1.
Table1:Rankingofserviceimplementations
Implementation
ORBacusomniORBORBitJavaSDK
Exception88.079.576.658.0
UNKNOWN1.30.60.020.3
ServiceHang
6.119.315.820.7
ServiceCrash
4.60.67.61.0
FailureanalysisofanORBinthepresenceoffaults40DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
Fromthisviewpoint,theORBacusandomniORBimplementationsexhibitthesafestbehaviour:moresignificantexceptionsarereported,i.e.,fewerUNKNOWNexceptions,andthereisasmallerproportionofservicehangs.ORBacushasarelativelyhighrateofservicefailure,which(asdiscussedinSection6.4.3)ispartlyduetotheconfigurationwechose.Thistypeofreactiontoabnormalsituationsisnotnecessarilyanegativepointfromadependabilityviewpoint.Manyfaulttolerancestrategies,particularlyinadistributedcomputingcontext,makeafailsilenceassumption,whichrequirescomponentstoproduceeithercorrectresults,ornone.Silentfailurescansuccessfullybehandledbyreplication,eitherbyusingidenticalcopieslocatedondifferentsites,todealwithphysicalorenvironmentalfaults[Powell,1991],orbyusingdiversifiedcopiestoprotectagainstsoftwarefaults[Avizienis,1975,Randell,1975,Laprieetal.,1990].Wealsoobservedintheexperimentsthatthebehaviourdependsonthefaultmodel.Theresultsobtainedwithdoublezeroingandbitflipsleadtoadifferentstatisticaldistributionofthefailuremodes.However,theresultingnumbersdonotdisturbtherankinggiveninTable1.Manyissuescaninfluencetheobservedresults.Neverthelessbothtypesofexperimentsleadingtothesameconclusionsreinforcetheconfidenceonecanhaveintheranking.
Clearly,manyotheraspectsofmiddlewaredependabilitymustbetakenintoaccountinthefinalselectionofacandidate.Inparticular,theeffectsofotherclassesoffaultsneedtobeinvestigated.Fromthisviewpoint,theworkdonebytheBallistaproject[Panetal.,2001],whichusesadifferentfaultmodelandtargetsadifferentpartofthemiddleware,iscomplementarytoours.
FailureanalysisofanORBinthepresenceoffaults41DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
7Conclusionsandfuturework
ThisdocumentproposesamethodforthefailuremodeanalysisofCORBA-basedsystems.Thismethodreliesessentiallyonconventionalfaultinjectiontechniquesandonaclearidentificationofpossibletargetsinamiddlewareimplementation.AlthoughtheCORBAstandarddefinesthefeaturesthatmustbeprovidedbyCORBAORBimplementations,theirimplementationmayvarysignificantlyfromonevendortoanother.Fromadependabilityviewpoint,thedesignstrategyandtheimplementationofthestandardareofprimeimportance.Clearly,amiddlewaresuchasCORBAincludesseveralfacetsthatmakethecharacterisationquitedifficult.WeanalysedthevariouscomponentsandpossibletargetsinaCORBAmiddlewareandjustifiedtheuseofaparticularfaultinjectiontechnique.InthecontextofDSoS,theanalysisoffailuremodestargetingsensitiveservicesusingnetworkcorruptionseemedthemostrelevant,andwasthefirsttobetackled.ExperimentshavebeencarriedouttoobtainsignificantresultsonanumberofCORBAimplementations.Theseresultsshowthevariouspossiblebehavioursthatcanbeobservedandtheirimpactinasystemofsystems,fromadependabilityviewpoint.Theinsightsrevealedbytheseexperimentsarenovelandusefulinputstodeveloperrorconfinementwrappers(cfSection4ofDSoSdeliverableIC2).WehavepresentedanexperimentalrobustnessevaluationmethodforCORBA-basedservices,anddiscussresultsofexperimentstargetingfourimplementationsoftheCORBANameService.TheseexperimentscanbecarriedoutonanyCORBAserviceoruser-definedserviceontopofCORBA.ThechoiceoftheNamingServicewasjustifiedbyitsessentialroleinaCORBAdistributedsystem.Itisworthnotingthattheseexperimentsalsoevaluatetheeffectofcorruptedmethodinvocationsatthemiddlewarelevel.
Theimplementationswehavetestedshowanon-negligiblenumberofrobustnessweaknesses,butwehavenotobservedanyfailurescorrespondingtothepropagationofanerrorfromthemiddlewaretotheapplicationlevel.OurresultssuggestthattherobustnessofCORBA-basedsystemswouldbeenhancedbytheadditionofan(application-level)checksumtoGIOP.Theachievedfailuremodecharacterizationaidsintheselectionofacandidatemiddlewareimplementationforcriticalsystems,andhelpsDSoSsystemintegratorsdecideontheerrordetectionandrecoverymechanisms,faulttolerancestrategiesandarchitecturalsolutionsthatareneededtomeetdependabilityrequirements.Ourtechniqueisnon-intrusive,and(thankstothetransparencyprovidedbyCORBA)easytoport,bothtonewimplementationsoftheservice,andtoalternativeoperatingenvironments(operatingsystem,hardwareplatform).TheapproachcouldalsobeappliedforthefailuremodescharacterizationofotherCORBAservices,bymodifyingtheworkloadandthefaultinjector.
Thismethodwillbeusedtocarryoutotherexperimentsandobtainmoreresults
FailureanalysisofanORBinthepresenceoffaults42DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
regardingthefailuremodesofaCORBA-basedsystem10.First,faultinjectionwillbeperformedwithinseveraltargetcomponentscomposingthemiddlewareusingbit-flipfaultinjectiontechniquestargetingmemorysegments(simulationofhardwarefaults,asdescribedinSection5.1).Second,wewilladdresstherobustnessofimplicitfunctionsofanORBusinganadhocinterfacetothesefunctions.Therobustnessoftheseessentialfunctionswillbeevaluatedusingparameterfaultinjectiontechniques(cfSection5.3).Third,weplantoexaminetheinfluenceoffaultspropagatingfromtheoperatingsystemtothemiddleware,asdescribedinSection5.4.Theextensionoftheexperimentscarriedoutwillprovideusefulinputstothedefinitionoferrorconfinementwrappers.
Acknowledgements:TheauthorswouldliketothankJeanArlatforhelpfulcommentsontheirexperimentsandonearlyversionsofthisdocument.
References
[Arlatetal.,1993]J.Arlat,A.Costes,Y.Crouzet,J.-C.Laprie,andD.Powell.Faultinjectionanddependabilityevaluationoffault-tolerantsystems.IEEETransactionsonComputers,42(8):913–923,August1993.[Avizienis,1975]A.Avizienis.Fault-toleranceandfault-intolerance:complemen-taryapproachestoreliablecomputing.ACMSIGPLANNotices,10(6):458–4,June1975.[Carreiraetal.,1998]J.Carreira,H.Madeira,andJ.G.Silva.Xception:Atechniquefortheexperimentalevaluationofdependabilityinmoderncomputers.IEEETransactionsonSoftwareEngineering,24(2):125–136,February1998.[ChevalleyandThévenod-Fosse,2001]P.ChevalleyandP.Thévenod-Fosse.Amutationanalysistoolforjavaprograms.TechnicalReport01356,LAAS-CNRS,September2001.[Chungetal.,1999]P.E.Chung,W.Lee,J.Shih,S.Yajnik,andY.Huang.Fault-injectionexperimentsfordistributedobjects.InIEEE,editor,ProceedingsoftheInternationalSymposiumonDistributedObjectsandApplications,1999.[DaranandThévenod-Fosse,1996]M.DaranandP.Thévenod-Fosse.Softwareerroranalysis:arealcasestudyinvolvingrealfaultsandmutations.InS.J.Zeil,editor,Proceedingsofthe1996InternationalSymposiumonSoftwareTestingandanalysis,pages158–171,NewYork,January8–101996.ACMPress.
TheresultsobtainedfromtheseupcomingexperimentswillbesummarizedintheforthcomingDSoSdeliverablePCE1,togetherwiththedefinitionofthecorrespondingrobustness-enhancingmechanisms.
10
FailureanalysisofanORBinthepresenceoffaults43DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
[DawsonandJahanian,1995]S.DawsonandF.Jahanian.Probingandfaultinjectionofdependabledistributedprotocols.TheComputerJournal,38(4):286–300,1995.[Dawsonetal.,1997]S.Dawson,F.Jahanian,andT.Mitton.ExperimentsonsixcommercialTCPimplementationsusingasoftwarefaultinjectiontool.SoftwarePracticeandExperience,27(12):1385–1410,December1997.[Fabreetal.,2000]J.-C.Fabre,M.Rodríguez,J.Arlat,F.Salles,andJ.-M.Sizun.BuildingdependableCOTSmicrokernel-basedsystemsusingMAFALDA.InProceedingsofthe2000PacificRimInternationalSymposiumonDependableComputing(PRDC-2000),pages85–92.IEEEComputerSocietyPress,2000.[Fuchs,1998]E.Fuchs.Validatingthefail-silenceoftheMARSarchitecture.InProc.6thIFIPInt.WorkingConferenceonDependableComputingforCriticalApplications:DCCA-6,pages225–247.IEEEComputerSocietyPress,1998.[Jonesetal.,2001]C.Jones,K.Kopetz,E.Marsden,M.Paulitsch,D.Powell,B.Randell,andR.Stroud.Revisedversionofconceptualmodel.Researchreport,DSoS,September2001.[Kalyanakrishnametal.,1999]M.Kalyanakrishnam,Z.Kalbarczyk,andR.Iyer.FailuredataanalysisofaLANofWindowsNTbasedcomputers.InProceedingsofthe18thIEEESymposiumonReliableDistributedSystems(SRDS’99),pages178–1,Washington-Brussels-Tokyo,October1999.IEEE.[Karlssonetal.,1998]J.Karlsson,P.Folkesson,J.Arlat,Y.Crouzet,G.Leber,andJ.Reisinger.ApplicationofthreephysicalfaultinjectiontechniquestotheexperimentalassessmentoftheMARSarchitecture.InProc.5thIFIPWorkingConferenceonDependableComputingforCriticalApplications:DCCA-6,pages267–287.IEEEComputerSocietyPress,1998.[KoopmanandDeVale,1999]P.J.KoopmanandJ.DeVale.ComparingtherobustnessofPOSIXoperatingsystems.InProceedingsofthe29thAnnualInternationalSymposiumonFault-TolerantComputing(FTCS-29),pages30–37,LosAlamitos,CA,USA,1999.IEEEComputerSocietyPress.[Labovitzetal.,1998]C.Labovitz,G.R.Malan,andF.Jahanian.Internetroutinginstability.IEEE/ACMTransactionsonNetworking,6(5):515–528,October1998.[Laprieetal.,1990]J.-C.Laprie,J.Arlat,C.Beounes,andK.Kanoun.Definitionandanalysisofhardware-andsoftware-fault-tolerantarchitectures.Computer,23(7):39–51,July1990.
FailureanalysisofanORBinthepresenceoffaults44DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
[Madeiraetal.,2000]H.Madeira,D.Costa,andM.Vieira.Ontheemulationofsoftwarefaultsbysoftwarefaultinjection.InProceedingsoftheInternationalConferenceonDependableSystemsandNetworks(DSN2000),pages417–426.IEEEComputerSocietyPress,2000.[Milleretal.,1990]B.P.Miller,L.Fredriksen,andB.So.AnempiricalstudyofthereliabilityofUNIXutilities.CommunicationsoftheACM,33(12):32–44,December1990.[Nimmagaddaetal.,1999]S.Nimmagadda,C.Liyanaarachchi,A.Gopinath,D.Niehaus,andA.Kaushal.Performancepatterns:AutomatedscenariobasedORBperformanceevaluation.InProceedingsoftheFifthUSENIXConferenceonObject-OrientedTechnologiesandSystems,pages15–28.TheUSENIXAssociation,1999.[OMG,2001a]OMG.TheCommonObjectRequestBroker:ArchitectureandSpecification.Technicalreport,September2001.(formal/2001-09-01).[OMG,2001b]OMG.CORBAServices:CommonObjectServiceSpecification:NamingServiceSpecification.Documentationavailableatwww.omg.org,ObjectManagementGroup,February2001.[Panetal.,2001]J.Pan,P.Koopman,D.Siewiorek,Y.Huang,R.Gruber,andM.L.Jiang.RobustnesstestingandhardeningofCORBAORBimplementa-tions.InProceedingsoftheInternationalConferenceonDependableSystemsandNetworks(DSN2001).IEEE,June2001.[Powell,1991]D.Powell.Delta-4:AGenericArchitectureforDependableDistributedComputing.Springer-Verlag,Berlin,Germany,1991.[Randell,1975]B.Randell.Systemstructuresforsoftwarefaulttolerance.IEEETransactionsonSoftwareEngineering,SE-1(2):220–232,June1975.[Riménetal.,1994]M.Rimén,J.Ohlsson,andJ.Torin.Onmicroprocessorerrorbehaviormodeling.InProceedingsofthe24thAnnualInternationalSymposiumonFault-TolerantComputing,pages76–85,LosAlamitos,CA,USA,June1994.IEEEComputerSocietyPress.[StoneandPartridge,2000]J.StoneandC.Partridge.WhentheCRCandTCPchecksumdisagree.InProceedingsofthe2000ACMSIGCOMMConference,pages309–319,2000.[SullivanandChillarege,1991]M.SullivanandR.Chillarege.Softwaredefectsandtheirimpactonsystemavailability-astudyoffieldfailuresinoperatingsystems.21stInt.Symp.onFault-TolerantComputing(FTCS-21),pages2–9,1991.
FailureanalysisofanORBinthepresenceoffaults45DeliverableIC3
DependableSystemsofSystemsIST-1999-11585
[Thévenod-Fosseetal.,1991]P.Thévenod-Fosse,H.Waeselynck,andY.Crouzet.Anexperimentalstudyonsoftwarestructuraltesting:Deterministicversusrandominputgeneration.InFaultTolerantComputing,pages410–417,LosAlamitos,Ca.,USA,June1991.IEEEComputerSocietyPress.[VoasandMcGraw,1997]J.VoasandG.McGraw.SoftwareFaultInjection.JohnWileyandSons,1997.[ZieglerandSrinivasan,1996]J.F.ZieglerandG.R.Srinivasan.Preface:Terrestrialcosmicraysandsofterrors.Development,40(1):2–2,January1996.
FailureanalysisofanORBinthepresenceoffaultsIBMJournalofResearchand46DeliverableIC3
因篇幅问题不能全部显示,请点此查看更多更全内容
Copyright © 2019- oldu.cn 版权所有 浙ICP备2024123271号-1
违法及侵权请联系:TEL:199 1889 7713 E-MAIL:2724546146@qq.com
本站由北京市万商天勤律师事务所王兴未律师提供法律服务