195
ALow-Voltage42.4G-BPSSingle-EndedRead-Modify-WriteBusandProgrammablePage-Sizeona3DFrame-Buffer
KazunariINOUE†a),Member,HideakiABE†,KaoriMORI†,
PAPERSpecialIssueonLow-PowerHigh-SpeedCMOSLSITechnologies
SUMMARYVariouskindsofhighbandwidtharchitectureusingtheembeddedDRAMtechnologyhavebeenpresentedpre-viously.Inmostcases,theyusewidebusimplementationand/orfastbusspeed,thatbothhavethepenaltyofdieareaandmuchpowerconsumptionatthesametime.Theproposingsingle-endedread-modify-writebusincreasesthebandwidthtwiceashigh,whileitmaintainsthesamebussizeandthesamebusspeed.Thedata-buscomprises1k-bitread-busand1k-bitwrite-busthateachworksconcurrently,andhasamplitudefrom0Vto1V,hencethemeasuredpowerconsumptionisonly0.3Watafrequencyof166MHz.Aprogrammablepage-sizereducesthepagemiss-rateandefficientlyimprovesthebandwidththatiscomparabletothewidebusandfastspeedapproach.Alltheproposingfeaturesareimplementedona3Dframe-buffertoachieve42.4G-BPSbandwidth.keywords:
1.Introduction
EmbeddedDRAM(eRAM)technologyhasbeenre-portedtohavehighbandwidthandmoreadvantagesfor
certainapplications[1]–[6].Inparticularaframe-bufferisoneofthemostnoticeableeRAM,becausethecom-moditymemoryhastoodeepandnarrowbandwidthforthegraphicsapplication.Inmostcases,theypersistedinwidebusimplementationand/orfastbusspeed,bothhavethepenaltyofdieareaandmuchpower.EvenifhugenumberofdatabusplacesonDRAMarraytosavediearea[3],thepowerconsumptioncancausenoiseproblem.Asingle-endedread-modify-writebusincreasesthebandwidthtwiceashigh,whileitmain-tainssamebussizeandbusspeed.Inadditionthedatabushasamplitudefrom0Vto1V,hence,thelowvolt-ageoperationefficientlyeliminatesthenoiseproblemaswellaslowpowerconsumption.
Anotherapproachtoobtainahighbandwidthistoimprovethepage-missthatisinherenttoDRAM.TheeRAMtechnologyisalsousefulhere,(e.g.hav-ingadditionaldata-latchthatconsecutivelycommuni-cateswiththedata-businsteadofsense-amptohidethepage-miss[4];buildingcascadedmulti-bankarchi-ManuscriptreceivedJune30,1999.†
ManuscriptrevisedOctober12,1999.TheauthorsarewithSystem-LSIdivisionAS-memorygroup,MitsubishiElectricCorporation,Itami-shi,6-81Japan.
a)E-mail:inoue.kazunari@lsi.melco.co.jp
andShujiFUKAGAWA†,Nonmembers
tectureandhavingauniqueaccesscontrolcircuitryforrelievingthepage-misspenalty[5],[6].TheeRAMtech-nologyreducesthepage-missandimprovestheaveragebandwidthtowardthepeakbandwidth.Inthispaperweproposeaprogrammablepage-size,thatadjustsitssizetofitthedifferentkindsofdata-accesssequence,whichisgenerallyoccurredinthegraphicsapplication.
Whendata-accessissequencedfromrighttoleft,ahorizontalpage-sizeisapplied.Whendata-accessissequencedasatile,(i.e.righttoleftfourtimesfirstandshiftstobottomandrepeatrighttoleftfourtime),arectangularpage-sizeisappliedtoreducethepage-missrate.
Onentiregraphicssystem,theframe-bufferinter-faceswitharenderingprocessorandaRAMDAC.Therequiredbandwidthfortheframe-bufferisacombina-tionoftherenderingpixel-rateandthevideopixel-rate.
TherequiredbandwidthwillbediscussedinSect.2.Sections3,4,and5willbeintroducingpro-posedhighbandwidtharchitecture,whichcanminimizethepowerconsumptionandsiliconpenalty.Section3isexplainingtheexternalI/O-buswhileSect.4isre-latingtotheinternalI/O-bus.Section5explainstheproposeddatatransferbufferandlow-voltagebus.Theprogrammablepage-sizepage-missreductiontechniquewillbediscussedinSect.6.2.
Band-GapbetweenFrame-BufferandRen-dering/Video
Thissectionwillshowthecomputedbandwidth,be-tweentherenderingprocessor,video,andtheframe-buffer.
Figure1isanexampleofa3Dgraphicssystem.Ageneralrenderingprimitivein3Dapplicationisatriangle.
First,CPUpassesthevertexpixeldatatotheren-deringprocessortocomputetheinnerpixeldataofthetriangle.Thecomputationisdoneinthesetupunitandthescanneroftherenderingprocessor.Second,therearemanyrenderingproceduressuchastexturemapping,inthepixel-pipeline.Somepixel-proceduresaredonebyhardwareandsomearedonebysoftware.
Finally,therenderingprocessorgeneratesthe
196
Fig.1Agraphicssystemwhichisrendering,frame-buffer,andRAMDAC.
result-pixeldataandprovidesthemtotheframe-buffer.Aframe-bufferisamemory-LSI,whichdisplaysthepixel-dataonthescreen.Theoutputoftheframe-bufferisthevideo-datathatcontainsA-colorandB-color,whichisnamedaccordingtothedoublebuffer-ingusedintheanimation.Thevideo-dataiscoupledtoRAMDACandfollowthehorizontalscan-linethatisseenonthescreen.
2.1RenderingPixelRateandFrame-BufferBand-widthArenderingprocessorconsistsofmultiplepixel-pipelines.Thebandwidthwhichframe-bufferdevotestotherenderingpixel-pipelineis,
BW[render]
=2(Source+Destination)*BPP*N*F(1)
Note:
BPP=BitPerPixel
N*F=#pixelpipeline*operatingfrequencyHere,thepixel-pipelineusesonlynew-data(=Source)in2D,while3Dapplicationneedsnew-dataandold-data(=Destination).Notall3D-proceduresusedes-tinationdata,butsomeofthemcertainlyneedbothsourceanddestinationasdescribedinFig.1.Thatiswhy3D-renderingisnamely“read-modify-write.”
Figure2showsthehistoricaltrendofBW[render].ItlookslikeasquarecurvebecauseofthedoublegrowthofBPP[BitPerPixel]andN*F[pixelpipeline].Assum-inganapplicationhastwopixelpipelinesworksinpar-allelat150MHzwith32-bitcolorand32-bitdepthforatotalof-BPP,thebandwidthofaframe-bufferre-quiresfromtherenderingprocessoris,
BW[render]=4.8G-Byte/sec
(2)2.2VideoPixelRateandtheFrame-BufferBand-widthARAMDACcomprisesofapallet-RAMandaDAC
IEICETRANS.ELECTRON.,VOL.E83–C,NO.2FEBRUARY2000
Fig.2
ThehistoricaltrendoftherenderingBW.
Fig.3
ThehistoricaltrendofvideoBW.
(D/Aconverter)toproduceavisualdisplayfromthevideopixel-data.Thebandwidthofaframe-bufferre-quiresfromaRAMDACis,
BW[video]
=2(A/Bbuffer)*BPP*screen-size*1.4*frame-rate(3)
Note:
BPP=bitperpixelforcolorchannel
Adouble-bufferingtechnologyrequiresdoubledataratefortheframe-buffer.Becauseframe-bufferdoesnotdis-tinguishwhetherA-bufferorB-bufferisindicatedonthescreen.Bothbuffers’dataneedtobetransferredtoRAMDAC.Also,thevideopixel-rateisproportiontothescreensizeandtotheframerate.Figure3showsthehistoricaltrendofvideopixel-rate.Intoday’sap-plication,using32-bitcolor,1280*1024screensize,andaframe-rateof60fps,thebandwidthis,
BW[video]=881M-Byte/sec
(4)
Lastly,theentirebandwidththataframe-bufferneeds
INOUEetal.:ALOW-VOLTAGE42.4G-BPSSINGLE-ENDEDREAD-MODIFY-WRITEBUS
197
Fig.4Blockdiagramofreportedframe-buffer.
toofferisasbelow.
BW[frame-buffer]=BW[render]+BW[video](5)
=5.7G-Byte/sec(6)Besidesahigh-endWS,thereisnootherframe-buffersolutionmeettheabovebandwidth.When3D-animationisusedintoday’sPCapplication;some-times,colormodeiscompromisedto16-bitcolor,and
othertime,thevideoframe-ratereducesto30fpsorlessbecauseofthelackofframe-bufferbandwidth.3.
HighBandwidthApproach
Fig.5
Theproposedgraphicsprocedure.
Figure4showstheblockdiagramofanexperimen-talembedded3Dframe-buffer.DRAMisusedastheframe-memoryandSRAMisusedasafirstlevelcache.1k-bitRead-busand1k-bitwrite-bus,whichworkcon-currently,areplacedonDRAMarray.Implementing2k-bitbus-sizeonaDRAMarrayisclosedtothemax-imumnumberofbitsallowablebythedesign-rule[3].
Aprogrammablepage-sizecontrolisproposedinordertoreducethepage-missrate.Thepage-sizecontrolisalsoeffectivetoimprovetheaverageband-width.Video-portprovidestheserialaccessdataforthescreen.JTAGistheserialportfortesting.4.
BandwidthonI/O-BusandtheFrame-Buffer
NomatterhowDRAMembeddedtechnologyenablesawiderbustoachievefasterspeed,itismoredifficultforanexternalI/O-bustocatchupwithsuchahighbandwidth.Therefore,weareproposingapixel-rateimprovementontheexternalI/O-busfirst,andthenexplainsanewinternalI/O-busarchitecture.4.1BWontheExternalI/O-Bus
CurrentgraphicssystemcontainsthreekindsofLSI,
therenderingprocessor,theframe-buffer,andtheRAMDAC.Therenderingprocessorgeneratestheresult-pixelbysynthesizingthesourceandthedesti-nation.ThesourcecomesfromCPUandthedestina-tioncomesfromtheframe-buffer,thusbothdestina-tionandresult-pixelgobackandforthalongtheexter-nalI/O-busbetweentherenderingprocessorandtheframe-buffer.Therefore,theactualrenderingpixel-ratedecreasesto1/2ofthebandwidthontheexter-nalI/O-bus.Whenpixel-rateincreases2Xinspeed,theexternalI/O-busneedstobe4Xasfast.Inad-ditioneitherhavinga4Xbus-sizeoroperatinga4Xspeedtakes4Xofthepower,thusit’squitedifficulttogetdoublerenderingpixel-rateontheexternalI/O-bus.TheRAMDACgeneratesthevisualpixel-databyMUXingA-buffer,B-buffer,andsometimesoverlay.Thesevideopixel-dataalwaysappearontheexternalI/O-bus.Thustheactualpixel-rateforvideois1/2orlessthanthebandwidthontheexternalI/O-bus.Again,it’snoteasytohavedoublevideopixel-rate.
Figure5showsaproposedgraphicsprocedure.Anindicateddata-pathimprovestheactualpixel-rateontheexternalI/O-bus.Animplementedgraphics-
198
hardwareontheframe-bufferreducesthetaskexecutedontheexternalI/O-bus.BecauseS/W/Z-testunitandBlendingunitareplacedintheframe-buffer,theresult-pixelisalsogeneratedinsidetheframe-buffer.Hence,puresource-pixelappearsontheexternalI/O-busbe-tweenrenderingandframe-buffer.Therenderingpixel-rateisexactlythesameasthebandwidthontheex-ternalI/O-bus[7].Thevideo-outputdatafromserial-portinFig.4containsA-bufferandB-buffer.Animple-mentedW-LUTindicateswhetherA-bufferorB-bufferisrequestedfromeachWindowonthescreen.There-fore,thevideopixel-datacomingfromtheframe-bufferisvisualdisplaydata.Thevideopixel-rateisalsoex-actlythesameasthebandwidthontheexternalI/O-bus.
Theimplementationofgraphicshardwareontotheframe-bufferimprovesthepixel-ratetwiceasfast,whileitmaintainsthesamedatarateandsamepowercon-sumptionontheext.I/O-bus.4.2BWontheInternalI/O-Bus
ThebandwidthontheinternalI/O-busiscalculatedasfollows.
BW[frame-buffer]
=[bussize]*[busspeed]/[∼pagemissoverhead]
(7)Here,[bussize]*[busspeed]meansmerelyapeakband-width.BecausetherenderingprocessandtheRAM-DACareconsecutivepixel-data,thepeakbandwidthdoesnotapplytotheframe-buffer.Thekeyofthegraphicsperformanceonthesystemlevelistheaveragebandwidth.Itismoreeffectivetoworkonthepage-missreductionandvideo-overhead,insteadofmakingawiderandfasterbus.Ingeneral,bussizeandbusspeeddependsontheprocessparameter,(i.e.thedesignruleofthemetalline/spacedeterminesthemaximumbussize),andthetransistorperformancemanagesthebusspeed.
Itisobviousthattherenderingcontroller,theframe-bufferandRAMDAChavetousethelatestpro-cesstechnology.Otherwise,iftheframe-bufferfocusesonthebussizeandbusspeedonly,itnevercatchesupwiththerenderingandvideoperformanceasthehistoricaltrendshowed.
Thispaperdiscussestheeffectivepixel-rateandproposestwoarchitectures.Oneisalow-voltagesingle-endedread-modify-writebusandtheotherisapro-grammablepage-size.5.
Single-EndedRead-Modify-WriteBusandDTB(DataTransferBuffer)
ThedifferentialI/O-busiswidelyusedinDRAMbe-causethesense-ampisnotcapableofdrivingtheglobal
IEICETRANS.ELECTRON.,VOL.E83–C,NO.2FEBRUARY2000
Fig.6ConventionalseparatedI/O-busarchitecture.
bus.Inreadoperation,aslightvoltage-differenceap-pearsonthedifferentialI/O-busanditisamplifiedagainattheotherendofI/O-bus.Inwrite-operation,thedifferentialI/O-busisfullydrivenbythewrite-driveranddataiseasilywrittenintothesense-amp.MostoftheDRAMspeedbottleneckiscomingfromthesense-amptotheendofI/O-busbecauseitislikeananalog-operationusingtheslightsignal.
AsthedifferentialI/O-busworksforeitherthedes-tinationortheresult-pixelintheframe-buffer,theac-tualpixel-ratealsoreducesto1/2oftheinternalI/O-busbandwidth.Aneasiermethodtoacquiremorebandwidthistoincreasethebus-size,butitislimitedbydesign-rulesuchastheminimummetallinewidthandthespacing.Theothermethodtoimprovethepixel-rateistohaveaseparatedI/O-bus.Thatisaread-modify-writebusinwhichdestinationandresult-pixelworkconcurrently.ItmakesthepixelratetwiceasfastasthedifferentialI/O-bus.However,theread-modify-writebusneeds2Xofmetallines,oneistoreadandtheotheristowrite.Bydoingso,itmightre-ducepossiblebussize.Figure6showstheexampleofconventionalread/writeseparated-I/Obusarchitec-ture.Inthisfigure,therearetwokindsofbuslines,oneisGBR(globalbusforread)andtheotherisGBW(globalbusforwrite).TheothertwokindsofCSL(col-umnselectline),areR-CSL(readcolumnselectline)andW-CSL(writecolumnselectline)thatoperatesreadandwritetogether.Becausethemaximumnum-berofmetal-linesplacedonthememoryarraydependsontheprocessparameter,it’snoteasytohavedoubleGBsanddoubleCSLs.
Figure7showstheotherexampleoftheread/writeseparatedI/O-busarchitecture.R-CSLandW-CSLrunshorizontallyandit’sbettertohavethedoublesizeofGBvertically.Thesiliconareapenaltyforimple-mentingdoubleCSLandY-gateareserious.
ConsiderabledisadvantageofaseparatedI/O-bus
INOUEetal.:ALOW-VOLTAGE42.4G-BPSSINGLE-ENDEDREAD-MODIFY-WRITEBUS
199
architectureisthenoiseproblem.Asthewrite-busswitchesfast,whichis0VtofullVDD-level,thewrite-busaffectstheweaksignalsontheread-businthecon-currentoperation.ThisistheworstattRCD=min.becausesense-ampdoesnotcompletethebit-lineam-plification.ItmusttakelongertRCD,whenwrite-busgoestodifferentdirectionfromtheread-bus.Ashieldedpower-linebetweenwrite-busandread-busisasolutiontosolvethecouplingnoiseproblem.However,ittakesextrametallinesandcausesthebus-sizereduction.
Reportedlow-voltagesingle-endedread-modify-writebuscontributesdoublebandwidthwithoutin-creasingmetalline.Thesense-ampisn’tconnectedtotheread/writebusdirectly,butaDTB(DataTrans-ferBuffer)isplacedbetweenthesense-ampandtheread/writebus.ADRAMarrayhasthesense-ampinbothsides,andeachsense-amphasDTBandcolumnselectasshowninFig.8.Figure9istheDTBstructure.Thereareten-CSLsconnectedtolocal-businapage.BecauseDTBoperateseitherread-operationorwrite-operation,itisnotnecessarytoorganizetheseparatedlocalI/O-busandseparatedCSL.
Fig.7TheotherexampleofseparatedI/O-busarchitecture.
Inotherwords,thedata-pathfromthesense-amptoDTBremainsinsidethedifferentialarchitecturetosavesiliconarea,andtheotherdata-pathbetweenDTRandGBR/GBWismodifiedtosingle-endedstruc-turetoobtaintwicethebandwidth.Inthisexperi-ment,DTBandthecolumn-switchtakelessthan15µmheightonthesilicon,itisnotasignificantdiepenaltycomparedtotheconventionalseparatedI/O-busarchi-tecture.Inreadoperation,DTBconvertstheslightsignalonthelocalI/O-busintoMOS-levelsignalonGBR.Theloadisquitesmallforthesense-ampbecauseitdrivesjustlocal-busthatinvolvesten-CSLs.Thead-ditionalamplifier,suchascurrent-mirrortype,isnotnecessarybecauseGBRisMOS-levelsignal.Inwriteoperation,DTBdrivesthelocalI/O-buswithrespecttotheGBW.Inthisexperiment,theGBWusedthedif-ferentialbusbecausewrite-maskfunctionisadopted.Itisalsoeasiertoconvertotherarchitecturesintothesingle-endedarchitecture.
FurtherobjectiveofDTBimplementationistore-ducethepowerandtoeliminatethenoiseproblem.DTBandGBR/GBWuse1Vfortheirpower-supplycomesfrominternalvoltagedown-converter(VDC).Ontheotherhand,thesense-ampandthelocal-busworkat2VVDD-level.Figure10showsthewave-formofGBRandGBW.AsforGBR/GBWspeed,1V–0Vissufficienttooperateat166MHz,becausetheanalog-operationinDRAMhasshutdownatDTB.GBR/GBWworkforjustasimpledata-pathusingMOS-levelsignal,whichdoesnothavetheequalizetimerequirement.
Thelow-voltagebusefficientlyremovesthecou-plingnoiseproblemintheconcurrentoperations.EventhoughthecouplingnoiseoccursinGBR,itnevergoesintothesense-ampbecauseGBR/GBWisn’tconnectedtothesense-amp.
Anotherareathatcanimprovethenoiseproblemisusingseparatepower-lines,the2Vpowerlineforsense-ampandcells,andthe1VpowerlineforDTBanddata-bus.Therefore,thecolumnaccessthatuses
Fig.8Single-endedRMWbusandDTB.
200
Fig.9DTBstructure.
Fig.10
GBW/GBRwaveform.
bigbus-sizeneveraffectstherow-access.Intheconven-tionalarchitecture,becauseoftheunifiedpowerline,thesense-speed(=rowaccesstime)sometimesreducesbythecolumn-accessthatisoperatedinotherbanks.Obviously,thewiderbus-sizeandthefasterbus-speedmakeitworse.Theexperimentalchiphas1k-bitread-busand1k-bitwrite-bus,eachworksconcurrentlyat166MHz,buttheydonotaffecttherowaccessspeed.Figure11showsthemeasuredIDDcurrentofrow-accessandcolumnaccess.TheIDDofGBRisalmostthesameastherow-access,eventhough1k-bitworksatthesametime.Alsoascolumn-accessuses1Vandrow-accessuses2Vfortheirpowersupply,thepowerconsumptionof2k-bitbusislessthan0.3W.Accord-ingtothecurrentSDRAM(x16)characteristicdata,thepowerconsumptionincolumn-accessisevenlargerthanthatofrow-access.6.
Page-MissRateReduction
6.1Page-MissandVideo-Overhead
In3Dapplication,therenderingprimitivesaretrian-gles.Therearethreetypesoftriangles,strip,fan,andindependent,asshowninFig.12.Inanycase,arectan-gularpage-sizeisbetterthanahorizontalshape,sinceitmustincludesmoretrianglesinapage.Forageneraldata-accesssequenceofrenderingatiled-image,thatis
IEICETRANS.ELECTRON.,VOL.E83–C,NO.2FEBRUARY2000
Fig.11MeasuredIDDcurrent.
Fig.12
Primitivetrianglein3D.
4-pixellefttorightfirst,andscantobottomsecond,arectangularpage-sizereducestherenderingpage-missrate.Ontheotherhand,thescan-linealwaysrunsfromlefttoright,thusthehorizontalpage-sizeisbetterthantherectangularshapetoreducethevideooverhead.
Itislikelythatthepage-sizeiscompromisedbythetrade-offbetweentherenderingpage-missrateandthevideooverhead.Table1showstherenderingper-formancelimitwhencommodity128-bitbusSDRAMisused.Here,severalcasesofpage-size,i.e.128*2-pixelforthehorizontalcaseand16*16-pixelfortherectan-gularcase,aretakenastheparameterinthisstudy.Apage-missratemeanstheaverageofrow-accesscountwhenatriangleisdrawn.Forexample,av.5.56-timesofrow-accessoccurswhen128*2-pixelpage-sizeisused,andav.1.86-timesofrow-accessoccurswhen16*16-pixelpage-sizeisused.Thesenumbersarecalculatedbyusingasimplesimulation.Arectangularpage-sizeisbetterinreducingtherenderingpage-missrate.
Apage-misscostis[page-missrate]*[row-accesstime],inwhich[tRCD/tRP:2/2=@24ns]isusedforthefirstrow-accessand[tRRD:2=@12ns]isusedaf-tersecondrow-accessbecauseoftheeffectofthemulti-bankarchitecture.Ablock-sizemeansapixelcache-sizeinthecolumn-access.InTable1,BPP=bitisused,thus128-bitbusiscapableoftwo-pixel(Px*Py=1*2-pixel)workinparallel.Ablock-missratemeanstheaverageofcolumn-accesscountasthe25-pixeltriangleisdrawn.Block-misscost
INOUEetal.:ALOW-VOLTAGE42.4G-BPSSINGLE-ENDEDREAD-MODIFY-WRITEBUS
201
Table1
SDRAMperformancelimit.
Table24*4-pixelcolumnaccessperformancelimit.
means[block-missrate]*[column-accesstime].Here,[read-modify-write=@12ns]isusedforcolumn-access.Video-overheadmeanslostbandwidthforvideo-refresh.1280*1024@60fpsofscreensize,thatis9ns/pixelofvideo-outputrateisused.Takingthecaseof128*2-pixelpage-sizeforexample,thevideooverheadiscal-culatedasfollows.
[videooverhead]
=[accesstimeofascan-lineinapage]/[videopixel-rateinapage]=(24ns+6ns*128-pixel/2-pixel)/(9ns*128-pixel)=35.5%
/(1-videooverhead)
When128*2-pixelpage-sizeisused,
[25-pixelTriangleRenderingcost]=(182.4ns)/(1–35.5%)=282.8ns
(10)
(11)
[25-pixelTriangleRenderingrate]
=1/[renderingcost]=3.53M-triangle/sec(12)Inthisexample,regardlessoftherenderingpage-size,thebottleneckisalwaysinthecolumn-access.Thehori-zontalpage-sizeisbetterinreducingthevideooverheadandgettinghigherrenderingrate.TheotherexampleinTable2isbasedontheembeddedframe-bufferinthispaper.Theblock-sizeincreasesto4×4-pixelsince[BPP*4*4-pixel=1k-bitbus].Theblock-misscostequationdoesn’tneed*2becausetheread-modify-writebuscanoperatedestinationandresult-pixelatthesametime.Also,thisexperimentalframe-bufferishavingVideo-port,whichdoesn’tneedthecolumn-accessinvideo-refresh.Asaresult,theperformancelimitofthe
(8)
(9)
Assumingthatrow-accessandcolumn-accessoperatesindependently,therenderingcostiscalculatedasfol-lows.
[totalTriangleRenderingcost]
=max.(page-misscost,block-misscost)
IEICETRANS.ELECTRON.,VOL.E83–C,NO.2FEBRUARY2000
202
Fig.13Aprogrammablepage-size.
page.Ontheotherhandintherenderingmode,thedifferentXnisactivatedineachSRD.Word-linea0insub-array-A,a1insub-array-B,a2insub-array-C,anda3insub-array-Dareselectedtoperformtherectangu-larpage.Figure14showsSRDcircuitryanddecodingofX3...0.
Usingtheprogrammablepage-size,inwhich128*2-pixelforthevideo-pageand32*8-pixelfortherendering-page,thefinalrenderingrateofthisexperi-mentalframe-bufferisupdatedasfollows.
[Renderingpage-missrate]:2.11
Fig.14
SRDstructure.
(14)(15)
[Renderingpage-misscost]:37.4ns/triangle
frame-bufferwhichprovides4*4-pixelofcolumnaccess,isasfollows.
[FRB:25-pixelTriangleRenderingcost]=44.9ns
(13)when32×8-pixelpage-sizeisused.6.2ProgrammablePage-Size
Thelasthighbandwidthtechniquepresentedinthispaperisaprogrammablepage-size.Itofferstwodiffer-entpage-sizes,oneisarectanglepage-sizefortheren-deringprocessandtheotherishorizontalpage-sizeforvideo-refresh.Figure13illustratestheprogrammablepage-sizeandtheword-linedecoding.Therow-decoderconsistsofmainrow-decoder(MRD)fortheprimaryselectandsubrow-decoder(SRD)forthesecondaryselect.Theword-lineisgeneratedbytheoutputofbothMRDandSRD“ANDed.”TheoutputofMRDisplacedeveryfourword-lines,sothatSRDdecodesfourtoonebytheinternalsignalX3...0asshowedinFig.14.
Inthevideomode,thesameXn(n=3...0)areactivatedinallSRD.InFig.13,word-linea0insub-array-A,b0insub-array-B,c0insub-array-C,andd0insub-array-Dareselectedtoperformthehorizontal
Thisisbecauseoftheeffectof32*8-pixelrectangularpage-sizefortherendering.
[Videooverhead]:4.2%
(16)
Thisisbecauseoftheeffectof128*2-pixelhorizontalpage-sizeforthevideo.
[Final25-pixelTriangleRenderingcost]=max.(page-misscost,block-misscost)/(1-videooverhead)=39.0ns(17)[Final25-pixelTriangleRenderingrate]=1/[renderingcost]=25.6M-triangle/sec
(18)
Resultedcontributionoftheprogrammablepage-sizeis15%higherrenderingratethandescribedinSect.6.1.7.
ExperimentalResult
A3Dframe-bufferusing0.25µmCMOSembeddedDRAMprocesstechnologyhasdeveloped.ThechipphotoisshowninFig.15.DRAMisdividedintoeightmemorymats,andSRAMisplacedatthetop(orbot-tom)ofeachmemorymat.Data-buscomprisesof1k-bitread-busand1k-bitwrite-busplusadditional40-bit
INOUEetal.:ALOW-VOLTAGE42.4G-BPSSINGLE-ENDEDREAD-MODIFY-WRITEBUS
Fig.15
Chipphoto.
Table3
Characteristicoftheembeddedframe-buffer.
busforaparticularframe,areplacedontheDRAMarraywithoutincreaseindiearea;word-line(MRD)usesthefirst-metallayer;data-bususesthesecond-metallayer;thepower-lineusesthethird-metallayer.Amemorycontrolcircuitryandpixelprocessor,arelocatedbetweenrightmemorymatandleftmemorymat.Allinputsareregisteredwhichlocateatthecen-terofchiptobalancetheskewfromthepintotheinput-register.Theimplementedgraphicshardwarere-ducesthetaskontheexternalI/O-busandimprovesthepixel-ratetwiceasfast.Thereportedembedded3Dframe-bufferrealized42.4G-BPSofbandwidthand25.6M-triangle/secofrenderingrate.AsummaryofthecharacteristicdataisshowinTable3.Thepowerconsumptionwhen2k-bitbusworkis0.3W,thatissameasthatofrow-access.8.
Conclusion
Alow-voltagesingle-endedread-modify-writebusar-
203
chitecturehasbeenpresented.Itachievesthedoublepixel-ratebydoingtheread-writebusconcurrentop-eration,whileitmaintainsthesamebus-sizeandsamebusspeedbytheeffectofsingle-endedstructure.Also,animplementedDTBconvertstheread-modify-writebusoperationintoMOS-levelsignalwhichamplitudeof0V–1V.BotharchitecturescontributetodeducethepowerconsumptionandtoeliminatethenoiseprobleminahighbandwidtheRAM.
Aprogrammablepage-sizeisalsopresentedtore-ducethepage-missrate,whichefficientlyimprovestheaverageofbandwidthaswellaswidebusandfastspeedapproach.Itprovidesprogrammable-controltwodiffer-entkindsofpage,oneishorizontalshapeandtheotherisrectangularshape,toreducethepage-missratein3Dframe-bufferapplication.
References
[1]A.Yamazaki,T.Yamagata,Y.Arita,M.Taniguchi,and
M.Yamada,“LargescaleembeddedDRAMtechnology,”IEICETrans.Electron.,vol.E81-C,no.5,pp.750–757,May1998.
[2]T.Tsuruda,M.Kobayashi,M.Tsukude,T.Yamagata,
K.Arimoto,andM.Yamada,“Highspeedhigh-bandwidthdesignmethodologiesforon-chipDRAMcoremultimediasystemLSI’s,”IEEEJ.Solid-StateCircuit,vol.32,pp.477–482,March1997.
[3]R.Torrance,I.Mes,B.Hold,D.Jones,J.Crepeau,
P.DeMone,D.MacDonald,C.O’Connell,P.Gllingham,R.White,S.Duggins,andD.Fielder,“A33GB/s13.4Mbintegratedgraphicsacceleratorandframebuffer,”ISSCC98,Dig.ofTech.,p.340,1998.
[4]S.Miyano,K.Numata,K.Sato,T.Yabe,M.Wada,
R.Haga,M.Enkaku,M.Shiochi,Y.Kawashima,M.Iwase,M.Ohgata,J.Kumagai,T.Yoshida,M.Sakurai,S.Kaki,N.Yanagiya,H.Shinya,T.Furuyama,P.Hansen,M.Hannah,.Nagy,A.Nagarajan,and.Rungsea,“A1.6GB/sdata-transfer-rate8MbembeddedDRAM,”ISSCC95,Dig.ofTech.,p.300,1995.
[5]T.Watanabe,R.Fujita,K.Yanagisawa,H.Tanaka,
K.Ayukawa,M.Soga,Y.Tanaka,Y.Sugie,andY.Nakagome,“Amodulararchitecturefora6.4-Gbyte/s,8-Mbitmediachip,”1996SymposiumonVLSICircuits,Dig.ofTech.,p.42.
[6]K.Ayukawa,T.Watanabe,andS.Narita,“Anaccess-sequencecontrolschemetoenhancerandomaccessperfor-manceofembeddedDRAMs,”1997SymposiumonVLSICircuits,Dig.ofTech.,p.59.
[7]K.Inoue,H.Nakamra,andH.Kawai,“A10Mbframebuffer
memorywithZ-compareandA-bendunit,”IEEEJ.Solid-StateCircuit,vol.30,no.12,pp.1563–1568,Dec.1995.
204
IEICETRANS.ELECTRON.,VOL.E83–C,NO.2FEBRUARY2000
KazunariInouewasbornin1961.HereceivedhisB.S.degreeinelectricen-gineeringfromYokohamaNationalUni-versity,Japan,in1984.In1984,hejoinedMitsubishiElectricCorporation,Hyogo,Japan.Since1984,hehasbeenen-gagedinthedevelopmentofsemiconduc-tormemoriesforthegraphicsapplication.
HideakiAbewasbornin1966.HereceivedtheB.S.degreeinelectronicen-gineeringinKyotoInstituteofTechnol-ogy,Japan,in1990.In1990,hejoinedMitsubishiElectricCorporation,Hyogo,Japan.Since1990,hehasbeenen-gagedinthedevelopmentofsemiconduc-tormemoriesforthegraphicsapplication.
KaoriMoriwasbornin1969.Shere-ceivedtheB.S.degreeinphysicsinKyotoInstituteofTechnology,Japan,in1992.In1992,shejoinedMitsubishiElectricCorporation,Hyogo,Japan.Since1992,shehasbeenengagedinthedevelopmentofsemiconductormemoriesforthegraph-icsapplication.
ShujiFukagawawasbornin1972.HereceivedtheB.S.andM.S.degreesinelectronicengineeringinHimejiInsti-tuteofTechnology,Japan,in1997.In1997,hejoinedMitsubishiElectricCor-poration,Hyogo,Japan.Since1997,hehasbeenengagedinthedevelopmentofsemiconductormemoriesforthegraphicsapplication.
因篇幅问题不能全部显示,请点此查看更多更全内容
Copyright © 2019- oldu.cn 版权所有 浙ICP备2024123271号-1
违法及侵权请联系:TEL:199 1889 7713 E-MAIL:2724546146@qq.com
本站由北京市万商天勤律师事务所王兴未律师提供法律服务