This directory contains supplementary data relevant to: Hegyi & Gerstein, Functional divergence in multidomain proteins: a large-scale survey of structural superfamilies __________________ function-codes.txt ~~~~~~~~~~~~~~~~~~ We divided functions into enzyme and non-enzyme. Enzymatic functions were classified by the EC system (Bairoch, 2000). Comparisons of enzymatic functions were treated the same way as in our earlier analysis, i.e. if they differ in the first 3 components of their respective EC numbers, they were considered different. This implied that our analysis dealt with a total of 112 enzymatic functions. Non-enzymatic functions were classified into 508 different categories based on a simple thesaurus we assembled of synonymous keywords drawn from Swiss-prot description lines. In addition, we created 49 categories for functions that have an enzymatic component but which are not part of the EC system. This gave us a total of 669 functions (112+508+49). The file contains a list of synonyms in the Swiss-prot database, used for our functional classification. One can have more than 1 function for a given protein. E is EC system, and D is an enzyme that is not annotated in the EC system. __________________________________________ new_combo_wmultidomcoverage_wlen_wcode.lst ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 4763 lines XXXXXXXXXXXXXXXXXXXXXX|YYYYYYYYYY|ZZZ|WWW|U#| ... |VVVVVVVV.... 1.100.1 3.5.4 4.33.1|CH60_ACTAC|546|1 1|1#|1.100.1 4.33.1 3.5.4 4.33.1 1.100.1|60 KD CHAPERONIN (PROTEIN CPN60) (GROEL PROTEIN) 1.100.1 3.5.4 4.33.1|CH60_ACTPL|546|1 1|1#|1.100.1 4.33.1 3.5.4 4.33.1 1.100.1|60 KD CHAPERONIN (PROTEIN CPN60) (GROEL PROTEIN) 1.100.1 3.5.4 4.33.1|CH60_ACYPS|548|1 0|1#|1.100.1 4.33.1 3.5.4 1.100.1|60 KD CHAPERONIN (PROTEIN CPN60) (GROEL PROTEIN) (SYMBIONIN) 1.100.1 3.5.4 4.33.1|CH60_AGRTU|544|1 0|1#|1.100.1 4.33.1 3.5.4 1.100.1|60 KD CHAPERONIN (PROTEIN CPN60) (GROEL PROTEIN) 2.1.1 4.15.1|HB2P_HUMAN|258|1 0|45#|4.15.1 2.1.1|HLA CLASS II HISTOCOMPATIBILITY ANTIGEN, DP(W4) BETA CHAIN PRECURSOR - HOMO SAPIENS (HUMAN) 2.1.1 4.15.1|HB2P_RABIT|257|1 0|45#|4.15.1 2.1.1|RLA CLASS II HISTOCOMPATIBILITY ANTIGEN, DP BETA CHAIN PRECURSOR (D10 HAPLOTYPE) - ORYCTOLAGUS CUNICULUS (RABBIT) 2.1.1 4.15.1|HB2Q_HUMAN|258|1 0|45#|4.15.1 2.1.1|HLA CLASS II HISTOCOMPATIBILITY ANTIGEN, DP(W2) BETA CHAIN PRECURSOR (SB-2-BETA) - HOMO SAPIENS (HUMAN) 2.1.1 4.15.1|HB2Q_MOUSE|265|1 0|45#|4.15.1 2.1.1|H-2 CLASS II HISTOCOMPATIBILITY ANTIGEN, A-Q BETA CHAIN PRECURSOR - MUS MUSCULUS (MOUSE) 2.1.1 4.15.1|HB2S_MOUSE|263|1 0|45#|4.15.1 2.1.1|H-2 CLASS II HISTOCOMPATIBILITY ANTIGEN, A-S BETA CHAIN PRECURSOR - MUS MUSCULUS (MOUSE) 4.72.1 5.13.1|GYRB_HELPY|773|1 0|E5.99.1|4.72.1 5.13.1|DNA GYRASE SUBUNIT B (EC 5.99.1.3) 4.72.1 5.13.1|GYRB_MYCCA|643|1 0|E5.99.1|4.72.1 5.13.1|DNA GYRASE SUBUNIT B (EC 5.99.1.3) 2.39.2 3.69.1|BISC_RHOSH|744|1 1|D369#|3.69.1 2.39.2|BIOTIN SULFOXIDE REDUCTASE (EC 1.-.-.-) (BDS REDUCTASE) (BSO REDUCTASE) - RHODOBACTER SPHAEROIDES (RHODOPSEUDOMONAS SPHAEROIDES) 2.39.2 3.69.1|BISZ_ECOLI|809|1 1|D369#|3.69.1 2.39.2|BIOTIN SULFOXIDE REDUCTASE 2 (EC 1.-.-.-) (BDS REDUCTASE 2) (BSO REDUCTASE 2) X = scop superfamily identifiers, giving non-redundant combination Y = swiss id Z = length W = is it fully covered? (1 1) U = function code V = swiss prot description (DE) line ____________________________________________ realsingle_sfam_wfun_nofrag_nohypo_wcode.lst ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1818 lines describing single domain classifications similar format to the multi-domain file above YYYYYYYYYY|XXXXX|UUU#|VVVVVV..... GLB1_CALSO|1.1.1|167#|GLOBIN I (HB I) - CALYPTOGENA SOYOAE (DEEP-SEA COLD-SEEP CLAM) GLB1_PHESE|1.1.1|167#|GLOBIN I, EXTRACELLULAR (ERYTHROCRUORIN) - PHERETIMA SIEBOLDI (EARTHWORM) GLB2_CALSO|1.1.1|167#|GLOBIN II (HB II) - CALYPTOGENA SOYOAE (DEEP-SEA COLD-SEEP CLAM) GLBC_NIPBR|1.1.1|167#|GLOBIN, CUTICULAR ISOFORM PRECURSOR GLB1_GLYDI|1.1.1|167#|GLOBIN, MAJOR MONOMERIC COMPONENT - GLYCERA DIBRANCHIATA (BLOODWORM) X = scop superfamily identifiers Y = swiss id U = function code V = swiss prot description (DE) line --- last updated on 2000,12.11