#include <Gapseq.hh>
Inheritance diagram for RazorBack::Gapseq_:
Public Methods | |
Gapseq_ (Monomer_::Type_ Mt=Monomer_::PROT) | |
Create a gapped sequence object with the prescribed monomer type (amino acid codes by default). More... | |
Gapseq_ (const char *Codes) | |
Create a gapped sequence object with the prescribed monomer alphabet given in Codes. More... | |
Gapseq_ (const Bioseq_ &B) | |
copy ctor & conversion from base class. | |
Gapseq_ (const Gapseq_ &G) | |
copy ctor. | |
virtual char | at_1 (size_t Idx) const |
at_1(Idx), at_1(Idx, C): safe index operator for the Idx:th char in the sequence. More... | |
virtual bool | at_1 (size_t Idx, char C) |
virtual const char* | at_3 (size_t Idx) const |
at_3(Idx), at_3(Idx, Ccc): safe index operator for getting and setting a monomer using the 3-letter abbreviations. More... | |
virtual bool | at_3 (size_t Idx, const char *Ccc) |
const string& | seq () const |
seq(): inherited from Bioseq_, returns a const ref to the ungapped sequence. More... | |
virtual void | seq (const string &S) |
seq(S): stores the sequence S in the calling object. More... | |
string | gapped_sequence (bool Gcg=false) const |
gapped_sequence([Gcg]): create and return a string with the gap characters actually inserted. More... | |
void | gapped_sequence (const string &Gseq, int Mtype=Monomer_::UNKNOWN) |
virtual bool | is_legal (char C) const |
is_legal(C): returns true if C is a legal character in the calling object under the current monomer alphabet and case-sensitivity setting. More... | |
virtual bool | is_legal (const char *Ccc) const |
bool | is_legal (const string &S) const |
void | insert_gaps (size_t Idx, size_t N) |
insert_gaps(Idx, N): insert N gap characters before position [Idx]. More... | |
void | remove_gaps (size_t Idx) |
remove_gaps([Idx]): removes the gap characters before net position [Idx]. More... | |
void | remove_gaps () |
size_t | net_pos (size_t Gross) const |
net_pos(Gross): given the "gross" position Gross, the ungapped net position is returned. More... | |
size_t | gross_pos (size_t Net) const |
gross_pos(Net): given the "net" position Net, the gapped gross position is returned (inverse of net_pos(Gross) above). More... | |
size_t | net_length () const |
net_length(): the length of the sequence without the gaps. | |
size_t | gross_length () const |
gross_length(): the total length of the sequence with gaps. | |
virtual Errtype_ | read_seq (istream &Inf, int Type=Monomer_::USER, int Format=ANYFORMAT) |
read_seq(Inf, Type, Format): tries to read a sequence from the input stream Inf. More... | |
virtual Errtype_ | write_seq (ostream &Outf, int Format) const |
write_seq(Outf, Format): writes the contents of the calling object to Outf according to the format specification in Format. More... | |
Static Public Attributes | |
const char | GAP |
the gap character '-'. | |
const char* | GAP3 |
three-letter version "---". | |
const char | GCG_GAP |
the gap character '.'. | |
const char* | GCG_GAP3 |
three-letter version "...". | |
const char | GCG_ENDGAP |
the end gap character '~'. | |
const char* | GCG_ENDGAP3 |
three-letter version "~~~". | |
Protected Methods | |
virtual bool | is_legal (int Mt, char C) const |
Private Attributes | |
list<Gap_> | Gaplist |
list of gaps in gapped sequence. | |
size_t | Gaplen |
total no. of gap chars in gapped sequence. |
Provides methods for inserting and deleting gaps.
Definition at line 49 of file Gapseq.hh.
|
Create a gapped sequence object with the prescribed monomer type (amino acid codes by default). Consult "Monomer.hh" for available monomer types. |
|
Create a gapped sequence object with the prescribed monomer alphabet given in Codes. Consult "Monomer.hh" for code string syntax. |
|
copy ctor & conversion from base class.
|
|
copy ctor.
|
|
at_1(Idx), at_1(Idx, C): safe index operator for the Idx:th char in the sequence. at_1(NPOS) returns a GAP character. The non-const version refuses to store characters not corresponding to the current type of the calling object. Throws Indexrangexc_ if Idx is NPOS or illegal. Returns true on success, false when C was illegal. Reimplemented from RazorBack::Bioseq_. |
|
Reimplemented from RazorBack::Bioseq_. |
|
at_3(Idx), at_3(Idx, Ccc): safe index operator for getting and setting a monomer using the 3-letter abbreviations. at_3(NPOS) returns a GAP3 string. The non-const version refuses to store abbreviations that are illegal. Returns true on success, false when Ccc was illegal. Reimplemented from RazorBack::Bioseq_. |
|
Reimplemented from RazorBack::Bioseq_. |
|
seq(): inherited from Bioseq_, returns a const ref to the ungapped sequence. seq(S): stores the sequence S in the calling object. Refuses storage if S does not correspond to calling obj type. Note that changing the underlying sequence would invalidate the gap records, therefore all gaps are removed by this method. Reimplemented from RazorBack::Bioseq_. Definition at line 169 of file Gapseq.hh. Referenced by gross_length(), and net_length(). |
|
seq(S): stores the sequence S in the calling object. Refuses storage if S does not correspond to calling obj type. Reimplemented from RazorBack::Bioseq_. |
|
gapped_sequence([Gcg]): create and return a string with the gap characters actually inserted. If Gcg==true (default false), then the funny GCG gap characters are used, ie. '~' for trailing gaps and '.' for internal gaps. gapped_sequence(Gseq,[Mtype]): set the calling object to contain the gapped sequence Gseq where the gap positions are actually indicated by the gap characters. Here all gap characters are accepted, no matter where they are. The new monomer type can be forced by specifying Mt, if it is Monomer_::UNKNOWN (the default) then autodetection will be done. |
|
|
|
is_legal(C): returns true if C is a legal character in the calling object under the current monomer alphabet and case-sensitivity setting. is_legal(Ccc): does the same for 3-letter abbreviations. is_legal(S): does the same for the complete sequence string S. Reimplemented from RazorBack::Bioseq_. Definition at line 203 of file Gapseq.hh. Referenced by seq(). |
|
Reimplemented from RazorBack::Bioseq_. |
|
Reimplemented from RazorBack::Bioseq_. |
|
insert_gaps(Idx, N): insert N gap characters before position [Idx]. If Idx==NPOS, the gap will be placed after the sequence. If Idx is out-of-range, then Indexrangexc_ will be thrown. If there is already a gap before Idx, then the length of this existing gap will be changed to N. N must be non-0, if N==0 then nothing is done. |
|
remove_gaps([Idx]): removes the gap characters before net position [Idx]. Nothing is done if there were no gaps there. If Idx is out of range, Indexrangexc_ will be thrown. If Idx is omitted, then all gaps are removed. |
|
Definition at line 235 of file Gapseq.hh. Referenced by seq(). |
|
net_pos(Gross): given the "gross" position Gross, the ungapped net position is returned. If there is a gap at Gross then NPOS is returned. If Gross is illegal (>=gross_length()), then Indexrangexc_ is thrown. |
|
gross_pos(Net): given the "net" position Net, the gapped gross position is returned (inverse of net_pos(Gross) above). If Net is illegal (>=net_length()), then Indexrangexc_ is thrown. |
|
net_length(): the length of the sequence without the gaps.
|
|
gross_length(): the total length of the sequence with gaps.
|
|
read_seq(Inf, Type, Format): tries to read a sequence from the input stream Inf. If Type is set to Monomer_::PROT, then only protein sequences will be read, if it is set to Monomer_::[DEOXY[RIBO]]NUCL, then only the corresponding nucleic acid sequences will be read, if Type is Monomer_::USER (default) or an OR-ed combination of PROT and *NUCL, then all types are considered and a polymer type autodetection is attempted. This is not foolproof. Format can be set to ANYFORMAT (default), in this case automatic file format detection is performed, or to FASTA (protein or nucleic acid) or to PIR (proteins only). The other formats provided by the Bioseq_ class are not accepted because those cannot be used in multiple alignments. No read is performed and FORMATERR is returned when incompatible type and format are specified. Return value: OK or an error enum if something went wrong. Reimplemented from RazorBack::Bioseq_. |
|
write_seq(Outf, Format): writes the contents of the calling object to Outf according to the format specification in Format. Only FASTA and PIR formats are accepted, see comments to read_seq(..). Returns OK or an error enum if something went wrong. Reimplemented from RazorBack::Bioseq_. |
|
Reimplemented from RazorBack::Bioseq_. |
|
the gap character '-'.
|
|
three-letter version "---".
|
|
the gap character '.'.
|
|
three-letter version "...".
|
|
the end gap character '~'.
|
|
three-letter version "~~~".
|
|
list of gaps in gapped sequence.
|
|
total no. of gap chars in gapped sequence.
|