Biology#

class polars_extensions.biology.BioExtensionNamespace(expr: Expr)[source]#

Bases: object

Polars expression namespace for biological sequence conversions and analysis.

Methods

at_content()

Return AT content as a percentage.

contains_motif(motif)

Return True if sequence contains the given motif.

count_codons()

Return the number of codons (sequence length / 3).

count_motif(motif)

Count occurrences of a motif in the sequence.

count_nucleotides()

Return a struct of counts for A, T, G, and C.

delete_sequence(start, end)

Delete a segment of the sequence from start to end.

dna_complement()

Return the DNA complement (A↔T, C↔G).

dna_reverse_complement()

Return the reverse complement of DNA sequence.

dna_to_rna()

Convert DNA sequences (A, T, G, C) to RNA sequences (A, U, G, C).

dna_transcribe()

Transcribe DNA to RNA (same as dna_to_rna).

gc_content()

Return GC content as a percentage.

gc_skew()

Compute GC skew = (G - C) / (G + C).

hamming_distance(other)

Return Hamming distance between two equal-length sequences.

insert_sequence(position, subseq)

Insert a subsequence at the given position.

is_valid_dna()

Return True if the sequence only contains valid DNA bases (A, T, G, C, N).

is_valid_rna()

Return True if the sequence only contains valid RNA bases (A, U, G, C, N).

mutate_sequence(position, new_base)

Mutate a sequence by replacing one base at a given position (0-indexed).

repeat_sequence(n)

Repeat the sequence n times.

reverse_sequence()

Reverse a sequence string.

rna_to_dna()

Convert RNA sequences (A, U, G, C) to DNA sequences (A, T, G, C).

sequence_length()

Return the sequence length.

at_content() Expr[source]#

Return AT content as a percentage.

contains_motif(motif: str) Expr[source]#

Return True if sequence contains the given motif.

count_codons() Expr[source]#

Return the number of codons (sequence length / 3).

count_motif(motif: str) Expr[source]#

Count occurrences of a motif in the sequence.

count_nucleotides() Expr[source]#

Return a struct of counts for A, T, G, and C.

delete_sequence(start: int, end: int) Expr[source]#

Delete a segment of the sequence from start to end.

dna_complement() Expr[source]#

Return the DNA complement (A↔T, C↔G).

dna_reverse_complement() Expr[source]#

Return the reverse complement of DNA sequence.

dna_to_rna() Expr[source]#

Convert DNA sequences (A, T, G, C) to RNA sequences (A, U, G, C).

dna_transcribe() Expr[source]#

Transcribe DNA to RNA (same as dna_to_rna).

gc_content() Expr[source]#

Return GC content as a percentage.

gc_skew() Expr[source]#

Compute GC skew = (G - C) / (G + C).

hamming_distance(
other: Expr,
) Expr[source]#

Return Hamming distance between two equal-length sequences.

Parameters:
otherpl.Expr

Another expression containing sequences of equal length.

insert_sequence(
position: int,
subseq: str,
) Expr[source]#

Insert a subsequence at the given position.

is_valid_dna() Expr[source]#

Return True if the sequence only contains valid DNA bases (A, T, G, C, N).

is_valid_rna() Expr[source]#

Return True if the sequence only contains valid RNA bases (A, U, G, C, N).

mutate_sequence(
position: int,
new_base: str,
) Expr[source]#

Mutate a sequence by replacing one base at a given position (0-indexed).

repeat_sequence(n: int) Expr[source]#

Repeat the sequence n times.

reverse_sequence() Expr[source]#

Reverse a sequence string.

rna_to_dna() Expr[source]#

Convert RNA sequences (A, U, G, C) to DNA sequences (A, T, G, C).

sequence_length() Expr[source]#

Return the sequence length.