Bachelor Thesis from the year 2015 in the subject Computer Science -
Bioinformatics, Technical University of Munich, language: English,
abstract: Nuclear transport of proteins is a basic cellular mechanism
preceding a lot of biological processes. The classical transport
mechanism for nuclear proteins involves karyopherins importing and
exporting the proteins. The karyopherins recognize typically nuclear
transport signals in the protein sequence. Three main types of nuclear
localization signals (NLS) are focused in the scientific field of
nuclear protein transport: monopartite, bipartite and PY-NLS. In studies
on nuclear export signals (NES) the specific type of leucine-rich
signals is often investigated. The first goal of this thesis was to
update NLSdb, a database containing 114 experimental and 194 potential
NLS, to the current state of available data. Towards this end, a set of
2452 novel signals with published experimental evidence was extracted
from the literature and used as development set. An in silico
mutagenesis approach was applied to this set to detect 4301 novel
potential NLS in nuclear proteins. We matched these potential NLS in
protein sequences of unannotated subcellular localization to identify
nuclear proteins. We were able to confirm the predicted localization
using our potential NLS in literature. Additional to the collection of
data, an extensive analysis on protein sequences containing NLS and NES
was performed to provide insights into subcellular localization of
proteins and their occurrence in various organisms. A clustering of
sequences of NLS led to the separation of signals into distinct
sub-groups with a clear definition of a consensus sequence for each
sub-group. Aligning potential NLS against the sub-groups resulted in a
refinement of the consensus sequences. The results from this study
reflect the scientific progress, lead to further knowledge in the field
of nuclear transport and highlight the usability of bioinformatics
methods