Collection of scripts for bacterial genomics
A collection of scripts intended for bacterial genomics (some might also be useful for eukaryotes) from high-throughput sequencing (aka next-generation sequencing).
calc_fastq-stats
cat_seq
cdd2cog
cds_extractor
ecoli_mlst
genomes_feature_table
ncbi_ftp_download
order_fastx
po2anno
po2group_stats
prot_finder
rename_fasta_id
revcom_seq
rod_finder
sam_insert-size
sample_fastx-txt
seq_format-converter
tbl2tab
trunc_seq
All the scripts here are written in Perl (some include bash shell wrappers).
Each script is hosted in its own folder, so that a separate README.md can be included for more information. However, all of the Perl scripts include additionally a usage/help text or a comprehensive POD (Plain Old Documentation) by calling the script either without arguments/options or option -h|-help.
The scripts are only tested under UNIX, some won't run in a Windows environment (because of included UNIX commands). If you are on Windows an alternative might be Cygwin.
To download the repository, use either the 'Download ZIP' link after clicking the green 'Clone or download' button at the top or clone the repository with git
:
git clone https://github.com/aleimba/bac-genomics-scripts.git
If there is an update to this GitHub repository (see above commits and releases), you can refresh your local repository by using the following command inside the local folder:
git pull
To install the scripts, copy them e.g. to a home /bin folder in your PATH and make them executable
$ find . \( -name '*.pl' -o -name '*.sh' -o -name '*.fas' -o -name '*.txt' \) -exec cp {} ~/bin \;
$ chmod u+x ~/bin/*.pl
the scripts can then be run everywhere on your system. Of course you can just call them directly by prefexing perl
to the command or a './' for bash wrappers:
$ perl /path/to/script/script.pl <options>
or
$ ./script.sh <arguments>
Single scripts can be downloaded as well. For this purpose click on the folder you're interested in and then on the link of the script. There click on the Raw button and save this page to a file (without Raw you'll get an unusable html file). This is also true for other files (e.g. PDFs etc.).
All scripts are tested with Perl v5.22.1.
Most of the Perl scripts include modules from BioPerl as stated in their respective README.md or POD, which as a consequence has to be installed on your system. For BioPerl installation instructions see the website (Installation).
Some scripts need additional Perl modules, which will be stated in the associated README.md or POD. If they're not installed yet on your system get them from CPAN (installation instructions can be found on the website, see e.g. Getting Started...Installing Perl Modules or FAQ).
Furthermore, some scripts call upon statistical computing language R and dependent packages for plotting purposes (again see the respective README.md or POD).
A very handy tip, if you want to run a script on all files in the current working directory you can use a loop in UNIX, e.g.:
$ for file in *.fasta; do perl script.pl "$file"; done
At last, some of the scripts don't like Windows formatted line breaks, you might consider running these input files through a nifty UNIX utility called dos2unix:
$ dos2unix input
For now cite the latest major release (tag: bovine_ecoli_mastitis) hosted on Zenodo:
Leimbach A. 2016. bac-genomics-scripts: Bovine E. coli mastitis comparative genomics edition. Zenodo. http://dx.doi.org/10.5281/zenodo.215824.
Also, all scripts have a version number (see option -v), which might be included in a materials and methods section.
All scripts are licensed under GPLv3 which is contained in the file LICENSE.
For help, suggestions, bugs etc. use the GitHub issues or write an email to aleimba [at] gmx [dot] de.
Andreas Leimbach (Microbial Genome Plasticity, Institute of Hygiene, University of Muenster)