python3 -m pip install -r requirements.txt
Then try python3 -m pip gurobipy
to install python support for the gurobi
solver.
In OSX you may need to install girubi
support for python manualy by executing this command: python3 /Library/gurobiXXX/mac64/setup.py install
, where XXX
is the version installed in your computer.
The files nosqlinjection_projects.txt
, sqlinjection_projects.txt
, and xss_projects.txt
contains each one a
a list of datatabases to be fetched from the LGTM site.
Run python3 -m misc.scrape -dld [project-slug] -o [outputdirectory]
where
project-slug
is one database listed in the three aforementioned files (e.g. 1046224544/fontend
). The result of the script is a zip file (e.g.,outputdirectory/1046224544-fontend.zip
) will be placed in the folder outputdirectory
that must exist beforehand.
Finally unzip the zip file corresponding to the downloaded database (e.g.,:output/1046224544-fontend.zip
)
To run the analysis pipeline there are currently two options:
- Using the Orquestator to run the analysis end-to-end or indivitual steps
- Execute indivirual the scripts
The Orchestrator
can be used to execute each phase of the analysis pipeline.
The pipeline at the moment has the following steps implemented:
generate_entities
: Generaterepr
functions for sources/sinks/sanitizers, propagation graph nodes and edges and a node-to-repr
mapping.generate_model
: Generategurobi
model to optimize.optimize
: Rungurobi
with the model generated in thegenerate_model
step.generate_scores
: Generate scores info for sinks, sources and sanitizers. This will leave the scores in the folderdata/[db-name]/*tsmworse-*.prop.csv
These steps can be executed individually or all together in an end-to-end runner. You can use the orchestrator in code, or with it's CLI. The latter one is located in main.py
.
First, configure the config.json
file, which has to be located at the constraintsolving/
root dir. It has the following properties:
{
"codeQLExecutable": "absolute path to the codeql executable",
"codeQLSourcesRoot": "absolute path to this project's root directory (where the `.git` folder lives)",
"workingDirectory": "absolute path to the working directory",
"resultsDirectory": "absolute path to the results dir"
}
Then, you can either run a single step of the pipeline:
# Run the whole pipeline:
python main.py --project-dir output/abhinavkumarl-bidding-system/ --query-type Xss --query-name DomBasedXssWorse
For instance this command will run the generate_scores
step
# Execute one step:
python main.py --project-dir output/abhinavkumarl-bidding-system/ --single-step generate_scores --query-type Xss --query-name DomBasedXssWorse
To see more options or get help from the CLI:
python main.py --help
Follow the steps below to prepare the environment:
- Let's say the CodeQL database has bees stored in
output/1046224544_fontend_19c10c3
- Set the following environment variables:
CODEQL=
path tocodelql
bynary (e.g.,/home/tools/semmle/codeql-cli-atm-home/codeql/codeql
)CODEQL_SOURCE_ROOT=
path to theql
queries root (e.g.,/home/dev/microsoft/ql
)QUERY_TYPE=
query type, one of [Xss
,NoSql
,Sql
]QUERY_NAME=
query name, one of [NosqlInjectionWorse
,SqlInjectionWorse
,DomBasedXssWorse
]
Then invoke python3 -m generation.main --step entities --project-dir [projectDir]
where projectDir
is the name of the resulting folder after of the unzipped database and the output folder (e.g.,output/1046224544_fontend_19c10c3
) or simply invoke ./generateData.sh [projectDir]
.
This module will orchestrate all the calls to the CodeQL
toolchain, to generate and extract:
- Sources, sinks and sanitizers information
- Propagation graph
- A seldon-like repr mapping
Now, we will generate the constraints out of the information generated in the previous phase and send invoke the gurobi optimizer.
To produce the constraints perform python3 main.py --mode [project] -g
, where project is the name of the project to analyze (e.g.: 1046224544_fontend_19c10c3
)
This will obtain the information from the CVS files in the data
folder generate all constraints in the constraints/projectdir
folder (e.g.,constraints/1046224544_fontend_19c10c3/...
)
To analyze several projects use the option --mode combined
and add the parameter --project_folders [folder]
to indicate the folder containing all the projects to be analyzed.
To run the solver perform: python3 main.py --mode [projectdir] -s
This will generate results in results/projectdir/[query-name+timestamp]
and the gurobi model in models/projectdir/[query-name+timestamp]
folder.
results/projectdir/[query-name=timestamp]
will contain reprScores.txt
which needs to be added to ql/javascript/ql/src/TSM/tsm_[query-type]_worse.qll
file or use the next steps to combine scores
Use cd misc; python3 combinescores.py
to combine the scores from each database.
Copy the scores to the target query-type
: codeql/javascript/ql/src/TSM/tsm_[query-type]_worse.qll
For each database, run
python3 -m generation.main --step scores --project-dir [projectDir]
where projectDir
is the name of the resulting folder after of the unzipped database and the output folder (e.g.,output/1046224544_fontend_19c10c3
).
This will generate csv files scoring all events in data/[db-name]/*tsmworse-*.prop.csv
Use python3 generateMetrics.py [projectList]
where projectList
is one of these files nosqlinjection_projects.txt
, sqlinjection_projects.txt
, and xss_projects.txt
that contains list of projects. This script will compute precision and recall across different thresholds for the query type.